<a href="https://colab.research.google.com/github/YogeshGadade/Analytics/blob/master/FIFA_World_Cup_2022_Twitter_Data_Sentiment_Analysis_and_Predictions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



### About FIFA 2022: 
- World Cup 2022 Dates: Nov 20, 2022 – Dec 18, 2022
- The FIFA World Cup, a global football sporting event that takes place every four years, was in Qatar.


## Problems Addressed:
### 1. Sentiment Analysis
#### Problem:
- The decision to hold the World Cup in Qatar has sparked several controversies, including allegations of corruption and human rights violations.

#### Solution: 
- So, what do football lovers think about the FIFA World Cup 2022? To find out that, I’ll perform a Twitter sentiment analysis using the hashtag #WorldCup2022 on Twitter in this blog post.

### 2. Predictions
Answering Following Questions:
1. Champion Team
2. Best Player(s)
3. Top Scorer(s) 

Answering Following Questions: 
1. Which team has the most wins?
2. What is the winning percentage comparing when the highest-ranked team plays against the lowest-ranked team?
3. What is the winning percentage comparing when the highest attack rank plays against the lowest attack rank?
4. Do teams with stronger offensive scores have more goals?
5. Do teams with stronger goalkeepers receive fewer goals?
6. Team with the longest win streak?
7. Better team win percentage?
8. which team has the best goalkeeper, strongest defense, midfield, and offense score?
9. Teams with a high win percentage?
10. Winner prediction FIFA world cup 2022


## Performance of Finals Prediction
1. Actual Results Vs Predictions 


In [None]:
# Install and scrape with snscrape
!pip install snscrape

In [None]:
import snscrape.modules.twitter as sntwitter
import pandas as pd

# Creating a list to append all tweet attributes(data)
tweets = []

# Creating query
query = '#WorldCup2022 lang:en since:2022-11-18 until:2022-11-19'
q = sntwitter.TwitterSearchScraper(query)

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(q.get_items()):
    if i > 1000:
      break
    #if i%1000==0:  # print after every 1000
      #break
    #  print(i)
    tweets.append([tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content])

# Converting data to dataframe
tweets_df = pd.DataFrame(tweets, columns=["Date Created", "Number of Likes", "Source of Tweet", "Tweet"])
tweets_df.head()

In [None]:
len(tweets_df), tweets_df['Date Created'].min(), tweets_df['Date Created'].max()

In [None]:
#Let's import the model we'll use for Twitter sentiment analysis.
!pip install transformers

In [None]:
from transformers import pipeline
sentiment_analysis = pipeline(model="cardiffnlp/twitter-roberta-base-sentiment-latest")

In [None]:
# Let's find out the sentiment in each tweet with the for loop.
# Creating a list to append all tweet attributes(data)
tweet_sa = []

# Creating query
query = '#WorldCup2022 lang:en since:2022-11-20 until:2022-11-21'
q = sntwitter.TwitterSearchScraper(query)

# Preprocess text (username and link placeholders)
def preprocess(text):
    new_text = []
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)

# Predicting the sentiments of tweets
for i,tweet in enumerate(q.get_items()):
    if i>100:
        break
    content = tweet.content
    content = preprocess(content)
    sentiment = sentiment_analysis(content)
    tweet_sa.append({"Date Created": tweet.date, "Number of Likes": tweet.likeCount, 
                     "Source of Tweet": tweet.sourceLabel, "Tweet": tweet.content, 'Sentiment': sentiment[0]['label']})

In [None]:
# Let's convert data into a dataframe.
import pandas as pd
pd.set_option('max_colwidth', None)

# Converting data to dataframe
df = pd.DataFrame(tweet_sa)
df.head()

In [None]:
df.tail()

In [None]:
# Data Visualization - First, let's count the number of tweets by sentiments.

import matplotlib.pyplot as plt

sentiment_counts = df.groupby(['Sentiment']).size()
print(sentiment_counts)

In [None]:
#Now let's draw a pie plot for the sentiments.
# Let's visualize the sentiments
fig = plt.figure(figsize=(6,6), dpi=100)
ax = plt.subplot()
sentiment_counts.plot.pie(ax=ax, autopct='%1.1f%%', startangle=270, fontsize=12, label="")

In [None]:
# Creating a world cloud with positive tweets.  

In [None]:
from wordcloud import WordCloud
from wordcloud import STOPWORDS

# Wordcloud with positive tweets
positive_tweets = df['Tweet'][df["Sentiment"] == 'positive']
stop_words = ["https", "co", "RT", "WorldCup2022"] + list(STOPWORDS)
positive_wordcloud = WordCloud(width=800, height=400, background_color="black", stopwords = stop_words).generate(str(positive_tweets))
plt.figure(figsize=[20,10])
plt.title("Positive Tweets - Wordcloud")
plt.imshow(positive_wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

In [None]:
# Next, let's create a world cloud with negative tweets.

# Wordcloud with negative tweets
negative_tweets = df['Tweet'][df["Sentiment"] == 'negative']
stop_words = ["https", "co", "RT", "WorldCup2022"] + list(STOPWORDS)
negative_wordcloud = WordCloud(width=800, height=400, background_color="black", stopwords = stop_words).generate(str(negative_tweets))
plt.figure(figsize=[20,10])
plt.title("Negative Tweets - Wordcloud")
plt.imshow(negative_wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()