<center><div class="alert alert-block alert-info"> Please Upvote ✌ if you like the notebook and share possible improvements in the comments.</div> </center>

## Worst Crisis

**India has reported more than 300,000 daily infections for 21 consecutive days, highlighting the country's slide into the world's worst health crisis. One research model is predicting deaths could quadruple to 1,018,879 from the current official count of almost 254,200. Just as some countries needed ventilators in large quantities last year, India is now desperately seeking oxygen supplies and concentrators.**

## About Dataset

* user_name - The name of the user, as they’ve defined it.
* user_location - The user-defined location for this account’s profile.
* user_description - The user-defined UTF-8 string describing their account.
* user_created - Time and date, when the account was created.
* user_followers - The number of followers an account currently has.
* user_friends - The number of friends an account currently has.
* user_favourites - The number of favorites an account currently has
* user_verified - When true, indicates that the user has a verified account
* date -  UTC time and date when the Tweet was created
* text - The actual UTF-8 text of the Tweet
* hashtags - All the other hashtags posted in the tweet along with #IndiaWantsOxygen
* source - Utility used to post the Tweet, Tweets from the Twitter website have a source value - web
* is_retweet - Indicates whether this Tweet has been Retweeted by the authenticating user.

## Importing Requred Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.figure_factory as ff
from collections import Counter
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import warnings
import nltk
from nltk.corpus import stopwords
warnings.filterwarnings("ignore")

## Preprocessing

In [None]:
tweet=pd.read_csv('../input/indianeedsoxygen-tweets/IndiaWantsOxygen.csv')
tweet.head()

In [None]:
tweet.info()

In [None]:
tweet.shape

In [None]:
tweet.isnull().sum()

In [None]:
# Dropping Null Values
tweet.dropna(inplace=True)

In [None]:
tweet.isnull().sum()

**successfully dropped all the null values**

In [None]:
len(tweet.user_name.unique())

In [None]:
tweet.describe().T


In [None]:
tweet['user_location'].nunique()

In [None]:
tweet['user_location']=tweet['user_location'].apply(lambda x: x.split("/")[0])
tweet['user_location']=tweet['user_location'].apply(lambda x: x.split(",")[0])
tweet['user_location'].head()

In [None]:
tweet['user_location'].nunique()

In [None]:
import re
import string
def clean_text(text):
    '''Make text lowercase, remove text in square brackets,remove links,remove punctuation
    and remove words containing numbers.'''
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    return text

In [None]:
tweet['text'] = tweet['text'].apply(lambda x:clean_text(x))

In [None]:
tweet['text'].head()

# EDA

In [None]:
from collections import Counter
tweet['temp_list'] = tweet['text'].apply(lambda x:str(x).split())
top = Counter([item for sublist in tweet['temp_list'] for item in sublist])
temp = pd.DataFrame(top.most_common(20))
temp.columns = ['Common_words','count']
temp.style.background_gradient(cmap='Blues')

**Observation**: The Most Common word used in tweet is indianeedsoxygen 

In [None]:
fig = px.bar(temp, x="count", y="Common_words", title='Commmon Words in Selected Text', orientation='h', 
             width=800, height=700,color='Common_words')
fig.show()

In [None]:
# Removing stopwords
def remove_stopword(x):
    return [y for y in x if y not in stopwords.words('english')]
tweet['temp_list'] = tweet['temp_list'].apply(lambda x:remove_stopword(x))

In [None]:
top = Counter([item for sublist in tweet['temp_list'] for item in sublist])
temp = pd.DataFrame(top.most_common(20))
temp = temp.iloc[1:,:]
temp.columns = ['Common_words','count']
temp.style.background_gradient(cmap='Purples')

In [None]:
fig = px.treemap(temp, path=['Common_words'], values='count',title='Tree of Most Common Words')
fig.show()

In [None]:
tweet.user_location.value_counts().head(15)

In [None]:
x= tweet.user_location.value_counts().head(15)
plt.figure(figsize= (10,7))
sns.set_style("whitegrid")
ax= sns.barplot(x.values,x.index)
ax.set_xlabel("No of tweets")
ax.set_ylabel("Locations")
plt.show()

#### **Observation :** Most of the tweets are from their neighbor Country Pakistan

**Observation:** 98% of the User's are Verified User

In [None]:
tweet["date"]= pd.to_datetime(tweet.date)
tweet.date= tweet.date.apply(lambda x: str(x).split(" ")[0])
tweet.date.head()

In [None]:
x= tweet.groupby("date").date.count()
plt.figure(figsize= (15,7))
sns.set_style("whitegrid")
ax= sns.lineplot(x.index,x.values)
ax.set_xlabel("date")
ax.set_ylabel("No of tweets")
plt.title('Number of tweets over time')
plt.show()

In [None]:
tweet.source.value_counts()
plt.figure(figsize= (15,7))
ax= sns.countplot(x= "source",data= tweet)
plt.xticks(rotation=90)
plt.title('Devices used for Tweet')
plt.show()

**Observation:** Most of tweets are from Android Device

In [None]:
x= tweet.user_verified.value_counts()
plt.figure(figsize= (15,7))
labels=("Verified","Non verified")
plt.pie(x,labels= labels,autopct= "%1.1f%%")
plt.show()

**Observation:** 98.4% are Verfied Twitter User's

In [None]:
x= tweet.is_retweet.value_counts()
tweet.user_name.value_counts().head(20)


In [None]:
x= tweet.user_name.value_counts().head(20)
plt.figure(figsize= (7,10))
ax= sns.barplot(x.values,x.index)
ax.set_xlabel("No of tweets")
ax.set_ylabel("Usernames")
plt.show()

## It's Time For WordClouds

In [None]:
#unique  Location wordcloud Visualization
tweet= tweet[pd.notnull(tweet["user_name"])]
tweet.user_location.unique()
x = tweet[pd.notnull(tweet["user_location"])]
from wordcloud import WordCloud, STOPWORDS
plt.figure(figsize= (20,20))
words= "".join(str(x["user_location"]))
final = WordCloud(width = 2000, height = 800, background_color ="black",min_font_size = 10).generate(words)
plt.imshow(final)
plt.axis("off") 
plt.show()   

In [None]:
# wordcloud visualisation of usernames 
plt.figure(figsize= (20,20))
words= "".join(tweet["user_name"])
final = WordCloud(width = 2000, height = 800, background_color ="black",min_font_size = 10).generate(words)
plt.imshow(final)
plt.axis("off") 
plt.show()   
     

In [None]:
# Wordcloud visualization for #Hastag
plt.figure(figsize= (20,20))
words= "".join(tweet["hashtags"])
final = WordCloud(width = 2000, height = 800, background_color ="black",min_font_size = 10).generate(words)
plt.imshow(final)
plt.axis("off") 
plt.show()   

In [None]:
# Wordcloud visualization for text column
plt.figure(figsize= (20,20))
words= "".join(tweet["text"])
final = WordCloud(width = 2000, height = 800, background_color ="black",min_font_size = 10).generate(words)
plt.imshow(final)
plt.axis("off") 
plt.show()   
     

<h2 style="color:red"><b>STAY TUNED!</b></h2>


##### <b><p style="color:blue">I hope you Liked my kernel. An upvote is a gesture of appreciation and encouragement that fills me with energy to keep improving my efforts. 👍</p></b>