# Keyword Analysis - Texas

This notebook analyses the tweets scrapped using the keywords ``northamericawinterstorm``, ``winterstormtexas``, ``texasblizzard`` and `iceagetexas`. The objective of this notebook is to identify relevant keywords associated to Texas Blizzard. These tweets are scrapped for time period between ``2021-02-10`` and ``2021-08-10``.

In [1]:
# Import library to display plot
from IPython.display import display, HTML
import emoji

## 1. Tweets at a glance
First of all, number of tweets, retweets, etc. made by verified and unverified accounts are checked in the dataset.

In [2]:
display(HTML('verified_proportion_overall.html'))
print("Hover over the pie to see the count. Single click on the pie to select.")

Hover over the pie to see the count. Single click on the pie to select.


Lastly, following is the distribution of languages of tweets in the dataset. Most of the tweets are made in English (93% approx.) followed by Spanish and Dutch.

In [3]:
display(HTML('languages.html'))
print("Hover over the plot to see the count.")

Hover over the plot to see the count.


In [4]:
display(HTML('hashtag_languages.html'))
print("Hover over the plot to see the count.")

Hover over the plot to see the count.


In [5]:
display(HTML('context_languages.html'))
print("Hover over the plot to see the count.")

Hover over the plot to see the count.


## 2. Hashtags and context of tweets
Hashtags are used by users to associate a tweet with a topic on social media. However, users often use unrelated hashtags to gain relevance or get more impression on their tweets. Therefore, along with hashtags, context of the tweet is also analysed using context annotations. Context annotations are generated by twitter's NLP algorithm to provide a context to the tweet.

**Following are the common hashtags observed in the dataset.**

<br>
<img src='hashtags_wordcloud.png' alt='word cloud' width='800' title='hashtag word cloud'>
<br>

Some notable hashtags are `energypoweroutage`, `blackout`, `ercot`, `texasblackout`,`winterstorm`, `texaswinterstorm`,`texasblizzard`.

**Following are the popular context annotations in the dataset.**

<br>
<img src='context_annotations_wordcloud.png' alt='word cloud' width='800' title='hashtag word cloud'>
<br>

Most of the context shown above seems to be relevant for the topic.

However, topic trends, context and hashtags keep changing with time. Therefore, it is important to track the usage of hashtag and context with time. Following are top ten trending hashtags and their relevance over the time.

In [6]:
display(HTML('hashtags.html'))
print("Hover over the bars to see the hashtag count. Single click on the legend to switch on and off. Double click to hide others.")

Hover over the bars to see the hashtag count. Single click on the legend to switch on and off. Double click to hide others.


The most common hashtags based on above figure are `texasfreeze`,`txlege`,`snowstorm`,`winter`,`texasblizzard`,`texas`, and `winterstorm`. Moreover, tweets should be collected from a prior date.

Since hashtags alone are not sufficient to understand the context of tweet, following are top ten tweet contexts in the dataset.

In [7]:
from IPython.display import display, HTML
display(HTML('context_annotations.html'))
print("Hover over the bars to see the context count. Single click on the legend to switch on and off. Double click to hide others.")

Hover over the bars to see the context count. Single click on the legend to switch on and off. Double click to hide others.


## 3. Relevant hashtags related to the event
To identify relevant hashtags, relevant context annotations in the dataset are identified. In this case, relevant context annotations are ``ted cruz, greg abbott, theweatherchannel, global environmental issues, politics, science, political figures, weather``.

After identifying the relevant context annotations, hashtags used in the tweets having relevant context are marked.
Following are the hashtags used in the tweets having relevant context.

<br>
<img src='most_relevant_hashtags.png' alt='most_relevant_hashtags' width='800' title='most_relevant_hashtags'>
<br>

Based on the above figure, following hashtags can be included to the search query ``tornado``, ``polarair``, ``weatherchannel``.

## 4. Verified users and top tweets
Apart from hashtags used in most relevant tweets (based on the context), hashtags are often brought into trend by popular users and viral tweets.

Following is the proportion of verified user tweets in most popular hashtags in the dataset. Based on the proportion of verified user engagement, ``txlege, winter, texasfreeze, cowx, winterstorm, nlwx`` can be added to the search query.

<br>
<img src='hashtags_verified.png' alt='most_relevant_hashtags' width='800' title='most_relevant_hashtags'>
<br>

Other than hashtags, it is important to look into the proportion of verified users tweeting about popular contexts in the dataset.

<br>
<img src='context_annotations_verified.png' alt='most_relevant_hashtags' width='800' title='most_relevant_hashtags'>
<br>

Most of the verified users are tweeting about weather and political figures.

Following are the popular users with most number of tweets in the dataset. This list comprises news media and celebrity accounts. Hover over the bars to see the count and hashtags.

In [8]:
display(HTML('popular_users.html'))
print(" Single click on the legend to switch on and off. Double click to hide others.")

 Single click on the legend to switch on and off. Double click to hide others.


Following are the hashtags used by most popular users in the dataset.

In [9]:
display(HTML('hashtags_popular_users.html'))
print(" Single click on the legend to switch on and off. Double click to hide others.")

 Single click on the legend to switch on and off. Double click to hide others.


## 5. Conclusion

This analysis helped in understanding the dataset for tweets about Texas Blizzard including the hashtags and context of tweets. The relevant hashtags for further collecting tweets is identified through relevant context and use of hashtags by popular users.

Based on this analysis, following keywords will be used to collect further tweets related to Texas Blizzard.

`txlege`, `winter`, `texasfreeze`, `cowx`, `IAwz`, `NEwx`, `SDwx`, `NDwx`, `WYwx`, `KSwx`, `MNwx`, , `winterstorm`, `nlwx` ,`snowstorm`,`texasblizzard`,`texas`, `ercot`, `texasblackout`,`texasblizzard`.

Tweets will be collected from ``2021-01-01`` to ``2021-12-31``.