# Keyword Analysis - Australia

This notebook analyses the tweets scrapped using the keywords ``australianbushfires``, ``australianforestfire``, ``australiafire`` and helps in identifying relevant keywords associated to Australian bush fires. These tweets are scrapped for time period between ``2020-01-01`` and ``2021-01-31``.

In [None]:
# Import library to display plot
from IPython.display import display, HTML
import emoji
import plotly.offline as pyo
import plotly.graph_objs as go
# Set notebook mode to work in offline
pyo.init_notebook_mode()

## 1. Tweets at a glance
First of all, number of tweets, retweets, etc. made by verified and unverified accounts are checked in the dataset.

In [None]:
display(HTML('verified_proportion_overall.html'))
print("Hover over the pie to see the count. Single click on the pie to select.")

As shown above, the majority of tweets in the pool (around 67%) are retweets and only ``21% tweets are original``. Additionally, most of the tweets are made by unverified users and ``around 5% tweets in the dataset are made by verified users``.

To check whether the keywords successfully encompass the global trend, tweets with available geo location are plotted on the following map. Out of 5000, only 42 tweets contain geolocation data. Although the sample size is relatively small, it shows that the hashtags selected successfully captures the local and global tweets.

In [None]:
display(HTML('_tweet_location.html'))
print("Hover over the plot to see the hashtags used.")

Lastly, following is the distribution of languages of tweets in the dataset. Most of the tweets are made in English (87% approx.) followed by German and Hindi.

In [None]:
display(HTML('_languages.html'))
print("Hover over the plot to see the count.")

In [None]:
display(HTML('hashtag_languages.html'))
print("Hover over the plot to see the count.")

In [None]:
display(HTML('context_languages.html'))
print("Hover over the plot to see the count.")

## 2. Hashtags and context of tweets
Hashtags are used by users to associate a tweet with a topic on social media. However, users often use unrelated hashtags to gain relevance or get more impression on their tweets. Therefore, along with hashtags, context of the tweet is also analysed using context annotations. Context annotations are generated by twitter's NLP algorithm to provide a context to the tweet.

**Following are the common hashtags observed in the dataset.**

<br>
<img src='hashtags_wordcloud.png' alt='word cloud' width='800' title='hashtag word cloud'>
<br>

Some notable hashtags are ``australiabushfire``, ``climatereaction``. Hashtags targeting mining industry (adani) -> `stopadani` fossil fuels -> `dumpfossilfuel` climate activism -> `fridaysforfuture` and `fireecology`.

**Following are the popular context annotations in the dataset.**

<br>
<img src='context_annotations_wordcloud.png' alt='word cloud' width='800' title='hashtag word cloud'>
<br>

Most of the context shown above seems to be relevant for the topic.

However, topic trends, context and hashtags keep changing with time. Therefore, it is important to track the usage of hashtag and context with time. Following are top ten trending hashtags and their relevance over the time.

In [None]:
display(HTML('hashtags.html'))
print("Hover over the bars to see the hashtag count. Single click on the legend to switch on and off. Double click to hide others.")

The most common hashtags based on above figure are `australiabushfire`,`tieredearth`,`australiaburning`,`australiaonfire`,`auspol`,`australianfire`, and `australia`. Moreover, tweets should be collected from a prior date.

Since hashtags alone are not sufficient to understand the context of tweet, following are top ten tweet contexts in the dataset.

In [None]:
from IPython.display import display, HTML
display(HTML('context_annotations.html'))
print("Hover over the bars to see the context count. Single click on the legend to switch on and off. Double click to hide others.")

## 3. Relevant hashtags related to the event
To identify relevant hashtags, relevant context annotations in the dataset are identified. In this case, relevant context annotations are ``political issues, global environmental issues, politics, science, rain relieves eastern australian fires as west prepares for extreme heat wave``.

After identifying the relevant context annotations, hashtags used in the tweets having relevant context are marked.
Following are the hashtags used in the tweets having relevant context.

<br>
<img src='most_relevant_hashtags.png' alt='most_relevant_hashtags' width='800' title='most_relevant_hashtags'>
<br>

Based on the above figure, following hashtags can be included to the search query ``australia``, ``australiabushfire``, ``dumpfossilfule``, ``australiafire``, ``auspol``.

## 4. Verified users and top tweets
Apart from hashtags used in most relevant tweets (based on the context), hashtags are often brought into trend by popular users and viral tweets.

Following is the proportion of verified user tweets in most popular hashtags in the dataset. Most of the shortlisted hashtags here have 5% tweets made by verified users which is same as the proportion of verified user's tweets in the datset.

<br>
<img src='hashtags_verified.png' alt='most_relevant_hashtags' width='800' title='most_relevant_hashtags'>
<br>

Other than hashtags, it is important to look into the proportion of verified users tweeting about popular contexts in the dataset.

<br>
<img src='context_annotations_verified.png' alt='most_relevant_hashtags' width='800' title='most_relevant_hashtags'>
<br>

Following are the popular users with most number of tweets in the dataset. This list comprises news media and celebrity accounts. Hover over the bars to see the count and hashtags.

In [None]:
display(HTML('popular_users.html'))
print(" Single click on the legend to switch on and off. Double click to hide others.")

Following are the hashtags used by most popular users in the dataset.

<br>
<img src='hashtags_top_verified_users.png' alt='most_relevant_hashtags' width='800' title='most_relevant_hashtags'>
<br>

One noteable hashtags here is `nswfires` targeting the North South Wales region of Australia and will be included to the search query.

Following is the most retweeted tweet authored by `_tisyia` with retweet count of ``14184 🔁``
 > RT @usmandevries: Muslims in Australia performed *Salat Ul  Istasqa" (a special prayers for the rain) &amp; it rained in Australia now ALHAMDUL…


 Following is the most liked and replied tweet authored by ``Elizabeth_Ruler`` with retweet count of ``560 🔁``, ``34 ❤️`` likes and ``29 ↩️`` quotes.
> After the devastating bushfires Australia is coming back to life...
https://t.co/vR3KsN6OKm
#Tiredearth #Australia #AustraliaBurning #AustraliaBushfires #australiafire #ClimateChange #ClimateCrisis



 Following is the tweet with most impressions authored by ``darla_reeves `` 👁️
> RT @DouglasReeves: Want to help Australia?  My friend John Hattie @john_hattie  recommends this as the place to give:  https://t.co/3m9AodA…

## 5.Conclusion

This analysis helped in understanding the dataset for tweets about Australian Bush Fire including the hashtags and context of tweets. The relevant hashtags for further collecting tweets is identified through relevant context and use of hashtags by popular users.

Based on this analysis, following keywords will be used to collect further tweets related to Australian bush fire.

``australia``, ``australiabushfire``, ``dumpfossilfuel``, ``australiafire``, `stopadani`,``auspol``, `nswfires`

Tweets will be collected from ``2019-06-01`` to ``2020-06-31``.