#### **LSE Data Analytics Online Career Accelerator** 
#### Course 201: Data Analytics with Python

## Assignment template: Covid data

## Student Note
This template can be used to better understand the suggested workflow and how to approach the questions. You are welcome to add code and markdown blocks to the various sections to add either code or comments. Make sure to add code cells as applicable, and to comment all your code blocks.

You have the option to populate your notebook with all the elements typically contained within the report, or to submit a separate report. In the case of submitting your notebook, you can embed images, links and text where appropriate in addition to the text notes added.

**SPECIAL NOTE**
- Submit your Jupyter Notebook with the following naming convention: `LSE_DA201_assignment_[your name]_[your surname]`
- You should submit a zipped folder containing all the elements used in your notebook (data files, images, etc)

> ***Markdown notes:*** Remember to change cell types to `Markdown` and take a look here: [Markdown basics](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) to add formatted text, links and images to your markdown documents.

### 0) Environment preparation
These settings are provided for you. You do not need to make any changes.

In [1]:
# Import the required libraries and set plotting options
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(rc = {'figure.figsize':(15,10)})

### 1) Assignment activity 1: [Prepare your GitHub repository](https://fourthrev.instructure.com/courses/313/pages/assignment-activity-1-prepare-your-github-repository?module_item_id=20647)

#### 1.1) Report/notebook expectations:
- Demonstrate your GitHub setup consisting of the load and push updates of all the Jupyter Notebook files. (**Hint**: Make sure that your github username, the directory structure and updates are visible in the screenshot. Make sure to provide a zipped copy of the folder containing your submission notebook as well as supporting files such as images used in the notebook.)

#### Required: Report submission:
Insert URL to your public GitHub repository and a screenshot (double click cell to edit)
- [My Github Repo](https://github.com/username/reponame)
- Screenshot demo (replace with your own)

!['My Github screenshot](http://github.com/apvoges/lse-ca/blob/main/GitHubScreenshot.png?raw=true)
(Note that this only works if your repo is set to **public**. Alternatively you need to refer to a local image and include this image in your submission.)

#### 1.2) Presentation expectations:
- Describe the role and how workflow tools such as Github can be used to add value to organisations.

#### Optional for notebook/Required for presentation.
- You can use this cell as placeholder for bulletpoints to include in your presentation. 
- Note that this section will not be graded in the notebook, grades awarded based on presentation content only. 

(Double-click to edit)

### 2) Assignment activity 2: [Import and explore data](https://fourthrev.instructure.com/courses/313/pages/assignment-activity-2-import-and-explore-the-data?module_item_id=20648)

#### 2.1) Report expectations:
- Load the files `covid_19_uk_cases.csv` and `covid_19_uk_vaccinated.csv` and explore the data.
- Explore the data using the `info()`, `describe()`, `shape` and `value_counts()` methods, and note the observations regarding data types, number or records and features
- Identify missing data
- Filter/subset data
- Aggregate data (totals and by month)
- Note observations

In [2]:
# Load the covid cases and vaccine data sets as cov and vac respectively

In [None]:
cov = 
vac = 

In [None]:
# Explore the DataFrames with the appropriate functions

In [None]:
# Create DataFrame based on Gibraltar data
# Hint: newdf = df[df[col]==index]

In [None]:
# Explore behaviour over time

In [None]:
# Explore and note observations
# Are there any of the visualisations that could be added here to make it easier?

#### 2.2) Presentation expectations:
Use the process of exploring the data for Gibraltar as example to provide a brief description of the various phases to help your team to understand the process. Keep it high level and make sure to focus on both specifics relating to the case (first dose, second dose per region, total and over time) and brief observations regarding the process. Assignment two considers basic data exploration.
- Can we make decisions based on total numbers only, or do trends over time offer additional insights?
- Why it is important to explore the data, what are the typical mistakes made in this phase?

### 3) Assignment activity 3: [Merge and analyse the data](https://fourthrev.instructure.com/courses/313/pages/assignment-activity-3-merge-and-analyse-the-data?module_item_id=20649)

#### 3.1) Report expectations:
- Merge and explore the data
- Convert the data type of the Date column from object to datetime
- Create a dataset that meets the expected parameters
- Add calculated features to dataframes (difference between first and second dose vaccinations)
- Filter and sort output
- Observe totals and percentages as a total and over time
- Note observations

Merge the DataFrames without duplicating columns. The new DataFrame (e.g. `covid`) will have `7584` rows and the following columns: Province/State, Country/Region, Date, Vaccinated, First Dose, Second Dose, Deaths, Cases, Recovered, Hospitalised.

In [None]:
# Join the DataFrames as covid where you merge cov and vac

In [None]:
# Explore the new DataFrame

In [None]:
# Fix the date column data type

In [None]:
# Clean up / drop unnecessary columns 

In [None]:
# Groupby and calculate difference between first and second dose

In [None]:
# Groupby and calculate difference between first and second dose over time

#### 3.2) Presentation expectations:
We use similar calculations and representations as we had in activity 2, but now expand to look at all provinces. Assignment 3 is concerned with exploring data in the context of a specific business question (as opposed to general exploration in assignment 2).
- What insights can be gained from the data? (Description of all regions, assumptions and concerns, trends or patterns you have observed.)
- Are there limitations or assumptions that needs to be considered?
- Make sure to provide a brief overview of the data and typical considerations at this phase of analysis

### 4) Assignment activity 4: [Visualise and identify initial trends](https://fourthrev.instructure.com/courses/313/pages/assignment-activity-4-visualise-and-identify-initial-trends?module_item_id=21381)

The government is looking to promote second dose vaccinations and looking for the first area to test a new campaign. They are looking for the highest number of people who have received a first dose and not a second dose. 
- Where should they target?
- Which provinces have the highest number (actual numbers) and highest relative numbers (second dose only/first dose)
- Visualise both outputs.

#### 4.1) Report expectations:
- Consider additional features (deaths and recoveries)
- Visualise the data
- Note observations
 - Do deaths follow the same patterns observed in vaccination data (daily vs cumulative)?
 - Do we need to separate groups of data for specific variables and analyse them in isolation (Others) to be able to observe the patterns?

#### 4.2) Presentation expectations:
- What insights can be gained from the data?
- Why do we need to consider other features?
- **Note**: Different features evaluated to improve decision making (deaths and recoveries); why it is important to explore data and use different views; Highlight two or three suggestions to get junior team members started in terms of good practices

In [None]:
# Absolute numbers

In [None]:
# Relative numbers (%)

In [None]:
# Sort and display

In [None]:
# Visualise

In [None]:
# Let's smooth out the data by looking at monthly figures

In [None]:
# Other features evaluated (data preparation, output and plots)

***Notes and observations:***
Your observations here. (Double click to edit)

***Examples could include:***
- Are there other trends in terms of recoveries or hospitalisations compared to other features that you found interesting and that may add value in terms of the decision making process?
- Any other observations regarding the data?
- Any suggestions for improvements and further analysis?
- What would your future data requirements be?

### 5) Assignment activity 5: External data: [Analyse the Twitter data](https://fourthrev.instructure.com/courses/313/pages/assignment-activity-5-analyse-the-twitter-data?module_item_id=21383)
In the next section, you were supplied with a sample file and the question was asked to determine whether there are additional #tags or keywords that could potentially provide insights into your covid analysis. While the sample set is limited, you were asked to review the provided file and demonstrate the typical steps and make recommendations regarding future use of similar datasets to provide richer insights.

#### 5.1) Report expectations:
- Demonstrate basic ability to work with Twitter data
- Search for hash-tags or keywords
- Create dataframes and visualisations
- Note observations

In [4]:
# Import the tweet dataset (`tweets_2.csv`)
import pandas as pd

# assignment 5: import the Twitter data
tweets = pd.read_csv('tweets_2.csv')

# print the shape of each DataFrame
print(tweets.shape)

(100, 29)


In [5]:
# the period of time that the data were gathered
tweets.sort_values('created_at').copy()

# March 18 to Mar 20

Unnamed: 0,created_at,id,id_str,text,truncated,entities,metadata,source,in_reply_to_status_id,in_reply_to_status_id_str,...,retweet_count,favorite_count,favorited,retweeted,lang,possibly_sensitive,quoted_status_id,quoted_status_id_str,quoted_status,extended_entities
99,Fri Mar 18 10:47:26 +0000 2022,1504771451940974611,1504771451940974611,Right it couldn't happen here could it. Brexit...,True,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",,,...,2,5,False,False,en,False,,,,
98,Fri Mar 18 11:20:59 +0000 2022,1504779896786243585,1504779896786243585,#COVID19 was mentioned on the death certificat...,True,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""https://mobile.twitter.com"" rel=""nofo...",,,...,0,0,False,False,en,False,,,,
97,Fri Mar 18 11:31:50 +0000 2022,1504782625491333123,1504782625491333123,@joeldommett @ZoeTheBall @kylieminogue so all...,True,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",,,...,0,0,False,False,en,,,,,
96,Fri Mar 18 11:32:16 +0000 2022,1504782734924918830,1504782734924918830,"The sun is shining. The doors are open, it’s n...",True,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/iphone"" r...",,,...,0,0,False,False,en,,,,,
95,Fri Mar 18 11:43:35 +0000 2022,1504785582656008202,1504785582656008202,#CovidIsNotOver #COVID19 #coronavirus #Omicron...,True,"{'hashtags': [{'text': 'CovidIsNotOver', 'indi...","{'iso_language_code': 'und', 'result_type': 'r...","<a href=""http://twitter.com/download/iphone"" r...",,,...,1,0,False,False,und,False,1.504715e+18,1.504715e+18,{'created_at': 'Fri Mar 18 07:03:00 +0000 2022...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30,Sun Mar 20 16:36:22 +0000 2022,1505584040170205191,1505584040170205191,ONS swabs and bloods done today. I even manage...,True,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",,,...,0,1,False,False,en,False,,,,
29,Sun Mar 20 17:31:33 +0000 2022,1505597927074549766,1505597927074549766,Idiot anti-vax conspiracy theorists at Elk Mil...,True,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/iphone"" r...",,,...,0,1,False,False,en,False,,,,
28,Sun Mar 20 18:22:43 +0000 2022,1505610806297300996,1505610806297300996,After #Johnson comparing #Brexit to the #Ukrai...,True,"{'hashtags': [{'text': 'Johnson', 'indices': [...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",,,...,0,0,False,False,en,False,1.505588e+18,1.505588e+18,{'created_at': 'Sun Mar 20 16:52:22 +0000 2022...,
27,Sun Mar 20 20:57:05 +0000 2022,1505649654460997633,1505649654460997633,@Cella_dunn90 It's little to do with trade as ...,True,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",1.505538e+18,1.505538e+18,...,0,0,False,False,en,,,,,


In [6]:
tweets.info

<bound method DataFrame.info of                         created_at                   id               id_str  \
0   Mon Mar 21 21:45:28 +0000 2022  1506024218571464717  1506024218571464717   
1   Mon Mar 21 21:31:13 +0000 2022  1506020629849391104  1506020629849391104   
2   Mon Mar 21 19:04:53 +0000 2022  1505983803822592004  1505983803822592004   
3   Mon Mar 21 18:39:58 +0000 2022  1505977533841481731  1505977533841481731   
4   Mon Mar 21 18:23:08 +0000 2022  1505973299502850052  1505973299502850052   
..                             ...                  ...                  ...   
95  Fri Mar 18 11:43:35 +0000 2022  1504785582656008202  1504785582656008202   
96  Fri Mar 18 11:32:16 +0000 2022  1504782734924918830  1504782734924918830   
97  Fri Mar 18 11:31:50 +0000 2022  1504782625491333123  1504782625491333123   
98  Fri Mar 18 11:20:59 +0000 2022  1504779896786243585  1504779896786243585   
99  Fri Mar 18 10:47:26 +0000 2022  1504771451940974611  1504771451940974611   

       

In [7]:
# Explore the data: info(), head()
tweets.head()

Unnamed: 0,created_at,id,id_str,text,truncated,entities,metadata,source,in_reply_to_status_id,in_reply_to_status_id_str,...,retweet_count,favorite_count,favorited,retweeted,lang,possibly_sensitive,quoted_status_id,quoted_status_id_str,quoted_status,extended_entities
0,Mon Mar 21 21:45:28 +0000 2022,1506024218571464717,1506024218571464717,@Johnrashton47 When #diabetes has been the big...,True,"{'hashtags': [{'text': 'diabetes', 'indices': ...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",1.50566e+18,1.50566e+18,...,0,0,False,False,en,,,,,
1,Mon Mar 21 21:31:13 +0000 2022,1506020629849391104,1506020629849391104,Disturbing figures from @fsb_policy @indparltr...,True,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",,,...,4,3,False,False,en,False,,,,
2,Mon Mar 21 19:04:53 +0000 2022,1505983803822592004,1505983803822592004,NEW: #Stormont MLAs voted 57 ~ 25 to extend Do...,True,"{'hashtags': [{'text': 'Stormont', 'indices': ...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/iphone"" r...",,,...,2,9,False,False,en,False,,,,
3,Mon Mar 21 18:39:58 +0000 2022,1505977533841481731,1505977533841481731,I'm do sick of coming on twitter to see the sa...,True,"{'hashtags': [{'text': 'borisOut', 'indices': ...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",,,...,0,0,False,False,en,,,,,
4,Mon Mar 21 18:23:08 +0000 2022,1505973299502850052,1505973299502850052,The rollout of new #COVID19 #Booster jabs to #...,True,"{'hashtags': [{'text': 'COVID19', 'indices': [...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""https://mobile.twitter.com"" rel=""nofo...",,,...,0,1,False,False,en,False,,,,


In [8]:
# view the DataFrame of tweets
print(tweets.shape)
print(tweets.dtypes)
print(tweets.columns)
tweets.head()

print("----------tweets DataFrame----------")
print(tweets.isnull().sum())

(100, 29)
created_at                    object
id                             int64
id_str                         int64
text                          object
truncated                       bool
entities                      object
metadata                      object
source                        object
in_reply_to_status_id        float64
in_reply_to_status_id_str    float64
in_reply_to_user_id          float64
in_reply_to_user_id_str      float64
in_reply_to_screen_name       object
user                          object
geo                           object
coordinates                   object
place                         object
contributors                 float64
is_quote_status                 bool
retweet_count                  int64
favorite_count                 int64
favorited                       bool
retweeted                       bool
lang                          object
possibly_sensitive            object
quoted_status_id             float64
quoted_status_id_str        

In [9]:
# drop the unnecessary columns
tweets.drop(columns=['created_at', 'id', 'id_str', 'truncated', 'entities',
                     'metadata', 'source', 'in_reply_to_status_id',
                     'in_reply_to_status_id_str', 'in_reply_to_user_id',
                     'in_reply_to_user_id_str', 'in_reply_to_screen_name', 
                     'user', 'geo', 'coordinates', 'place', 'contributors',
                     'is_quote_status', 'favorited', 'retweeted', 'lang',
                     'possibly_sensitive', 'quoted_status_id', 'quoted_status_id_str',
                     'quoted_status', 'extended_entities'],
            inplace=True)

tweets.head()

Unnamed: 0,text,retweet_count,favorite_count
0,@Johnrashton47 When #diabetes has been the big...,0,0
1,Disturbing figures from @fsb_policy @indparltr...,4,3
2,NEW: #Stormont MLAs voted 57 ~ 25 to extend Do...,2,9
3,I'm do sick of coming on twitter to see the sa...,0,0
4,The rollout of new #COVID19 #Booster jabs to #...,0,1


In [10]:
def contains_vaccine(x):
    """ does the text contain vaccine? """
    y = x.lower()
    return "vaccine" in y

print(contains_vaccine(x='vaccine'))

cv = tweets['text'].apply(contains_vaccine)

tweets[cv]

True


Unnamed: 0,text,retweet_count,favorite_count
21,It’s never too late to get your vaccine. You c...,2,2
56,"@theblade113 Hi @theblade113 , you are right, ...",0,3


In [11]:
def contains_covid(x):
    """ does the text contain COVID? """
    y = x.lower()
    return "covid" in y

print(contains_covid(x='covid'))

cc = tweets['text'].apply(contains_covid)

tweets[cc]

True


Unnamed: 0,text,retweet_count,favorite_count
0,@Johnrashton47 When #diabetes has been the big...,0,0
2,NEW: #Stormont MLAs voted 57 ~ 25 to extend Do...,2,9
3,I'm do sick of coming on twitter to see the sa...,0,0
4,The rollout of new #COVID19 #Booster jabs to #...,0,1
5,Trying to keep a mask in inside to see if we c...,0,2
...,...,...,...
94,Spliced alignments exercise 😃#RNAseq #training...,1,5
95,#CovidIsNotOver #COVID19 #coronavirus #Omicron...,1,0
96,"The sun is shining. The doors are open, it’s n...",0,0
97,@joeldommett @ZoeTheBall @kylieminogue so all...,0,0


In [12]:
import re


In [13]:
tweettext = tweets['text']

print(type(tweettext))

<class 'pandas.core.series.Series'>


In [14]:
tweettext = str(tweettext)

In [16]:
tweettext

"0     @Johnrashton47 When #diabetes has been the big...\n1     Disturbing figures from @fsb_policy @indparltr...\n2     NEW: #Stormont MLAs voted 57 ~ 25 to extend Do...\n3     I'm do sick of coming on twitter to see the sa...\n4     The rollout of new #COVID19 #Booster jabs to #...\n                            ...                        \n95    #CovidIsNotOver #COVID19 #coronavirus #Omicron...\n96    The sun is shining. The doors are open, it’s n...\n97    @joeldommett @ZoeTheBall @kylieminogue  so all...\n98    #COVID19 was mentioned on the death certificat...\n99    Right it couldn't happen here could it. Brexit...\nName: text, Length: 100, dtype: object"

https://regex101.com/r/wP6yX5/1

In [17]:
pat = r'(#+[a-zA-Z0-9(_)]{1,})'

hashtags = tweets['text'].str.findall(pat, flags=0)

hashtags

0                                 [#diabetes, #COVID19]
1                                                    []
2                                 [#Stormont, #covid19]
3        [#borisOut, #BrexitDisaster, #Trans, #COVID19]
4         [#COVID19, #Booster, #vulnerable, #IsleofMan]
                            ...                        
95    [#CovidIsNotOver, #COVID19, #coronavirus, #Omi...
96                                                   []
97                                                   []
98                         [#COVID19, #NorthernIreland]
99                                                   []
Name: text, Length: 100, dtype: object

In [20]:
clean = pd.DataFrame(hashtags, columns = ['text'])

clean

Unnamed: 0,text
0,"[#diabetes, #COVID19]"
1,[]
2,"[#Stormont, #covid19]"
3,"[#borisOut, #BrexitDisaster, #Trans, #COVID19]"
4,"[#COVID19, #Booster, #vulnerable, #IsleofMan]"
...,...
95,"[#CovidIsNotOver, #COVID19, #coronavirus, #Omi..."
96,[]
97,[]
98,"[#COVID19, #NorthernIreland]"


In [23]:
clean['text'].value_counts()

TypeError: unhashable type: 'list'

Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas\_libs\hashtable_class_helper.pxi", line 5231, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'list'


[]                                                                                                                             29
[#COVID19]                                                                                                                     27
[#covid19]                                                                                                                      2
[#diabetes, #COVID19]                                                                                                           1
[#BorisJohnson, #covid19]                                                                                                       1
[#Russia, #Ukraine, #Brexit]                                                                                                    1
[#Covid_19, #COVID19, #COVID, #sleep, #insomnia]                                                                                1
[#Uber, #COVID19]                                                                         

In [25]:
import numpy as np

test = np.array(hashtags)

test

array([list(['#diabetes', '#COVID19']), list([]),
       list(['#Stormont', '#covid19']),
       list(['#borisOut', '#BrexitDisaster', '#Trans', '#COVID19']),
       list(['#COVID19', '#Booster', '#vulnerable', '#IsleofMan']),
       list(['#COVID19']), list([]),
       list(['#breaking', '#ShanghaiDisneyland', '#COVID19', '#China']),
       list([]), list([]), list(['#COVID19', '#StPatricksDay']),
       list(['#Television', '#University', '#UniversityofWestminster', '#competition']),
       list(['#COVID19', '#hmrc', '#banking']),
       list(['#COVID19', '#CovidIsNotOver']), list([]), list([]),
       list(['#hospital', '#COVID19', '#Devon']),
       list(['#COVID19', '#Scotland']), list(['#carehomes', '#immune']),
       list([]), list([]),
       list(['#COVID19', '#covid', '#CovidVaccination']),
       list(['#COVID19']), list(['#HongKong', '#COVID19']),
       list(['#covid19']), list([]), list(['#Erasmus', '#COVID19']),
       list([]), list(['#Johnson', '#Brexit', '#Ukraine', 

In [26]:
def flatmatrix(matrix):
    result = []
    for i in range(len(matrix)):
        result.extend(matrix[i])
    print(result)

flatmatrix(test)

['#diabetes', '#COVID19', '#Stormont', '#covid19', '#borisOut', '#BrexitDisaster', '#Trans', '#COVID19', '#COVID19', '#Booster', '#vulnerable', '#IsleofMan', '#COVID19', '#breaking', '#ShanghaiDisneyland', '#COVID19', '#China', '#COVID19', '#StPatricksDay', '#Television', '#University', '#UniversityofWestminster', '#competition', '#COVID19', '#hmrc', '#banking', '#COVID19', '#CovidIsNotOver', '#hospital', '#COVID19', '#Devon', '#COVID19', '#Scotland', '#carehomes', '#immune', '#COVID19', '#covid', '#CovidVaccination', '#COVID19', '#HongKong', '#COVID19', '#covid19', '#Erasmus', '#COVID19', '#Johnson', '#Brexit', '#Ukraine', '#Russia', '#COVID19', '#Partygate', '#COVID19', '#COVID19', '#covid19', '#RecoveryTrial', '#COVID19', '#COVID19', '#vaccinations', '#testingpositive', '#COVID19', '#coronavirus', '#covid19UK', '#COVID19', '#COVID19', '#lostsmellandtaste', '#COVID19', '#covid19', '#friday20thmarch2020', '#marinadalglishcentre', '#China', '#COVID19', '#COVID19', '#COVID19', '#Russia'

AttributeError: 'NoneType' object has no attribute 'strip'

In [28]:
from collections import Counter

flat_text = flatmatrix(test)

word_counts = Counter(flat_test)
# 出現頻率最高的3個單詞
top_three = word_counts.most_common(3)
print(top_three)
# 原文網址：https://itw01.com/2UIXEFU.html

['#diabetes', '#COVID19', '#Stormont', '#covid19', '#borisOut', '#BrexitDisaster', '#Trans', '#COVID19', '#COVID19', '#Booster', '#vulnerable', '#IsleofMan', '#COVID19', '#breaking', '#ShanghaiDisneyland', '#COVID19', '#China', '#COVID19', '#StPatricksDay', '#Television', '#University', '#UniversityofWestminster', '#competition', '#COVID19', '#hmrc', '#banking', '#COVID19', '#CovidIsNotOver', '#hospital', '#COVID19', '#Devon', '#COVID19', '#Scotland', '#carehomes', '#immune', '#COVID19', '#covid', '#CovidVaccination', '#COVID19', '#HongKong', '#COVID19', '#covid19', '#Erasmus', '#COVID19', '#Johnson', '#Brexit', '#Ukraine', '#Russia', '#COVID19', '#Partygate', '#COVID19', '#COVID19', '#covid19', '#RecoveryTrial', '#COVID19', '#COVID19', '#vaccinations', '#testingpositive', '#COVID19', '#coronavirus', '#covid19UK', '#COVID19', '#COVID19', '#lostsmellandtaste', '#COVID19', '#covid19', '#friday20thmarch2020', '#marinadalglishcentre', '#China', '#COVID19', '#COVID19', '#COVID19', '#Russia'

NameError: name 'flat_test' is not defined

In [13]:
# Explore the structure, count the tweets, get the elements of interest


In [None]:
# Create a dataframe with the text only


In [None]:
# Loop through the messages and build a list of values containing the #-symbol


In [None]:
# Filter and sort


In [None]:
# Plot


#### 5.2) Presentation expectations:
Discuss whether external data could potentially be used and whether it is a viable solution to pursue. Discuss your assumptions and suggestions. 

Points to consider:
- What insights can be gained from the data?
- What are the advantages and disadvantages of using external data?
- How would you suggest using external data in the project?

### 6) Assignment activity 6: [Perform time-series analysis](https://fourthrev.instructure.com/courses/313/pages/assignment-activity-6-perform-time-series-analysis?module_item_id=22584)

#### 6.1) Report expectations:
- Demonstrate using external function and interpret results
- Note observations

In [None]:
# You can copy and paste the relevant code cells from the provided template here.

#### 6.2) Presentation expectations:
- **Question 1**: We have heard of both qualitative and quantitative data from the previous consultant. What are the differences between the two? Should we use only one or both of these types of data and why? How can these be used in business predictions? Could you provide examples of each?
- **Question 2**: We have also heard a bit about the need for continuous improvement. Why should this be implemented, it seems like a waste of time. Why can’t we just implement the current project as it stands and move on to other pressing matters?
- **Question 3**: As a government, we adhere to all data protection requirements and have good governance in place. We only work with aggregated data and therefore will not expose any personal details. We have covered everything from a data ethics standpoint, correct? There’s nothing else we need to implement from a data ethics perspective, right?