## Ideas to do for project

- General analysis: time series forecasting models, Kmeans clustering


- Twitter covid analysis: 
+ Visualize sentiment based on data available, looking at articles
+ Predict case
+ How a social media monitoring dashboard would look like for COVID
+ How it would look like for a crisis management

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

from sklearn.preprocessing import OrdinalEncoder
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.feature_selection import RFE

from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

## Some datasets to look at

In [10]:
covid = pd.read_csv('dataset/owid-covid-data.csv')

covid

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,population,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-01-03,,0.0,,,0.0,,...,,37.746,0.5,64.83,0.511,41128772.0,,,,
1,AFG,Asia,Afghanistan,2020-01-04,,0.0,,,0.0,,...,,37.746,0.5,64.83,0.511,41128772.0,,,,
2,AFG,Asia,Afghanistan,2020-01-05,,0.0,,,0.0,,...,,37.746,0.5,64.83,0.511,41128772.0,,,,
3,AFG,Asia,Afghanistan,2020-01-06,,0.0,,,0.0,,...,,37.746,0.5,64.83,0.511,41128772.0,,,,
4,AFG,Asia,Afghanistan,2020-01-07,,0.0,,,0.0,,...,,37.746,0.5,64.83,0.511,41128772.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
330827,ZWE,Africa,Zimbabwe,2023-07-29,265693.0,,,5712.0,1.0,0.143,...,30.7,36.791,1.7,61.49,0.571,16320539.0,,,,
330828,ZWE,Africa,Zimbabwe,2023-07-30,265693.0,0.0,0.0,5712.0,0.0,0.143,...,30.7,36.791,1.7,61.49,0.571,16320539.0,,,,
330829,ZWE,Africa,Zimbabwe,2023-07-31,265693.0,0.0,0.0,5712.0,0.0,0.143,...,30.7,36.791,1.7,61.49,0.571,16320539.0,,,,
330830,ZWE,Africa,Zimbabwe,2023-08-01,265693.0,0.0,0.0,5712.0,0.0,0.143,...,30.7,36.791,1.7,61.49,0.571,16320539.0,,,,


In [3]:
nswcovid = pd.read_csv('dataset/confirmed_cases_table4_location_likely_source.csv')

In [4]:
nswcovid

Unnamed: 0,notification_date,postcode,likely_source_of_infection,lhd_2010_code,lhd_2010_name,lga_code19,lga_name19
0,2020-01-25,2134,Overseas,X700,Sydney,11300,Burwood (A)
1,2020-01-25,2071,Overseas,X760,Northern Sydney,14500,Ku-ring-gai (A)
2,2020-01-25,2121,Overseas,X760,Northern Sydney,16260,Parramatta (C)
3,2020-01-27,2033,Overseas,X720,South Eastern Sydney,16550,Randwick (C)
4,2020-03-01,2163,Overseas,X710,South Western Sydney,12850,Fairfield (C)
...,...,...,...,...,...,...,...
79274,2021-11-18,2360,Under initial investigation,X800,Hunter New England,14200,Inverell (A)
79275,2021-11-18,2360,Under initial investigation,X800,Hunter New England,14200,Inverell (A)
79276,2021-11-18,2430,Locally acquired - linked to known case or clu...,X800,Hunter New England,15240,Mid-Coast (A)
79277,2021-11-18,2034,Under initial investigation,X720,South Eastern Sydney,16550,Randwick (C)


In [6]:
nswcovidage = pd.read_csv('dataset/confirmed_cases_table2_age_group_agg.csv')

In [7]:
nswcovidage

Unnamed: 0,notification_date,age_group,confirmed_by_pcr,confirmed_cases_count
0,2020-03-09,AgeGroup_0-19,,1
1,2020-03-09,AgeGroup_20-24,,1
2,2020-03-09,AgeGroup_25-29,,1
3,2020-03-09,AgeGroup_35-39,,3
4,2020-03-09,AgeGroup_40-44,,1
...,...,...,...,...
17920,2023-08-24,AgeGroup_60-64,Yes,2
17921,2023-08-24,AgeGroup_65-69,No,14
17922,2023-08-24,AgeGroup_65-69,Yes,5
17923,2023-08-24,AgeGroup_70+,No,11


In [8]:
nswcovidpcr = pd.read_csv('dataset/pcr_testing_table1_location_agg.csv')

In [9]:
nswcovidpcr

Unnamed: 0,test_date,postcode,lhd_2010_code,lhd_2010_name,lga_code19,lga_name19,test_count
0,2020-01-01,2038,X700,Sydney,14170,Inner West (A),1
1,2020-01-01,2039,X700,Sydney,14170,Inner West (A),1
2,2020-01-01,2040,X700,Sydney,14170,Inner West (A),2
3,2020-01-01,2041,X700,Sydney,14170,Inner West (A),1
4,2020-01-01,2069,X760,Northern Sydney,14500,Ku-ring-gai (A),1
...,...,...,...,...,...,...,...
552630,2023-02-08,2870,X850,Western NSW,16200,Parkes (A),10
552631,2023-02-08,2871,X850,Western NSW,12900,Forbes (A),2
552632,2023-02-08,2873,X850,Western NSW,14600,Lachlan (A),1
552633,2023-02-08,2874,X850,Western NSW,16200,Parkes (A),1


In [11]:
covid_pol = pd.read_csv('dataset/covid-19-testing-policy.csv')

covid_pol

Unnamed: 0,Entity,Code,Day,testing_policy
0,Afghanistan,AFG,2020-01-01,0
1,Afghanistan,AFG,2020-01-02,0
2,Afghanistan,AFG,2020-01-03,0
3,Afghanistan,AFG,2020-01-04,0
4,Afghanistan,AFG,2020-01-05,0
...,...,...,...,...
202755,Zimbabwe,ZWE,2022-12-27,3
202756,Zimbabwe,ZWE,2022-12-28,3
202757,Zimbabwe,ZWE,2022-12-29,3
202758,Zimbabwe,ZWE,2022-12-30,3


## About Stringency Index

The OxCGRT project calculate a Government Stringency Index, a composite measure of nine of the response metrics.

The nine metrics used to calculate the Government Stringency Index are: school closures; workplace closures; cancellation of public events; restrictions on public gatherings; closures of public transport; stay-at-home requirements; public information campaigns; restrictions on internal movements; and international travel controls.

The index on any given day is calculated as the mean score of the nine metrics, each taking a value between 0 and 100. See the authors’ full description of how this index is calculated.

A higher score indicates a stricter government response (i.e. 100 = strictest response). If policies vary at the subnational level, the index is shown as the response level of the strictest sub-region.

It’s important to note that this index simply records the strictness of government policies. It does not measure or imply the appropriateness or effectiveness of a country’s response. A higher score does not necessarily mean that a country’s response is ‘better’ than others lower on the index.

https://github.com/owid/covid-19-data/tree/master/public/data

## About testing policies

This interactive chart maps government policies on testing for COVID-19. Note that this relates to PCR testing for the virus only; it does not include non-PCR, antibody testing.

Countries are grouped into four categories:

- No testing policy
- Testing only for those who both (a) have symptoms AND (b) meet specific criteria (e.g. key workers, admitted to hospital, came into contact with a known case, returned from overseas)
- Testing of anyone showing COVID-19 symptoms
- Open public testing (e.g “drive through” testing available to asymptomatic people)