## Analysis of COVID-19 cases in Tamil Nadu

Import the basic Libraries

In [1]:
import pandas as pd
import numpy as np

We will first scrape data from a table in Wikipedia which contains information on the COVID-19 situation in Tamil Nadu.
As the page keeps getting updated, each time the code in this page is run, the results may differ according to the new statistics.


In [2]:
URL1 = "https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Tamil_Nadu"
dfs = pd.read_html(URL1,header=0)
df = dfs[7]
columns = df.iloc[1,:-1]
tn_total = df.iloc[2,:]
df = df.iloc[3:40,:-1]
columns[1]="Diagnosed cases"
columns[4]="Active cases"
df.columns = columns
df_temp=df.iloc[:,1:-1]
district=[]
for i in df["District"]:
    district.append(i)
df.set_index("District",inplace=True,drop=True)
df



1,Diagnosed cases,Deaths,Recovered cases,Active cases,Population,Cases per M,Last case reported on
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Ariyalur,414,0,387,27,752481,550,20 June 2020
Chengalpattu,3620,45,1831,1744,2556423,1416,20 June 2020
Chennai,39641,559,21796,17286,7088000,5593,20 June 2020
Coimbatore,255,1,164,90,3172578,80,20 June 2020
Cuddalore,663,3,479,181,2600880,255,20 June 2020
Dharmapuri,30,0,16,14,1502900,20,20 June 2020
Dindigul,278,4,202,72,2161367,129,20 June 2020
Erode,78,1,72,5,2259608,35,20 June 2020
Kallakurichi,366,0,292,74,548950,667,20 June 2020
Kancheepuram,1095,10,538,547,1863174,588,20 June 2020


All the data in the table should be converted to the required format in order to perform analysis. Districts with invalid values entered will be ignored.

In [3]:
droplist=[]
numlist = [1,2,3,4,5,6,7,8,9,0]
for i in range(len(numlist)):
    numlist[i] = str(numlist[i])
for i in range(1,len(columns)-1):
    for num,j in enumerate(df_temp[columns[i]]):
        for k in j:
            if k not in numlist:
                droplist.append(df.index[num])
                break
for i in droplist:
    df.drop(i,inplace=True)
    district.remove(i)

df = df.astype({"Diagnosed cases":"int64","Deaths":"int64","Recovered cases":"int64","Active cases":"int64","Population":"int64","Cases per M":"int64"})
df

1,Diagnosed cases,Deaths,Recovered cases,Active cases,Population,Cases per M,Last case reported on
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Ariyalur,414,0,387,27,752481,550,20 June 2020
Chengalpattu,3620,45,1831,1744,2556423,1416,20 June 2020
Chennai,39641,559,21796,17286,7088000,5593,20 June 2020
Coimbatore,255,1,164,90,3172578,80,20 June 2020
Cuddalore,663,3,479,181,2600880,255,20 June 2020
Dharmapuri,30,0,16,14,1502900,20,20 June 2020
Dindigul,278,4,202,72,2161367,129,20 June 2020
Erode,78,1,72,5,2259608,35,20 June 2020
Kallakurichi,366,0,292,74,548950,667,20 June 2020
Kancheepuram,1095,10,538,547,1863174,588,20 June 2020


Now let us pre-process the date column 

In [4]:
import datetime

In [5]:
for i in range(len(df)):
    date_time_str = df['Last case reported on'][i]
    date_time_obj = datetime.datetime.strptime(date_time_str, '%d %B %Y')
    df['Last case reported on'][i] = date_time_obj.date()
df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


1,Diagnosed cases,Deaths,Recovered cases,Active cases,Population,Cases per M,Last case reported on
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Ariyalur,414,0,387,27,752481,550,2020-06-20
Chengalpattu,3620,45,1831,1744,2556423,1416,2020-06-20
Chennai,39641,559,21796,17286,7088000,5593,2020-06-20
Coimbatore,255,1,164,90,3172578,80,2020-06-20
Cuddalore,663,3,479,181,2600880,255,2020-06-20
Dharmapuri,30,0,16,14,1502900,20,2020-06-20
Dindigul,278,4,202,72,2161367,129,2020-06-20
Erode,78,1,72,5,2259608,35,2020-06-20
Kallakurichi,366,0,292,74,548950,667,2020-06-20
Kancheepuram,1095,10,538,547,1863174,588,2020-06-20


One final step in the pre-processing is to combine newly created districts with the old districts because of the non-availability of GeoJSON files from the last few years. 
This means we combine the following districts for the purpose of the Chloropleth Map:

1.Tirunelveli and Tenkasi
2.Tirupattur, Ranipet and Vellore
3.Chengalpattu and Kancheepuram
4.Kallakurichi and Villupuram

In [6]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Folium installed and imported!')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Folium installed and imported!


In [7]:
df["District"] = district
    
tn_geo = r'https://raw.githubusercontent.com/HindustanTimesLabs/shapefiles/master/state_ut/tamilnadu/district/tamilnadu_district.json'

latitude = 11.1271 
longitude = 78.6569

tn_map = folium.Map(location=[latitude, longitude], zoom_start=7)

tn_map.choropleth(
    geo_data=tn_geo,
    data=df,
    columns=['District','Active cases'],
    key_on='feature.properties.district',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Cases per M'
)
tn_map