
## Impact of COVID19 on Mobility to Various Public Places





### Project Introduction


The project focuses on performing Exploratory Data Analysis on the Community Mobility Reports data provided by Google during the pandemic period.We analyse how the mobility of people to various popular public places has changed with respect to number of covid cases registered. We will try to extract various insights such as how the visits and duration of stay at different places changed compared to a baseline. The baseline is considered as the median value for the corresponding day of the 5- week period duration Jan 3–Feb 6, 2020.



## Data Preparation

### Introduction of Data

We are performing our research on the data provided by Google that is collected from users who have opted-in to Location History for their Google Account.We are also using data provided by CDC for daily number of COVID Positive cases. We will combine both the datasets in order to perform anaylse on different mobility trends with respect to COVID positive cases.

In [1]:
import numpy as np
import pandas as pd

df_mobility = pd.read_csv('mobility.csv')
df_cases = pd.read_csv('cases_cleaned.csv')

In [2]:
print("shape of mobility dataset",df_mobility.shape)
print("shape of cases dataset",df_cases.shape)

shape of mobility dataset (14280, 9)
shape of cases dataset (14280, 16)


- Dataset 'mobility' comprises of 14280 observations and 14 features.
- Dataset 'cases' comprises of 14280 observations and 16 features.

In [3]:
# Prints the top 5 rows of the mobility dataset
df_mobility.head(5)

Unnamed: 0,state,iso_code,date,retail_and_recreation,grocery_and_pharmacy,parks,transit_stations,workplaces,residential
0,Alabama,AL,2020-02-15,5.0,2.0,39.0,7.0,2.0,-1.0
1,Alabama,AL,2020-02-16,0.0,-2.0,-7.0,3.0,-1.0,1.0
2,Alabama,AL,2020-02-17,3.0,0.0,17.0,7.0,-17.0,4.0
3,Alabama,AL,2020-02-18,-4.0,-3.0,-11.0,-1.0,1.0,2.0
4,Alabama,AL,2020-02-19,4.0,1.0,6.0,4.0,1.0,0.0


In [4]:
# Prints the top 5 rows of the cases dataset
df_cases.head(5)

Unnamed: 0,submission_date,state,tot_cases,conf_cases,prob_cases,new_case,pnew_case,tot_death,conf_death,prob_death,new_death,pnew_death,created_at,consent_cases,consent_deaths,total_tests
0,2/15/20,AK,0.0,,,0.0,,0.0,,,0.0,,3/26/20 16:22,,,0.0
1,2/16/20,AK,0.0,,,0.0,,0.0,,,0.0,,3/26/20 16:22,,,0.0
2,2/17/20,AK,0.0,,,0.0,,0.0,,,0.0,,3/26/20 16:22,,,0.0
3,2/18/20,AK,0.0,,,0.0,,0.0,,,0.0,,3/26/20 16:22,,,0.0
4,2/19/20,AK,0.0,,,0.0,,0.0,,,0.0,,3/26/20 16:22,,,0.0


List of columns of cases and mobility datasets

In [5]:
df_cases.columns

Index(['submission_date', 'state', 'tot_cases', 'conf_cases', 'prob_cases',
       'new_case', 'pnew_case', 'tot_death', 'conf_death', 'prob_death',
       'new_death', 'pnew_death', 'created_at', 'consent_cases',
       'consent_deaths', 'total_tests'],
      dtype='object')

In [6]:
df_mobility.columns

Index(['state', 'iso_code', 'date', 'retail_and_recreation',
       'grocery_and_pharmacy', 'parks', 'transit_stations', 'workplaces',
       'residential'],
      dtype='object')

In [7]:
df_mobility.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14280 entries, 0 to 14279
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   state                  14280 non-null  object 
 1   iso_code               14280 non-null  object 
 2   date                   14280 non-null  object 
 3   retail_and_recreation  14280 non-null  float64
 4   grocery_and_pharmacy   14280 non-null  float64
 5   parks                  14280 non-null  float64
 6   transit_stations       14280 non-null  float64
 7   workplaces             14280 non-null  float64
 8   residential            14280 non-null  float64
dtypes: float64(6), object(3)
memory usage: 1004.2+ KB


In [8]:
df_cases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14280 entries, 0 to 14279
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   submission_date  14280 non-null  object 
 1   state            14280 non-null  object 
 2   tot_cases        14268 non-null  float64
 3   conf_cases       5947 non-null   float64
 4   prob_cases       5947 non-null   float64
 5   new_case         14263 non-null  float64
 6   pnew_case        11157 non-null  float64
 7   tot_death        14262 non-null  float64
 8   conf_death       6327 non-null   float64
 9   prob_death       6327 non-null   float64
 10  new_death        14262 non-null  float64
 11  pnew_death       11065 non-null  float64
 12  created_at       14262 non-null  object 
 13  consent_cases    12022 non-null  object 
 14  consent_deaths   12302 non-null  object 
 15  total_tests      13988 non-null  float64
dtypes: float64(11), object(5)
memory usage: 1.7+ MB


From the below describe() tables,we can observe mean, maximum and minimum values for mobility and cases of different places.We can also observe that maximum mobility is observed in the parks and minimum in retail_and_recreation

In [9]:
df_mobility.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
retail_and_recreation,14280.0,-15.040126,16.608661,-77.0,-24.0,-13.0,-4.0,41.0
grocery_and_pharmacy,14280.0,-1.336485,11.198563,-62.0,-7.0,-1.0,5.0,61.0
parks,14280.0,58.381092,78.785946,-77.0,0.0,38.0,97.0,636.0
transit_stations,14280.0,-19.79937,23.00747,-82.0,-36.0,-20.0,-1.0,73.0
workplaces,14280.0,-27.609384,15.668133,-78.0,-38.0,-30.0,-16.0,18.0
residential,14280.0,8.477381,6.363153,-5.0,4.0,8.0,12.0,33.0


In [10]:
df_cases.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
tot_cases,14268.0,70649.603448,129817.6,0.0,2475.75,20795.5,87691.0,1072698.0
conf_cases,5947.0,72553.584496,80985.06,0.0,11651.0,44642.0,108785.0,634395.0
prob_cases,5947.0,3659.07281,5599.577,0.0,129.5,1252.0,4829.5,41184.0
new_case,14263.0,805.884246,1489.443,-10427.0,37.0,297.0,910.5,17844.0
pnew_case,11157.0,46.488931,197.4092,-6259.0,0.0,0.0,23.0,5014.0
tot_death,14262.0,2035.965082,3381.592,0.0,58.0,547.0,2420.0,20296.0
conf_death,6327.0,2614.613877,3040.045,0.0,449.5,1552.0,3566.0,14900.0
prob_death,6327.0,156.958116,320.612,0.0,0.0,38.0,207.0,5482.0
new_death,14262.0,16.004628,45.61464,-1824.0,0.0,4.0,16.0,2185.0
pnew_death,11065.0,0.922368,79.92104,-5482.0,0.0,0.0,0.0,5482.0


## Basic Data cleaning

### Dealing with Data types

We have 2 different datatypes in our datasets which are :
- Numeric, 
- Categorical

We convert datatype of date column to datetype so that we can use that column to filter the data based on the datetime values of the dataset.

In [11]:
df_mobility['date']= pd.to_datetime(df_mobility['date'])

df_cases['submission_date']= pd.to_datetime(df_cases['submission_date'])

In [12]:
print(df_mobility.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14280 entries, 0 to 14279
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   state                  14280 non-null  object        
 1   iso_code               14280 non-null  object        
 2   date                   14280 non-null  datetime64[ns]
 3   retail_and_recreation  14280 non-null  float64       
 4   grocery_and_pharmacy   14280 non-null  float64       
 5   parks                  14280 non-null  float64       
 6   transit_stations       14280 non-null  float64       
 7   workplaces             14280 non-null  float64       
 8   residential            14280 non-null  float64       
dtypes: datetime64[ns](1), float64(6), object(2)
memory usage: 1004.2+ KB
None


We can observe that datatype of date columns has been changed to datetime.

### Feature Selection


Our dataset 'mobility' contains the values corresponding to each county within state. We are primarily focusing on analysing the state wise trends. Hence we are dropping all the rows corresponding to the each county.



In [13]:
df_mobility=df_mobility[df_mobility.state.notnull()]
df_mobility=df_mobility[df_mobility.county.isnull()]
df_mobility.drop(['county', 'metro_area','census_fips_code','country_code','country_region'], axis = 1,inplace=True)

AttributeError: 'DataFrame' object has no attribute 'county'

We create a new column 'cases_percent' and store the values by computing percentage of COVID positive cases using the columns 'New_cases' and 'Total_tests'

In [None]:
df_cases["cases_percent"]=(df_cases['new_case']*100)/df_cases['total_tests']

In [None]:
#checking if the new column 'cases_percent' has been added to df_cases
df_cases.tail()

### Handling missing data

We have few null values in the columns 'parks' and 'transit_stations'. We will replaces those values with zeros so that we can caluclate accurate trends of the mobility to those category of places


In [None]:
df_mobility.isnull().sum().sort_values(ascending=False)

In [None]:
df_mobility['parks']= df_mobility['parks'].fillna(0)
df_mobility['transit_stations']= df_mobility['transit_stations'].fillna(0)

In [None]:
df_mobility.isnull().sum().sort_values(ascending=False)

Dataset "cases" contains null values, we will drop those columns which have large number of null values and replace other null values with zeros.

In [None]:
df_cases.isnull().sum().sort_values(ascending=False)

In [None]:
df_cases.drop(['prob_cases','created_at','conf_cases','prob_death','conf_death','pnew_death','pnew_case','pnew_case','consent_cases','consent_deaths'], axis = 1,inplace=True)

In [None]:
df_cases['new_death']= df_cases['new_death'].fillna(0)
df_cases['tot_death']= df_cases['tot_death'].fillna(0)
df_cases['new_case']= df_cases['new_case'].fillna(0)
df_cases['tot_cases']= df_cases['tot_cases'].fillna(0)
df_cases['cases_percent']= df_cases['tot_cases'].fillna(0)
df_cases['total_tests']= df_cases['tot_cases'].fillna(0)

In [None]:
df_cases.isnull().sum().sort_values(ascending=False)

Checking if both datasets have same number of rows

In [None]:
df_mobility.shape

In [None]:
df_cases.shape

Saving the cleaned dataset into seperate file

In [None]:
df_mobility.to_csv('mobility_cleaned.csv',header=True,index=False)
df_mobility= pd.read_csv('mobility_cleaned.csv')
df_mobility['date']=pd.to_datetime(df_mobility['date'])

### Merging columns of both datasets



In [None]:
df_cases['submission_date']= pd.to_datetime(df_cases['submission_date'])

states_mobility=df_mobility.iso_code.unique()
df_cases1 = pd.DataFrame()

for  stat in states_mobility:
    df_cases1=df_cases1.append(df_cases[(df_cases['submission_date'] >= '2/15/20') & (df_cases['submission_date'] <= '11/20/20') & (df_cases['state']==stat)])


In [None]:
df_cases1.shape
df_cases1['submission_date']= pd.to_datetime(df_cases1['submission_date'])

In [None]:
df_cases1 = df_cases1.sort_values(by = 'state') 
df_mobility = df_mobility.sort_values(by = 'state') 
df_cases.tail()

In [None]:
df_mobility.head()

#### Adding the columns 'new_case', 'cases_percent' of df_cases1 to df_mobility dataset

In [None]:
df_mobility['cases']=df_cases1['new_case']
df_mobility['cases_percent']=df_cases1['cases_percent']

In [None]:
df_mobility

### Exploratory Data Analysis


#### Summarising the mobility trends of different place categories accross all the US States.

In [None]:
df=pd.DataFrame()

df['transit_stations']=df_mobility.groupby(['iso_code'])['transit_stations'].mean()
df['parks']=df_mobility.groupby(['iso_code'])['parks'].mean()
df['retail_and_recreation']=df_mobility.groupby(['iso_code'])['parks'].mean()
df['workplaces']=df_mobility.groupby(['iso_code'])['workplaces'].mean()
df['grocery_and_pharmacy']=df_mobility.groupby(['iso_code'])['grocery_and_pharmacy'].mean()
df['retail_and_recreation']=df_mobility.groupby(['iso_code'])['retail_and_recreation'].mean()
df['residential']=df_mobility.groupby(['iso_code'])['residential'].mean()
df['cases']=df_mobility.groupby(['iso_code'])['cases'].mean()
df['total cases']=df_mobility.groupby(['iso_code'])['cases'].sum()
state=df_mobility['iso_code'].unique()
df['state']=sorted(state)
sum1=df['cases'].sum()

df = df.sort_values(by = 'state')


In [None]:
df_color=pd.read_csv('red_blue.csv')
red_blue=[]
red_blue=df_color['Red/Blue'].tolist()
state_name=df_color['State/District'].tolist()
df['Red/Blue']=red_blue
df['state_name']=state_name


In [None]:
df

In [None]:
df.describe().T

From the above table we can observe  following 

- Mobility has dropped to a greater extent to places such as workplaces , transit_stations and retail_and_recreation. 
- Mbility has increased very much to parks.
- There is no significant difference in the mobility to grocery and pharmacy stores.

#### Heatmap showing correlation between columns of mobility dataset

In [None]:
import seaborn as sns
import matplotlib as pl
import matplotlib.pyplot as plt


correlation=df_mobility.corr()
plt.figure(figsize=(20,20))
#plt.savefig('Heatmap.png')
sns_plot=sns.heatmap(correlation,annot=True)
#plt.savefig("output.png")
plt.show()

### Histograms on different columns of cases dataset

In [None]:
sns.set_style("whitegrid");
num=df.select_dtypes(include=['int64','float64'])
num.hist(figsize=(20, 20))
#plt.savefig('Histograms.png')
plt.show() 



### Outlier Detection

In [None]:
columns = ['parks',
       'workplaces','transit_stations','retail_and_recreation','grocery_and_pharmacy','residential']

for i in columns:
    print(i)
    
    df_mobility[str(i)].plot(kind='box', subplots=True, sharex=False, sharey=False, figsize=(6,6))
   
    plt.show()
    

We can observe that there are outliers in every category of places.

In [None]:
df_bottom=df.head(10)

In [None]:
df_bottom

In [None]:
df_top=df.tail(10)
df_top

### States with lowest and highest positive cases

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.bar(df_bottom['state'], df_bottom['total cases'])
plt.xlabel("States")
plt.ylabel("Cases")
plt.title("List of states with lowest positive cases")
#plt.savefig('low.png')
plt.show()

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.bar(df_top['state'], df_top['total cases'])
plt.xlabel("States")
plt.ylabel("Cases")
plt.title("List of states with highest positive cases")
#plt.savefig('top.png')
plt.show()

In [None]:
df=df.round(2)

## Analysis &  Observations

### Visualizing Trends of Mobility

We are visualising the trends of mobility during COVID19 to different public places using the graphs plotted using the plotly library. Hovering on each individual states gives the mobility figure of that particular states. States with  darker colours indicate the mobility is high and as the color intensity decrease, it indicates the states with low mobility than that of baseline.

As part of implementation of these graphs we need to install following two libraries namely <b> plotly </b> and <b> chart_studio</b>

In [None]:
#pip install plotly

In [None]:
#pip install chart_studio

In [None]:
import chart_studio.plotly as py
import plotly.graph_objects as go
import plotly.graph_objs as go
import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import sys
if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

### Total Number of positive cases - statewise

In [None]:
data = dict(type='choropleth',
            locations = df['state'],
            locationmode = 'USA-states',
            colorscale = 'Reds',
            text = df['state'],
            z = df['total cases'],
            colorbar = {'title':"cases Count"},
        
            )

layout = dict(title = 'Total number of cases across states',
              geo = dict(scope='usa')
           )

choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)

The above graph shows the total number of COVID positive cases. States with dark color have more number of cases and it decreases as the intensity of color decreases. We can clearly say that states like California,Texas,Florida have very large number of cases. On other hand, states in the central part of America have less number of states comparatively  

In [None]:
df=round(df,2)

In [None]:
df.loc[df['Red/Blue'] == 'red', 'state_color'] = '2'
df.loc[df['Red/Blue'] == 'blue', 'state_color'] = '1'
df = df.sort_values(by = 'Red/Blue')


# Summary of Mobility Trends in Red Blue states


The below graph provides the complete analysis with number of cases and trend of mobility in each of the 50 states by hovering on the respective state. The states in Red are won by the Republicans and Blue represents states won by the Democrats. This graph helps to analyse how the election campaign as impacted on the mobility of people.

- States in Red - Republican states

- States in Blue - Democrat states

In [None]:
import plotly.express as px

fig = px.choropleth(df, locations=df['state'], hover_data=['state_name','total cases','parks','retail_and_recreation','workplaces','grocery_and_pharmacy','residential'],locationmode="USA-states",color=df['state_color'], scope="usa")

fig.show()

## Plots for change in mobility to different public places


We are visualising the trends of mobility in each of six different public places. Hovering on each individual states gives the mobility figure of that particular states. States with darker color indicate the mobility is high and as the color intensity decrease, it indicates that these states are with lower mobility to that of baseline.

#### 1. workplaces

In [None]:
data = dict(type='choropleth',
            locations = df['state'],
            locationmode = 'USA-states',
            colorscale = 'Reds',
            text = df['state'],
            z = df['workplaces'],
            colorbar = {'title':"cases Count"},
           # hover_data =df['parks']
            )

layout = dict(title = 'Mobility to workplaces',
              geo = dict(scope='usa')
           )

choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)
#fig.show()

In [None]:
data = dict(type='choropleth',
            locations = df['state'],
            locationmode = 'USA-states',
            colorscale = 'Reds',
            text = df['state'],
            z = df['transit_stations'],
            colorbar = {'title':"Mobility"},
           # hover_data =df['parks']
            )

layout = dict(title = 'Mobility to transit_stations',
              geo = dict(scope='usa')
           )

choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)

#### 3. parks

In [None]:
data = dict(type='choropleth',
            locations = df['state'],
            locationmode = 'USA-states',
            colorscale = 'Reds',
            text = df['state'],
            z = df['parks'],
            colorbar = {'title':"Mobility"},
           # hover_data =df['parks']
            )

layout = dict(title = 'Mobility to parks',
              geo = dict(scope='usa')
           )

choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)

#### 4. retail_and_recreation

In [None]:
data = dict(type='choropleth',
            locations = df['state'],
            locationmode = 'USA-states',
            colorscale = 'Reds',
            text = df['state'],
            z = df['retail_and_recreation'],
            colorbar = {'title':"Mobility"},
           # hover_data =df['parks']
            )

layout = dict(title = 'Mobility to retail_and_recreation',
              geo = dict(scope='usa')
           )

choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)

#### 5. grocery_and_pharmacy

In [None]:
data = dict(type='choropleth',
            locations = df['state'],
            locationmode = 'USA-states',
            colorscale = 'Blues',
            text = df['state'],
            z = df['grocery_and_pharmacy'],
            colorbar = {'title':"Mobility"},
           # hover_data =df['parks']
            )

layout = dict(title = 'Mobility to grocery_and_pharmacy',
              geo = dict(scope='usa')
           )

choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)

#### 5. residential

In [None]:
data = dict(type='choropleth',
            locations = df['state'],
            locationmode = 'USA-states',
            colorscale = 'Blues',
            text = df['residential'],
            z = df['residential'],
            colorbar = {'title':"Mobility"},
           # hover_data =df['parks']
            )

layout = dict(title = 'Mobility to residential places',
              geo = dict(scope='usa')
           )

choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)

### Finding average change in mobility across the states

In [None]:
df_filtered=df[['transit_stations','parks','retail_and_recreation','workplaces','grocery_and_pharmacy','residential']]
df_filtered['avg_drop']=(df_filtered['transit_stations']+df['parks']+df['retail_and_recreation']+df['workplaces']+df['grocery_and_pharmacy']+df['residential'])/6

In [None]:
df_filtered=round(df_filtered,2)

In [None]:
data = dict(type='choropleth',
            locations = df['state'],
            locationmode = 'USA-states',
            colorscale = 'Blues',
            text = df['state'],
            z = df_filtered['avg_drop'],
            colorbar = {'title':"Mobility"},
           # hover_data =df['parks']
            )

layout = dict(title = 'Average change in mobility to all six public places',
              geo = dict(scope='usa')
           )
choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)

### Mobility trend in all 50 states (Red and Blue states)

In [None]:
from matplotlib.pyplot import figure


fig, ax = plt.subplots(figsize=(8,16))
df_filtered['avg_drop'].plot(kind='barh',color = df['Red/Blue'], legend = False, ax=ax)

ax.set_xlabel('Change in mobility')
ax.set_ylabel('States')

#fig.savefig("barh.png")

                 

In [None]:
df_places=round(df.describe().T,2)
df_places

In [None]:
df_filtered=df_filtered.sort_values('iso_code')
df_filtered['state']=state_name
df_filtered['Red/Blue']=df.sort_values('iso_code')['Red/Blue']
df_filtered=df_filtered.sort_values('avg_drop')
df_filtered_top=df_filtered.head(5)
df_filtered_bottom=df_filtered.tail(5)

In [None]:
df_filtered_bottom

### Top and Bottom 5 states in mobility change

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.bar(df_filtered_top['state'], df_filtered_top['avg_drop'],color=df_filtered_top['Red/Blue'])
plt.xlabel("States")
plt.ylabel("Cases")
plt.title("List of states with highest drop in mobility")
#plt.savefig('highdrop.png')
plt.show()

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.bar(df_filtered_bottom['state'], df_filtered_bottom['avg_drop'],color=df_filtered_bottom['Red/Blue'])
plt.xlabel("States")
plt.ylabel("Cases")
plt.title("List of states with lowest drop in mobility")
#plt.savefig('lowdrop.png')
plt.show()

## Summary

The project focuses on performing Exploratory Data Analysis on the Community Mobility Reports data provided by Google during the pandemic period. We analyze how the mobility of people to various popular public places has changed with respect to the number of covid cases registered and extract various insights such as how the visits and duration of stay at different places changed compared to a baseline. After we have performed our analysis on the mobility dataset we have got to following conclusions. 

- The mobility of cases has dropped mostly in Blue states and the mobility increased in Red states. 
- There is huge amount of mobility drop in the states like Washington DC , Hawaii and Florida.
- We can understand that as these states are mostly visited by Tourist, due to the pandemic there is huge drop in mobility in such places.
- On another hand there is no much decrease in the mobility of people in the states like Idaho,Wisconsin,Wyoming and South Dakota.
- We can observe that these states are mostly the mid central states which are not affected by the COVID'19 much.



#### What was unique about the data?  Did you have to deal with imbalance? What data cleaning did you do? 

We have combined the covid-19 community mobility dataset and the covid-19 cases dataset from February to November to get the required results and there was a lot of imbalance between the two datasets and  We have 2 different data types and they are numeric and categorical. We have a column "Date" which is of data type object and we converted it to date type. Also removed unwanted extra columns this includes duplicate or irrelevant observations which are not useful for our analysis for example in feature selection we have data corresponding to each county within the state and as we concentrated more on state-wise data we dropped the county wise data. We had many missing values in the datasets which we cannot ignore missing values in our dataset, So we have dropped some observations and have replaced some with the 0. We have also observed that there are outliers in all places.
#### Did you create any new additional features / variables?

We have added new columns 'new_cases', cases_percent' of df_cases1 to df_mobility dataset.

#### What was the process you used for evaluation?  What was the best result?

We used exploratory data analysis to analyze the mobility trends of different places across all the US states, we observed that mobility has dropped to a greater extent in retail and recreation, transit_stations, and workplaces. We used a heatmap to show the correlation between the columns of the mobility dataset. Histogram plots show us the number of cases in different states and also states with the highest positive cases wrt to the population of the state.We used plotly and chart_studio library to analyze the mobility trends in different places with hovering on the states. States with darker colors indicate the mobility is high and as the color intensity decrease, it indicates the states with low mobility than that of baseline

### Future work

In our current project we have performed our analysis just to state level in the US, as part of future work we can extend this model to county level. The county level analysis gives more accurate results with more possible insights. As we are dealing with timeseries data including the counties would have become more extensive data and we may have required more time


## Conclusion

We used exploratory data analysis to analyze the mobility trends of different places across all the US states, we observed that mobility has dropped to a greater extent in retail and recreation, transit_stations and workplaces. We used heatmap to show the correlation between the columns of mobility dataset. Histogram plots show us the number of cases in different states and also states with highest positive cases wrt to the state’s population.

We are visualizing the trends of mobility during COVID19 to different public places using the graphs plotted using the plotly library. Hovering on each individual state gives the statistics such as the totoal number of positive cases, change in mobility to 6 mentioned public places. States with darker colors indicate the mobility is high and as the color intensity decrease, it indicates the states with low mobility than that of baseline. As part of implementation of these graphs we need to install following two libraries namely plotly and chart_studio

 
## References:  
 
1. https://www.google.com/covid19/mobility/
 
2. https://www.gstatic.com/covid19/mobility/2020-11-13_US_Mobility_Report_en.pdf

3. https://docs.oracle.com/cd/E11882_01/datamine.112/e16808/algo_apriori.htm#DMCON287

4. https://plotly.com/python/choropleth-maps/

5. https://matplotlib.org/gallery/index.html#event-handling