# LearnPlatform COVID-19 Impact

**Problem Statement:**

* The **COVID-19** Pandemic has disrupted learning for more than **56 million** students in the United States. 
 * In the Spring of **2020**, most *states and local governments* across the U.S. closed educational institutions to stop the spread of the virus. 
 * In response, schools and teachers have attempted to reach students remotely through *distance learning tools and digital platforms.*
 * Until today, concerns of the exacaberting *digital divide and long-term learning loss* among America’s most vulnerable learners continue to grow.

To get started we will use Python for data processing, understanding and insights.

Import python libraries as necessary to get started for data load and later import other libraries as needed

In [None]:
import glob
import pandas as pd
import numpy as np
import missingno as msno
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.offline as po
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import matplotlib.pyplot as plt
import plotly.express as px
import random
import plotly.figure_factory as ff

In [None]:
product_df = pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")
district_df =pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")

| No. | Feature Name | Description of the feature |
| :-- | :--| :--| 
|01| **district_id**   | The unique identifier of the school district|
|02| **state** | The state where the district resides in                 |
|03| **locale**   | NCES locale classification that categorizes U.S. territory into four types of areas: City, Suburban, Town, and Rural. See Locale Boundaries User's Manual for more information. |
|04| **pct_black/hispanic** | Percentage of students in the districts identified as Black or Hispanic based on 2018-19 NCES data |
|05| **pct_free/reduced**   | Percentage of students in the districts eligible for free or reduced-price lunch based on 2018-19 NCES data |
|06| **county_connections_ratio**   | ratio (residential fixed high-speed connections over 200 kbps in at least one direction/households) based on the county level data from FCC From 477 (December 2018 version). See FCC data for more information.|
|07| **pp_total_raw**   | Per-pupil total expenditure (sum of local and federal expenditure) from Edunomics Lab's National Education Resource Database on Schools (NERD$) project. The expenditure data are school-by-school, and we use the median value to represent the expenditure of a given school district.|

In [None]:
path = "/kaggle/input/learnplatform-covid19-impact-on-digital-learning/engagement_data"
all_files = glob.glob(path + "/*.csv")

li = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)
    
engagement_df = pd.concat(li, axis=0, ignore_index=True)

In [None]:
# Shape of the data files ( number of rows and number of columns) 
print('\033[1m'"Shape of the Engagement File "'\033[0m',engagement_df.shape )
print('\033[1m'"Shape of the District file"'\033[0m', district_df.shape)
print('\033[1m'"Shape of the Product File"'\033[0m',product_df.shape)

In [None]:
engagement_df.head(10).style.set_caption("Engagement Dataframe").set_properties(**{'background-color': 'black',
                           'color': 'lawngreen','border': '1.5px  solid white'})

In [None]:
district_df.head(10).style.set_caption("District Dataframe").set_properties(**{'background-color': 'black',
                           'color': 'lawngreen','border': '1.5px  solid white'})

In [None]:
product_df.head(10).style.set_caption("Product Dataframe").set_properties(**{'background-color': 'black',
                           'color': 'lawngreen','border': '1.5px  solid white'})

In [None]:
print('\033[1m'"Data types of each column in district data file\n"'\033[0m',district_df.dtypes)
print('\033[1m'"Data types of each column in product data file\n"'\033[0m',product_df.dtypes)
print('\033[1m'"Data types of each column in engagement data file\n"'\033[0m',engagement_df.dtypes)

Missing Values

In [None]:
# Missing Value check
print('\033[1m'"Missing value present in each column of district data file\n"'\033[0m',district_df.isna().any())
print('\033[1m'"Missing value present in each column of product data file\n"'\033[0m',product_df.isna().any())
print('\033[1m'"Missing value present in each column of engagement data file\n"'\033[0m',engagement_df.isna().any())

Missing Values count

In [None]:
#Missing value count  
print('\033[1m'"Missing value count in each column of district data file\n"'\033[0m',district_df.isna().sum())
print('\033[1m'"Missing value count in each column of product data file\n"'\033[0m',product_df.isna().sum())
print('\033[1m'"Missing value count in each column of engagement data file\n"'\033[0m',engagement_df.isna().sum())

In [None]:
# Visualize missing values as a matrix

msno.heatmap(district_df,figsize=(10,5))

In [None]:
# Visualize the correlation between the number of missing values in different columns as a heatmap

msno.heatmap(product_df,figsize=(10,5))

In [None]:
# Visualize the correlation between the number of missing values in different columns as a heatmap

msno.heatmap(engagement_df,figsize=(10,5))

In [None]:
# Dropping the values from the states column which are in district dataframe
district_df = district_df[district_df['state'].notna()].reset_index(drop=True)

In [None]:
# Rechecking the no of missing values in the district column

district_df.isna().sum()

In [None]:
#First Five records 

fig = ff.create_table(district_df.head(5),height_constant=50)
fig.update_layout(width=3500, height=400)
fig.show()

In [None]:
#Last five records 

colorscale = [[0, 'red'],[.5, '#DCE775'],[1, '#C0CA33']]
font=['white', '#212121' , 'red']
fig = ff.create_table(district_df.tail(5),height_constant=50,colorscale=colorscale,font_colors=font)
for i in range(len(fig.layout.annotations)):
    fig.layout.annotations[i].font.size = 17
fig.update_layout(width=4500, height=400)
fig.show()

In [None]:
# Count of each states in the dataframe
plt.figure(figsize=(10,12))
sns.countplot(y ='state',data = district_df,order=district_df['state'].value_counts().index)
plt.show()

In [None]:
#Simple Pie Chart
#marks = [48 , 30 , 20 , 15]
status = ['Connecticut' , 'Utah' , 'Massachusetts' , 'Illinois','California','Ohio','New York','Indiana','Missouri','Washington','Virginia','North Carolina','Wisconsin','District Of Columbia','Texas','New Jersey','New Hampshire','Michigan','Tennessee','Arizona','North Dakota','Florida','Minnesota']
data = go.Pie(
values= district_df["state"].value_counts(),
labels= status,
)
layout = go.Layout(
title=dict(text = "State",x=0.46,y=0.95,font_size=20)
)
fig = go.Figure(data=data,layout=layout)
fig.show()

In [None]:
# Count of each locale in the dataframe
plt.figure(figsize=(10,12))
sns.countplot(x ='locale',data = district_df,order=district_df['locale'].value_counts().index)
plt.show()

In [None]:
status = ['Suburb','Rural','City','Town']
colors = ['#8BC34A','#D4E157','#FFB300','#FF7043']
data = go.Pie(
values= district_df["state"].value_counts(),
labels= status,
marker=dict(colors=colors),
textinfo='label+value+percent'
)
layout = go.Layout(
title=dict(text = "State",x=0.46,y=0.95,font_size=20)
)
fig = go.Figure(data=data,layout=layout)
fig.show()

In [None]:
# Count of each pct_black/hispanic in the dataframe
plt.figure(figsize=(10,12))
sns.countplot(x ='pct_black/hispanic',data = district_df,order=district_df['pct_black/hispanic'].value_counts().index)
plt.show()

In [None]:
status = ['[0, 0.2[','[0.2, 0.4[','[0.4, 0.6[','[0.6, 0.8[','[0.8, 1[']
colors = ['#1f77b4','#ff7f0e','#2ca02c','#d62728','#9467bd']
data = go.Pie(
values= district_df["pct_black/hispanic"].value_counts(),
labels= status,
marker=dict(colors=colors),
textinfo='label+value+percent'
)
layout = go.Layout(
title=dict(text = "State",x=0.46,y=0.95,font_size=20)
)
fig = go.Figure(data=data,layout=layout)
fig.show()

In [None]:
#count of pct_free/reduced in the district dataframe

plt.figure(figsize=(10,12))
sns.countplot(x ='pct_free/reduced',data = district_df,order=district_df['pct_free/reduced'].value_counts().index)
plt.show()

In [None]:
status = ['[0, 0.2[','[0.2, 0.4[','[0.4, 0.6[','[0.6, 0.8[','[0.8, 1[']
colors = ['#8c564b','#e377c2','#7f7f7f','#bcbd22','#17becf']
data = go.Pie(
values= district_df["pct_black/hispanic"].value_counts(),
labels= status,
marker=dict(colors=colors),
textinfo='label+value+percent'
)
layout = go.Layout(
title=dict(text = "State",x=0.46,y=0.95,font_size=20)
)
fig = go.Figure(data=data,layout=layout)
fig.show()

In [None]:
status = ['[0.18, 1[','[1,2[']
colors = ['#17becf', '#E1396C']
data = go.Pie(
values= district_df["county_connections_ratio"].value_counts(),
labels= status,
marker=dict(colors=colors),
textinfo='label+value+percent'
)
layout = go.Layout(
title=dict(text = "State",x=0.46,y=0.95,font_size=20)
)
fig = go.Figure(data=data,layout=layout)
fig.show()

In [None]:
#pp_total_raw

district_df["pp_total_raw"].value_counts()

In [None]:
#count of pp_total_raw in the district dataframe

plt.figure(figsize=(10,12))
sns.countplot(x ='pp_total_raw',data = district_df,order=district_df['pp_total_raw'].value_counts().index)
plt.xticks(rotation=90)
plt.show()

In [None]:
status = ['[8000, 10000[','[10000, 12000[','[14000, 16000[','[12000, 14000[','[6000, 8000[','[16000, 18000[','[18000, 20000[','[20000, 22000[','[22000, 24000[','[4000, 6000[','[32000, 34000[']
colors = ['#8c564b','#e377c2','#7f7f7f','#bcbd22','#17becf','#1f77b4','#ff7f0e','#2ca02c','#d62728','#9467bd','#E1396C']
data = go.Pie(
values= district_df["pp_total_raw"].value_counts(),
labels= status,
marker=dict(colors=colors),
textinfo='label+value+percent'
)
layout = go.Layout(
title=dict(text = "State",x=0.46,y=0.95,font_size=20)
)
fig = go.Figure(data=data,layout=layout)
fig.show()

In [None]:
plt.figure(figsize=(20,40))
value_to_int = {j:i for i,j in enumerate(pd.unique(district_df.values.ravel()))} # like you did
n = len(value_to_int)     
# discrete colormap (n samples from a given cmap)
cmap = sns.color_palette("Pastel2", n) 
ax = sns.heatmap(district_df.replace(value_to_int), cmap=cmap) 
# modify colorbar:
colorbar = ax.collections[0].colorbar 
r = colorbar.vmax - colorbar.vmin 
colorbar.set_ticks([colorbar.vmin + r / n * (0.5 + i) for i in range(n)])
colorbar.set_ticklabels(list(value_to_int.keys()))                                          
plt.show()

In [None]:
states = district_df.groupby(by ='state').count()[['district_id']]
plt.figure(figsize=(15,10))
plt.title("States with most Districts Mentioned")
plt.ylabel('No of districts')
plt.xlabel('States')
sns.set(rc={"axes.facecolor":"#283747", "axes.grid":False,'xtick.labelsize':14,'ytick.labelsize':14})
sns.barplot(x=states.index,y=list(states['district_id']))
plt.xticks(rotation=90)

In [None]:
locale=district_df.groupby(by ='locale').count()[['district_id']]
plt.figure(figsize=(15,10))
plt.title("Locale with most Districts Mentioned")
plt.ylabel('No of districts')
plt.xlabel('No. of locale mentioned')
sns.set(rc={"axes.facecolor":"#283747", "axes.grid":False,'xtick.labelsize':14,'ytick.labelsize':14})
sns.barplot(x=locale.index,y=list(locale['district_id']))
plt.xticks(rotation=90)

In [None]:
black_hispanic=district_df.groupby(by ='pct_black/hispanic').count()[['district_id']]
plt.figure(figsize=(15,10))
plt.title("pct_black/hispanic with most Districts Mentioned")
plt.ylabel('No of districts')
plt.xlabel('No. of pct_black/hispanic mentioned')
sns.set(rc={"axes.facecolor":"#283747", "axes.grid":False,'xtick.labelsize':14,'ytick.labelsize':14})
sns.barplot(x=black_hispanic.index,y=list(black_hispanic['district_id']))
plt.xticks(rotation=90)

In [None]:
free_reduced=district_df.groupby(by ='pct_free/reduced').count()[['district_id']]
plt.figure(figsize=(15,10))
plt.title("pct_free/reduced with most Districts Mentioned")
plt.ylabel('No of districts')
plt.xlabel('No. of pct_black/hispanic mentioned')
sns.set(rc={"axes.facecolor":"#283747", "axes.grid":False,'xtick.labelsize':14,'ytick.labelsize':14})
sns.barplot(x=free_reduced.index,y=list(free_reduced['district_id']))
plt.xticks(rotation=90)

In [None]:
# county_connections_ratio

county_connection=district_df.groupby(by ='county_connections_ratio').count()[['district_id']]
plt.figure(figsize=(10,10))
plt.title("county_connections_ratio with most Districts Mentioned")
plt.ylabel('No of districts')
plt.xlabel('No. of county_connections_ratio mentioned')
sns.set(rc={"axes.facecolor":"#283747", "axes.grid":False,'xtick.labelsize':14,'ytick.labelsize':14})
sns.barplot(x=county_connection.index,y=list(county_connection['district_id']))
plt.xticks(rotation=90)

In [None]:
plt.figure(figsize=(10,10))
sns.set(rc={'xtick.labelsize':12,'ytick.labelsize':12,'axes.labelsize':12})
sns.swarmplot(x="state", y="pct_black/hispanic", hue="locale", data=district_df)
plt.xticks(rotation=90)

In [None]:
#distribution of state and locale
plt.figure(figsize=(10,10))
sns.displot(data=district_df, x='state', hue= 'locale', height=8, aspect=3)
plt.xticks(rotation=90)

In [None]:
pct_black_hispanic = district_df['pct_black/hispanic'].str.split(",",n=1,expand=True)
# separating pct_black and pct_hispanic
district_df['pct_black']=pct_black_hispanic[0].str.replace('[','',regex=True)
district_df['pct_hispanic']= pct_black_hispanic[1].str.replace('[','',regex=True)
# changing pct_black and pct_hispanic to numeric
district_df['pct_black']=pd.to_numeric(district_df['pct_black'])
district_df['pct_hispanic']=pd.to_numeric(district_df['pct_hispanic'])
district_df['pct_black_and_hispanic']=(district_df['pct_black'] + district_df['pct_hispanic'])/2

In [None]:
sns.displot(data=district_df, x='pct_black_and_hispanic', hue='locale',kind='kde',multiple="stack",height=8.27, aspect=11.7/8.27)

In [None]:
sns.displot(data=district_df, x="pct_black_and_hispanic", hue='state', height=8.27, aspect=11.7/8.27)

To be continued...