# **Impact of Covid-19 on Digital Learning**


![](https://i0.wp.com/www.woschool.com/wp-content/uploads/2020/04/Woodland.jpg)

# Importing the Libraries and Basic EDA

In [None]:
import numpy as np 
import pandas as pd 
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import missingno as msno
import plotly.graph_objects as go
import math
import glob
import os

### Product information data
*The product file ```products_info.csv``` includes information about the characteristics of the top 372 products with most users in 2020. The categories listed in this file are part of LearnPlatform's product taxonomy.*

| Name                       | Description                                                                                                                                                                                                                                                                                                                    |
|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| LP ID                      | The unique identifier of the product                                                                                                                                                                                                                                                                                           |
| URL                        | Web Link to the specific product                                                                                                                                                                                                                                                                                               |
| Product Name               | Name of the specific product                                                                                                                                                                                                                                                                                                   |
| Provider/Company Name      | Name of the product provider                                                                                                                                                                                                                                                                                                   |
| Sector(s)                  | Sector of education where the product is used                                                                                                                                                                                                                                                                                  |
| Primary Essential Function | The basic function of the product. There are two layers of labels here. Products are first labeled as one of these three categories: LC = Learning & Curriculum, CM = Classroom Management, and SDO = School & District Operations. Each of these categories have multiple sub-categories with which the products were labeled |
|                            |                                                                                                                                                                                

In [None]:
products_df = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")
print(products_df.shape)
products_df.head()

In [None]:
msno.bar(products_df, sort="ascending", figsize=(10,5), fontsize=12,color='black')
plt.show()

### District information data

The district file ```districts_info.csv``` includes information about the **characteristics of school districts**, including data from 
- NCES (2018-19), 
- FCC (Dec 2018), and 
- Edunomics Lab. 

Steps taken to preserve Privacy  
- Identifiable information about the school districts has been removed. 
- An open source tool ARX (Prasser et al. 2020) was used to transform several data fields and reduce the risks of re-identification. 

For data generalization purposes some data points are released with a range where the actual value falls under. Additionally, there are many missing data marked as 'NaN' indicating that the data was suppressed to maximize anonymization of the dataset.

| Name                   | Description                                                                                                                                                                                                                                                                              |
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| district_id            | The unique identifier of the school district                                                                                                                                                                                                                                             |
| state                  | The state where the district resides in                                                                                                                                                                                                                                                  |
| locale                 | NCES locale classification that categorizes U.S. territory into four types of areas: City, Suburban, Town, and Rural. See Locale Boundaries User's Manual for more information.                                                                                                          |
| pct_black/hispanic     | Percentage of students in the districts identified as Black or Hispanic based on 2018-19 NCES data                                                                                                                                                                                       |
| pct_free/reduced       | Percentage of students in the districts eligible for free or reduced-price lunch based on 2018-19 NCES data                                                                                                                                                                              |
| countyconnectionsratio | ratio (residential fixed high-speed connections over 200 kbps in at least one direction/households) based on the county level data from FCC From 477 (December 2018 version). See FCC data for more information.                                                                         |
| pptotalraw             | Per-pupil total expenditure (sum of local and federal expenditure) from Edunomics Lab's National Education Resource Database on Schools (NERD$) project. The expenditure data are school-by-school, and we use the median value to represent the expenditure of a given school district. |
                                                         

In [None]:
districts_df = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")
print(districts_df.shape)
districts_df.head()

In [None]:
msno.bar(districts_df, sort="ascending", figsize=(10,5), fontsize=12)

### Engagement data
The engagement data are aggregated at school district level, and each file in the folder ```engagement_data``` represents data from **one school district**. 

- The 4-digit file name represents ```district_id``` which can be used to link to district information in ```district_info.csv```. 

- The ```lp_id``` can be used to link to product information in ```product_info.csv```.

| Name             | Description                                                                                                    |
|------------------|----------------------------------------------------------------------------------------------------------------|
| time             | date in "YYYY-MM-DD"                                                                                           |
| lp_id            | The unique identifier of the product                                                                           |
| pct_access       | Percentage of students in the district have at least one page-load event of a given product and on a given day |
| engagement_index | Total page-load events per one thousand students of a given product and on a given day                         |

In [None]:
path='../input/learnplatform-covid19-impact-on-digital-learning/engagement_data'
all_files = glob.glob(path + "/*.csv")
li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    district_id = filename.split("/")[4].split(".")[0]
    df["district_id"] = district_id
    li.append(df)
    
engagement_df = pd.concat(li)

engagement_df = engagement_df.reset_index(drop=True)
print(engagement_df.shape)
engagement_df.head()

In [None]:
msno.bar(engagement_df, sort="ascending", figsize=(10,5), fontsize=12)

# EDA

In [None]:
f1=lambda x: x['Primary Essential Function'].split()[0] if (not pd.isnull(x['Primary Essential Function'])) else None
products_df['Pef']=products_df.apply(f1,axis=1)

In [None]:
fig=px.pie(products_df,names=products_df['Primary Essential Function'].value_counts().index,
           values=products_df['Primary Essential Function'].value_counts(),
           color_discrete_sequence= px.colors.sequential.Tealgrn_r,title='Primary Essential Function(Sub)')
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()

In [None]:
fig=px.pie(products_df,names=products_df['Pef'].value_counts().index,values=products_df['Pef'].value_counts(),
                   color_discrete_sequence= px.colors.sequential.Tealgrn_r,opacity=0.8,hole=0.3,title="Primary Essential Function",height=600,width=1000
           )
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()

In [None]:
fig=px.bar(products_df,x=products_df['Provider/Company Name'].value_counts()[:20],y=products_df['Provider/Company Name'].value_counts().index[:20],color=products_df['Provider/Company Name'].value_counts()[:20],
color_continuous_scale= px.colors.sequential.Tealgrn_r,title='Products per Provider/Company',labels={'x':'Count','y':'Provider/Company Name'},width=1200,height=800).update_yaxes(categoryorder='total ascending')

fig.show()

In [None]:
fig=px.bar(districts_df,x=districts_df['state'].value_counts(),y=districts_df['state'].value_counts().index,color=districts_df['state'].value_counts(),
color_continuous_scale = "tealgrn",title='Schools per State',labels={'x':'Count','y':'States'}).update_yaxes(categoryorder='total ascending')

fig.show()

In [None]:
fig=px.pie(districts_df,names=districts_df['locale'].value_counts().index,values=districts_df['locale'].value_counts(),
                   color_discrete_sequence= px.colors.sequential.Tealgrn_r,opacity=0.9,hole=0.3,title="Distribution of locales across Country",height=600,width=1000
           )
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()

### Merging Files


In [None]:
engagement_df['district_id']=engagement_df['district_id'].astype('int64')

In [None]:
prod_engagement_df=pd.merge(products_df,engagement_df,left_on='LP ID',right_on='lp_id')
prod_engagement_df.head()

In [None]:
district_engagement_df=pd.merge(districts_df,engagement_df)
district_engagement_df.head()

In [None]:
district_engagement_df.shape

In [None]:
prod_engagement_df['Sector(s)'].unique()

In [None]:
pct_dict={'[0, 0.2[': 0.1 ,'[0.2, 0.4[': 0.3,'[0.4, 0.6[':0.5,'[0.6, 0.8[': 0.7,'[0.8, 1[':0.9}
district_engagement_df['pct_black']=district_engagement_df['pct_black/hispanic'].map(pct_dict)
district_engagement_df['pct_free']=district_engagement_df['pct_free/reduced'].map(pct_dict)

In [None]:
expenditure_dict={'[4000, 6000[':5000,'[6000, 8000[':7000,'[8000, 10000[':9000,'[10000, 12000[':11000,'[12000, 14000[':13000,
                  '[14000, 16000[':15000,'[16000, 18000[':17000,'[18000, 20000[':19000,'[20000, 22000[':21000,'[22000, 24000[':23000,
                 '[32000, 34000[':33000}


district_engagement_df['avg_expenditure']=district_engagement_df['pp_total_raw'].map(expenditure_dict)

# 1. Does the expenditure in different locales affect the engagement index ?

In [None]:
px.bar(district_engagement_df.groupby('locale')['avg_expenditure'].mean(),title='Average Expenditure of Students based on locale',
       color=district_engagement_df.groupby('locale')['avg_expenditure'].mean().index,labels={'value':'Expenditure','locale':'Locale'})

In [None]:
px.bar(district_engagement_df.groupby('locale')['engagement_index'].mean(),title='Engagement Index of Students based on locale',
       color=district_engagement_df.groupby('locale')['engagement_index'].mean().index,labels={'value':'engagement_index','locale':'Locale'})

In [None]:
px.bar(district_engagement_df.groupby('locale')['pct_access'].mean(),
       color=district_engagement_df.groupby('locale')['pct_access'].mean().index,
       title='Percentage Access of Students based on locale',
       labels={'value':'Percentage Access','locale':'Locale'})

* For Rural regions the average expenditure is slightly high compared to other regions but the average engagement index and page loads of students from this locale is comparitively high.
* For City locale,the average expenditure is comparable to that of Suburbs but average percentage of students with page loads is comparably low compared to Suburbs
* For Town locale, even if the average expenditure per pupil is low the engagement index and average percentage of students with atleast one page loads is higher that suburbs and city

**Lets break the question into further subparts by considering the percentage of black/hispanic people and percentage of people eligible for free or reduced food prices**

# 2. How does percentage of black/hispanic people affect the engagement index and percentage access in general

In [None]:
px.bar(district_engagement_df.groupby('pct_black/hispanic')['avg_expenditure'].mean(),
      title=' Per-pupil Expenditure based on black/hispanic people',
      color=district_engagement_df.groupby('pct_black/hispanic')['avg_expenditure'].mean().index,
      labels={'value':'Per-pupil Expenditure'})

In [None]:
px.bar(district_engagement_df.groupby('pct_black/hispanic')['engagement_index'].mean(),
      title='Engagement Index based on black/hispanic people',
      color=district_engagement_df.groupby('pct_black/hispanic')['engagement_index'].mean().index,
      labels={'value':'Engagement Index'})

In [None]:
px.bar(district_engagement_df.groupby('pct_black/hispanic')['pct_access'].mean(),
      title='Percentage Access based on black/hispanic people',
      color=district_engagement_df.groupby('pct_black/hispanic')['pct_access'].mean().index,
      labels={'value':'Percentage Access'})

- We can see that average per-pupil expenditure almost similar across all groups
- The engagement index and percentage access or page-loads is highest for areas where the percentage of black/hispanic people is between 80-100%
- The engagement index and percentage access or page-loads is second highest for areas where the percentage of black/hispanic people is between 0-20%

# 3. How does percentage of people eligible for free/reduced food prices affect the engagement index and percentage access in general

In [None]:
px.bar(district_engagement_df.groupby('pct_free/reduced')['avg_expenditure'].mean(),
      title=' Per-pupil Expenditure based on pct_free/reduced people',
      color=district_engagement_df.groupby('pct_free/reduced')['avg_expenditure'].mean().index,
      labels={'value':'Per-pupil Expenditure'})

In [None]:
px.bar(district_engagement_df.groupby('pct_free/reduced')['engagement_index'].mean(),
      title='Engagement Index based on pct_free/reduced people',
      color=district_engagement_df.groupby('pct_free/reduced')['engagement_index'].mean().index,
      labels={'value':'Engagement Index'})

In [None]:
px.bar(district_engagement_df.groupby('pct_free/reduced')['pct_access'].mean(),
      title='Percentage Access based on pct_free/reduced people',
      color=district_engagement_df.groupby('pct_free/reduced')['pct_access'].mean().index,
      labels={'value':'Percentage Access'})

- We can see that average per-pupil expenditure almost similar across all groups
- The engagement index and percentage access or page-loads is highest for areas where the percentage of people eligible for free/reduced food prices is between 80-100% 
- The engagement index and percentage access or page-loads is second highest for areas where the percentage of people eligible for free/reduced food prices is between 0-20%

# 4. What is the average per-pupil expenditure of percentage of black/hispanic people across different locales and how does it affect the engagement index and the percentage access

In [None]:
px.bar(y=district_engagement_df.groupby(['locale','pct_black/hispanic'])['avg_expenditure'].mean().reset_index()['avg_expenditure'],
       x=district_engagement_df.groupby(['locale','pct_black/hispanic'])['avg_expenditure'].mean().reset_index()['locale'],
       color=district_engagement_df.groupby(['locale','pct_black/hispanic'])['avg_expenditure'].mean().reset_index()['pct_black/hispanic'],
       title='Average Expenditure of Students based on locale',
       labels={'x':'Locale','y':'Expenditure'},
       barmode="group",
       )

In [None]:
px.bar(y=district_engagement_df.groupby(['locale','pct_black/hispanic'])['engagement_index'].mean().reset_index()['engagement_index'],
       x=district_engagement_df.groupby(['locale','pct_black/hispanic'])['engagement_index'].mean().reset_index()['locale'],
       color=district_engagement_df.groupby(['locale','pct_black/hispanic'])['engagement_index'].mean().reset_index()['pct_black/hispanic'],
       title='Engagement index of Students based on locale and the percent of black and hispanic people',
       labels={'x':'Locale','y':'Engagement index'},
       barmode="group",
       )

In [None]:
px.bar(y=district_engagement_df.groupby(['locale','pct_black/hispanic'])['pct_access'].mean().reset_index()['pct_access'],
       x=district_engagement_df.groupby(['locale','pct_black/hispanic'])['pct_access'].mean().reset_index()['locale'],
       color=district_engagement_df.groupby(['locale','pct_black/hispanic'])['pct_access'].mean().reset_index()['pct_black/hispanic'],
       title='Percentage Access of Students based on locale and the percent of black and hispanic people',
       labels={'x':'Locale','y':'Percentage Access'},
       barmode="group",
       )

- ### Average per-pupil expenditure for different regions with different percentages of black/hispanic people seems to vary based on Locale. Lets look at some interesting observations
    * Average per-pupil expenditure in **city** tends to be high where black/hispanic students form 80-100% of the student population(**around 35% more than the 40-60% black/hispanic student cities**). And also in cities the average engagement index and page loads are high where black/hispanic students form about 80-100% of the population(around 200% more than the 40-60% black/hisp students cities). In general, though **the average per pupil expenditure is high in cities for 80-100% black/hispanic students by around 35-70% the engagement index seems to be around 200-300% higher**.
    * In **Rural** locales the average per-pupil expenditure is high where black/hispanic students are **0-20%** of student population. Here we see completely opposite trend to that of the cities. Also the **engagement index and percentage access seems to be considerably higher compared to other regions**. 
    * In both **cities and rural** regions the general trend of **engagement index and page loads or percentage access** seems to be **affected by the per-pupil expenditure**.
    * In Suburbs though the per-pupil expenditure seems similar the regions with less than 40% black/hispanic people have slightly better engagement than those with greater than 40% black/hispanic student population.
    * In town the average per-pupil expenditure of less than 20% black/hisp student population seems to be similar or slighly lower to that of 20-40% but the engagement of students with less than 20% black/hisp student population seems to be considerably higher than that of 20-40% black/hisp populated student regions

# 5. Is there any relationship between the percentage of people eligible for free or reduced food price and the average expenditure and how does it impact the engagement index

In [None]:
px.bar(y=district_engagement_df.groupby(['locale','pct_free/reduced'])['avg_expenditure'].mean().reset_index()['avg_expenditure'],
       x=district_engagement_df.groupby(['locale','pct_free/reduced'])['avg_expenditure'].mean().reset_index()['locale'],
       color=district_engagement_df.groupby(['locale','pct_free/reduced'])['avg_expenditure'].mean().reset_index()['pct_free/reduced'],
       title='Average Expenditure of Students of based on locale and the percentage of people eligible for free or reduced food',
       labels={'x':'Locale','y':'Expenditure'},
       barmode="group",
       )

In [None]:
px.bar(y=district_engagement_df.groupby(['locale','pct_free/reduced'])['engagement_index'].mean().reset_index()['engagement_index'],
       x=district_engagement_df.groupby(['locale','pct_free/reduced'])['engagement_index'].mean().reset_index()['locale'],
       color=district_engagement_df.groupby(['locale','pct_free/reduced'])['engagement_index'].mean().reset_index()['pct_free/reduced'],
       title='Engagement index of Students based on locale and the percent of people eligible for free or reduced food',
       labels={'x':'Locale','y':'Engagement index'},
       barmode="group",
       )

In [None]:
px.bar(y=district_engagement_df.groupby(['locale','pct_free/reduced'])['pct_access'].mean().reset_index()['pct_access'],
       x=district_engagement_df.groupby(['locale','pct_free/reduced'])['pct_access'].mean().reset_index()['locale'],
       color=district_engagement_df.groupby(['locale','pct_free/reduced'])['pct_access'].mean().reset_index()['pct_free/reduced'],
       title='Percentage Access of Students based on locale and the percent of people eligible for free or reduced food',
       labels={'x':'Locale','y':'Engagement index'},
       barmode="group",
       )

**We dont see any relationship between average expenditure and the engagement index based on the percentage of people eligible for free or reduced people. But some important points can be noted down here.**
- In City, the areas where people eligible for free/reduced food price is between 80-100%,the average engagement index and percentage access is very high.
- In town, the areas where people eligible for free/reduced food price is between 0-20%, the average engagement index and percentage access is comparbly high compared to other areas.

In [None]:
district_engagement_df.groupby(['state','locale','pct_black/hispanic'])['engagement_index'].mean()

In [None]:
sns.heatmap(district_engagement_df[['pct_black','pct_free','avg_expenditure']].corr(),annot=True)

**Percentage of black/hispanic people has good correlation with percentage of people eligible for free/reduced food prices**

In [None]:
px.sunburst(district_engagement_df.dropna(),path=['locale','pct_black/hispanic','pct_free/reduced'])

- It can be noted that percentage of people eligible for free or reduced food is related to percentage of black/hispanic people from the above chart

# 6. How are the students across different states and locales engaging 

In [None]:
px.bar(y=district_engagement_df.groupby('state')['engagement_index'].mean().reset_index()['engagement_index'],
      x=district_engagement_df.groupby('state')['engagement_index'].mean().reset_index()['state'],
      color=district_engagement_df.groupby('state')['engagement_index'].mean().reset_index()['engagement_index'],
       color_continuous_scale= px.colors.sequential.Tealgrn_r,title='Engagement Index per State',labels={'x':'State','y':'Engagement Index'}
      )

In [None]:
px.bar(y=district_engagement_df.groupby('state')['pct_access'].mean().reset_index()['pct_access'],
      x=district_engagement_df.groupby('state')['pct_access'].mean().reset_index()['state'],
      color=district_engagement_df.groupby('state')['pct_access'].mean().reset_index()['pct_access'],
       color_continuous_scale= px.colors.sequential.Tealgrn_r,title='Percentage Access per State',labels={'x':'State','y':'Percentage Access'}
      )

In [None]:
px.bar(y=district_engagement_df.groupby('state')['engagement_index'].mean().reset_index()['engagement_index'],
      x=district_engagement_df.groupby('state')['engagement_index'].mean().reset_index()['state'],
      color=district_engagement_df.groupby('state')['engagement_index'].mean().reset_index()['state'],
       color_continuous_scale= px.colors.sequential.Tealgrn_r
      )

In [None]:
fig=go.Figure()
states=list(district_engagement_df['state'].unique())
for region_name in states:
    region=district_engagement_df[district_engagement_df['state']==region_name]
    
    fig.add_trace(go.Scatter(x=region.groupby('time')['engagement_index'].mean().index,
                             y=region.groupby('time')['engagement_index'].mean(),name=region_name))
    

fig.add_vline(x='2020-03-11', line_width=3, line_color="red")
fig.add_vrect(x0="2020-06-1", x1="2020-08-31", 
              annotation_text="Summer Holidays", annotation_position="top left",
              fillcolor="green", opacity=0.25, line_width=0)

fig.update_layout(title="Engagement Index of Different States",yaxis_title="Engagement Index",xaxis_title="Date")
fig.show()

In [None]:
fig=go.Figure()
states=list(district_engagement_df['state'].unique())
for region_name in states:
    region=district_engagement_df[district_engagement_df['state']==region_name]
    
    fig.add_trace(go.Scatter(x=region.groupby('time')['pct_access'].mean().index,
                             y=region.groupby('time')['pct_access'].mean(),name=region_name))
    

fig.add_vline(x='2020-03-11', line_width=3, line_color="red")
fig.add_vrect(x0="2020-06-1", x1="2020-08-31", annotation_text="Summer Holidays", annotation_position="top left",
              fillcolor="green", opacity=0.25, line_width=0)

fig.update_layout(title="Percentage Access of Different States",yaxis_title="Percentage Access",xaxis_title="Date")

fig.show()

# 7. How are the students with different percentage of black/hispanic students and percentage of students eligible for free/reduced food prices engaging throughout the year  

In [None]:
fig=px.line(y=district_engagement_df.groupby(['time','pct_black/hispanic'])['engagement_index'].mean().reset_index()['engagement_index'],
        x=district_engagement_df.groupby(['time','pct_black/hispanic'])['engagement_index'].mean().reset_index()['time'],
        color=district_engagement_df.groupby(['time','pct_black/hispanic'])['engagement_index'].mean().reset_index()['pct_black/hispanic'])

fig.add_vline(x='2020-03-11', line_width=3, line_color="red")
fig.add_vrect(x0="2020-06-1", x1="2020-08-31", annotation_text="Summer Holidays", annotation_position="top left",
              fillcolor="green", opacity=0.25, line_width=0)

fig.update_layout(title="Engagement Index based on Black/Hispanic Students",yaxis_title="Engagement Index",xaxis_title="Date")


In [None]:
fig=px.line(y=district_engagement_df.groupby(['time','pct_black/hispanic'])['pct_access'].mean().reset_index()['pct_access'],
        x=district_engagement_df.groupby(['time','pct_black/hispanic'])['pct_access'].mean().reset_index()['time'],
        color=district_engagement_df.groupby(['time','pct_black/hispanic'])['pct_access'].mean().reset_index()['pct_black/hispanic'])

fig.add_vline(x='2020-03-11', line_width=3, line_color="red")
fig.add_vrect(x0="2020-06-1", x1="2020-08-31", annotation_text="Summer Holidays", annotation_position="top left",
              fillcolor="green", opacity=0.25, line_width=0)

fig.update_layout(title="Percentage Access based on Black/Hispanic Students",yaxis_title="Percentage Access",xaxis_title="Date")


In [None]:
fig=px.line(y=district_engagement_df.groupby(['time','pct_free/reduced'])['engagement_index'].mean().reset_index()['engagement_index'],
        x=district_engagement_df.groupby(['time','pct_free/reduced'])['engagement_index'].mean().reset_index()['time'],
        color=district_engagement_df.groupby(['time','pct_free/reduced'])['engagement_index'].mean().reset_index()['pct_free/reduced'])

fig.add_vline(x='2020-03-11', line_width=3, line_color="red")
fig.add_vrect(x0="2020-06-1", x1="2020-08-31", annotation_text="Summer Holidays", annotation_position="top left",
              fillcolor="green", opacity=0.25, line_width=0)

fig.update_layout(title="Engagement Index based on pct_free/reduced Students",yaxis_title="Engagement Index",xaxis_title="Date")


In [None]:
fig=px.line(y=district_engagement_df.groupby(['time','pct_free/reduced'])['pct_access'].mean().reset_index()['pct_access'],
        x=district_engagement_df.groupby(['time','pct_free/reduced'])['pct_access'].mean().reset_index()['time'],
        color=district_engagement_df.groupby(['time','pct_free/reduced'])['pct_access'].mean().reset_index()['pct_free/reduced'])

fig.add_vline(x='2020-03-11', line_width=3, line_color="red")
fig.add_vrect(x0="2020-06-1", x1="2020-08-31", annotation_text="Summer Holidays", annotation_position="top left",
              fillcolor="green", opacity=0.25, line_width=0)

fig.update_layout(title="Percentage Access based on pct_free/reduced Students",yaxis_title="Percentage Access",xaxis_title="Date")


# 8. Engagement Index and Percentage Access across different Sectors throughout the Year

In [None]:
fig=px.line(y=prod_engagement_df.groupby(['time','Sector(s)'])['engagement_index'].mean().reset_index()['engagement_index'],
        x=prod_engagement_df.groupby(['time','Sector(s)'])['engagement_index'].mean().reset_index()['time'],
        color=prod_engagement_df.groupby(['time','Sector(s)'])['engagement_index'].mean().reset_index()['Sector(s)'])

fig.add_vline(x='2020-03-11', line_width=3, line_color="red")
fig.add_vrect(x0="2020-06-1", x1="2020-08-31", annotation_text="Summer Holidays", annotation_position="top left",
              fillcolor="green", opacity=0.25, line_width=0)

fig.update_layout(title="Engagement based on Sectors",yaxis_title="Engagement Index",xaxis_title="Date")

In [None]:
fig=px.line(y=prod_engagement_df.groupby(['time','Sector(s)'])['pct_access'].mean().reset_index()['pct_access'],
        x=prod_engagement_df.groupby(['time','Sector(s)'])['pct_access'].mean().reset_index()['time'],
        color=prod_engagement_df.groupby(['time','Sector(s)'])['pct_access'].mean().reset_index()['Sector(s)'])

fig.add_vline(x='2020-03-11', line_width=3, line_color="red")
fig.add_vrect(x0="2020-06-1", x1="2020-08-31", annotation_text="Summer Holidays", annotation_position="top left",
              fillcolor="green", opacity=0.25, line_width=0)

fig.update_layout(title="Percentage Access based on  Students",yaxis_title="Percentage Access",xaxis_title="Date")

In [None]:
fig=go.Figure()
fig.add_trace(go.Scatter(x=district_engagement_df[district_engagement_df['locale']=='Suburb'].groupby(['time'])['pct_access'].mean().index,
                      y=district_engagement_df[district_engagement_df['locale']=='Suburb'].groupby(['time'])['pct_access'].mean(),mode='lines',
                      name='Suburb'))

fig.add_trace(go.Scatter(x=district_engagement_df[district_engagement_df['locale']=='Rural'].groupby(['time'])['pct_access'].mean().index,
                      y=district_engagement_df[district_engagement_df['locale']=='Rural'].groupby(['time'])['pct_access'].mean(),mode='lines',
                      name='Rural'))

fig.add_trace(go.Scatter(x=district_engagement_df[district_engagement_df['locale']=='City'].groupby(['time'])['pct_access'].mean().index,
                      y=district_engagement_df[district_engagement_df['locale']=='City'].groupby(['time'])['pct_access'].mean(),mode='lines',
                      name='City'))

fig.add_trace(go.Scatter(x=district_engagement_df[district_engagement_df['locale']=='Town'].groupby(['time'])['pct_access'].mean().index,
                      y=district_engagement_df[district_engagement_df['locale']=='Town'].groupby(['time'])['pct_access'].mean(),mode='lines',
                      name='Town'))

fig.add_vline(x='2020-03-11', line_width=3, line_color="red")
fig.add_vrect(x0="2020-06-1", x1="2020-08-31", annotation_text="Summer Holidays", annotation_position="top left",
              fillcolor="green", opacity=0.25, line_width=0)

fig.update_layout(title='Access of Different locales',
                   xaxis_title='Month',
                   yaxis_title='Percentage of Access')

fig.show()

In [None]:
state=pd.DataFrame(district_engagement_df.groupby('state')['engagement_index'].mean())
state['pct_access']=district_engagement_df.groupby('state')['pct_access'].mean()

values=pd.DataFrame(data=districts_df['state'].value_counts(),index=districts_df['state'].value_counts().index)
values.index.name='state'
values.columns=['Count']

state_info=pd.concat((state,values),axis=1)

fig=px.scatter_3d(state_info,x='pct_access' ,y='engagement_index',z='Count',
              color=state_info.index,labels={'pct_access':'Pct Access','engagement_index':'Engagement Index','Count':'Count'},
                  title='Engagement Index and Access of different states')

fig.show()

- For Connecticut and Utah which have more than 25 schools but the average Engagement Index and average Percentage Access is low
- Arizona, North Dakota we have information about few schools but the engagement index and percentage access is high

**References:**
* Used [😷COVID-19 Impact on Digital Learning💻: EDA + W&B](https://www.kaggle.com/ruchi798/covid-19-impact-on-digital-learning-eda-w-b) by Ruchi Bhatia as reference for reading the datasets


Please Upvote 👍 if  you like the notebook