# COVID-19 Impact on Digital Learning

**Summary**

This report represents my analysis of Covid-19 impact on digital learning (LearnPlatform). The data in my possession enabled me to highlight the following three essential points on student engagement.
- Students' engagement in digital learning is best if educational technology (Google Docs, Google Meet, etc.) allows students to have easy and reliable internet access connectivity. And also, it goes to the basic needs of the students and the teachers.
- The socio-economic aspect is the main problem which can decrease the engagement of the pupils because a parent who has a wage loss during the confinement of the population and the closure of economic activities can lead him to a poverty, to an expulsion of home, abrupt stopping or absenteeism of his child in digital learning.
- To reduce the impact of Covid-19 on the socio-economic life, some American states have preferred to increase the minimum hourly wage and to reopen certain economic activities.To solve the problem of eviction from home, some American states have chosen to give moratoriums on evictions to homeowners (some are still active; others are expired). Thanks to this, we observe an improvement in student engagement in some states.

## Introduction

The Covid-19 pandemic has affected education systems around the world, leading to the near total closure of schools, early childhood education and care services, universities and colleges. Most governments decided to temporarily shut down an educational institution in an attempt to reduce the spread of covid-19. As of January 2021, around 825 million learners are affected due to school closures in response to the pandemic.

Therefore, UNESCO has recommended the use of distance learning programs and open educational apps and platforms that schools and teachers can use to reach distance learners and limit disruption to education.

![unicef](https://s32152.pcdn.co/wp-content/uploads/2020/10/Digital-Learning_South-Africa-UNI363429.jpg)

### The consequences of Covid-19 on education in the United States
1. In March 2020, schools in the United States began to close.
2. 55.1 million students in 124,000 US public and private schools are affected by school closures.
3. The effect of the generalized closure of schools has been felt nationwide and has worsened several social inequalities, gender, technology, academic success, etc ...
4. Most schools shifted to online learning that lead an another problem like unequal access to technology, unequal access to educational resource, absenteeism of students and accomodation for special needs student, etc...

The objective of this notebook is to better understand and measure the scope and impact of the pandemic on digital learning of students based on the two challenges below.

- state of digital learning 2020
- the link between engagement in digital learning and factors such as district demographics, broadband access, and state / national policies and events.

In this notebook, we are talking about:

- **[School closures in the worldwide](#closures)**: 
    - **[USA vs ROW](#usr)**
    - **[USA school closure](#usc)**
- **[Challenge of the digital learning in USA](#challenge)**
    - **[Part I: district study](#dist)**
    - **[Part I: product study](#prod)**
    - **[The picture of digital connectivity and engagement in 2020](#digi2020)**
    - **[The effect of the covid-19 pandemic on online and distance](#effect)**
    - **[Student engagement and type of education technology](#student)**
    - **[Student engagement and geographic, demographic](#socio)**
- **[Student engagement and socio-econimic status](#status)**
    - **[Impact of Covid-19 on socio-économic](#covid)**
- **[State intervention, practice or policy.](#inter)**
    - **[Closing and reopening of bars, restaurants, casinos, gym, etc.](#clo)**
    - **[Moratoriums on evictions](#mora)**
    
- **[Conclusion](#conc)**
    



### Source: 
- [Impact of COVID-19 on education in the United States](https://en.wikipedia.org/wiki/Impact_of_COVID-19_on_education_in_the_United_States)
- [UNESCO School Closures](https://data.humdata.org/showcase/unesco-school-closures)
- [Global School Closures COVID-19](https://data.humdata.org/m/dataset/global-school-closures-covid19)
- [US school closure & distance learning database](https://osf.io/tpwqf/files/)
- [EuropeanSchoolnet](http://www.eun.org/news/detail?articleId=4993184)
- [Parolin, Z., Lee, E.K. (2021) Large socio-economic, geographic and demographic disparities exist
in exposure to school closures. Nature Human Behaviour](https://www.nature.com/articles/s41562-021-01087-8)
- [South Africa: COVID-19, schools reopening and digital learning](https://blogs.unicef.org/blog/south-africa-covid-19-schools-reopening-digital-learning/)
- [ COVID-19 US State Policy database](https://www.openicpsr.org/openicpsr/project/119446/version/V75/view;jsessionid=851ECB80E6CB42252D396C29564184DC)


In [None]:
import matplotlib 
import seaborn as sns
import statsmodels as sm
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopy as gpy
import geopandas as gpd
from warnings import filterwarnings
import os

In [None]:
filterwarnings('ignore')
plt.style.use('fivethirtyeight')

In [None]:
for root, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(root, filename))

In [None]:
!pip install xlrd
!pip install openpyxl

<a id = "closures"></a>

# I/ School closures in the worldwide 

In [None]:
file1 = '/kaggle/input/school-closures/covid_impact_education.csv'
file2 = '/kaggle/input/school-closures/COVID_19_Education data on school closures - Schools data.csv'
file3 = '/kaggle/input/school-closures/COVID-19 US state policy database 8_3_2021.xlsx'
file4 = '/kaggle/input/school-closures/duration_school_closures.csv'
file5 = '/kaggle/input/school-closures/school-closure-evolution.xlsx'

In [None]:
covidImpEduc = pd.read_csv(file1)# covid-19 impact education data
educSchClos = pd.read_csv(file2)# COVID_19_Education data on school closures - Schools data 
usPolicy = pd.read_excel(file3) # COVID-19 US state policy database 8_3_2021
durationSchcl = pd.read_csv(file4) # duration_school_closures
schCloEvo = pd.read_excel(file5) # school-closure-evolution

In [None]:
covidImpEduc.head(2)

In [None]:
educSchClos.head(2)

In [None]:
usPolicy.head(2)

In [None]:
durationSchcl.head(2)

In [None]:
schCloEvo.head(2)

**Definitions attributes**

- **Full school closures** refer to situations where all schools were closed at the nation-wide level due to COVID-19.
- **Partial school closures** refer to school closures in some regions or for some grades, or with reduced in-person instruction.

In [None]:
#missing values
educSchClos.isnull().sum()[educSchClos.isnull().sum()>0]

ISO3 country have missing value, we can remove it. And also remove column Date from source and also Date update done.

In [None]:
educSchClos.dropna(inplace=True)

In [None]:
educSchClos.drop([educSchClos.columns[5], educSchClos.columns[6]], axis=1, inplace=True)

<a id="usr"></a>

## I.1/ USA and Row

In this section, we are going to explore, analyse and visualize data.

### Covid-19 impact on education.

**Definition**

- Closed due to COVID-19: Government-mandated closures of educational institutions affecting most or all of the student population enrolled from pre-primary through to upper secondary levels [ISCED levels 0 to 3]. In most cases, various distance learning strategies are deployed to ensure educational continuity.
- Academic break: Most schools across the country are on scheduled academic breaks. All study during this period is suspended.
Fully open: For the majority of schools, classes are being held exclusively in person, noting that measures to ensure safety and hygiene in schools vary considerably from context to context and/or by level of education.
- Partially open: Schools are : (a) open/closed in certain regions only; and/or (b) open/closed for some grade levels/age groups only; and/or (c) open but with reduced in-person class time, combined with distance learning (hybrid approach).

In [None]:
print(f"The number of country in this data are: {covidImpEduc['Country'].nunique()}.")

In [None]:
print(f"Number of status is: {covidImpEduc['Status'].nunique()}  are: {covidImpEduc['Status'].unique()}")

In [None]:
plt.figure(figsize=(15,5))
gc = sns.countplot(x='Status', data=covidImpEduc)
plt.title(f"Covid-19 impact on education from 2020-02-17 to 2021-07-31.", fontsize=18)
plt.show()

Fully open and Academic break are majorities in this data. Between 2020-02-17 to 2021-07-31, we are going to see how different countries, Covid-19 impact education.

In [None]:
#initialize
country_status = {}
name_ctry = covidImpEduc.ISO.unique()# name of the countries

In [None]:
for u in covidImpEduc.Status.unique().tolist():# for each status
    row = covidImpEduc[covidImpEduc.Status == u]# we select data.
    #we want to get table such that columns is status and index is the country 
    country_status[u] = {c:row[row.ISO == c].ISO.value_counts().values[0] if c in row.ISO.unique()
                        else 0 for c in name_ctry}

In [None]:
#Dataframe our data
country_status = pd.DataFrame(country_status)

In [None]:
#This table shows how many each status appear in each country between 2020-02-17 to 2021-07-31.
#For example, YEMEN have fully open school 29 times, partially open school 6 times, close school due to
#Covid-19 14 times and Academic break 54 times with permutation.
country_status.tail(2)

### Status map
We create new data to map the different status.

In [None]:
#load world earth data
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

In [None]:
#merge it with country_status
statusMap = world.merge(country_status, right_on=country_status.index, left_on='iso_a3', suffixes=('', ''))

In [None]:
colors = ['Greens', 'Blues', 'OrRd', 'cividis_r']
fig = plt.figure(figsize=(15,20), dpi=150)
for i, u in enumerate(covidImpEduc.Status.unique().tolist()):
    plt.style.use('ggplot')
    ax = fig.add_subplot(4, 1, i+1)
    plt.suptitle("Global school status between 2020-02-17 to 2021-07-31.", fontsize=20)
    statusMap.plot(cmap=colors[i], column=u, legend=True, scheme='quantiles', k=5, ax=ax)
    ax.set_title(f'School status: {u}', fontsize=15)
plt.show()

According to the evolution of the pandemic between the two dates given on the graph, we see that the countries tends to adopt the four situations and others not.

For example, the majorities of African countries have fully opened their schools over a very long period. Others have completely closed schools because of Covid-19 to continue digital learning, for example Mexico, Brazil and India.

For USA, we see that the four situations are present despite the partial opening of schools and the end of academic teaching are majorities. You should know that in USA, it is the states that defines their type of strategy on the issue of the impact of covid-19 on education.

### COVID-19 Education data on school closures

In [None]:
educSchClos.replace(['Partially Open', 'fully open'], ['Fully open', 'Partially open'], inplace=True)

In [None]:
#some data are recorded in this format 16,500,100 but pandas does not recognize it.
#we create function convertString2Int to fix it.
def convertString2Int(row):
    if row == '--' or row == '..':
        return 0
    else:
        return int("".join(u for u in str(row).split(',')))

In [None]:
cols = educSchClos.columns# columns
#apply function 
educSchClos[cols[2]] = educSchClos[cols[2]].apply(convertString2Int)
educSchClos[cols[3]] = educSchClos[cols[3]].apply(convertString2Int)

In [None]:
#define 
fully_open = educSchClos[educSchClos['Status of school closures']=='Fully open']
closed = educSchClos[educSchClos['Status of school closures']=='Closed due to COVID-19']
partial_open = educSchClos[educSchClos['Status of school closures']=='Partially open']
aca_break = educSchClos[educSchClos['Status of school closures'] =='Academic break']

In [None]:
#remove duplicated
fully_open.drop_duplicates(inplace=True,subset=educSchClos.columns)
closed.drop_duplicates(inplace=True,subset=educSchClos.columns)
partial_open.drop_duplicates(inplace=True,subset=educSchClos.columns)
aca_break.drop_duplicates(inplace=True,subset=educSchClos.columns)

#### Fully Open school
Majorities of the countries have chosen social distancing and hygiene in the schools. we prefer to see it in the map.

In [None]:
map_fullOpen = world.merge(fully_open, right_on='ISO3 country',  left_on='iso_a3', suffixes=('', ''))

In [None]:
fig1 = plt.figure(figsize=(15,20), dpi=150)
for i, u in enumerate([cols[2], cols[3]]):
    ax = fig1.add_subplot(2, 1, i+1)
    plt.suptitle("Fully open school.", fontsize=20)
    map_fullOpen.plot(cmap='Greens', column=u, legend=True, scheme='quantiles', k=3, ax=ax)
    ax.set_title(f'{u}', fontsize=15)
plt.show()

You should know that the countries that have fully opened their schools are those who have succeeded in controlling the level of spread of Covid-19 i.e. the recovery rate is higher than the transmission rate with a very low case fatality rate for example Cameroon, China, Japan, etc ...

In [None]:
print(f'Total {cols[2]} for Fully open school in the world are: {fully_open[cols[2]].sum()}.')

In [None]:
print(f'Total {cols[3]} for Fully open school in the world are: {fully_open[cols[3]].sum()}.')

#### Partially open school
Here, many countries have partially open school in some region and opt digital learning.

In [None]:
map_par = world.merge(partial_open, right_on='ISO3 country',  left_on='iso_a3', suffixes=('', ''))


In [None]:
fig1 = plt.figure(figsize=(15,20),dpi=150)
for i, u in enumerate([cols[2], cols[3]]):
    ax = fig1.add_subplot(2, 1, i+1)
    plt.suptitle("Partially open school.", fontsize=20)
    map_par.plot(cmap='OrRd', column=u, legend=True, scheme='quantiles', k=3, ax=ax)
    ax.set_title(f'{u}', fontsize=15)
plt.show()

In [None]:
print(f'Total {cols[2]} for Partially open school in the world are: {partial_open[cols[2]].sum()}.')

In [None]:
print(f'Total {cols[3]} for Partially open school in the world are: {partial_open[cols[3]].sum()}.')

#### Closed due to Covid-19
Countries without control over the evolution of the pandemic find themselves constrained to close their schools and then use digital learning as recommended by UNESCO. This situation can be changed.

In [None]:
plt.style.use('fivethirtyeight')
closed.plot(x=cols[1], y=[cols[2], cols[3]], kind='barh', subplots=True, figsize=(10, 20),
            sharex=True, sharey=True, logx=True)
plt.show()

In [None]:
print(f'Total {cols[2]} for Close school in the world are: {closed[cols[2]].sum()}.')

In [None]:
print(f'''Total {cols[3]} for Close school in the world are: {closed[cols[3]].sum()}.''')

#### Academics break

In [None]:
aca_break

### Duration school closures
We present the duration school closures of each country in the map.

In [None]:
plt.figure(figsize=(10,5), dpi=150)
sns.heatmap(durationSchcl.corr(), center=0, annot=True)
plt.show()

We choose a four last columns. Because it is period of school and strong correlate with others features.

In [None]:
school = durationSchcl[['ISO', 'Country']+list(durationSchcl.columns[5:])]

In [None]:
#merge it with country_status
geoSchool = world.merge(school, right_on='ISO', left_on='iso_a3', suffixes=('', ''))

<a id="sch"></a>

In [None]:
colors = ['Reds', 'Blues', 'OrRd', 'cividis_r']
figx = plt.figure(figsize=(20,20), dpi=200)
#plt.style.use('ggplot')
for i, u in enumerate(school.columns[2:]):
    ax = figx.add_subplot(4, 1, i+1)
    plt.suptitle("Duration school closures.", fontsize=20)
    geoSchool.plot(cmap=colors[i], column=u, legend=True, scheme='quantiles', k=3, ax=ax)
    ax.set_title(f'School period: {u}', fontsize=15)
plt.show()

During the school period (Mar-Aug 2020) on time of the peak of pandemic (in March, Europe was the main focus with Italy), USA had not completely closed its schools as well as this school year Sep 2020-Jun 2021. USA have opted for the partial closure of schools between March-August 2020 with a duration of 19 weeks. Unfortunately, this partial closure increases by 20 more weeks (see table below) i.e. a total of 39 weeks. This is the cause of the 3rd or 4th waves and the delta variant also that the inefficient of vaccine against disease.

Between the Sep 2020-Jun 2021 school year, many countries are using digital learning recommended by UNESCO. Therefore, many countries have undergone third or fourth wave and variant Covid-19 for example, USA(Cal.C20), Brazil(P.4), India (hybrid variant), England(Delta) etc ...

N.B: The duration of school closure depends on the pandemic control of the country.

<a id="usc"></a>

## I.2/ USA school closure
In this section, we are studying the school closure evolution and see school closure in each state.

In [None]:
us = usPolicy[['STATE','POSTCODE', 'CLSCHOOL']]

#### The date a state closed K-12 public schools statewide

In [None]:
us.drop(index=[0,1,2,3], inplace=True)

In [None]:
state = {}
for u in us.CLSCHOOL.unique().tolist():# for each date
    row = us[us.CLSCHOOL == u]# we select data.
    #we want to get table such that columns is date
    state[str(u)] = len(row['STATE'].tolist())

In [None]:
#sorted 
temp = {}
for u in sorted(state.keys()):
    temp[u] = state[u]
state = temp

<a id="k12"></a>

In [None]:
plt.figure(figsize=(9,8), dpi=180)
plt.pie(list(state.values()), labels=list(state.keys()), shadow=True, startangle=180, autopct='%1.1f%%',
       explode=(0,0.1,0,0,0,0,0,0,0,0,0))
plt.suptitle('The date a state closed K-12 public schools statewide.', fontsize=18)
plt.axis('equal')
plt.show()

Only 41.2% of the states had chosen to close school on 2020-03-16 00:00:00. And the next day, 17.6% of the states had also chosen to close school.

### School closure evolution

In [None]:
usa = schCloEvo[schCloEvo.Country == 'United States of America'][['Date', 'Status']]

We consider

- 1 = Full open
- 0.5 = Partially open
- 0 = Academic break

In [None]:
#Labelize 
def labelizer(row):
    if row == 'Fully open':
        return 1
    elif row == 'Partially open':
        return 0.5
    else:
        return 0

In [None]:
usa['Label_status'] = usa['Status'].apply(labelizer)

In [None]:
usa.plot(x='Date', y='Label_status', figsize=(15,5), title='School closure evolution in USA..')
plt.show()

This graph gives us a square signal with a period to be determined. The periodic form of the signal may be due to the partial opening of schools and academic shutdown of each state and each day.

In [None]:
#load package
from statsmodels.tsa import seasonal  

In [None]:
label_status = usa.set_index('Date')['Label_status']
season = seasonal.seasonal_decompose(label_status, period=90)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(3,1, figsize=(15,15))
season.trend.plot(ax=ax1)
season.seasonal.plot(ax=ax2)
season.resid.plot(ax=ax3)
plt.show()

By seeing this graph, we can tell that the school closure status changes after 3 months according to the pandemic situation.

Despite the spread of Covid-19 in the world, each country try to control the disruptions to  education; by trying partially open school. UNESCO recommend each country to use online learning or distance learning but this solution have huge challenge that we discover below.

<a id="challenge"></a>

# II/ The challenges of the digital learning in USA

<a id="dist"></a>

# Part I: district study

in this part, we study a district csv file.

In [None]:
file6='/kaggle/input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv'

In [None]:
district = pd.read_csv(file6)

In [None]:
district.head()

In [None]:
district.tail()

In [None]:
district.info()

In [None]:
district.isnull().sum()[district.isnull().sum()>0]

In [None]:
#We remove NAN from state column because we have not information. 
district.dropna(subset=['state'], inplace=True)

In [None]:
from sklearn.impute import SimpleImputer

In [None]:
imputer = SimpleImputer(strategy = "most_frequent")

In [None]:
cols_with_missing = [col for col in district.columns if district[col].isnull().any()]

In [None]:
imputed_data = imputer.fit_transform(district[cols_with_missing])

In [None]:
district[cols_with_missing] = imputed_data

## Visualization

In [None]:
fig, axis1 = plt.subplots(1, 1, figsize=(5,10), dpi=180)
sns.countplot(y='state', data=district, ax=axis1, color='blue')
plt.suptitle('State distribution.', fontsize=18)
plt.show()

In [None]:
print(f"Mode is {district['state'].mode().values[0]}")

In [None]:
#We create function for plotting
def categoricalPlot(y1=None, y2=None, hue=None, data=None, title=None):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 12), dpi=200)
    plt.suptitle(title, fontsize=18)
    sns.countplot(y=y1, hue=hue, data=data, ax=ax1)
    ax1.set_title('States', fontsize=18)
    sns.countplot(x=y2, hue=hue, data=data, ax=ax2)
    ax2.set_title('Locale', fontsize=18)
    plt.show()

In [None]:
categoricalPlot(y1='state', y2='pct_black/hispanic', hue='locale', data=district,
                title='Percentage of students identified as black/hispanic.')

In [None]:
categoricalPlot(y1='state', y2='pct_free/reduced', hue='locale', data=district,
                title='Percentage of students eligible for free or reduced-price lunch.')

In [None]:
categoricalPlot(y1='state', y2='county_connections_ratio', hue='locale', data=district,
                title='Residential fixed high-speed connections over 200 kbps.')

22 states gives connection ratio to residents between 0.18 and 1. Only Rural locate (North Dakota state) give connection ratio between 1 and 2.

In [None]:
fig, (ax11, ax12) = plt.subplots(nrows=1, ncols=2, figsize=(15, 10), dpi=180)
sns.countplot(y='pp_total_raw', hue='locale', data=district, ax=ax11)
sns.countplot(y='pp_total_raw', hue='pct_free/reduced', data=district, ax=ax12)
plt.suptitle('Per-pupil total expenditure by each category. ', fontsize=20)
plt.show()

In [None]:
_, (ax21, ax22) = plt.subplots(nrows=1, ncols=2, figsize=(15, 10), dpi=180)
sns.countplot(y='pp_total_raw', hue='pct_black/hispanic', data=district, ax=ax21)
sns.countplot(y='pp_total_raw', hue='county_connections_ratio', data=district, ax=ax22)
plt.suptitle('Per-pupil total expenditure by each category. ', fontsize=20)
plt.show()

Majorities of student expends between 8000 and 10000 US dollars.

<a id="black"></a>

In [None]:
plt.figure(figsize=(15,5))
sns.countplot(hue='pct_black/hispanic', x='county_connections_ratio', data=district)
plt.title('The black/hispanic students internet connectivity.')
plt.show()

This graph shows that many black or hispanic student have connection ratio in their resident between 0.18 and 1. And only 20% of black/hispanic student have connection ratio between 1 and 2.

In [None]:
plt.figure(figsize=(15,5))
sns.countplot(hue='pct_black/hispanic', x='pct_free/reduced', data=district)
plt.show()

### EDA

In [None]:
#We create function to take minimun value from interval
def minimun_value(row):
    row1 = row.split(',')# split by ','
    a = row1[0].split('[')[1]# split 
    #b = row1[1].split('[')[0]
    return float(a) #+float(b))/2

In [None]:
data = pd.DataFrame()

In [None]:
for u in ['pct_black/hispanic', 'pct_free/reduced', 'county_connections_ratio', 'pp_total_raw']:
    data[u] = district[u].apply(minimun_value)

In [None]:
data.tail()

In [None]:
#corelation
data.corr()

pct_black/hispanic and pct_free/reduced seems to trend together.i.e majorities of students in the districts eligible for free or reduced-price lunch seems to be a black/hispanic student.

<a id = "prod"></a>

# Part I: product study

In [None]:
file7 = "/kaggle/input/learnplatform-covid19-impact-on-digital-learning/products_info.csv"

In [None]:
product = pd.read_csv(file7)

In [None]:
product.head(3)

In [None]:
product.info()

## EDA & Visualization

In [None]:
print('------------ Inventory -------------')
print(f"Total Company: {product['Provider/Company Name'].nunique()}.")
print(f"Total Sector: {product['Sector(s)'].nunique()}.")
print(f"Total Function: {product['Primary Essential Function'].nunique()}.")
print(f"Total product: {product['Product Name'].nunique()}.")

In [None]:
#count value
company_name = product['Provider/Company Name'].value_counts().sort_values(ascending = True)
sector = product['Sector(s)'].value_counts()
pessfunc = product['Primary Essential Function'].value_counts() 

<a id="prov"></a>

In [None]:
#company name with counts > 1
company_name[company_name>1].plot(kind='barh', figsize=(10,15),
                                  title=' Name of the product provider (counts>1).')
for i, u in enumerate(company_name[company_name>1]):
    plt.text(u, i, str(u), bbox=dict(facecolor='yellow', alpha=0.5), fontsize=15, ha='left')
plt.xlabel('counts')
plt.ylabel('Company')
plt.show()

By this graph, Google LLC is the provider most represented in data 30 products; follows by four others provider which are Houghton Mifflin Harcourt 6 products, Microsoft products, Learning A-Z 4 products, IXL Learning 4 products.

In [None]:
sector.plot(kind='bar', figsize=(15,5))
for i, u in enumerate(sector):
    plt.text(i, u, str(u), bbox=dict(facecolor='green', alpha=0.5), fontsize=15, ha='center')
plt.ylabel('counts')
plt.title("Sectors", fontsize=18)
plt.show()

170 products in PreK-12 sector. 115 products in PreK-12; Higer Ed and Corporate sector. 65 products in PreK-12 and Higher Ed.

In [None]:
pessfunc[pessfunc>5].sort_values(ascending=True).plot(kind='barh', figsize=(10, 15))
plt.title('Primary Essential Function(counts>5)', fontsize=18)
for i, u in enumerate(pessfunc[pessfunc>5].sort_values(ascending=True)):
    plt.text(u, i, str(u), bbox=dict(facecolor='yellow', alpha=0.5), fontsize=15, ha='left')
plt.xlabel('counts')
plt.show()

74 products have basic function LC-Digital Learning Platform, 47 products have basic function LC-Sites, Ressources and Reference, 36 products have basic function LC-Content Creation and Curation, 25 products have basic function LC-study Tools.

## Company exploration and visualization
We are going to study a first five companies (Google LLC, Houghton Mifflin Harcourt, Microsoft, Learning A-Z, IXL Learning) more represented in the data. To do so, we are find

- how many product provide company?
- in what sector are products?
- what is its function?

### GOOGLE LLC company

In [None]:
google = product[product['Provider/Company Name'] == 'Google LLC']

In [None]:
google.head(2)

In [None]:
print(f"Google LLC provide {google['Product Name'].nunique()} products\n\nWho are:\n{google['Product Name'].tolist()}")

In [None]:
print(f"The sectors where google provides its product: {product['Sector(s)'].unique()}")

In [None]:
print(f"total function: {product['Primary Essential Function'].nunique()}")

In [None]:
google['Sector(s)'].value_counts().plot(kind='bar', figsize=(15,5))
for i, u in enumerate(google['Sector(s)'].value_counts()):
    plt.text(i, u, str(u), bbox=dict(facecolor='red', alpha=0.5), fontsize=18, ha='center')
plt.title('The sectors where Google offers its products.') 
plt.ylabel('counts')
plt.show()

23 products of google llc works in PreK-12; Higher Ed; Corporate sector againt 2 in PreK-12 and 2 in PreK-12; Higher Ed.

In [None]:
google['Primary Essential Function'].value_counts().sort_values(ascending=True).plot(kind='barh',
                                                                                     figsize=(10,15),
                                        title='Primary Essential Function of the products.')
for i, u in enumerate(google['Primary Essential Function'].value_counts().sort_values(ascending=True)):
    plt.text(u, i, str(u), bbox=dict(facecolor='yellow', alpha=0.5), fontsize=18, ha='center')  
plt.xlabel('counts')
plt.show()

### Houghton Mifflin Harcourt company

In [None]:
houghton = product[product['Provider/Company Name'] == 'Houghton Mifflin Harcourt']

In [None]:
houghton

### Microsoft company

In [None]:
microsoft = product[product['Provider/Company Name'] == 'Microsoft']

In [None]:
microsoft

### Learning A-Z company

In [None]:
az =product[product['Provider/Company Name'] == 'Learning A-Z'] 

In [None]:
az

### IXL Learning company

In [None]:
ixl =product[product['Provider/Company Name'] == 'IXL Learning']  

In [None]:
ixl

In this first part, we learn that:

- All black/hispanic students have in their resident connection ratio between 0.18 and 1.0 everywhere.

- Majorities of students expends betwween 8000 and 10000 US dollars.

- pct_black/hispanic and pct_free/reduced seems to trend together.

- Google LLC is the company that is most represented in the digital learning following by Microsoft and Houghton Mifflin Harcourt company.

# Part II: The challenges of the digital learning in USA

Now, we work with an engagement data. To begin, we interest us only with a first five companies more represented in data.

In [None]:
#we take all district where we select only a first five company more represented showing above.
# after we append it in engagement list
#we create function
def select_product(tech=None, take_state=False):
    
    engagement = [] #list for concatenation
    
    for root, _, filenames in os.walk('/kaggle/input/learnplatform-covid19-impact-on-digital-learning/engagement_data/'):
        for filename in filenames:
            data = pd.read_csv(os.path.join(root, filename))
        
            xdata = data[data['lp_id'].isin(tech)]
            dist_id = os.path.splitext(filename)[0]
            state = district[district['district_id'] == int(dist_id)]
            state = state['state'].unique()
            
            if len(state) != 0 and take_state:
                xdata['district_id'] = [int(dist_id) for _ in range(xdata.shape[0])]
                xdata['state'] = [state[0] for _ in range(xdata.shape[0])]
                engagement.append(xdata)
            else:
                engagement.append(xdata)
            
    return engagement

In [None]:
#create asssemble_data function
def assemble_data():
    """This function assemble data"""
    
    assemble = [] #list
    for root, _, filenames in os.walk('/kaggle/input/learnplatform-covid19-impact-on-digital-learning/engagement_data/'):
        for filename in filenames:
            #load data
            data = pd.read_csv(os.path.join(root, filename))
            
            #select columns
            xdata = data[['time', 'engagement_index']]
            dist_id = os.path.splitext(filename)[0]# select name of filename
            #filter district
            state = district[district['district_id'] == int(dist_id)]
            state = state['state'].unique()# choose unique value due to duplication
            
            #
            if len(state) != 0:
                #create state column
                xdata['state'] = [state[0] for _ in range(xdata.shape[0])]
                assemble.append(xdata)# append data
                
    return pd.concat(assemble)# concatenation

<a id="digi2020"></a>

 # The picture of digital connectivity and engagement in 2020
 According to the [Black/Hispanic Student Connectivity](#black), digital connectivity is between [36kbps, 200kbps]. And below, we will take a look at how engagement behaves.

In [None]:
#we take all filename in engagement data folder after we concatenate.
engagement_2020 = []
for root, _, filenames in os.walk('/kaggle/input/learnplatform-covid19-impact-on-digital-learning/engagement_data/'):
    for filename in filenames:
        data = pd.read_csv(os.path.join(root, filename))
        engagement_2020.append(data)

In [None]:
engagement_2020 = pd.concat(engagement_2020, ignore_index=True)

In [None]:
engagement_2020.head()

In [None]:
engagement_2020.info()

In [None]:
engagement_2020['time'] = pd.to_datetime(engagement_2020['time'], errors='ignore')

In [None]:
engage_usa_2020 = engagement_2020.groupby('time')['engagement_index'].agg('sum')

In [None]:
mean_engage_2020 = engage_usa_2020.rolling(7).mean()

In [None]:
engage_usa_2020.plot(figsize=(15,5), legend=True)
mean_engage_2020.plot(label='Moving average', legend=True)
plt.title('Digital learning: Engagement for year 2020.', fontsize=15)
plt.ylabel('engagement_index')
plt.xlabel('time: day')
plt.axis('tight')
plt.show()

According to [The date a state closed K-12 public schools statewide](#k12) chart, we can see that the engagement index increases between Mar-Apr. After decreasing slowly between Apr-Jun, at this time each state opened partially their school. The Sep-2020 to Jan-2021, engagement index increases very fast than the previous academic year (because USA have closed school during 40 weeks, see [Duration school closures](#sch) chart.

## Engagement by weekly, monthy and quarterly

In [None]:
week_engage = engage_usa_2020.resample('W').sum()
month_engage = engage_usa_2020.resample('M').sum()
quart_engage = engage_usa_2020.resample('Q').sum()

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(3,1, figsize=(15,20), dpi=180)
week_engage.plot(ax=ax1, legend=True)
month_engage.plot(ax=ax2, legend=True)
quart_engage.plot(ax=ax3, legend=True)
ax1.set_title('Weekly engagement', fontsize=15)
ax2.set_title('Monthly engagement', fontsize=15)
ax3.set_title('Quarterly engagement', fontsize=15)
plt.show()

We observe the engagement index in three stages: weeks, months and quarters; and it increases.we can say that the engagement for the year 2020 had a positive trend.

<a id="effect"></a>

# The effect of the COVID-19 pandemic on online and distance learning,

 It was on March 13, 2020 that Trump declared COVID-19 a national emergency. March 19, 2020 - California issues statewide stay-at-home order. Three days before i.e. March 16, 2020 41.2% of states closed K-12 schools and the next 2 days 17.6% and 13.7% of states in the USA were doing the same. (see: [The date a state closed K-12 public schools statewide](#k12)).

Between March-August 2020 in USA, the partial closure of schools lasted 19 weeks while for this year i.e. Sep 20-Jun 21 it lasted 39 weeks so 20 more than that of last year (see: [Duration school closures](#sch). This is the reason why in the following figure we see a higher engagement.

In [None]:
quart_engage.plot(kind='area', figsize=(15, 5), legend=True)
plt.title('USA: Digital learning Engagement area')
plt.ylabel('engagement_index')
plt.show()

We note that between Q1 and Q2 the index of engagement is 0.6 and drops to 0.5 in Q3 (remember that between Q2 and Q3, we are at the beginning of the holiday) then quickly rises to 1 of Q4.

<a id="student"></a>

# Student engagement and the type of education technology
We use five provider more represented which are: Google LLC, Microsoft, Houghton Mifflin Harcourt, Learning A-Z, IXL Learning. See: [provider](#prov)

## EduTech: Google LLC

In [None]:
#we take all google product 
product_name_google = google['Product Name'].tolist()
lp_id_google = google['LP ID'].tolist()

In [None]:
google_engagement = engagement_2020[engagement_2020['lp_id'].isin(lp_id_google)]

In [None]:
#now, we can plot.
figure = plt.figure(figsize=(18, 18), dpi=180)
figure.subplots_adjust(wspace=0.2, hspace=0.5)
for i, u in enumerate(lp_id_google):
    name = product_name_google[i]
    data = google_engagement[google_engagement['lp_id'] == u]
    ax = figure.add_subplot(6, 5, i+1)
    data.plot(ax=ax, x='time', y='pct_access')
    ax.set_title(name)
    ax.set_xlabel(' ')
plt.suptitle('Pourcentage access in different google product.', fontsize=20)
plt.show()

In [None]:
figure1 = plt.figure(figsize=(18, 18), dpi=200)
figure1.subplots_adjust(wspace=0.2, hspace=0.5)
for i, u in enumerate(lp_id_google):
    name = product_name_google[i]
    data = google_engagement[google_engagement['lp_id'] == u]
    ax = figure1.add_subplot(6, 5, i+1)
    data.plot(ax=ax, x='time', y='engagement_index')
    ax.set_title(name)
    ax.set_xlabel(' ')
plt.suptitle('Engagement index in different google product.', fontsize=20)
plt.show()

the product with increasing pct access for year 2020 are
- Google Docs
- Google Forms
- Google Sites
- Google Translate
- Youtube
- Google ClassRooms
- Google Drawing
- Google Sheet
- Google Calendar
- Kiddle
- Meet
Meet and Google Classroom are relevant and stable. It shows very well relation between students and teachers.

As we can see, Google Docs, Classroom, Calendar, Meet, Youtube are the products with the highest engagement index. This shows also that students have easy access to these educational technologies compared to other Google product.

Okay, these are the google product with a good student engagement.

In [None]:
best_prod = google[google['Product Name'].isin(["Google Docs", "Google Classroom", 
                                                "Google Calendar", "Meet", "YouTube"])]

In [None]:
figur_ = plt.figure(figsize=(18, 18), dpi=180)
figur_.subplots_adjust(hspace=0.5)
prod_names = ["Google Docs", "Google Classroom", "Google Calendar", "Meet", "YouTube"]
for i, u in enumerate(best_prod['LP ID'].tolist()):
    name = prod_names[i] 
    data = google_engagement[google_engagement['lp_id'] == u]
    ax = figur_.add_subplot(5, 1, i+1)
    data.plot(ax=ax, x='time', y='engagement_index')
    ax.set_title(name)
    ax.set_xlabel(' ')
plt.suptitle('Best google product.', fontsize=20)
plt.show()

## EduTech: Microsoft

In [None]:
#we take all microsoft product
product_name_microsoft = microsoft['Product Name'].tolist()
lp_id_microsoft = microsoft['LP ID'].tolist()

In [None]:
microsoft_engagement = engagement_2020[engagement_2020['lp_id'].isin(lp_id_microsoft)]

In [None]:
figure2 = plt.figure(figsize=(15, 10))
figure2.subplots_adjust(wspace=0.2, hspace=0.5)
for i, u in enumerate(lp_id_microsoft):
    name = product_name_microsoft[i]
    data = microsoft_engagement[microsoft_engagement['lp_id'] == u]
    ax = figure2.add_subplot(2, 3, i+1)
    data.plot(ax=ax, x='time', y='pct_access')
    ax.set_title(name)
    ax.set_xlabel(' ')
plt.suptitle('Pourcentage access in different microsoft product.', fontsize=20)
plt.show()

Just only microsoft office 365 and microsoft onedrive have a good pct access. We see it on engagement index.

In [None]:
figure3 = plt.figure(figsize=(15, 10))
figure3.subplots_adjust(wspace=0.2, hspace=0.5)
for i, u in enumerate(lp_id_microsoft):
    name = product_name_microsoft[i]
    data = microsoft_engagement[microsoft_engagement['lp_id'] == u]
    ax = figure3.add_subplot(2, 3, i+1)
    data.plot(ax=ax, x='time', y='engagement_index')
    ax.set_title(name)
    ax.set_xlabel(' ')
plt.suptitle('Engagement index in different microsoft product.', fontsize=20)
plt.show()

## EduTech: Houghton Mifflin Harcourt

In [None]:
product_name_houghton = houghton['Product Name'].tolist()
lp_id_houghton = houghton['LP ID'].tolist()

In [None]:
houghton_engagement = engagement_2020[engagement_2020['lp_id'].isin(lp_id_houghton)]

In [None]:
figure3 = plt.figure(figsize=(15, 10))
figure3.subplots_adjust(wspace=0.2, hspace=0.5)
for i, u in enumerate(lp_id_houghton):
    name = product_name_houghton[i]
    data = houghton_engagement[houghton_engagement['lp_id'] == u]
    ax = figure3.add_subplot(2, 3, i+1)
    data.plot(ax=ax, x='time', y='pct_access')
    ax.set_title(name)
    ax.set_xlabel(' ')
plt.suptitle('Pourcentage access in different houghton product.', fontsize=20)
plt.show()

In [None]:
figure4 = plt.figure(figsize=(15, 10))
figure4.subplots_adjust(wspace=0.2, hspace=0.5)
for i, u in enumerate(lp_id_houghton):
    name = product_name_houghton[i]
    data = houghton_engagement[houghton_engagement['lp_id'] == u]
    ax = figure4.add_subplot(2, 3, i+1)
    data.plot(ax=ax, x='time', y='engagement_index')
    ax.set_title(name)
    ax.set_xlabel(' ')
plt.suptitle('Engagement index in different houghton product.', fontsize=20)
plt.show()

## EduTech: Learning A-Z and IXL Learning

In [None]:
product_name_az = az['Product Name'].tolist()
lp_id_az = az['LP ID'].tolist()

In [None]:
product_name_ixl = ixl['Product Name'].tolist()
lp_id_ixl = ixl['LP ID'].tolist()

In [None]:
az_engagement = engagement_2020[engagement_2020['lp_id'].isin(lp_id_az)]
ixl_engagement = engagement_2020[engagement_2020['lp_id'].isin(lp_id_ixl)]


In [None]:
figure5 = plt.figure(figsize=(15, 10))
figure5.subplots_adjust(wspace=0.2, hspace=0.5)
for i, u in enumerate(lp_id_az+lp_id_ixl):
    
    if i < 4:
        name = product_name_az[i]
        data = az_engagement[az_engagement['lp_id'] == u]
        ax = figure5.add_subplot(2, 4, i+1)
        data.plot(ax=ax, x='time', y='engagement_index')
        ax.set_title(name)
        #ax.set_xlabel(' ')
    else:
        name = product_name_ixl[i%2]
        data = ixl_engagement[ixl_engagement['lp_id'] == u]
        ax = figure5.add_subplot(2, 4, i+1)
        data.plot(ax=ax, x='time', y='engagement_index')
        ax.set_title(name)
        #ax.set_xlabel(' ')
        
plt.suptitle('Engagement index in different Learning A-Z and IXL Learning product.', fontsize=20)
plt.show()

Student engagement will vary depending on the type of educational technology; if a technology offers ease of internet access and use then teaching will be easy for students and also for teachers.

If technology is moving towards the basic need of students and teachers then engagement will be positive. For example if we take Google Docs, Youtube and Meet we see that their curve is increasing and stabilizing, which means that the students acquire knowledge positively.

Other technology does not follow.

<a id="socio"></a>

# Student engagement and geographic, demographic,  socio-economics status

## Geographic context
Student engagement can also vary depending on the geographic context. So we proceed.

In [None]:
assemble_engage = assemble_data()

In [None]:
#pivot table
state_student_engagement = assemble_engage.pivot_table(values='engagement_index', index='time', columns='state', 
                                                      aggfunc='sum')

In [None]:
state_student_engagement.head(3)

We have 23 states due to missing values. Some districts do not have information about state, pct_black/hispanic, etc, .. in district csv file. We remove Minnesota, North Dakota because it have respective 95 non-null and 33 non-null values.

In [None]:
state_student_engagement.drop(columns=['Minnesota', 'North Dakota'], inplace=True)

In [None]:
# rename colum
state_student_engagement.columns = ['Arizona', 'California', 'Connecticut', 'District of Columbia',
       'Florida', 'Illinois', 'Indiana', 'Massachusetts', 'Michigan',
       'Missouri', 'New Hampshire', 'New Jersey', 'New York', 'North Carolina',
       'Ohio', 'Tennessee', 'Texas', 'Utah', 'Virginia', 'Washington',
       'Wisconsin']

We are going to plot state student engagement in each region of USA.

In [None]:
state_region = pd.read_csv('/kaggle/input/usa-states-to-region/states.csv')

In [None]:
region ={} #dict
for u in state_region["Region"].unique().tolist():
    region[u] = state_region[state_region['Region'] == u].State.unique().tolist()

In [None]:
for u in state_region["Region"].unique():
    res = list(set(state_student_engagement.columns).intersection(set(region[u])))
    state_student_engagement[res].plot(figsize=(15,15), subplots=True)
    plt.suptitle(f'Student engagement index: {u} region.', fontsize=20)
plt.show() ## see well plot 

The student engagement index differs considerably between regions. In each region, the states do not have the same trend.

In the West region, the state of California and Utah reached a million engagement index in the year 2020. On the other hand, the state of Arizona and Washington reached respectively 200,000 and 400,000 engagement index with a decreasing trend. We notice anomalies between the period March 2020 and April 2020.

In the South region, the states of Virginia, Texas, Tennesses, Florida, North Carolina respectively achieved an engagement index of 500,000, 150,000, 125,000, 150,000 and 400,000 throughout the year 2020. But, we find that the state of North Carolina, Texas, Tennesses suffered an abrupt interruption (for Texas) of engagement index between the period of March to April 2020. This may be caused by the socio-economic situation that through these two states.

Only the states of Connecticut, New York and Massachusetts in the Northeast region reached respectively 4 millions, one million and 2.5 million engagement index against 300,000 for New Hampshire and 200,000 for New jersey. This difference can be caused by the total number of students from each state.

The states Indiana, Ohio, Illinois also reached respectively one million and 3 million engagement index with a good trend even as we see an anomaly between March-April 2020. As for the other states of the Midwest region, we observe a good rise in the index after passing the anomaly indicated while Michigan saw its engagement index disappear as an evanescent signal.

When we look at the figures, we find that all the states are experiencing a strange anomaly between March-April 2020. After this anomaly has passed, some states such as North Carolina, Tennesses, Texas, Disrict of Columbia and Michigan see their engagement index curve disappear after April 2020.

We need to know what is causing this anomaly.

## Demographic context
We know that engagement_index is the total page-load events per one thousand students of a given product and on a given day. We are going to see how engagement_index evolves in each demographic state.

In [None]:
#load data schools state
school_st = pd.read_csv('/kaggle/input/osf-data/schools_state_csv.csv')

In [None]:
#state_name column have upper character we are going to capitalize
school_st['state_name'] = school_st['state_name'].apply(lambda x: x.capitalize())

In [None]:
#replace values
school_st.replace(value=['North Carolina', 'New Jersey', 'New York', 'New Hampshire', 'District of Columbia'],
                  to_replace=['North carolina', 'New jersey', 'New york', 'New hampshire',
                              'District of columbia'], inplace=True)

In [None]:
schools_state = school_st.groupby('state_name')['total_students'].agg('sum').reset_index()

In [None]:
#we select neccesary state and but state_column in index
relevant_schools_state = schools_state[schools_state['state_name'].isin(state_student_engagement.columns)]
relevant_schools_state.set_index('state_name', inplace=True)

In [None]:
relevant_schools_state.sort_values(ascending=False, by=['total_students']).plot(figsize=(15, 5), kind='bar')
plt.title('Total students by state', fontsize=18)
plt.ylabel("number of students")
plt.show()

California and Texas are two states with more total students than Florida, New York, Illinois, Ohio.

In [None]:
#we create total_load_page_events
total_student = relevant_schools_state.to_dict()['total_students']# total_student

In [None]:
for res in  [['California', 'Texas'], ['Florida', 'New York'], ['Illinois', 'Ohio'], ['North Carolina', 'Michigan']]:
    state_student_engagement[res].plot(figsize=(15,5))
    plt.ylabel('engagement_index')
    plt.title(f'{res[0]}: {total_student[res[0]]} students vs {res[1]}: {total_student[res[1]]} students.',
              fontsize=18)
    plt.show()

By this four figures, we see that engagement index is different in each demographic. California and Texas have huge students population but engagement index for Texas is very low and also between February to May 2020 is practically equal to zeros.

We see again a same behavior between Florida and New York.

In [None]:
ncols_states = [['California', 'Texas'], ['Florida', 'New York'], 
                ['Illinois', 'Ohio'], ['North Carolina', 'Michigan'], ['New Jersey', 'Virginia'],
               ['Arizona', 'Washington'], ['Indiana', 'Tennessee'], ["Massachusetts","Missouri"],
               ['Connecticut','New Hampshire']]

In [None]:
for res in ncols_states:
    state_student_engagement[res].plot(figsize=(15,5), kind="box", logy=True)
    plt.ylabel('engagement_index')
    plt.title(f'{res[0]}: {total_student[res[0]]} students vs {res[1]}: {total_student[res[1]]} students.',
              fontsize=18)
    plt.show()

As we can see the engagement index goes with the number of students that has each state. Some states have higher minimum engagement index than others. Also, some state have good observation on engagement index and others have outliers.

# Student engagement and socio economic status 

The socio-economic aspect of a state is very important for a healthy education.

## Impact of Covid-19 on socio economic

In [None]:
policy = pd.read_excel(file3)

In [None]:
district_ = pd.read_csv(file6)

In [None]:
#we want to take category as column
inv_policy = policy.T # transpose policy data

In [None]:
inv_policy.columns= inv_policy.iloc[0, :].tolist()# first row becomes header 

In [None]:
inv_policy.drop(index=['STATE','POSTCODE','FIPS'], inplace=True)

In [None]:
inv_policy['category'].unique()

In [None]:
#we select 
need_category = ['state_of_emergency', 'physical_distance_closure', 'shelter', 'physical_distance_closures',
                'second_closures', 'second_closures)', 'third_closures', 'housing', 'food_security',
                'unemployment', 'workplace', 'SUD_policies', 'state_characteristics', 'minimum_wage']

In [None]:
category = {}# dictionary
for u in need_category:
    category[u] = inv_policy[inv_policy['category'] == u].index.tolist()

In [None]:
#In this cell, each value of category is now a data.

#state of emergency data
state_of_emergency = policy[['STATE']+category['state_of_emergency']]

#food security data
food_security = policy[['STATE']+category['food_security']]

#physical distance closure data
physical_distance_closure = policy[['STATE']+category['physical_distance_closure']+category['physical_distance_closures']]
shelter = policy[['STATE']+category['shelter']]# shelter data
housing = policy[['STATE']+category['housing']]# housing for eviction moratorium
unemployment = policy[['STATE']+category['unemployment']]# unemployement data
state_characteristics = policy[['STATE']+category['state_characteristics']]# state characteristics
minimun_wage = policy[['STATE']+category['minimum_wage']]#minimun wage data
#closures
closures = policy[['STATE']+category['second_closures']+category['second_closures)']+category['third_closures']]
workplace = policy[['STATE']+category['workplace']]
sud_policy = policy[['STATE']+category['SUD_policies']]

## Impact of Covid-19 on socio-economic

In [None]:
state_of_emergency.drop(index=[0,1,2,3]).head()

In [None]:
district_['state'].replace(to_replace=['District Of Columbia'], value=['District of Columbia'], inplace=True)

In [None]:
district_.dropna(subset=['state'], inplace=True)

In [None]:
state = district_['state'].unique().tolist()

In [None]:
state.remove('Minnesota')

In [None]:
state.remove('North Dakota')

In [None]:
emergency = state_of_emergency[state_of_emergency.STATE.isin(state)]

In [None]:
emergency.drop(columns="STEMERG2", inplace=True)

### State of emergency and shelter

The majority of states have not stopped the state of emergency.

**N.B:** 0 means not implemented.

In [None]:
shel = shelter[shelter['STATE'].isin(state)][["STATE","STAYHOME","END_STHM"]]

In [None]:
emerg_shel = emergency.merge(shel, left_on='STATE', right_on='STATE', suffixes=(' ', ' '))

In [None]:
emerg_shel

All states in the table have alerted the state of emergency, some have ceased the alert (Florida, Massachussets, Michigan, New Jersey, New York, Ohio, Virginia) but others not implemented. By this emergency, all of us instructed the confinement in March except three states that we do not know the date namely Texas, Utah, Connecticut.

### Minimun wage

In [None]:
wage = minimun_wage[minimun_wage['STATE'].isin(state)][["STATE", "MINWAGEJAN2020", "MINWAGESEP2020"]]

In [None]:
fig, axis = plt.subplots(1, 2, figsize=(15,15), dpi=180)
wage.set_index('STATE')["MINWAGEJAN2020"].sort_values().plot(kind="barh", ax=axis[0])
for i, u in enumerate(wage.set_index('STATE')["MINWAGEJAN2020"].sort_values()):
    axis[0].text(u, i, "$"+str(u), bbox=dict(facecolor='yellow', alpha=0.25), fontsize=15, ha='center')
    axis[0].set_title(" January 2020")
    axis[0].set_xlabel("wage")
    axis[0].axis('tight')
    
wage.set_index('STATE')["MINWAGESEP2020"].sort_values().plot(kind="barh", ax=axis[1], rot=60)
for i, u in enumerate(wage.set_index('STATE')["MINWAGESEP2020"].sort_values()):
    axis[1].text(u, i, "$"+str(u), bbox=dict(facecolor='yellow', alpha=0.25), fontsize=15, ha='center')
    axis[1].set_title(" September 2020")
    axis[1].set_xlabel("wage")
    axis[1].axis('tight')
    
plt.suptitle('Minimun hourly wage', fontsize=18, ha='center')
plt.show()

Through this graph, we see that some states have improved their minimum hourly wages. The state of Connecticut gains one more place and finds itself 6th on the list of September 2020. Illinois, which was 11th in the list of January 2020 by improving its minimum hourly wage of 0.75 dollars, gains two places and is 
now  9th of the September 2020 list.

Surely the confinement is responsible for improving the hourly minimum wage.

### Wage losses 

Lots of families lost their salaries during confinement. When we look at the hourly minimum wage for each state above, we can calculate the lost wages for each state assuming the working hours are 30 hours per week. We take a week equal to 6 days with Sunday as the day of rest.

In [None]:
#  we remove Texas, Utah, Connecticut due to stayHome start date not inplemented
stayhome = shel.drop(index=[10, 47, 48])

In [None]:
stayhome['STAYHOME'] = pd.to_datetime(stayhome['STAYHOME'], errors='ignore')
stayhome['END_STHM'] = pd.to_datetime(stayhome['END_STHM'], errors='ignore')

In [None]:
stayhome['DURATION_STAYHOME'] = stayhome['END_STHM'] - stayhome['STAYHOME']

In [None]:
stayhome['LOSS_WAGE(Hour)'] = stayhome['DURATION_STAYHOME'].apply(lambda v: 30*(v.days/6.0))

In [None]:
stayhome.sort_values(by='DURATION_STAYHOME', ascending=False, inplace=True)

In [None]:
wage_stayhome = stayhome.merge(wage[['STATE', 'MINWAGEJAN2020']], left_on="STATE", right_on="STATE")  

In [None]:
wage_stayhome['MINWAGEJAN2020'] = pd.to_numeric(wage_stayhome["MINWAGEJAN2020"], errors="coerce")

In [None]:
wage_stayhome["LOSS_WAGE($)"] = wage_stayhome['LOSS_WAGE(Hour)']*wage_stayhome['MINWAGEJAN2020']

In [None]:
wage_stayhome.style.background_gradient('OrRd')

**NB:** Do not forget that Texas, Utah, Connecticut are not there for lack of information.

In [None]:
def currency(x, pos):
    """The two args are the value and tick position"""
    if x >= 1e3:
        s = '${:1.0f}K'.format(x*1e-3)
    else:
        s = '${:1.0f}'.format(x)
    return s

In [None]:
figp, axp = plt.subplots(1, 2, figsize=(15, 10), dpi=180)
axp[0].barh(wage_stayhome["STATE"].tolist(), [u.days for u in wage_stayhome["DURATION_STAYHOME"].tolist()])
axp[1].barh(wage_stayhome['STATE'].tolist(), wage_stayhome['LOSS_WAGE($)'].tolist())
labels1 = axp[0].get_xticklabels()
labels2 = axp[1].get_xticklabels()
plt.setp(labels1, rotation=45, horizontalalignment='right')
plt.setp(labels2, rotation=45, horizontalalignment='right')
axp[0].set(xlim=[0, 325], xlabel='Days', ylabel='STATE',
       title='Duration stay home in each state')
axp[1].set(xlim=[0, 20500], xlabel='Loss wage', ylabel='STATE',
       title='Employee wage loss in each state ')
axp[1].xaxis.set_major_formatter(currency)
axp[0].axvline(np.mean([u.days for u in wage_stayhome["DURATION_STAYHOME"].tolist()]),
               ls='--', color='r', label="mean")
axp[1].axvline(wage_stayhome["LOSS_WAGE($)"].mean(), ls="--", color='r', label="mean")
axp[0].legend(loc="best")
axp[1].legend(loc='best')
plt.show()

California have huge duration stay home and wage loss due to **[Covid-19 wave & variant tracker](https://www.kaggle.com/lumierebatalong/covid-19-wave-variant-tracker)**.

### Unemployment and poverty estimated

In [None]:
unemp_pov18 = state_characteristics[state_characteristics["STATE"].isin(state)][["STATE","UNEMP18","POV18"]]

In [None]:
unemp_pov18["UNEMP18"] = pd.to_numeric(unemp_pov18['UNEMP18'], errors="coerce")
unemp_pov18["POV18"] = pd.to_numeric(unemp_pov18['POV18'], errors="coerce")

In [None]:
unemp_pov18.corr()

UNEMP18 and POV18 are nonlinear correlated

In [None]:
fig1, axis1 = plt.subplots(1, 2, figsize=(15,15), dpi=180)
unemp_pov18.set_index('STATE')["UNEMP18"].sort_values().plot(kind="barh", ax=axis1[0])
for i, u in enumerate(unemp_pov18.set_index('STATE')["UNEMP18"].sort_values()):
    axis1[0].text(u, i, str(u), bbox=dict(facecolor='yellow', alpha=0.25), fontsize=15, ha='center')
    axis1[0].set_title(" Unemployment 2018")
    axis1[0].set_xlabel("unemp(%)")
    axis1[0].axis('tight')
    
unemp_pov18.set_index('STATE')["POV18"].sort_values().plot(kind="barh", ax=axis1[1], rot=30)
for i, u in enumerate(unemp_pov18.set_index('STATE')["POV18"].sort_values()):
    axis1[1].text(u, i, str(u), bbox=dict(facecolor='yellow', alpha=0.25), fontsize=15, ha='center')
    axis1[1].set_title(" Poverty 2018")
    axis1[1].set_xlabel("pov (%)")
    axis1[1].axis('tight')
    
plt.suptitle('State characteristics', fontsize=18)
plt.show()

<a id="inter"></a>

# State intervention, practice or policy.

<a id="clo"></a>

## Closing and reopening of bars, restaurants, casinos, gym, etc.
We will see how the interventions, practices or policies of the state evolve and the correlation with an engagement index.

To do so, we will study the behavior of bars, cinema, hairdressing salon, gym, restaurant, casinos according to the strategy of each state to fight against Covid-19. Then we observe the decisions made on the closing and opening of its economic activities.

In [None]:
closures.head()

In [None]:
#
clos = closures[closures['STATE'].isin(state)]


In [None]:
bars = clos[['STATE', 'BCLBAR2', 'END_BRS2', 'CLBAR3', 'END_CLBAR3']]
movie_theaters = clos[['STATE', 'CLMV2', 'END_CLMV2']]
hair_barber = clos[['STATE', 'CLHAIR2', 'END_CLHAIR2']]
gym = clos[['STATE', 'CLGYM2', 'END_CLGYM2']]
restaurants = clos[['STATE', 'CLRST2', 'ENDREST2', 'CLRST3', 'END_CLRST3']]
casinos = clos[['STATE', 'CASCLOSE2', 'CASOPEN2']]

### The bars and the wage loss per hour

In [None]:
bars

Only 7 states have decided to close the bars. Where five have reopened. Only the state of New York does not say the date of the opening of the bars. For Michigan, we see the first and second phase of bar opening. Most of the bar closings started in May 2020.

In [None]:
#we convert values
bars_ = bars.set_index('STATE').applymap(lambda x: pd.to_datetime(x) if x == 0 else pd.to_datetime(x))

In [None]:
#We compute a duration close-open bars in each state first and second time.
bars_['DURATION_BARS2'] = bars_['END_BRS2'] - bars_['BCLBAR2']
bars_['DURATION_BARS3'] = bars_['END_CLBAR3'] - bars_['CLBAR3']

In [None]:
#we correct
bars_[['DURATION_BARS2', 'DURATION_BARS3']] = bars_[['DURATION_BARS2', 'DURATION_BARS3']].applymap(lambda x: x if x > pd.Timedelta(0) else  pd.Timedelta(0) )

**Duration bars closures**

In [None]:
bars_[['DURATION_BARS2', 'DURATION_BARS3']].sort_values(by='DURATION_BARS2', ascending=False).style.background_gradient('OrRd')

This table gives us the number of days that  bars is closed and that bars employee loses his wage in each state. If we want to calculate the wage losses in hours for this employee, we will calculate the total number of days without work, then we convert into a week (7 days). Considering that the employee's working hours are 60 hours per week. We have what a bartender loses in every state.

In [None]:
barman = pd.DataFrame()
barman['TOTAL_DURATION_BARS'] = bars_['DURATION_BARS2'] + bars_['DURATION_BARS3']

In [None]:
barman['LOSS_WAGE(Hour)'] = barman['TOTAL_DURATION_BARS'].apply(lambda v: 60*(v.days/7.0))

In [None]:
barman.sort_values(by='TOTAL_DURATION_BARS', ascending=False).style.background_gradient('OrRd')

### Cinema and wage loss per hour

In [None]:
movie_theaters

Arizona, Illinois and Michigan have closed movie theaters and theaters.

In [None]:
#remove 0
movie_ = movie_theaters.set_index('STATE').applymap(lambda x: pd.to_datetime(x) if x == 0 else pd.to_datetime(x))

In [None]:
movie_['DURATION_CLMV2'] = movie_['END_CLMV2'] - movie_['CLMV2']

In the same way as for the bar we have.

In [None]:
movie_['LOSS_WAGE(Hour)'] = movie_['DURATION_CLMV2'].apply(lambda v: 60*(v.days/7.0))

In [None]:
movieman = movie_[['DURATION_CLMV2', 'LOSS_WAGE(Hour)']]

In [None]:
movieman.sort_values(by="DURATION_CLMV2", ascending=False).style.background_gradient('OrRd')

### Restaurants and wage loss per hour

In [None]:
restaurants

A little bit of change here.

In [None]:
restau_ = restaurants.set_index('STATE').applymap(lambda x: pd.to_datetime(x) if x == 0 else pd.to_datetime(x))

In [None]:
restau_['DURATION_RST2'] = restau_['ENDREST2'] - restau_['CLRST2'] 

De la même manière que pour le bar nous avons.

In [None]:
restau_['LOSS_WAGE(Hour)'] = restau_['DURATION_RST2'].apply(lambda v: 60*(v.days/7.0))

In [None]:
restau_man = restau_[['DURATION_RST2', 'LOSS_WAGE(Hour)']]

In [None]:
restau_man.sort_values(by="DURATION_RST2", ascending=False).style.background_gradient('OrRd')

### Gym

In [None]:
gym

We only observe two. For what follows, we proceed to the same method as that of the bars.

In [None]:
gym_ = gym.set_index('STATE').applymap(lambda x: pd.to_datetime(x) if x == 0 else pd.to_datetime(x))

gym_['DURATION_GYM2'] = gym_['END_CLGYM2'] - gym_['CLGYM2'] 

gym_['LOSS_WAGE(Hour)'] = gym_['DURATION_GYM2'].apply(lambda v: 60*(v.days/7.0))

In [None]:
gym_man = gym_[['DURATION_GYM2', 'LOSS_WAGE(Hour)']]

In [None]:
gym_man.sort_values(by="DURATION_GYM2", ascending=False).style.background_gradient('OrRd')

### Casinos

In [None]:
casinos

Illinois and Michigan close casinos

In [None]:
casinos_ = casinos.set_index('STATE').applymap(lambda x: pd.to_datetime(x) if x == 0 else pd.to_datetime(x))

In [None]:
casinos_['DURATION_CAS'] = - (casinos_['CASCLOSE2'] - casinos_['CASOPEN2']) 

casinos_['LOSS_WAGE(Hour)'] = casinos_['DURATION_CAS'].apply(lambda v: 60*(v.days/7.0))

casinos_man = casinos_[['DURATION_CAS', 'LOSS_WAGE(Hour)']]

In [None]:
casinos_man.sort_values(by="DURATION_CAS", ascending=False).style.background_gradient('OrRd')

We will summarize all the results in a single graph.

In [None]:
#plot bars, casinos, gym, restaurants, 
import matplotlib.gridspec as gridspec
figp = plt.figure(figsize=(15,10), constrained_layout=True, dpi=180)
gs = figp.add_gridspec(2, 3)
f3_ax1 = figp.add_subplot(gs[0, 0])
cas_ = casinos_man[casinos_man["LOSS_WAGE(Hour)"] > 0]
f3_ax1.barh(cas_.index, [u.days for u in cas_["DURATION_CAS"].tolist()])
f3_ax1.set_title('Duration casinos closures')
f3_ax1.set_xlabel("days")
f3_ax1.set_ylabel("STATE")
#-----------------------------------------------------------------------------------
gyman_ = gym_man[gym_man['LOSS_WAGE(Hour)'] > 0]
f3_ax2 = figp.add_subplot(gs[0, 1])
f3_ax2.barh(gyman_.index, [u.days for u in gyman_["DURATION_GYM2"].tolist()])
f3_ax2.set_title('Duration gym closures')
f3_ax2.set_xlabel('days')
f3_ax2.set_ylabel("STATE")
#---------------------------------------------------------------------------------------------
rst = restau_man[restau_man["LOSS_WAGE(Hour)"] > 0]
f3_ax3 = figp.add_subplot(gs[0, 2])
f3_ax3.barh(rst.index, [u.days for u in rst['DURATION_RST2'].tolist()])
f3_ax3.set_title('Duration restaurants closures')
f3_ax3.set_xlabel('days')
f3_ax3.set_ylabel("STATE")
f3_ax3.axvline(np.mean([u.days for u in rst['DURATION_RST2'].tolist()]),
              ls='--', color='r', label='mean')
f3_ax3.legend(loc='best')
#########################################################"#####################################"
mov = movieman[movieman['LOSS_WAGE(Hour)']>0]
f3_ax4 = figp.add_subplot(gs[1, 0])
f3_ax4.barh(mov.index, [u.days for u in mov['DURATION_CLMV2'].tolist()])
f3_ax4.set_title('Duration movie & theaters closures')
f3_ax4.set_xlabel("days")
f3_ax4.set_ylabel("STATE")
f3_ax4.axvline(np.mean([u.days for u in mov['DURATION_CLMV2'].tolist()]),
              ls='--', color='r', label='mean')
f3_ax4.legend(loc='best')
#*******************************************************************************************
drink_bar = barman[barman['LOSS_WAGE(Hour)']>0]
f3_ax5 = figp.add_subplot(gs[1, 1:])
f3_ax5.bar(drink_bar.index, [u.days for u in drink_bar['TOTAL_DURATION_BARS'].tolist()])
f3_ax5.axhline(np.mean([u.days for u in drink_bar['TOTAL_DURATION_BARS'].tolist()]),
               ls='--', color='r', label="mean")
f3_ax5.set_title('Duration bars closures')
f3_ax5.set_ylabel('days')
f3_ax5.set_xlabel("STATE")
f3_ax5.legend(loc='best')
labels_f3_ax5 = f3_ax5.get_xticklabels()
plt.setp(labels_f3_ax5, rotation=45, horizontalalignment='right')
plt.show()

By observing this graph, we see that

- the state of Michigan is closing bars, cinemas and theaters, restaurants and casinos. It took less than 100 days for it to open cinemas and theaters, restaurants and finally casinos. But for the opening of the bars, it took more than 50 days. Michiqan has opted to reopen activities gradually.
- the state of Illinois has opted to reopen bars, restaurants, casinos, etc. globally within 80 days.

The employees of each activity in each state have a considerable wage losses which has a huge impact on the payment of rent, internet subscription, etc ... Which can hinder the digital learning of his children.

<a id="mora"></a>

## Overall moratoriums on evictions

In [None]:
housing.head()

In [None]:
eviction_moratorium_housing = housing[housing['STATE'].isin(state)] 

In [None]:
overall_evic_moratorium = eviction_moratorium_housing[["STATE", "EMSTART", "EMEND", "EMSTART2","EMEND2",
                                                       "EMSTART3", "EMEND3", "EMNOW"]]

### Moratorium on evictions currently in place and expired

In [None]:
evic_mora_now_state = overall_evic_moratorium[overall_evic_moratorium['EMNOW'] == 1]
evic_mora_expired_state = overall_evic_moratorium[overall_evic_moratorium['EMNOW'] == 0]

### Moratorium on eviction currently in place and student engagement

Here, we are studying how student engagement evolves with a moratorium currently in place.

In [None]:
evic_mora_now_state.head()

In [None]:
print(f"The state that the moratorium on eviction are currently in place\n: {evic_mora_now_state.STATE.unique().tolist()}")

In [None]:
#we plot these state their student engament
#now_mora_dates = [str(u) for u in evic_mora_now_state['EMSTART']]
mora_student_engagement = state_student_engagement[evic_mora_now_state.STATE.unique().tolist()]

In [None]:
mora_student_engagement.index = pd.to_datetime(mora_student_engagement.index, errors='ignore')

In [None]:
figg = plt.figure(figsize=(15,20))
figg.subplots_adjust(hspace=0.5)
for i, u in enumerate(evic_mora_now_state.STATE.unique().tolist()):
    date = evic_mora_now_state[evic_mora_now_state['STATE'] == u]['EMSTART'].values[0]
    day = str(date).split(' ')[0] #take date
    data = mora_student_engagement[u]
    #print(day in data.index)
    ax = figg.add_subplot(9, 1, i+1)
    data.plot(ax=ax, legend=True)
    ax.set_xlabel(' ')
    ax.axvline(ymin=0, ymax=max(data), x=day, linestyle='--', color='red', label="Start moratorium")
    ax.legend(loc='upper left')
    ax.text('2020-05-30', max(data)//2, 'Currently in place',
           bbox=dict(facecolor='yellow', alpha=0.25), fontsize=15, ha='center')
plt.suptitle("Moratorium on eviction and student engagement")
plt.show()

### Moratorium on eviction expired and student engagement

Here, we are studying how student engagement evolves with a moratorium expired.

In [None]:
evic_mora_expired_state = evic_mora_expired_state[evic_mora_expired_state['EMSTART'] !=0]   #.head()

In [None]:
evic_mora_expired_state.head()

In [None]:
print(f"The state that the moratorium on eviction areexpired\n: {evic_mora_expired_state.STATE.unique().tolist()}")

In [None]:
exp_mora_student_engagement = state_student_engagement[evic_mora_expired_state.STATE.unique().tolist()]

In [None]:
exp_mora_student_engagement.index = pd.to_datetime(exp_mora_student_engagement.index, errors='ignore')

In [None]:
figg1 = plt.figure(figsize=(15,20))
figg1.subplots_adjust(hspace=0.5)
for i, u in enumerate(['Arizona', 'Florida', 'Indiana', 'Michigan', 'New Hampshire']):
    date1 = evic_mora_expired_state[evic_mora_expired_state['STATE'] == u]['EMSTART'].values[0]
    date2 = evic_mora_expired_state[evic_mora_expired_state['STATE'] == u]['EMEND'].values[0]
    day1 = str(date1).split(' ')[0] #take date
    day2 = str(date2).split(' ')[0]
    data = exp_mora_student_engagement[u]# state engagement index
    #print(day in data.index)
    ax = figg1.add_subplot(5, 1, i+1)
    data.plot(ax=ax, legend=True)
    ax.set_xlabel(' ')
    ax.axvline(ymin=0, ymax=max(data), x=day1, linestyle='--', color='green', label="Start moratorium")
    ax.axvline(ymin=0, ymax=max(data), x=day2, linestyle='--', color='red', label="End moratorium")
    ax.legend(loc='upper left')
    ax.text('2020-06-30', max(data)//2, 'Expired',
           bbox=dict(facecolor='red', alpha=0.25), fontsize=15, ha='center')
plt.suptitle("Moratorium on eviction and student engagement")
plt.show()

In [None]:
figg1 = plt.figure(figsize=(15,20))
figg1.subplots_adjust(hspace=0.5)
for i, u in enumerate(['Tennessee', 'Texas', 'Utah', 'Virginia', 'Wisconsin']):
    date1 = evic_mora_expired_state[evic_mora_expired_state['STATE'] == u]['EMSTART'].values[0]
    date2 = evic_mora_expired_state[evic_mora_expired_state['STATE'] == u]['EMEND'].values[0]
    day1 = str(date1).split(' ')[0] #take date
    day2 = str(date2).split(' ')[0]
    data = exp_mora_student_engagement[u]# state engagement index
    #print(day in data.index)
    ax = figg1.add_subplot(5, 1, i+1)
    data.plot(ax=ax, legend=True)
    ax.set_xlabel(' ')
    ax.axvline(ymin=0, ymax=max(data), x=day1, linestyle='--', color='green', label="Start moratorium")
    ax.axvline(ymin=0, ymax=max(data), x=day2, linestyle='--', color='red', label="End moratorium")
    ax.legend(loc='upper left')
    ax.text('2020-04-15', max(data)//2, 'Expired',
           bbox=dict(facecolor='red', alpha=0.25), fontsize=15, ha='center')
plt.suptitle("Moratorium on eviction and student engagement")
plt.show()

We can see that some state have good student engagement with moratorium on eviction, others more less.

**N.B:** It should be noted that the states with good student engagement are those that have a moratorium on expulsions currently in place.

# Conclusion

What we can retain in this analysis is the following
- Internet access should be affordable for low-income people.
- Digital learning must have a technology that allows the circulation of basic educational resources for students and teachers (timetable, school program, etc.)
- The socio-economic status of people is a real problem for digital learning.
- The moratorium on expulsions from states has allowed student engagement to improve as the school year progresses.