## LearnPlatform COVID-19 Impact on Digital Learning
#### Use digital learning data to analyze the impact of COVID-19 on student learning

### Table of Content
1. Business Objective & Problem Statement

2. Importing Required Libraries

3. Data Review and Cleansing:

*     Checking missing/null values
*     drop the column, if there are not useful    
*     data preparation: converting column to categorical variable
*     Univariate analysis
*     Handling outliers
*     EDA
*     corelation analysis 

Business Objective & Problem Statement:

The COVID-19 Pandemic has disrupted learning for more than 56 million students in the United States. In the Spring of 2020, most states and local governments across the U.S. closed educational institutions to stop the spread of the virus. In response, schools and teachers have attempted to reach students remotely through distance learning tools and digital platforms. Until today, concerns of the exacaberting digital divide and long-term learning loss among America’s most vulnerable learners continue to grow. 

 problem statement:

*    What is the picture of digital connectivity and engagement in 2020?
*    What is the effect of the COVID-19 pandemic on online and distance learning, and how might this also evolve in the future?
*    How does student engagement with different types of education technology change over the course of the pandemic?
*    How does student engagement with online learning platforms relate to different geography? Demographic context (e.g., race/ethnicity, ESL, learning disability)? Learning context? Socioeconomic status?
*    Do certain state interventions, practices or policies (e.g., stimulus, reopening, eviction moratorium) correlate with the increase or decrease online engagement?


In [None]:
#import packages
import numpy as np
import pandas as pd 
#import matplotlib as mpl
import h2o
from h2o.automl import H2OAutoML
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
districts_df=pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")

In [None]:
districts_df.head()

In [None]:
print("Column names",districts_df.columns)
print("Dataset shape",districts_df.shape)

In [None]:
#details of numeric data
districts_df.describe()

In [None]:
#check the Null value 
round(districts_df.isnull().sum()/len(districts_df)*100,2)

In [None]:
districts_df[districts_df.isna().any(axis=1)]

In [None]:
# Univariate analysis

In [None]:
#boxplots
cat_col=['state', 'locale', 'pct_black/hispanic','pct_free/reduced', 'county_connections_ratio', 'pp_total_raw']
plt.figure(figsize=(15, 15))
sns.set(style="darkgrid")
i=1
for col in cat_col:
    plt.subplot(4,2,i)
    sns.countplot(data=districts_df, x=col)
    i=i+1
    plt.xticks(rotation=90)
plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9,  top=1.1,  wspace=0.4, hspace=0.9)
plt.show()

#observations:
1. Maximum schools are from Connecticut and Utah,
2. Maximum schools are from suburban area
3. Maximu student got the 0-40% discount.
4. Learge number of student have high speed conection
5. 


#### **products_info.csv file**

In [None]:
products_df = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")
products_df.head()

In [None]:
print("Column names",products_df.columns)
print("Dataset shape",products_df.shape)

In [None]:
#details of numeric data
districts_df.describe()

In [None]:
#check the Null value 
round(products_df.isnull().sum()/len(products_df)*100,2)

In [None]:
#boxplots
cat_col=[ 'Sector(s)','Primary Essential Function']
plt.figure(figsize=(10, 10))
sns.set(style="darkgrid")
i=1
for col in cat_col:
    plt.subplot(2,1,i)
    sns.countplot(data=products_df, x=col)
    i=i+1
    plt.xticks(rotation=90)
plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9,  top=1.1,  wspace=0.4, hspace=0.9)
plt.show()

In [None]:
from wordcloud import WordCloud, STOPWORDS 
cloud = WordCloud(width=1440, height=1080).generate(" ".join(products_df['Product Name'].astype(str)))
plt.figure(figsize=(15, 10))
plt.imshow(cloud,interpolation='bilinear')
plt.axis('off')

#### Engagement data

In [None]:
engagement_df = pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/engagement_data/5802.csv", low_memory=False)
engagement_df.head()

In [None]:
#boxplots
cat_col=[ 'pct_access','engagement_index']
plt.figure(figsize=(10, 10))
sns.set(style="darkgrid")
i=1
for col in cat_col:
    plt.subplot(2,1,i)
    sns.boxplot(data=engagement_df, x=col)
    i=i+1
    plt.xticks(rotation=90)
plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9,  top=1.1,  wspace=0.4, hspace=0.9)
plt.show()

Work in progress....

Please upvote it

Thank You :)