<div class="alert alert-block alert-info">  
    <h3><strong>📖 LearnPlatform COVID-19 Impact on Digital Learning 😷</strong></h3>
    <h4><strong><li> Use digital learning data to analyze the impact of COVID-19 on student learning</strong></h4>
</div>

<center><img src="https://www.un.org/sites/un2.un.org/files/field/image/1597430311.7857.jpg" alt="covid_impact on digital learning" width="500" height="600"></center>

<div class="alert alert-block alert-success">  
 
<hr>
<b>Source of Data: </b> 
<hr> 
 <a href="https://www.kaggle.com/c/learnplatform-covid19-impact-on-digital-learning/datahttps://www.kaggle.com/c/learnplatform-covid19-impact-on-digital-learning/data">https://www.kaggle.com/c/learnplatform-covid19-impact-on-digital-learning/data</a>
   
</div>

<div class="alert alert-block alert-info">  
 
<hr>
<b>Data Description: </b> 
<hr> 
    <p>We include three basic sets of files to help you get started. The engagement data are based on LearnPlatform’s Student Chrome Extension. The extension collects page load events of over 10K education technology products in our product library, including websites, apps, web apps, software programs, extensions, ebooks, hardwares, and services used in educational institutions. The engagement data have been aggregated at school district level, and each file represents data from one school district. The product file includes information about the characteristics of the top 372 products with most users in 2020. The district file includes information about the characteristics of school districts, including data from National Center for Education Statistics (NCES), The Federal Communications Commission (FCC), and Edunomics Lab.</p>
 <a href="https://www.kaggle.com/c/learnplatform-covid19-impact-on-digital-learning/data">https://www.kaggle.com/c/learnplatform-covid19-impact-on-digital-learning/data</a>
   
</div>

<div class="alert alert-block alert-info">  
 
<hr>
<b>Objectives: </b> 
<hr> 
    <p> <li>Uncover trends in digital learning</li>
<li> Visualize the trends of digital connectivity and engagement in 2020. </li>
<li> Understand and measure the scope and impact of the pandemic on digital learning. </li>
<li> How does student engagement with different types of education technology change over the course of the pandemic? </li>
<li> How does student engagement with online learning platforms relate to different geography? Demographic context (e.g., race/ethnicity, ESL, learning disability)? Learning context? Socioeconomic status?</li>
<li> Do certain state interventions, practices or policies (e.g., stimulus, reopening, eviction moratorium) correlate with the increase or decrease online engagement?</li></p>
   
</div>

## Importing Python Libraries  📘
- Libraries are important and we call them to perform the different actions on our data and for training the models.
- Its a first step to load the library to perform the specific task

In [None]:
import pandas as pd
import numpy as np  
import seaborn as sns 
pal = sns.color_palette()
from wordcloud import WordCloud
import plotly.express as px
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn import preprocessing
import glob
import plotly.offline as py
import plotly.graph_objs as go
import plotly.tools as tls
from keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense, Dropout

#for geospatial analysis
import math
import folium
from geopy.geocoders import Nominatim
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster

<h4>How we can install the libraries in python?</h4>

<h4>To install the python library is very easy</h4>
- pip install name_of_library 
<h5> Like if you wanted to install tensorflow? </h5>
- pip install tensforflow

## Loading the data 📁 

In [None]:
districts_data=pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")

## Exploratory data analysis 🔎 📊

#### Five top records of data

In [None]:
districts_data.head()

#### Five last records of data

In [None]:
districts_data.tail()

#### Coloumns/features in data

In [None]:
districts_data.columns

#### Length of data

In [None]:
print('lenght of data is', len(districts_data))

#### Shape of data

In [None]:
districts_data.shape

#### Data information

In [None]:
districts_data.info()

#### Data types of all coloumns

In [None]:
districts_data.dtypes

#### Checking Null values / missing values

In [None]:
np.sum(districts_data.isnull().any(axis=1))

#### Rows and columns in the dataset

In [None]:
print('Count of columns in the data is:  ', len(districts_data.columns))

In [None]:
print('Count of rows in the data is:  ', len(districts_data))

### Deleting the duplicate rows

In [None]:
current=len(districts_data)
print('Rows of data before Delecting ', current)

In [None]:
districts_data=districts_data.drop_duplicates()

In [None]:
now=len(districts_data)
print('Rows of data before Delecting ', now)

In [None]:
diff=current-now
print('Duplicated rows deleted ', diff)

## Data Visualization 📝
###  Univariate Analysis 
Distribution of state

In [None]:
plt.figure(figsize=(12,10))
sns.countplot(districts_data.state)
plt.xticks(rotation=90)

In [None]:
districts_data["state"].value_counts().head(10).plot(kind = 'pie', autopct='%1.1f%%', figsize=(10, 10), startangle=0).legend()

Distribution of locale

In [None]:
plt.figure(figsize=(12,10))
sns.countplot(districts_data.locale)
plt.xticks(rotation=90)

In [None]:
districts_data["locale"].value_counts().head(10).plot(kind = 'pie', autopct='%1.1f%%', figsize=(10, 10), startangle=0).legend()

Distribution of pct_black/hispanic

In [None]:
sns.countplot(data= districts_data, x = "pct_black/hispanic")
plt.show()

pct_free/reduced

In [None]:
sns.countplot(data= districts_data, x = "pct_free/reduced")
plt.show()

county_connections_ratio

In [None]:
sns.countplot(data= districts_data, x = "county_connections_ratio")
plt.show()

In [None]:
plt.figure(figsize=(12,10))
sns.countplot(districts_data.pp_total_raw)
plt.xticks(rotation=90)

## Loading the Products data

In [None]:
products_data = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")
products_data

In [None]:
plt.figure(figsize=(16, 10))
sns.countplot(y='Provider/Company Name', data=products_data, order=products_data["Provider/Company Name"].value_counts().index[:10])
plt.title("Top 10 Provider/Company Names",font="Serif", size=20)
plt.show()

In [None]:
c1=c2=c3=0
for s in products_data["Sector(s)"]:
    if(not pd.isnull(s)):
        s = s.split(";")
        for i in range(len(s)):
            sub = s[i].strip()
            if(sub == 'PreK-12'): c1+=1
            if(sub == 'Higher Ed'): c2+=1
            if(sub == 'Corporate'): c3+=1

fig, ax  = plt.subplots(figsize=(16, 8))
fig.suptitle('Sector Distribution', size = 30, font="Serif")
explode = (0.05, 0.05, 0.05)
labels = ['PreK-12','Higher Ed','Corporate']
sizes = [c1,c2, c3]
ax.pie(sizes, explode=explode,startangle=60, labels=labels,autopct='%1.2f%%', pctdistance=0.7, colors=["#ff228a","#20b1fd","#ffb703"])
ax.add_artist(plt.Circle((0,0),0.4,fc='white'))
plt.show()

In [None]:
primary_essential_main = []
primary_essential_sub = []
for s in products_data["Primary Essential Function"]:
    if(not pd.isnull(s)):
        s1 = s.split("-",1)[0].strip()
        primary_essential_main.append(s1)
    else:
        primary_essential_main.append(np.nan)
    
    if(not pd.isnull(s)):
        s2 = s.split("-",1)[1].strip()
        primary_essential_sub.append(s2)
    else:
        primary_essential_sub.append(np.nan)

products_data["primary_essential_main"] = primary_essential_main
products_data["primary_essential_sub"] = primary_essential_sub

In [None]:
c1=c2=c3=0

for s in products_data["primary_essential_main"]:
    if(not pd.isnull(s)):
        c1 += s.count("CM")
        c2 += s.count("LC")
        c3 += s.count("SDO")

fig, ax  = plt.subplots(figsize=(16, 8))
fig.suptitle('Primary Essential Function', size = 20, font="Serif")
explode = (0.05, 0.05, 0.05)
labels = ['CM','LC','SDO']
sizes = [c1, c2, c3]
ax.pie(sizes, explode=explode,startangle=60, labels=labels,autopct='%1.2f%%', pctdistance=0.7, colors=["#18ff9f","#2cfbff","#ffb703"])
ax.add_artist(plt.Circle((0,0),0.4,fc='white'))
plt.show()

In [None]:
plt.figure(figsize=(16, 20))
sns.countplot(y='primary_essential_sub', data=products_data, order=products_data["primary_essential_sub"].value_counts().index)
plt.title("Primary Essential Function(Sub)",font="Serif", size=20)
plt.show()

### Distribution of Sector(s) in the District Information Data

In [None]:
ds = products_data['Sector(s)'].value_counts().reset_index()
ds.columns = [
    'Sector(s)', 
    'percent'
]
ds['percent'] /= len(products_data)

fig = px.pie(
    ds, 
    names='Sector(s)', 
    values='percent',
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Distribution of Sector(s) in the District Information Data:', 
    width=700,
    height=500
)
fig.show()

### districts state wordcloud

In [None]:
cloud = WordCloud(width=1440, height=1080).generate(" ".join(districts_data['state'].astype(str)))
plt.figure(figsize=(15, 10))
plt.imshow(cloud)
plt.axis('off')

### Occurrence of states in the District Information Data

In [None]:
ds = districts_data['state'].value_counts().reset_index()
ds.columns = [
    'state', 
    'percent'
]
ds['percent'] /= len(districts_data)

fig = px.pie(
    ds, 
    names='state', 
    values='percent',
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Occurrence of states in the District Information Data:', 
    width=700,
    height=500
)
fig.show()

### Occurrence of Locale in the District Information Data

In [None]:
ds = districts_data['locale'].value_counts().reset_index()
ds.columns = [
    'locale', 
    'percent'
]
ds['percent'] /= len(districts_data)

fig = px.pie(
    ds, 
    names='locale', 
    values='percent',
    color_discrete_sequence=px.colors.sequential.Mint,
    title='Occurrence of Locale in the District Information Data:', 
    width=700,
    height=500
)
fig.show()

# Loading All Engagement Files

In [None]:
CSV_files=pd.DataFrame()
address = glob.glob('../input/learnplatform-covid19-impact-on-digital-learning/engagement_data/*.csv')
count=0
for i in address:
    with open(i, "rb") as data_of_files:
        data=pd.read_csv(data_of_files)
        CSV_files=pd.concat([CSV_files,data], axis=0)
        count=count+1
        if count==233:
            break  
CSV_files

### Numeric features distrubution 

In [None]:
CSV_files.hist(figsize=(20,20),bins = 20, color="#107009AA")
plt.title("Numeric Features Distribution")
plt.show()

### Bivariate Analysis


In [None]:
colormap = plt.cm.RdBu
plt.figure(figsize=(14,12))
plt.title('Pearson Correlation of Features', y=1.05, size=15)
sns.heatmap(CSV_files.corr(),linewidths=0.1,vmax=1.0, 
            square=True, cmap=colormap, linecolor='white', annot=True)

### Missing value Treatment



In [None]:
CSV_files.isnull().sum()

### lets calculate the total missing values in the each column

In [None]:
data_total = CSV_files.isnull().sum()
data_percent = ((CSV_files.isnull().sum()/CSV_files.shape[0])*100).round(2)
missing_data = pd.concat([data_total, data_percent],
                                axis=1, 
                                keys=['Data_Total', 'Data_Percent %'],
                                sort = True)
missing_data.style.bar(color = ['gold'])

## Geospatial Analysis
- For Geospatial Analysis we'll merge files the we'll do analysis on states vs pct_access
### Merging files 📁

In [None]:
path = '../input/learnplatform-covid19-impact-on-digital-learning/engagement_data' 
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    district_id = filename.split("/")[4].split(".")[0]
    df["district_id"] = district_id
    li.append(df)
    
engagement_data = pd.concat(li)
engagement_data = engagement_data.reset_index(drop=True)
engagement_data.head()

In [None]:
engagement_data['time'] = pd.to_datetime(engagement_data['time'])

In [None]:
print(products_data["LP ID"].nunique())
print(engagement_data["lp_id"].nunique())

In [None]:
products_engagement_data = pd.merge(products_data, engagement_data, left_on='LP ID', right_on='lp_id')
products_engagement_data.head()

In [None]:
print(districts_data["district_id"].nunique())
print(engagement_data["district_id"].nunique())

In [None]:
engagement_data["district_id"] = engagement_data["district_id"].astype(str).astype(int)
districts_engagement_data = pd.merge(districts_data, engagement_data, left_on='district_id', right_on='district_id')
districts_engagement_data.head()

### Let's start Geospatial analysis

In [None]:
geolocator = Nominatim(user_agent="Ruch")

def feature_generation(df):
    lat=[]
    long=[]
    for i in df['state']: 
        location = geolocator.geocode(i)
        try:
            lat.append(location.latitude)
            long.append(location.longitude)
        except:
            lat.append("NA")
            long.append("NA")
    df['Latitude'] = lat
    df['Longitude'] = long
    
    return df

def map_df(df,col1,col2):
    df = pd.DataFrame(df[[col1,col2]]\
            .groupby([col1])[col2].mean()\
            .sort_values(ascending=False)[:20]).reset_index()
    df = feature_generation(df)
    
    return df

state_access = map_df(districts_engagement_data, "state", "pct_access")
state_engagement_index = map_df(districts_engagement_data, "state", "engagement_index")

### Base Map 🌎

In [None]:
north_america_map = folium.Map(location=[38.9, -77.05], tiles='Stamen Watercolor', zoom_start=3)
north_america_map

### State and Percentage Access (Top 20) 🌎

In [None]:
mc = MarkerCluster()
for idx, row in state_access.iterrows():
    if not math.isnan(row['Longitude']) and not math.isnan(row['Latitude']):
        popup = """
        State : <b>%s</b><br>
        Percentage Access : <b>%s</b><br>
        """ % (row['state'], row['pct_access'])
        mc.add_child(Marker([row['Latitude'], row['Longitude']],tooltip=popup))
    north_america_map.add_child(mc)
north_america_map

### State and Engagement Index (Top 20) 🌎

In [None]:
mc = MarkerCluster()
for idx, row in state_engagement_index.iterrows():
    if not math.isnan(row['Longitude']) and not math.isnan(row['Latitude']):
        popup = """
        State : <b>%s</b><br>
        Engagement Index : <b>%s</b><br>
        """ % (row['state'], row['engagement_index'])
        mc.add_child(Marker([row['Latitude'], row['Longitude']],tooltip=popup))
    north_america_map.add_child(mc)
north_america_map

Summary :
- The product file ```products_info.csv``` includes information about the characteristics of the top 372 products with most users in 2020. The categories listed in this file are part of LearnPlatform's product taxonomy. 
- The district file ```districts_info.csv``` includes information about the **characteristics of school districts**, including data from 
>- NCES (2018-19), 
>- FCC (Dec 2018), and 
>- Edunomics Lab. 
- The engagement data are aggregated at school district level, and each file in the folder ```engagement_data``` represents data from **one school district**.
- First we had Analyzed the ```districts_info.csv``` & then ```products_info.csv``` lastly we did geo spatial analysis by merging.
- I'm Humbled & thankful to all of the peoples who did their code & submitted on kaggle & had did a great learning by taking help from thier notebooks.

<p><center> <h3>More code is coming Soon. Please upvote you will get notifications with additions.</h3></center> </p>

<center><img src="https://cdn.dribbble.com/users/32897/screenshots/3564812/1.gif" width= 600px length=600px></center>