<img src="https://raw.githubusercontent.com/heyrobin/Covid-Digital-Learning/main/Untitled-1.jpg">

<div style="display:fill;
            border-radius: False;
            border-style: solid;
            border-color:#527B91;
            border-style: false;
            border-width: 3px;
            color:#121212;
            font-size:15px;
            font-family: Georgia;
            background-color:'';
            text-align:center;
            letter-spacing:0.5px;
            padding: 0.7em;
            text-align:left">
    
**About Competition**

We challenge the Kaggle community to explore (1) the state of digital learning in 2020 and (2) how the engagement of digital learning relates to factors such as district demographics, broadband access, and state/national level policies and events.

We encourage you to guide the analysis with questions that are related to the themes that are described above (in bold font). Below are some examples of questions that relate to our problem statement:

* What is the picture of digital connectivity and engagement in 2020?
* What is the effect of the COVID-19 pandemic on online and distance learning, and how might this also evolve in the future?
* How does student engagement with different types of education technology change over the course of the pandemic?
* How does student engagement with online learning platforms relate to different geography? Demographic context (e.g., race/ethnicity, ESL, learning disability)? Learning context? Socioeconomic status?
* Do certain state interventions, practices or policies (e.g., stimulus, reopening, eviction moratorium) correlate with the increase or decrease online engagement?


<h3>Data Definition

<table class="Data_Definition">
  <thead>
    <tr>
      <th class="header">Name</th>
      <th class="header">Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>time</td>
      <td>date in "YYYY-MM-DD"</td>
    </tr>
    <tr>
      <td>lp_id</td>
      <td>The unique identifier of the product</td>
    </tr>
    <tr>
      <td>pct_access</td>
      <td>Percentage of students in the district have at least one page-load event of a given product and on a given day</td>
    </tr>
    <tr>
      <td>engagement_index	</td>
      <td>Total page-load events per one thousand students of a given product and on a given day</td>
    </tr>
    <tr>
      <td>district_id </td>
      <td>The unique identifier of the school district</td>
    </tr>
    <tr>
      <td>state</td>
      <td>The state where the district resides in</td>
    </tr>
    <tr>
      <td>locale</td>
      <td>NCES locale classification that categorizes U.S. territory into four types of areas: City, Suburban, Town, and Rural. See Locale Boundaries User's Manual for more information.</td>
    </tr>
    <tr>
      <td>pct_black/hispanic</td>
      <td>Percentage of students in the districts identified as Black or Hispanic based on 2018-19 NCES data</td>
    </tr>
    <tr>
      <td>pct_free/reduced</td>
      <td>Percentage of students in the districts eligible for free or reduced-price lunch based on 2018-19 NCES data</td>
    </tr>
    <tr>
      <td>county connections ratio</td>
      <td>ratio (residential fixed high-speed connections over 200 kbps in at least one direction/households) based on the county level data from FCC From 477 (December 2018 version). See FCC data for more information.</td>
    </tr>
    <tr>
      <td>pptotalraw</td>
      <td>Per-pupil total expenditure </td>
    </tr>
       <tr>
      <td>LP ID</td>
      <td>The unique identifier of the product</td>
    </tr>  
          <tr>
      <td>URL</td>
      <td>Web Link to the specific product</td>
    </tr>  
          <tr>
      <td>Product Name</td>
      <td>Name of the specific product</td>
    </tr>  
          <tr>
      <td>Provider/Company Name</td>
      <td>Name of the product provider</td>
    </tr>
      <tr>
          <td>Sector(s)</td>
          <td>Sector of education where the product is used</td>
           </tr>
       <tr>
          <td>Primary Essential Function</td>
          <td>The basic function of the product. There are two layers of labels here. Products are first labeled as one of these three categories: LC = Learning & Curriculum, CM = Classroom Management, and SDO = School & District Operations. Each of these categories have multiple sub-categories with which the products were labeled</td>
           </tr>
  </tbody>
</table>

# **<center><span style='font-family:Georgia'> <span style='background:skyblue'> 📕 Importing Libraries and DataSet </span>**

<p><strong><span style="color:Black;"> <span style="font-size:135%">LIBRARIES</span>

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import glob # for assembling multipe csvs
import missingno as msno

#for visualization
import seaborn as sns 
import matplotlib.pyplot as plt
import matplotlib as mpl
from wordcloud import WordCloud, STOPWORDS

#for Ignoring the warnings and errors
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [None]:
colors = ['#90C6E8','#F1887D','#DEA060','#A8A3C7','#527B91']
sns.palplot(sns.color_palette(colors))

In [None]:
plt.style.use('fivethirtyeight')

plt.rcParams['font.sans-serif'] = 'Arial'
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['text.color'] = '#000000'
plt.rcParams['axes.labelcolor']= '#000000'
plt.rcParams['xtick.color'] = '#000000'
plt.rcParams['ytick.color'] = '#000000'
plt.rcParams['font.size']=12

<p><strong><span style="color:Black;"> <span style="font-size:135%">DATASETS</span>

In [None]:
# importing dataset
districts_info = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")
products_info = pd.read_csv("../input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")


# importing all the csv from engagement folder
folder = glob.glob("../input/learnplatform-covid19-impact-on-digital-learning/engagement_data/*.csv")

merged = []

for CSV in folder:
    df = pd.read_csv(CSV, index_col = None, header = 0)
    district_id = CSV.split("/")[4].split(".")[0]
    df["district_id"] = district_id
    merged.append(df)
    
engagement_info = pd.concat(merged)
engagement_info = engagement_info.reset_index(drop=True)

<blockquote><p><strong> 📌 Notes :</strong></p>
<ul>
<li>We have imported data sets easy peasy. lets see what the data sets looks like. below is the cool function to take a look at dataframes and shape.</li>
</ul>


In [None]:
df_name = ['districts_info','products_info','engagement_info']
df_list = [districts_info,products_info,engagement_info]
for i in range(3):
    print('****'*10)
    print(f'Dataframe {df_name[i]} has {df_list[i].shape[0]} Rows and {df_list[i].shape[1]} Columns')
    print('****'*10)
    display(df_list[i].head(5).style.set_properties(**{'background-color': colors[1],'color': 'White','border': '1.5px  solid black'}))

<p><strong><span style="color:Black;"> <span style="font-size:135%">MISSING VALUES</span>

In [None]:
plt.subplot(1,3,1)
msno.bar(districts_info, color = colors[3]);

plt.subplot(1,3,2)
msno.bar(products_info, color = colors[3]);

plt.subplot(1,3,3)
msno.bar(engagement_info, color = colors[3]);

<blockquote><p><strong> 📌 Notes :</strong></p>
<ul>
<li>We have some missing values on all the datasets.</li>
</ul>


<p><strong><span style="color:Black;"> <span style="font-size:135%">PREPROCESSING</span>

In [None]:
districts_info = districts_info[districts_info.state.notna()].reset_index(drop=True)

In [None]:
districts_info.head(5)

<blockquote><p><strong> 📌 Notes :</strong></p>
<ul>
<li>Droping the Null values.</li>
</ul>


# **<center><span style='font-family:Georgia'> <span style='background:skyblue'> 📈 Exporitory Data Analysis </span>**

<p><strong><span style="color:Black;"> <span style="font-size:135%">STATE DISTRIBUTIONS</span>

In [None]:
plt.figure(figsize=(18,8))

plotting = sns.countplot(y = districts_info['state'],
                         order=districts_info['state'].value_counts().index,
                        color = colors[2])
plt.ylabel('State')

#Text
plotting.text(x = -5, y = -4.2, s = "State Distribution",fontsize = 26, weight = 'bold', alpha = .90);
plotting.text(x = -5, y = -3, s = "Distribution of United States",fontsize = 19, alpha = .85)
plotting.text(x = 30, y = 0.28, s = 'Highest', weight = 'bold',backgroundcolor = '#f0f0f0')
plotting.text(x = 1.1, y = 22.3, s = 'Lowest', weight = 'bold',backgroundcolor = '#f0f0f0')
plotting.text(x = -5, y = 26,s = '    ©HeyRobin/Kaggle                                                                                                                                                                    Source: LearnPlatform COVID-19 Impact on Digital Learning   '      ,
              fontsize = 14, color = colors[0], backgroundcolor = 'grey')

plt.show()

In [None]:
print(f"="*100)
print(f"NUMBER of UNIQUE VALUES {districts_info['state'].nunique()} and NULL VALUES : {districts_info['state'].isna().sum()}")
print(f"="*100)

<blockquote><p><strong> 📌 Notes :</strong></p>
<ul>
<li>Connecticut has the highest number and the Florida on lowest.</li>
</ul>


<p><strong><span style="color:Black;"> <span style="font-size:135%">DISTRICTS</span>

In [None]:
plt.figure(figsize=(18,4))

plt.subplot(1,4,1)
A=sns.countplot(x =districts_info['locale'],color=colors[1])
A.text(x = -1,
       y = 120,
       s = "Districts count",
       fontsize = 26,
       weight = 'bold',
       alpha = .90)

plt.xlabel('Locale')


plt.subplot(1,4,2)
sns.countplot(x =districts_info['pct_black/hispanic'],color=colors[1])
plt.xticks(rotation='90')
plt.xlabel('% Black/Hispanic')

plt.subplot(1,4,3)
sns.countplot(x =districts_info['pct_free/reduced'],color=colors[1])
plt.xticks(rotation='90')
plt.xlabel('% Free/Reduced')

plt.subplot(1,4,4)
sns.countplot(x =districts_info['county_connections_ratio'],color=colors[1])
plt.xlabel('LOCALE')
plt.xlabel('County Connection Ratio')

plt.xticks(rotation='90');

In [None]:
plt.figure(figsize=(17,6))
B = sns.countplot(y =districts_info['pp_total_raw'],color=colors[4])
plt.ylabel('Per-Pupil')
plt.title('Per-pupil Total Count');

<p><strong><span style="color:Black;"> <span style="font-size:135%">COMPANIES</span>

In [None]:
plt.figure(figsize=(16, 10))
plotting = sns.countplot(y='Provider/Company Name', data=products_info, order=products_info["Provider/Company Name"].value_counts().index[:15],color=colors[2])

plotting.text(x = -14, y = -2.0, s = "Company Distribution",fontsize = 26, weight = 'bold', alpha = .90);
plotting.text(x = -14, y = -1.4, s = "Distribution of Digital Learnin Providers ",fontsize = 19, alpha = .85)
plotting.text(x = 30.3, y = 0.11, s = 'Highest', weight = 'bold',backgroundcolor = '#f0f0f0')
plotting.text(x = 2.3, y = 14.1, s = 'Lowest', weight = 'bold',backgroundcolor = '#f0f0f0')
plotting.text(x = -14, y = 16,s = '    ©HeyRobin/Kaggle                                                                                                                                                                                                     Source: LearnPlatform COVID-19 Impact on Digital Learning    '      ,
              fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey');


<p><strong><span style="color:Black;"> <span style="font-size:135%">PRODUCTS NAME</span>

In [None]:
cloud = WordCloud(width=1080,height=270,background_color='white').generate(" ".join(products_info['Product Name'].astype(str)))
plt.figure(figsize=(22, 10))
plt.imshow(cloud)
plt.axis('off');

<p><strong><span style="color:Black;"> <span style="font-size:150%">SECTORS</span>

In [None]:
plt.figure(figsize=(28, 6))

plt.subplot(1,2,1)
labels = ['PreK-12', 'PreK-12; Higher Ed; Corporate', 'PreK-12; Higher Ed', 'Higher Ed; Corporate','Corporate']
plt.pie(x=products_info["Sector(s)"].value_counts(),
        labels=labels,
        pctdistance=0.5,
        autopct='%1.1f%%', 
        shadow=False,
        startangle=40,
        labeldistance=1.4, explode=[0,0.1,0.1,0.5,0.6])
plt.axis('off')



plt.subplot(1,2,2)
plotting = sns.countplot(y = products_info["Sector(s)"],color=colors[1])
plotting.text(x = -190, y = -2.0, s = "Distribution of all the Sectors",fontsize = 26, weight = 'bold', alpha = .90)
plotting.text(x = -190, y = -1.7, s = "Distribution of Digital Learnin Providers ",fontsize = 19, alpha = .85)
plt.ylabel('Sectors')

plt.show()

<div style="display:fill;
            border-radius: false;
            border-style: solid;
            border-color:#527B91;
            border-style: false;
            border-width: 2px;
            color:#CF673A;
            font-size:15px;
            font-family: Georgia;
            background-color:'';
            text-align:center;
            letter-spacing:0.1px;
            padding: 0.1em;">

**<h2>END**

<h2><center> <span style="font-family:Georgia"> <span style="color:Black;font-weight:bold"> <span style="background:skyblue">✌️ If you like my notebook and found it usefull please do upvote