> # **Problem Statement**

The COVID-19 Pandemic has disrupted learning for more than 56 million students in the United States. In the Spring of 2020, most states and local governments across the U.S. closed educational institutions to stop the spread of the virus. In response, schools and teachers have attempted to reach students remotely through distance learning tools and digital platforms. Until today, concerns of the exacaberting digital divide and long-term learning loss among America’s most vulnerable learners continue to grow.

<img src = 'https://www.deccanherald.com/sites/dh/files/styles/article_detail/public/article_images/2017/10/25/639382.jpg?itok=3gVHgLWL'>

The COVID-19 has resulted in schools shut all across the world. Globally, over 1.2 billion children are out of the classroom. As a result, education has changed dramatically, with the distinctive rise of e-learning, whereby teaching is undertaken remotely and on digital platforms. Research suggests that online learning has been shown to increase retention of information, and take less time, meaning the changes coronavirus have caused might be here to stay.
Some students without reliable internet access and/or technology struggle to participate in digital learning; this gap is seen across countries and between income brackets within countries. In the US, there is a significant gap between those from privileged and disadvantaged backgrounds: whilst virtually all 15-year-olds from a privileged background said they had a computer to work on, nearly 25% of those from disadvantaged backgrounds did not. While some schools and governments have been providing digital equipment to students in need, such as in New South Wales, Australia, many are still concerned that the pandemic will widen the digital divide.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # For visualisation
import seaborn as sns # For visualisation
from warnings import filterwarnings
filterwarnings('ignore')

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

> # **Data Overview**

In [None]:
products = pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/products_info.csv")

In [None]:
products.head()

> ## **Products Data Description**

* LP ID - The unique identifier of the product
* URL - Web Link to the specific product
* Product Name - Name of the specific product
* Provider/Company Name - Name of the product provider
* Sector(s) - Sector of education where the product is used
* Primary Essential Function - The basic function of the product. There are two layers of labels here. Products are first labeled as one of these three categories: LC = Learning & Curriculum, CM = Classroom Management, and SDO = School & District Operations. Each of these categories have multiple sub-categories with which the products were labeled

In [None]:
products.shape

In [None]:
districts = pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/districts_info.csv")

In [None]:
districts.head()

> ## **Districts Data Description**

* district_id - The unique identifier of the school district
* state - The state where the district resides in
* locale - NCES locale classification that categorizes U.S. territory into four types of areas: City, Suburban, Town, and Rural. See Locale Boundaries User's Manual for more information.
* pct_black/hispanic - Percentage of students in the districts identified as Black or Hispanic based on 2018-19 NCES data
* pct_free/reduced - Percentage of students in the districts eligible for free or reduced-price lunch based on 2018-19 NCES data
* countyconnectionsratio - ratio (residential fixed high-speed connections over 200 kbps in at least one direction/households) based on the county level data from FCC From 477 (December 2018 version). See FCC data for more information.
* pptotalraw - Per-pupil total expenditure (sum of local and federal expenditure) from Edunomics Lab's National Education Resource Database on Schools (NERD$) project. The expenditure data are school-by-school, and we use the median value to represent the expenditure of a given school district.

In [None]:
districts.shape

In [None]:
engagement = pd.read_csv("/kaggle/input/learnplatform-covid19-impact-on-digital-learning/engagement_data/5802.csv")

In [None]:
engagement.head()

> ## **Engagement Data Description**

* time - date in "YYYY-MM-DD"
* lp_id - The unique identifier of the product
* pct_access - Percentage of students in the district have at least one page-load event of a given product and on a given day
* engagement_index - Total page-load events per one thousand students of a given product and on a given day

In [None]:
engagement.shape