This school-level database contains data from the following sources: the National Center for Education Statistics’ Common Core of Data (CCD), the Civil Rights Data Collection (CRDC), the US Department of Education’s EDFacts, and IPUMS’ National Historical Geographic Information System (NHGIS). Although the CCD includes information on schools over a long period, the topics covered are more limited than in the CRDC. 

The CCD include basic information about the school (e.g., location, grade offerings, and charter status) and student enrollment demographics. 

The CRDC contains three years of data on every public school on such topics as student discipline, retention, course enrollment, chronic absenteeism, and SAT or ACT test-taking. 

EDFacts provides assessment and graduation rate data broken down by various subgroups. 

NHGIS provides census geographies such as tract and block group for each school.

Data sources:
Common Core of Data
The Common Core of Data is the US Department of Education's primary database on public elementary and secondary education. It provides directory and enrollment information at the school level and directory, enrollment, and finance data at the school district level.
https://nces.ed.gov/ccd/

The Civil Rights Data Collection
The Civil Rights Data Collection (CRDC) is a biennial survey required by the US Department of Education's Office for Civil Rights. The CRDC features data about enrollment, math and science courses, Advanced Placement courses, discipline, school expenditures, and teacher experiences. The following endpoints are currently available:
Directory
Enrollment
Discipline
Harassment and bullying
Restraint and seclusion
Chronic absenteeism
Advanced coursework
AP exams
SAT and ACT participation
https://ocrdata.ed.gov/

EDFacts
The US Department of Education's EDFacts initiative collects, analyzes, and centralizes data from state education agencies on various topics. Currently included in the explorer and portal are assessment data for reading and math for grades 3–12 at both the school and school district level.
https://www2.ed.gov/about/inits/ed/edfacts/index.html

National Historical Geographic Information System
Provided through IPUMS, the National Historical Geographic Information System contains population, housing, agriculture, and economic data for all census geographies from 1790 through the present.
https://www.nhgis.org/

Model Estimates of Poverty in Schools (MEPS)
Model Estimates of Poverty in Schools is the Urban Institute's school-level measure of the percentage of students living in poverty. It is comparable across states and time. Derived from the CCD and the SAIPE, it provides the estimated percentage of students with family incomes up to 100 percent of the federal poverty level for the years 2013 through 2018.
https://www.urban.org/research/publication/model-estimates-poverty-schools

In [69]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os


#This Directory contains school-level information on location, mailing addresses, school types, highest and lowest grades offered, and free and reduced-price lunch. This endpoint also contains the school-level data on the number of full-time eqivalent teachers.

In [70]:
from urllib.request import urlopen
from json import loads
url = "https://educationdata.urban.org/api/v1/schools/ccd/directory/1988/"
response = urlopen(url)
data = loads(response.read())

In [71]:
data.keys()

dict_keys(['count', 'next', 'previous', 'results'])

In [72]:
directory=pd.DataFrame(data)

In [73]:
directory.head()

Unnamed: 0,count,next,previous,results
0,84968,https://educationdata.urban.org/api/v1/schools...,,"{'year': 1988, 'ncessch': '010000201704', 'sch..."
1,84968,https://educationdata.urban.org/api/v1/schools...,,"{'year': 1988, 'ncessch': '010000201705', 'sch..."
2,84968,https://educationdata.urban.org/api/v1/schools...,,"{'year': 1988, 'ncessch': '010000201706', 'sch..."
3,84968,https://educationdata.urban.org/api/v1/schools...,,"{'year': 1988, 'ncessch': '010000500870', 'sch..."
4,84968,https://educationdata.urban.org/api/v1/schools...,,"{'year': 1988, 'ncessch': '010000500871', 'sch..."


In [74]:
print(directory["results"])

0       {'year': 1988, 'ncessch': '010000201704', 'sch...
1       {'year': 1988, 'ncessch': '010000201705', 'sch...
2       {'year': 1988, 'ncessch': '010000201706', 'sch...
3       {'year': 1988, 'ncessch': '010000500870', 'sch...
4       {'year': 1988, 'ncessch': '010000500871', 'sch...
                              ...                        
9995    {'year': 1988, 'ncessch': '063537008981', 'sch...
9996    {'year': 1988, 'ncessch': '063537008983', 'sch...
9997    {'year': 1988, 'ncessch': '063537009441', 'sch...
9998    {'year': 1988, 'ncessch': '063537009442', 'sch...
9999    {'year': 1988, 'ncessch': '063543006033', 'sch...
Name: results, Length: 10000, dtype: object


From directory, the following variables are important free_or_reduced_price_lunch, name of schools, and states, school type, year, grade level

In [75]:
from urllib.request import urlopen
from json import loads
url = "https://educationdata.urban.org/api/v1/schools/ccd/enrollment/2014/grade-8/"
response = urlopen(url)
enrollmentbygrade = loads(response.read())

#This endpoint contains student enrollment for each school by students' race and sex.

In [76]:
data.keys()

dict_keys(['count', 'next', 'previous', 'results'])

In [77]:
enrollmentbygrade=pd.DataFrame(enrollmentbygrade)

In [78]:
enrollmentbygrade.head()

Unnamed: 0,count,next,previous,results
0,36585,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2014, 'ncessch': '010000200277', 'nce..."
1,36585,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2014, 'ncessch': '010000201402', 'nce..."
2,36585,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2014, 'ncessch': '010000201667', 'nce..."
3,36585,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2014, 'ncessch': '010000201670', 'nce..."
4,36585,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2014, 'ncessch': '010000201705', 'nce..."


In [80]:
enrollmentbygrade["results"][1]

{'year': 2014,
 'ncessch': '010000201402',
 'ncessch_num': 10000201402,
 'grade': 8,
 'race': 99,
 'sex': 99,
 'enrollment': None,
 'fips': 1,
 'leaid': '100002'}

In [None]:
#From Enrollment—By-grade, the following information might be important: year, grade, race, sex


In [81]:
from urllib.request import urlopen
from json import loads
url = "https://educationdata.urban.org/api/v1/schools/ccd/enrollment/2013/grade-3/race/"
response = urlopen(url)
EnrollmentBygradeandrace = loads(response.read())

In [82]:
EnrollmentBygradeandrace=pd.DataFrame(EnrollmentBygradeandrace)

In [83]:
EnrollmentBygradeandrace.head()

Unnamed: 0,count,next,previous,results
0,433640,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2013, 'ncessch': '010000201402', 'nce..."
1,433640,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2013, 'ncessch': '010000500889', 'nce..."
2,433640,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2013, 'ncessch': '010000600876', 'nce..."
3,433640,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2013, 'ncessch': '010000600877', 'nce..."
4,433640,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2013, 'ncessch': '010000600880', 'nce..."


In [86]:
EnrollmentBygradeandrace["results"][1]

{'year': 2013,
 'ncessch': '010000500889',
 'ncessch_num': 10000500889,
 'grade': 3,
 'race': 1,
 'sex': 99,
 'enrollment': 213,
 'fips': 1,
 'leaid': '100005'}

In [87]:
#There are overlapped between EnrollmentBygradeandrace and EnrollmentBygrade data sets

In [89]:
from urllib.request import urlopen
from json import loads
url = "https://educationdata.urban.org/api/v1/schools/ccd/enrollment/2012/grade-5/sex"
response = urlopen(url)
EnrollmentBygradeandsex = loads(response.read())

In [90]:
EnrollmentBygradeandsex=pd.DataFrame(EnrollmentBygradeandsex)

In [91]:
EnrollmentBygradeandsex.head()

Unnamed: 0,count,next,previous,results
0,159039,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2012, 'ncessch': '010000201402', 'nce..."
1,159039,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2012, 'ncessch': '010000500879', 'nce..."
2,159039,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2012, 'ncessch': '010000600193', 'nce..."
3,159039,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2012, 'ncessch': '010000600872', 'nce..."
4,159039,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2012, 'ncessch': '010000600876', 'nce..."


In [92]:
EnrollmentBygradeandsex["results"][1]

{'year': 2012,
 'ncessch': '010000500879',
 'ncessch_num': 10000500879,
 'grade': 5,
 'race': 99,
 'sex': 1,
 'enrollment': 148,
 'fips': 1,
 'leaid': '100005'}

#There are overlap data in EnrollmentBygrade, EnrollmentBygradeandrace and EnrollmentBygradeandsex datasets.

In [95]:
from urllib.request import urlopen
from json import loads
url = "https://educationdata.urban.org/api/v1/schools/ccd/enrollment/2007/grade-6/race/sex/"
response = urlopen(url)
EnrollmentBygradesexrace = loads(response.read())

In [96]:
EnrollmentBygradesexrace=pd.DataFrame(EnrollmentBygradesexrace)

In [97]:
EnrollmentBygradesexrace.head()

Unnamed: 0,count,next,previous,results
0,705582,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2007, 'ncessch': '100000600017', 'nce..."
1,705582,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2007, 'ncessch': '100000600017', 'nce..."
2,705582,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2007, 'ncessch': '100000600017', 'nce..."
3,705582,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2007, 'ncessch': '100000600017', 'nce..."
4,705582,https://educationdata.urban.org/api/v1/schools...,,"{'year': 2007, 'ncessch': '100000600017', 'nce..."


#There are overlap data in EnrollmentBygrade, EnrollmentBygradeandrace and EnrollmentBygradeandsex and EnrollmentBygradesexrace datasets