# Acquire Data - US Department of Education data API

**Table of Contents**

1. [Intro](#1.-Intro)
2. [URL](#2.-URL)
3. [Functions](#3.-Functions)

## 1. Intro

An organization by the name of the Urban Institute is the gateway to the US Department of Education datasets. They make this data available via paginated API.


## 2. URL
https://educationdata.urban.org/documentation/schools.html#crdc-sat-and-act-participation-by-race-and-sex


## 3. Functions



In [6]:
import pandas as pd
import json
import pandas as pd
import requests
from tqdm import tqdm_notebook

In [2]:
def school_data(source, topic, target_file, year_begin, year_end=False, grade=False):
     '''This function is designed to download data from the Urban Institute's data portal
     by leveraging its API. The API is paginated and each page contains 3k records. We use the streaming feature of Python's Requests
     library to save the data to disk instead of holding it in memory.
     source can be CCD or CRCD, topic can be enrollment, sat-act-participation, etc, grade is filtered for 11th grade'''
     
     base_url =  "https://educationdata.urban.org/api/v1/schools/"
     target_file  = open(str(target_file)+'.txt', 'wb')

     for year in range(year_begin, year_end):
          try:
               if (source, topic) in {('ccd', 'directory'), ('crdc', 'directory'), ('crdc', 'school-finance')}:
                    url = base_url+source+"/"+topic+"/"+str(year)+"/"                    
               elif (source, topic) in {('ccd', 'enrollment'), ('crdc', 'retention')}:
                    url = base_url+source+"/"+topic+"/"+str(year)+"/"+'grade-'+str(grade)+"/race/sex/"
               elif (source, topic) in {('crdc', 'enrollment'), ('crdc', 'sat-act-participation')}:
                    url = base_url+source+"/"+topic+"/"+str(year)+"/race/sex/"

               response = requests.get(url, stream=True)#--------->Streaming data into bytes
               data = response.json()
               number_pages = math.floor(data['count']/3000)
               print(url, "number of records: {}, number of pages: {}".format(data['count'], number_pages))
               i = 1

               for data['next'] in tqdm_notebook(range(0,number_pages)): #---------->use next to retrieve all pages
                    i += 1
                    url1 = url+"?page="+str(i)
                    print('year:{} downloading page # {}'.format(year, data['next']))
                    response = requests.get(url1, stream =True)
                    for chunk in response.iter_content(chunk_size= 1024*1000):#---->chunk response into ios
                         target_file.write(chunk) 
                    target_file.write(b'\n')#-------------------------------------> write chuncks to text file and insert a new line the end of each year loop
          except:
              pass
          
     target_file.close()

     return target_file

In [2]:
enrollment_data = school_data('ccd','enrollment', 'crcd_enrollment', 2015, 2020, 11) #download  enrollment information at the school level

https://educationdata.urban.org/api/v1/schools/ccd/enrollment/2015/grade-11/race/sex/
https://educationdata.urban.org/api/v1/schools/ccd/enrollment/2015/grade-11/race/sex/ number of records: 644592, number of pages: 214


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for data['next'] in tqdm_notebook(range(0,number_pages)):


  0%|          | 0/214 [00:00<?, ?it/s]

year:2015 downloading page # 0
year:2015 downloading page # 1
year:2015 downloading page # 2
year:2015 downloading page # 3
year:2015 downloading page # 4
year:2015 downloading page # 5
year:2015 downloading page # 6
year:2015 downloading page # 7
year:2015 downloading page # 8
year:2015 downloading page # 9
year:2015 downloading page # 10
year:2015 downloading page # 11
year:2015 downloading page # 12
year:2015 downloading page # 13
year:2015 downloading page # 14
year:2015 downloading page # 15
year:2015 downloading page # 16
year:2015 downloading page # 17
year:2015 downloading page # 18
year:2015 downloading page # 19
year:2015 downloading page # 20
year:2015 downloading page # 21
year:2015 downloading page # 22
year:2015 downloading page # 23
year:2015 downloading page # 24
year:2015 downloading page # 25
year:2015 downloading page # 26
year:2015 downloading page # 27
year:2015 downloading page # 28
year:2015 downloading page # 29
year:2015 downloading page # 30
year:2015 download

  0%|          | 0/298 [00:00<?, ?it/s]

year:2016 downloading page # 0
year:2016 downloading page # 1
year:2016 downloading page # 2
year:2016 downloading page # 3
year:2016 downloading page # 4
year:2016 downloading page # 5
year:2016 downloading page # 6
year:2016 downloading page # 7
year:2016 downloading page # 8
year:2016 downloading page # 9
year:2016 downloading page # 10
year:2016 downloading page # 11
year:2016 downloading page # 12
year:2016 downloading page # 13
year:2016 downloading page # 14
year:2016 downloading page # 15
year:2016 downloading page # 16
year:2016 downloading page # 17
year:2016 downloading page # 18
year:2016 downloading page # 19
year:2016 downloading page # 20
year:2016 downloading page # 21
year:2016 downloading page # 22
year:2016 downloading page # 23
year:2016 downloading page # 24
year:2016 downloading page # 25
year:2016 downloading page # 26
year:2016 downloading page # 27
year:2016 downloading page # 28
year:2016 downloading page # 29
year:2016 downloading page # 30
year:2016 download

  0%|          | 0/290 [00:00<?, ?it/s]

year:2017 downloading page # 0
year:2017 downloading page # 1
year:2017 downloading page # 2
year:2017 downloading page # 3
year:2017 downloading page # 4
year:2017 downloading page # 5
year:2017 downloading page # 6
year:2017 downloading page # 7
year:2017 downloading page # 8
year:2017 downloading page # 9
year:2017 downloading page # 10
year:2017 downloading page # 11
year:2017 downloading page # 12
year:2017 downloading page # 13
year:2017 downloading page # 14
year:2017 downloading page # 15
year:2017 downloading page # 16
year:2017 downloading page # 17
year:2017 downloading page # 18
year:2017 downloading page # 19
year:2017 downloading page # 20
year:2017 downloading page # 21
year:2017 downloading page # 22
year:2017 downloading page # 23
year:2017 downloading page # 24
year:2017 downloading page # 25
year:2017 downloading page # 26
year:2017 downloading page # 27
year:2017 downloading page # 28
year:2017 downloading page # 29
year:2017 downloading page # 30
year:2017 download

  0%|          | 0/284 [00:00<?, ?it/s]

year:2018 downloading page # 0
year:2018 downloading page # 1
year:2018 downloading page # 2
year:2018 downloading page # 3
year:2018 downloading page # 4
year:2018 downloading page # 5
year:2018 downloading page # 6
year:2018 downloading page # 7
year:2018 downloading page # 8
year:2018 downloading page # 9
year:2018 downloading page # 10
year:2018 downloading page # 11
year:2018 downloading page # 12
year:2018 downloading page # 13
year:2018 downloading page # 14
year:2018 downloading page # 15
year:2018 downloading page # 16
year:2018 downloading page # 17
year:2018 downloading page # 18
year:2018 downloading page # 19
year:2018 downloading page # 20
year:2018 downloading page # 21
year:2018 downloading page # 22
year:2018 downloading page # 23
year:2018 downloading page # 24
year:2018 downloading page # 25
year:2018 downloading page # 26
year:2018 downloading page # 27
year:2018 downloading page # 28
year:2018 downloading page # 29
year:2018 downloading page # 30
year:2018 download

  0%|          | 0/278 [00:00<?, ?it/s]

year:2019 downloading page # 0
year:2019 downloading page # 1
year:2019 downloading page # 2
year:2019 downloading page # 3
year:2019 downloading page # 4
year:2019 downloading page # 5
year:2019 downloading page # 6
year:2019 downloading page # 7
year:2019 downloading page # 8
year:2019 downloading page # 9
year:2019 downloading page # 10
year:2019 downloading page # 11
year:2019 downloading page # 12
year:2019 downloading page # 13
year:2019 downloading page # 14
year:2019 downloading page # 15
year:2019 downloading page # 16
year:2019 downloading page # 17
year:2019 downloading page # 18
year:2019 downloading page # 19
year:2019 downloading page # 20
year:2019 downloading page # 21
year:2019 downloading page # 22
year:2019 downloading page # 23
year:2019 downloading page # 24
year:2019 downloading page # 25
year:2019 downloading page # 26
year:2019 downloading page # 27
year:2019 downloading page # 28
year:2019 downloading page # 29
year:2019 downloading page # 30
year:2019 download

In [6]:
crcd_directory= school_data('crdc','directory', 'crcd_directory', 2017, 2020) #-->download school location information

https://educationdata.urban.org/api/v1/schools/crdc/directory/2017/
https://educationdata.urban.org/api/v1/schools/crdc/directory/2017/ number of records: 97632, number of pages: 32


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for data['next'] in tqdm_notebook(range(0,number_pages)):


  0%|          | 0/32 [00:00<?, ?it/s]

year:2017 downloading page # 0
year:2017 downloading page # 1
https://educationdata.urban.org/api/v1/schools/crdc/directory/2018/
https://educationdata.urban.org/api/v1/schools/crdc/directory/2018/ number of records: 0, number of pages: 0


0it [00:00, ?it/s]

https://educationdata.urban.org/api/v1/schools/crdc/directory/2019/
https://educationdata.urban.org/api/v1/schools/crdc/directory/2019/ number of records: 0, number of pages: 0


0it [00:00, ?it/s]

In [8]:
crcd_SAT = school_data('crdc','sat-act-participation', 'crcd_SAT_2017', 2017, 2018) #--> download sat-act participation counts per school
crcd_SAT

https://educationdata.urban.org/api/v1/schools/crdc/sat-act-participation/2017/race/sex/
https://educationdata.urban.org/api/v1/schools/crdc/sat-act-participation/2017/race/sex/ number of records: 2343168, number of pages: 781


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for data['next'] in tqdm_notebook(range(0,number_pages)):


  0%|          | 0/781 [00:00<?, ?it/s]

year:2017 downloading page # 0
year:2017 downloading page # 1
year:2017 downloading page # 2
year:2017 downloading page # 3
year:2017 downloading page # 4
year:2017 downloading page # 5


<_io.BufferedWriter name='crcd_SAT_2017.txt'>

In [22]:
school_directory

<_io.BufferedWriter name='crcd_school_directory2017.txt'>