# North Dakota

In this notebook the high school enrollment for North Dakota is downloaded. The files are a bit tricky to aquire. The school lookup file, which includes zip code used for plotting in the final data deliverable, is pretty straight forward and can be obtained with just wget. The enrollment files contain the upload date so the download links need to be found with a partitial title search.

First the needed packages are loaded into memory.

In [6]:
import datetime
import time
import wget
import numpy as np
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By

Next a rolling window object for the this years filename is created. This will be used to look up the data file.

In [7]:
ThisYearFile = str(datetime.datetime.now().year - 1)
Text = ThisYearFile +'-' + str(int(ThisYearFile[2:]) + 1) + ' Fall Enrollment'

For this notebook, to find the desired enrollment files using Selenium. This is will launch a browser and search for the download links. Below the browser preferences are set. This makes sure that Windows does not open a pop up when downloading and does not need permission to download the given file.

In [32]:
# Set the preferences for the firefox web browser
fp = webdriver.FirefoxProfile()
fp.set_preference('browser.download.folderList', 2)
fp.set_preference('browser.download.manager.showWhenStarting', False)
fp.set_preference('browser.download.dir', '/tmp')
fp.set_preference("http.response.timeout", 300)
fp.set_preference("dom.max_script_run_time", 300)
fp.set_preference('webdriver.load.strategy', 'unstable')
fp.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
fp.update_preferences()

After the preferences are set, the browser can be launched and navigate to the site where the desired enrollment files are found.

In [33]:
# MDE web app for hs enrollment data
searchURL = 'https://www.nd.gov/dpi/SchoolStaff/SchoolFinance/Resources/'
time.sleep(3)

# Launch web browser and navigate to the searchURL
driver = webdriver.Firefox(fp)
driver.get(searchURL)
time.sleep(3)

Next setup the partial search conditional and extract the desired data file.

In [18]:
searchConditional = '//a[@href and contains(@title,"Enrollment")]'
EnrollmentLinks = driver.find_elements_by_xpath(searchConditional)
TReport_href_Enrollment = EnrollmentLinks

After the desired link has been obtained, the download link is extracted from the selenium object.

In [19]:
 # All Enrollment by Ethnicity/Gender datasets
downloadsTitle = []
downloadURL = []
for ii in TReport_href_Enrollment:
    if Text in ii.get_attribute('title'):
        wget.download(ii.get_attribute('href'), print(ii.get_attribute('title')))
        downloadsTitle.append(ii.get_attribute('title'))
        downloadURL.append(ii.get_attribute('href'))

Using the link extracted from the selenium object, the file can now be downloaded.

In [26]:
links = driver.find_elements_by_link_text(Text)
for link in links:
    File = wget.download(link.get_attribute("href"))

100% [............................................................................] 109928 / 109928

Using the wget file object, use Pandas to read the excel file into memory.

In [75]:
ND = pd.read_excel(File, sheetname = 'School', skiprows = 2)
ND.head()

Unnamed: 0,SchNo,SchoolName,PKReg,PKSpEd,K,Gr1,Gr2,Gr3,Gr4,Gr5,Gr6,Gr7,Gr8,Gr9,Gr10,Gr11,Gr12,K-12 Total
0,01-013-3599,Hettinger Public School,31,8,27,25,21,22,16,28,22,20,20,14,17,17,26,275
1,02-002-4601,Jefferson Elem School,0,11,67,78,80,64,0,0,0,0,0,0,0,0,0,289
2,02-002-8954,Valley City Jr-Sr High School,0,0,0,0,0,0,0,0,0,110,89,88,81,78,92,538
3,02-002-9241,Washington Elem School,0,0,0,0,0,0,70,78,77,0,0,0,0,0,0,225
4,02-007-9463,Barnes County North Public School,12,0,12,15,24,17,24,19,20,23,18,28,19,24,13,256


The first few rows of the file are shown above. Below the file is subset to only the desired columns.

In [77]:
NDReduced = ND[['SchNo', 'SchoolName','Gr11', 'Gr12']]

Next the lookup file for the schools are downloaded. This is needed to be able to map the hs enrollment in the final data deliverable. First navigate to where the schools are located.

In [54]:
# MDE web app for hs enrollment data
searchURL = 'https://www.nd.gov/dpi/data/directory/schoollistings/'

# Launch web browser and navigate to the searchURL
driver.get(searchURL)
time.sleep(3)

Next find the link to download the lookup file.

In [60]:
ids_a =  driver.find_elements_by_xpath("//*[text()='LEAs and Schools (Excel)']")
for ii in ids_a:
    link = ii.get_attribute('href')
time.sleep(4)

After the link has been aquired, the file can be downloaded using wget.

In [62]:
LookUpFile = wget.download(link)

100% [............................................................................] 243712 / 243712

'sys_plnt.xls'

Next use Pandas to read the excel file into memory.

In [65]:
LookUp = pd.read_excel(LookUpFile)
LookUp.head()

Unnamed: 0,StateIssuedID,LEA Name,School Name,Phone,Mailing Address,City,St,ZIP,ZIP4,Site Address,Site City,Site St,Site ZIP,LEA Type,School Type
0,01-013,Hettinger 13,,701-567-5315,PO Box 1188,Hettinger,ND,58639,1188,209 8th St S,Hettinger,ND,58639,High School LEA,
1,01-013-3599,,Hettinger Public School,701-567-5315,PO Box 1188,Hettinger,ND,58639,1188,209 S 8th St,Hettinger,ND,58639,,Elementary/Middle/Junior High/Secondary Combin...
2,02-002,Valley City 2,,701-845-0483 Ext 102,460 Central Ave N,Valley City,ND,58072,2997,460 Central Ave N,Valley City,ND,58072,High School LEA,
3,02-002-4601,,Jefferson Elem School,701-845-0622,460 Central Ave N,Valley City,ND,58072,2997,1150 Central Ave N,Valley City,ND,58072,,Elementary (two or more teachers)
4,02-002-8954,,Valley City Jr-Sr High School,701-845-0483 Ext 2,460 Central Ave N,Valley City,ND,58072,2997,493 Central Ave N,Valley City,ND,58072,,Elementary/Middle/Junior High/Secondary Combin...


After the data file has been read into memory, subset to only the desired columns.

In [69]:
LookUp = LookUp[['StateIssuedID', 'LEA Name', 'School Name', 'Site Address', 'Site City', 'Site St',
       'Site ZIP']]

Once both the enrollment and lookup files are downloaded and cleaned they can be joined together via Pandas below.

In [80]:
NDClean = pd.merge(LookUp, 
         NDReduced, 
         how = 'inner', 
         left_on = 'StateIssuedID', right_on = 'SchNo')

Finally the cleaned data file can be written to csv.

In [81]:
NDClean.to_csv('ND.csv')