# Wisconsin

First import the packages that will be needed to access and clean the data. This will require a web browser to be launched with selenium. For this project I used geckodriver. The pyautogui is used to download the look up table. After clicking the search button, all of the wisconsin schools appear and selenium is not able to find the download button in the viewport. Using the scroll functionality of pyautogui brings the download button into view.

In [31]:
import datetime
import time
import wget
import numpy as np
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
import zipfile
from geopy.geocoders import Nominatim
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import pyautogui

First create the object containing a URL to download this years enrollment file.

In [2]:
ThisYear = str(datetime.datetime.now().year - 1)  + '-' + str(datetime.datetime.now().year)[2:]
PreString = 'https://dpi.wi.gov/sites/default/files/wise/downloads/enrollment_certified_'
PostString = '.zip'
File = PreString + ThisYear + PostString
File

'https://dpi.wi.gov/sites/default/files/wise/downloads/enrollment_certified_2017-18.zip'

Using wget, download the file from the URL above.

In [3]:
ThisYearFile = wget.download(File)

100% [............................................................................] 749325 / 749325

Next create an object to download the last years download file.

In [4]:
LastYear = str(datetime.datetime.now().year - 2)  + '-' + str(datetime.datetime.now().year -1)[2:]
File = PreString + LastYear + PostString
LastYearFile = wget.download(File)

100% [............................................................................] 749421 / 749421

The files are in a zip format so the data will need to be extracted and then read into memory with Pandas.

In [5]:
zf = zipfile.ZipFile(ThisYearFile) # having First.csv zipped file.
WIThYr = pd.read_csv(zf.open('enrollment_certified_' + ThisYear +'.csv'))

zf = zipfile.ZipFile(LastYearFile) # having First.csv zipped file.
WILsYr = pd.read_csv(zf.open('enrollment_certified_' + LastYear + '.csv'))

Next clean the files and subset to juniors and seniors.

In [8]:
WILsYr = WILsYr[(WILsYr.GROUP_BY == 'Grade Level') & (WILsYr.GRADE_GROUP != '[All]') &
((WILsYr.GROUP_BY_VALUE == '11') | (WILsYr.GROUP_BY_VALUE == '12'))]

WIThYr = WIThYr[(WIThYr.GROUP_BY == 'Grade Level') & (WIThYr.GRADE_GROUP != '[All]') &
((WIThYr.GROUP_BY_VALUE == '11') | (WIThYr.GROUP_BY_VALUE == '12'))]

The data is grouped by grade level. Below the data for juniors and seniors is extracted for this years and last years files. These will be cast from row values to column values.

In [13]:
# Separate grades into different dfs for last year
WILsYrSr = WILsYr[WILsYr['GROUP_BY_VALUE'] == '12']
WIThYrSr = WIThYr[WIThYr['GROUP_BY_VALUE'] == '12']

WILsYrJr = WILsYr[WILsYr['GROUP_BY_VALUE'] == '11']
WIThYrJr = WIThYr[WIThYr['GROUP_BY_VALUE'] == '11']

Next create objects for with the column names of interest for the data files.

In [54]:
columnsThsYr = ['DISTRICT_CODE', 'SCHOOL_CODE', 'SCHOOL_NAME', 'STUDENT_COUNT', 'DISTRICT_NAME']
columnsLstYr = ['DISTRICT_CODE', 'SCHOOL_CODE', 'STUDENT_COUNT']

Next merge the classes from this and last year back together.

In [56]:
# Combine Jr and Sr as separate columns instead of rows
WIL = pd.merge(WILsYrSr[columnsThsYr], WILsYrJr[columnsLstYr], how = 'inner', on = ['DISTRICT_CODE', 'SCHOOL_CODE'])
WIT = (pd.merge(WIThYrSr[columnsThsYr], WIThYrJr[columnsLstYr], how = 'inner', on = ['DISTRICT_CODE', 'SCHOOL_CODE']))

Next rename the student count columns to more understandable names.

In [58]:
WIL = WIL.rename(columns = {'STUDENT_COUNT_x':'Juniors Last Year',
                       'STUDENT_COUNT_y': 'Seniors Last Year' })

WIT = WIT.rename(columns = {'STUDENT_COUNT_x':'Juniors This Year',
                       'STUDENT_COUNT_y': 'Seniors This Year' })

Next create objects for the reformatted column names to be used to combine this and last years data files.

In [60]:
columnsThsYr = ['DISTRICT_CODE', 'SCHOOL_CODE', 'SCHOOL_NAME', 'DISTRICT_NAME', 'Juniors This Year', 'Seniors This Year']
columnsLstYr = ['DISTRICT_CODE', 'SCHOOL_CODE',  'Juniors Last Year', 'Seniors Last Year']

Next join this and last years data together.

In [61]:
WI = pd.merge(WIL[columnsLstYr], WIT[columnsThsYr], how = 'inner', on = ['DISTRICT_CODE', 'SCHOOL_CODE'])

In [1]:
# TODO: create enrollment delta columns.

The lookup file is a bit tricky to acquire for Wisconsin. To get the data, Selenium in conjunction with pyautogui can be used. First set the preferences for the Selenium browser.

In [25]:
# Set the preferences for the firefox web browser
fp = webdriver.FirefoxProfile()
fp.set_preference('browser.download.folderList', 2)
fp.set_preference('browser.download.manager.showWhenStarting', False)
fp.set_preference('browser.download.dir', 'C:\\Users\\karlk\\GitHubRepos\\SCSU-Reciprocity-HS-Enrollment\\Wisconsin')
fp.set_preference("http.response.timeout", 300)
fp.set_preference("dom.max_script_run_time", 300)
fp.set_preference('webdriver.load.strategy', 'unstable')
fp.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
fp.update_preferences()

Launch the browser and navigate to the public school search.

In [26]:
# MDE web app for hs enrollment data
searchURL = "https://apps4.dpi.wi.gov/SchoolDirectory/Search/PublicSchoolsSearch"
time.sleep(3)

# Launch web browser and navigate to the searchURL
driver = webdriver.Firefox(fp)
driver.get(searchURL)
time.sleep(3)

Below click the search button to display the download button for the file.

In [30]:
Search = driver.find_element_by_id("txtSearch")
searchbutton = driver.find_element_by_xpath("//input[@value='Search']")

time.sleep(1)
time.sleep(1)
ActionChains(driver).move_to_element(searchbutton).click().perform()

time.sleep(3)

The download button will be outside the viewport, when the search button is clicked all of the Wisconsin public schools are displayed. Use pyautogui to scroll to where the download button can be clicked.

In [48]:
pyautogui.moveTo(x=50, y=100)
pyautogui.click() 
time.sleep(1)
for i in range(200):
    pyautogui.scroll(-100000000) 

After scrolling down, click the download button.

In [49]:
downloadbutton = driver.find_element_by_xpath("//input[@value='Download']")
ActionChains(driver).move_to_element(downloadbutton).click().perform()

Read the excel file into memory.

In [50]:
WILookup = pd.read_csv('Directory.csv')

Combine the enrollment and lookup files with pandas.

In [69]:
Full = pd.merge(WI, WILookup, how = 'inner', left_on = ['DISTRICT_NAME', 'SCHOOL_CODE']
        , right_on = ['District Name', 'School Code'])

In [None]:
Full['JuniorDelta'] = Full['Juniors This Year']/SD['Juniors Last Year']
Full['SeniorDelta'] = Full['Seniors This Year']/SD['Seniors Last Year']

Finally save the clean Wisconsin file to a .csv.

In [71]:
Full.to_csv('FullWI.csv', index = False)