<a href="https://colab.research.google.com/github/TuckerRasbury/coding_sample_eviction-lab_data-engineer/blob/main/Eviction_Lab_Data_Engineer_Coding_Sample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Coding Sample - Data Pipeline for Evictions Filings
#### _Submission of Candidate Coding Sample for the role of Data Engineer at The Eviction Lab of Princeton University_


## Initial Prompt
---

In order to apply to the Data Engineer role at the Eviction Lab, I am spinning up a concise data pipeline to meet the fourth criteria laid out in the listing.

_"Applicants should submit a dossier including... (4) a coding sample or data product that speaks to applicant’s experience with relevant tasks"_

## Tasks Required of the Data Engineer
---

Here is an excerpt from the listing including what will be required of the Data Engineer for context.

_"The responsibilities of the position are to lead the development of a data construction pipeline for processing large-scale administrative records. This would involve writing code to create new data products (e.g., geocoding addresses, cleaning names, combining multiple sources of data) in a reproducible way; writing tests to assess the quality of the data products created by the pipeline; writing tests to assess the speed of the pipeline; optimizing the code to improve quality and speed; cleaning and reformatting incoming datasets to conform to the pipeline; running the pipeline using these datasets; and identifying and fixing bugs, among other tasks. The datasets used are very large and require the use of remote computing clusters. Applicants with experience using very large datasets and optimizing code to run efficiently are preferred."_

## Beginning of Script
---
In order to provide a coding sample to demonstrate some of the pre-requisite skills for this opening, herein I will spin up a light weight data pipeline using data from the Legal Services Corporation [1] and Zillow's publicly available housing data [2]. I would ideally like to gather more data, but in light of the U.S. Government Accountability Office's research  on eviction data availability being limited, I am going to leverage the csv download available datasets above [3]. Should I need to talk about my skills with larger datasets or datasets being pulled from APIs, I will talk my interviewer through my experience in those areas.


Data Sources
---
[1] [Civil Court Data Initiative. Legal Services Corporation, 2022.(accessed May 16, 2025)](https://civilcourtdata.lsc.gov/data/eviction)

[2] [Rental Data. Zillow. (accessed May 16, 2025)](https://www.zillow.com/research/data/)

[3] [Government Accountability Office - Evictions: National Data Are Limited and Challenging to Collect](https://www.gao.gov/products/gao-24-106637)


In [38]:
## Establishing Libraries
import pandas as pd

### Ingesting Raw Data

In [39]:
# Importing Datasets - Prt 1

## Legal Services Corporation - Civil Court Data Initiative
### Weekly County Data
lsc_weekly_url = 'https://raw.githubusercontent.com/TuckerRasbury/coding_sample_eviction-lab_data-engineer/main/data/weekly_county_data_download.csv'
lsc_weekly_df = pd.read_csv(lsc_weekly_url)

### Weekly State Data
lsc_state_url = 'https://raw.githubusercontent.com/TuckerRasbury/coding_sample_eviction-lab_data-engineer/main/data/weekly_state_data_download.csv'
lsc_state_df = pd.read_csv(lsc_state_url)

In [40]:
# Importing Datasets - Prt 2

## Zillow House Value Data
## Zillow Home Value Index (ZHVI): A measure of the typical home value and market changes across a given region and housing type.


### Publicly Available Housing Data - County
zillow_county_url = 'https://raw.githubusercontent.com/TuckerRasbury/coding_sample_eviction-lab_data-engineer/main/data/County_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv'
zillow_county_df = pd.read_csv(zillow_county_url)

### Publicly Available Housing Data - State
zillow_state_url = 'https://raw.githubusercontent.com/TuckerRasbury/coding_sample_eviction-lab_data-engineer/main/data/State_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv'
zillow_state_df = pd.read_csv(zillow_state_url)

## Zillow Rental Price Data - County
## Zillow Observed Rent Index (ZORI): A smoothed measure of the typical observed market rate rent across a given region

### Publicly Available Rental Data - County
zillow_county_rental_url = 'https://raw.githubusercontent.com/TuckerRasbury/coding_sample_eviction-lab_data-engineer/main/data/County_zori_uc_sfrcondomfr_sm_month.csv'
zillow_county_rental_df = pd.read_csv(zillow_state_url)

In [41]:
# Confirming Datasets Ingested - LSC Weekly County
lsc_weekly_df.head()

Unnamed: 0,fips,name,date,filings_count
0,33019,Sullivan,2022-01-03 12:00:00,4
1,33005,Cheshire,2022-02-07 12:00:00,3
2,33015,Rockingham,2022-03-21 12:00:00,14
3,33019,Sullivan,2022-03-28 12:00:00,2
4,33007,Coos,2022-04-04 12:00:00,2


In [42]:
# Confirming Datasets Ingested - LSC Weekly State
lsc_state_df.head()

Unnamed: 0,fips,name,date,filings_count
0,33,New Hampshire,2022-03-28 12:00:00,138
1,33,New Hampshire,2022-05-02 12:00:00,196
2,33,New Hampshire,2022-06-06 12:00:00,168
3,33,New Hampshire,2022-07-18 12:00:00,248
4,33,New Hampshire,2022-08-15 12:00:00,256


In [43]:
# Confirming Datasets Ingested - Zillow County
zillow_county_df.head()

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,Metro,StateCodeFIPS,MunicipalCodeFIPS,2000-01-31,...,2024-07-31,2024-08-31,2024-09-30,2024-10-31,2024-11-30,2024-12-31,2025-01-31,2025-02-28,2025-03-31,2025-04-30
0,3101,0,Los Angeles County,county,CA,CA,"Los Angeles-Long Beach-Anaheim, CA",6,37,217063.294177,...,899984.68281,905415.981451,912328.703331,917692.306861,922000.785648,925563.153495,924758.178085,920819.715333,913682.426345,908153.620878
1,139,1,Cook County,county,IL,IL,"Chicago-Naperville-Elgin, IL-IN-WI",17,31,151585.670641,...,313394.930038,314242.997161,314932.706887,315299.009518,315703.773776,316389.055754,317240.306214,318115.823473,318621.886143,318892.678007
2,1090,2,Harris County,county,TX,TX,"Houston-The Woodlands-Sugar Land, TX",48,201,112783.78044,...,291732.658728,291383.570694,291162.232894,290784.730752,290320.284084,290081.040114,289884.353421,289580.399918,288761.159762,287713.386
3,2402,3,Maricopa County,county,AZ,AZ,"Phoenix-Mesa-Chandler, AZ",4,13,148232.335171,...,486286.547379,485133.357775,484155.656561,483401.584763,482714.105903,482116.687325,481150.25881,479626.268969,477407.146572,475352.523454
4,2841,4,San Diego County,county,CA,CA,"San Diego-Chula Vista-Carlsbad, CA",6,73,220345.421445,...,968248.190213,967797.735617,967337.284267,966604.540776,966802.864361,967327.270415,967437.473088,966889.661388,964709.314264,961467.703608


In [44]:
# Confirming Datasets Ingested - Zillow State
zillow_state_df.head()

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,2000-01-31,2000-02-29,2000-03-31,2000-04-30,2000-05-31,...,2024-07-31,2024-08-31,2024-09-30,2024-10-31,2024-11-30,2024-12-31,2025-01-31,2025-02-28,2025-03-31,2025-04-30
0,9,0,California,state,,193983.727686,194635.764895,195516.174793,197427.282169,199648.656602,...,794980.24427,796808.201343,799181.290966,800825.235948,802657.943456,804511.977573,804682.106782,803246.384103,799679.501223,796255.446598
1,54,1,Texas,state,,114650.702784,114713.006937,114743.225427,114893.202475,114990.329964,...,311418.064885,310990.147334,310723.158782,310457.399594,310077.350638,309797.301597,309617.220803,309438.86518,308702.238205,307629.100564
2,14,2,Florida,state,,108255.684033,108490.175922,108774.426833,109352.235824,109975.340556,...,400530.850389,399502.810192,398636.533116,397676.491442,396460.569464,395352.419957,394396.150784,393304.369817,391588.148457,389400.049453
3,43,3,New York,state,,154607.546443,155158.101181,155688.098901,156827.97992,158007.349303,...,480811.117639,483958.227544,486867.601548,489325.354271,491186.473381,492969.583322,494116.872029,495314.625285,496232.69297,497620.899427
4,47,4,Pennsylvania,state,,100690.506117,100905.875484,101108.264837,101520.703585,101944.381449,...,272989.988321,273246.79799,273644.011695,274407.296766,275318.143615,276537.984748,277632.128284,278567.608841,279045.436228,279450.903628


In [45]:
# Confirming Datasets Ingested - Zillow Rental County
zillow_county_rental_df.head()

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,2000-01-31,2000-02-29,2000-03-31,2000-04-30,2000-05-31,...,2024-07-31,2024-08-31,2024-09-30,2024-10-31,2024-11-30,2024-12-31,2025-01-31,2025-02-28,2025-03-31,2025-04-30
0,9,0,California,state,,193983.727686,194635.764895,195516.174793,197427.282169,199648.656602,...,794980.24427,796808.201343,799181.290966,800825.235948,802657.943456,804511.977573,804682.106782,803246.384103,799679.501223,796255.446598
1,54,1,Texas,state,,114650.702784,114713.006937,114743.225427,114893.202475,114990.329964,...,311418.064885,310990.147334,310723.158782,310457.399594,310077.350638,309797.301597,309617.220803,309438.86518,308702.238205,307629.100564
2,14,2,Florida,state,,108255.684033,108490.175922,108774.426833,109352.235824,109975.340556,...,400530.850389,399502.810192,398636.533116,397676.491442,396460.569464,395352.419957,394396.150784,393304.369817,391588.148457,389400.049453
3,43,3,New York,state,,154607.546443,155158.101181,155688.098901,156827.97992,158007.349303,...,480811.117639,483958.227544,486867.601548,489325.354271,491186.473381,492969.583322,494116.872029,495314.625285,496232.69297,497620.899427
4,47,4,Pennsylvania,state,,100690.506117,100905.875484,101108.264837,101520.703585,101944.381449,...,272989.988321,273246.79799,273644.011695,274407.296766,275318.143615,276537.984748,277632.128284,278567.608841,279045.436228,279450.903628
