# Capstone: Unaccounted Homeless Individuals in New York City
**_Author: Despina Daisy Matos_**

## Table of Contents 
- [Libraries](#Libraries)
- [Outside Research](#Outside-Research)
    - [Feature Selection](#Feature-Selection)
- [Data Cleaning](#Data-Cleaning)
    - [Individuals in the Shelter System in NYC](#Individuals-in-the-Shelter-System-in-NYC)
    - [Homelessness Count by State](#Homelessness-Count-by-State)
    - [Shelter Locations in NYC](#Shelter-Locations-in-NYC)
    - [Placement Housing in NYC](#Placement-Housing-in-NYC)
    - [Mental Health in NYC](#Mental-Health-in-NYC)
    - [Shelter Repair in NYC](#Shelter-Repair-in-NYC)
    - [Evictions in NYC](#Evictions-in-NYC)
    - [Using DHS Services in NYC](#Using-DHS-Services-in-NYC)
    - [Homelessness Prediction Model by State](#Homelessness-Prediction-Model-by-State)
- [Concatenating Data](#Concatenating-Data)
    - [Saved Final Dataset](#)
    - [Data Dictionary](#)
- [Exploratory Data Analysis](#)
    - [Summary Statistics](#)
    - [Graphs](#)
    - [Outliers](#)
- [Preprocessing](#)
    - [Feature Engineering](#)
    - [Creating X feature and y](#)
    - [Train-test Split](#)
    - [Determing the Baseline Score](#)
- [Modeling](#Modeling)
- [Conclusion and Recommendations](#)
- [Sources](#Sources)

## Libraries

In [1]:
import pandas as pd

#more would be added on later

## Outside Research

Explaining my findings and key takeaways.

### Feature Selection

- Permanent Supportive Housing
- Emergency Shelter
- Temporary Housing
- Rapid Rehousing
- The impetus for determining the number of homeless people results largely from increased interest in the projection of service needs and the distribution of resources for the homeless.
- Unemployment rates
- Mental illness
- Shelter systems in NYC
- Chronically Homeless	
- Eviction rates 
- 311 Service requests 
- Total Vacant Units
- Housing Units Available 

## Data Cleaning

We will group the datasets by dates

### Individuals in the Shelter System in NYC

In [4]:
#dhs_daily_report 
df = pd.read_csv('./data/DHS_Daily_Report.csv')
df.head()

Unnamed: 0,Date of Census,Total Adults in Shelter,Total Children in Shelter,Total Individuals in Shelter,Single Adult Men in Shelter,Single Adult Women in Shelter,Total Single Adults in Shelter,Families with Children in Shelter,Adults in Families with Children in Shelter,Children in Families with Children in Shelter,Total Individuals in Families with Children in Shelter,Adult Families in Shelter,Individuals in Adult Families in Shelter
0,09/24/2019,38107,21582,59689,11936,4561,16497,12193,16324,21582,37906,2499,5286
1,09/23/2019,38039,21612,59651,11911,4510,16421,12207,16332,21612,37944,2499,5286
2,09/22/2019,37960,21563,59523,11887,4527,16414,12174,16279,21563,37842,2489,5267
3,09/21/2019,37880,21559,59439,11806,4520,16326,12178,16283,21559,37842,2492,5271
4,09/20/2019,37980,21623,59603,11844,4537,16381,12208,16324,21623,37947,2497,5275


### Homelessness Count by State

In [6]:
#Homelessness_Count
df_2 = pd.read_csv('./data/Homelessness_Count_-_USA.csv')
df_2.head()

Unnamed: 0,CoC Number,CoC Name,Year,Attribute Name,Value,Year - Text
0,AK-500,Anchorage CoC,01/01/2014 12:00:00 AM,Chronically Homeless,101,2014
1,AK-500,Anchorage CoC,01/01/2014 12:00:00 AM,Sheltered Homeless People in Families,282,2014
2,AK-500,Anchorage CoC,01/01/2014 12:00:00 AM,Homeless Individuals,736,2014
3,AK-500,Anchorage CoC,01/01/2014 12:00:00 AM,Homeless People in Families,287,2014
4,AK-500,Anchorage CoC,01/01/2014 12:00:00 AM,Unsheltered Homeless Individuals,48,2014


In [13]:
df_2[df_2["CoC Name"] == "New York City CoC"]

Unnamed: 0,CoC Number,CoC Name,Year,Attribute Name,Value,Year - Text
6033,NY-600,New York City CoC,01/01/2014 12:00:00 AM,Chronically Homeless,5873,2014
6034,NY-600,New York City CoC,01/01/2014 12:00:00 AM,Sheltered Homeless People in Families,41633,2014
6035,NY-600,New York City CoC,01/01/2014 12:00:00 AM,Homeless Individuals,26177,2014
6036,NY-600,New York City CoC,01/01/2014 12:00:00 AM,Homeless People in Families,41633,2014
6037,NY-600,New York City CoC,01/01/2014 12:00:00 AM,Unsheltered Homeless Individuals,3357,2014
...,...,...,...,...,...,...
20725,NY-600,New York City CoC,01/01/2015 12:00:00 AM,Homeless People in Families,45711,2015
20726,NY-600,New York City CoC,01/01/2015 12:00:00 AM,Sheltered Homeless Veterans,1499,2015
20727,NY-600,New York City CoC,01/01/2015 12:00:00 AM,Parenting Youth Age 18-24,2114,2015
20728,NY-600,New York City CoC,01/01/2015 12:00:00 AM,Parenting Youth (Under 25),2114,2015


### Shelter Locations in NYC

In [15]:
#Buildings_by_Borough_and_Community_District
df_3 = pd.read_csv('./data/Buildings_by_Borough_and_Community_District.csv')
df_3.head()

Unnamed: 0,Report Date,Borough,Community District,Adult Family Comm Hotel,Adult Family Shelter,Adult Shelter,Adult Shelter Comm Hotel,FWC Cluster,FWC Comm Hotel,FWC Shelter
0,07/31/2018,Bronx,201,,1.0,2.0,,4.0,1.0,7.0
1,07/31/2018,Bronx,202,,1.0,,,3.0,,6.0
2,07/31/2018,Bronx,203,,2.0,5.0,,5.0,2.0,9.0
3,07/31/2018,Bronx,204,,,2.0,,18.0,,14.0
4,07/31/2018,Bronx,205,,2.0,3.0,,12.0,,5.0


### Placement Housing in NYC

In [18]:
#placement housing
df_4 = pd.read_csv('./data/Three-Quarter_Housing_Report_-_Placements.csv')
df_4.head()

Unnamed: 0,Reporting Period,Former Narco Freedom Three-Quarter Sites,Three-Quarter Sites Managed by Other Operators,Total Three-Quarter Sites,DOB Violations,HPD Violations,FDNY Violations,Total Violations,Total Individuals Relocated to Temporary Emergency Housing,Number Three-Quarter Houses Relocated From,...,NYCHA - Total,Supportive Housing - Clients from Former Narco Freedom Buildings,Supportive Housing - Clients from TQH Buildings Managed by Other Operators,Supportive Housing - Total,Other - Clients from Former Narco Freedom Buildings,Other - Clients from TQH Buildings Managed by Other Operators,Other - Total,Total - Clients from Former Narco Freedom Buildings,Total - Clients from TQH Buildings Managed by Other Operators,Total Placements into Permanent Housing
0,"June 1, 2015 to March 31, 2019",18,97,115,1169,2417,177,3763,692,57,...,1,6,5,11,12,22,34,417,392,809
1,"June 1, 2015 to December 31, 2018",18,97,115,1167,2470,183,3820,677,56,...,1,6,5,11,12,21,33,417,386,803
2,"June 1, 2015 to September 30, 2018",18,97,115,1122,2376,180,3678,671,56,...,1,6,5,11,12,20,32,416,376,792
3,"June 1, 2015 to June 30, 2018",18,97,115,1060,2229,188,3477,663,56,...,1,6,5,11,12,20,32,416,365,781
4,"June 1, 2015 to March 31, 2018",18,96,114,949,2276,184,3409,648,56,...,1,6,5,11,12,20,32,415,350,765


### Mental Health in NYC 

In [22]:
#Community_Mental_Health_Survey
df_5 = pd.read_csv('./data/DOHMH_Community_Mental_Health_Survey.csv')
df_5.head()

Unnamed: 0,Survey,Year,Denominator,Question,Prevalence,Lower 95% CI,Upper 95% CI
0,CMHS,2012,All CHS respondents,"Serious psychological distress (SPD), past 30 ...",5.39,4.47,6.5
1,CMHS,2012,All CHS respondents,"Serious mental illness (SMI), past 12 months",3.80,3.03,4.77
2,CMHS,2012,Follow-up respondents with SMI,"Received one-on-one counseling or therapy, pas...",44.34*,33.37,55.9
3,CMHS,2012,Follow-up respondents with SMI,"Took presciption medication to treat emotions,...",45.24*,34.17,56.81
4,CMHS,2012,Follow-up respondents with SMI,Received counseling for drugs or alcohol in pa...,7.65*,3.63,15.42


### Shelter Repair in NYC

In [23]:
#Shelter_Repair_Scorecard
df_6 = pd.read_csv('./data/Shelter_Repair_Scorecard.csv')
df_6.head()

Unnamed: 0,Month,DHS_Bld_ID,Shelter_Name_All,Landlord,Borough,Facility_Type,Capacity,HighPriority_Closed_Monthly_DOB,HighPriority_New_Monthly_DOB,HighPriority_Open_Monthly_DOB,...,CommissionerOrder_Open_Monthly_DOB,CommissionerOrder_Closed_Monthly_HPD,CommissionerOrder_New_Monthly_HPD,CommissionerOrder_Open_Monthly_HPD,CommissionerOrder_Closed_Monthly_FDNY,CommissionerOrder_New_Monthly_FDNY,CommissionerOrder_Open_Monthly_FDNY,CommissionerOrder_Closed_Monthly_DOHMH,CommissionerOrder_New_Monthly_DOHMH,CommissionerOrder_Open_Monthly_DOHMH
0,06/01/2018 12:00:00 AM,1664.0,"Bld ID: 1664 -- Women's Shelter, Help U.S.A, A...",NYC Owned (DHS),Bronx,Adult Shelter,200.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,12/01/2018 12:00:00 AM,1654.0,"Bld ID: 1654 -- Veterans Shelter, Institute Fo...",NYC Owned (DHS),Queens,Veterans Short Term Housing,254.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,11/01/2016 12:00:00 AM,1345.0,"Bld ID: 1345 -- EAST RIVER - WIN, WOMEN IN NEE...",XTH STREET REALTY LLC,MANHATTAN,Family Tier 2,146.0,0.0,1.0,3.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,11/01/2019 12:00:00 AM,18804.0,"Bld ID: 18804 -- HELP SEC, Help U.S.A, Adult S...",NYC Owned (Parks),Manhattan,Adult Shelter,24.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,09/01/2018 12:00:00 AM,111633.0,"Bld ID: 111633 -- URI HARLEM FAMILY RESIDENCE,...",WEST 133 REALTY LLC,Manhattan,Family Tier 2,14.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Evictions in NYC

In [24]:
df_6 = pd.read_csv('./data/Evictions.csv')
df_6.head()

Unnamed: 0,COURT_INDEX_NUMBER,DOCKET_NUMBER,EVICTION_ADDRESS,EVICTION_APT_NUM,EXECUTED_DATE,MARSHAL_FIRST_NAME,MARSHAL_LAST_NAME,RESIDENTIAL_COMMERCIAL_IND,BOROUGH,EVICTION_ZIP
0,60175/19A,97706,126 WEST 112TH ST,1B,12/10/2019,Justin,Grossman,Residential,MANHATTAN,10026
1,75023/16,462105,82 RUTGERS SLIP,6G,01/26/2017,Danny,Weinheim,Residential,MANHATTAN,10002
2,82479/17,80161,301-03 WEST 152ND ST THE ENTIRE STORE# 2,,03/20/2018,Henry,Daley,Commercial,MANHATTAN,10039
3,K87249/16,75872,368 EAST 26TH STREET,BASEMENT,02/21/2017,Ileana,Rivera,Residential,BROOKLYN,11226
4,79217/18,351976,564 WEST 160TH STREET,06,05/29/2019,Thomas,Bia,Commercial,MANHATTAN,10032


### Using DHS Services in NYC

In [26]:
df_7 = pd.read_csv('./data/Local_Law_37_-_DHS_Report.csv')
df_7.head()

Unnamed: 0,Category,SINGLE_MEN,SINGLE_WOMEN,TOTAL_SINGLE_ADULTS,FAMILIES_WITH_CHILDREN,ADULT_FAMILIES,TOTAL_FAMILIES,TOTAL_ADULTS_IN_FAMILIES,TOTAL_CHILDREN,DATA_PERIOD
0,DHS-administered facilities,72.0,40.0,112.0,160.0,23.0,183.0,,,201903.0
1,Average daily overnight census DHS drop-in cen...,166.0,73.0,239.0,,,,,,201903.0
2,Average daily overnight census DHS faith-based...,136.0,32.0,168.0,,,,,,201903.0
3,Average daily census: DHS-administered facilities,12041.0,4438.0,16479.0,12458.0,2521.0,14979.0,21983.0,21951.0,201903.0
4,Census: Safe Havens,865.0,135.0,1000.0,,,,,,201903.0


### Homelessness Prediction Model by State

In [20]:
#Homelessness prediction model
df_10 = pd.read_csv('./data/05b_analysis_file_update.csv')
df_10.head()

Unnamed: 0,year,cocnumber,pit_tot_shelt_pit_hud,pit_tot_unshelt_pit_hud,pit_tot_hless_pit_hud,pit_ind_shelt_pit_hud,pit_ind_unshelt_pit_hud,pit_ind_hless_pit_hud,pit_perfam_shelt_pit_hud,pit_perfam_unshelt_pit_hud,...,sub_high_cost_rent75,sub_high_cost_homeval75,sub_high_rent_share75,tight_high_cost_rental_mkt,sub_tight_high_cost_rent,sub_west_coast_all_urb,sub_west_census,major_city,suburban,rural
0,2010,AK-500,1113.0,118.0,1231.0,633.0,107.0,740.0,480.0,11.0,...,1,1,1,3,1,1,1,1,0,0
1,2011,AK-500,1082.0,141.0,1223.0,677.0,117.0,794.0,405.0,24.0,...,1,1,0,3,1,1,1,1,0,0
2,2012,AK-500,1097.0,50.0,1147.0,756.0,35.0,791.0,341.0,15.0,...,1,1,1,3,1,1,1,1,0,0
3,2013,AK-500,1070.0,52.0,1122.0,792.0,52.0,844.0,278.0,0.0,...,1,1,0,3,1,1,1,1,0,0
4,2014,AK-500,970.0,53.0,1023.0,688.0,48.0,736.0,282.0,5.0,...,1,1,1,3,1,1,1,1,0,0


In [21]:
df_10[df_10["cocnumber"] == "NY-500"]

Unnamed: 0,year,cocnumber,pit_tot_shelt_pit_hud,pit_tot_unshelt_pit_hud,pit_tot_hless_pit_hud,pit_ind_shelt_pit_hud,pit_ind_unshelt_pit_hud,pit_ind_hless_pit_hud,pit_perfam_shelt_pit_hud,pit_perfam_unshelt_pit_hud,...,sub_high_cost_rent75,sub_high_cost_homeval75,sub_high_rent_share75,tight_high_cost_rental_mkt,sub_tight_high_cost_rent,sub_west_coast_all_urb,sub_west_census,major_city,suburban,rural
1968,2010,NY-500,705.0,4.0,709.0,371.0,4.0,375.0,334.0,0.0,...,1,1,1,3,1,0,0,0,1,0
1969,2011,NY-500,694.0,0.0,694.0,360.0,0.0,360.0,334.0,0.0,...,0,0,0,0,0,0,0,0,1,0
1970,2012,NY-500,699.0,0.0,699.0,377.0,0.0,377.0,322.0,0.0,...,1,1,1,3,1,0,0,0,1,0
1971,2013,NY-500,883.0,124.0,1007.0,375.0,66.0,441.0,508.0,58.0,...,0,0,0,0,0,0,0,0,1,0
1972,2014,NY-500,787.0,51.0,838.0,448.0,46.0,494.0,339.0,5.0,...,1,1,1,3,1,0,0,0,1,0
1973,2015,NY-500,727.0,35.0,762.0,371.0,35.0,406.0,356.0,0.0,...,0,0,0,0,0,0,0,0,1,0
1974,2016,NY-500,791.0,69.0,860.0,477.0,64.0,541.0,314.0,5.0,...,1,1,1,3,1,0,0,0,1,0
1975,2017,NY-500,752.0,65.0,817.0,445.0,65.0,510.0,307.0,0.0,...,0,0,0,0,0,0,0,0,1,0


## Questions:
- Is my problem statement a data science problem?
- Which model should I use?
- Does my data answer my problem statement?

## Sources

- [DHS Daily Count](https://data.cityofnewyork.us/Social-Services/DHS-Daily-Report/k46n-sa2m)

- [State of Homelessness](https://endhomelessness.org/homelessness-in-america/homelessness-statistics/state-of-homelessness-report/)

- [Homelessness Count - USA](https://catalog.data.gov/dataset/homelessness-count-usa)

- [Buildings by Borough and Community District](https://data.cityofnewyork.us/Social-Services/Buildings-by-Borough-and-Community-District/3qem-6v3v)

- [The State of Homelessness in NYC](https://council.nyc.gov/data/homeless/)

- [FY 2019 Fair Market Rent Documentation System](https://www.huduser.gov/portal/datasets/fmr/fmrs/FY2019_code/2019summary.odn)

- [HOMELESSNESS PREDICTION MODEL](https://www.huduser.gov/portal/datasets/hpmd.html)

- [NYC Homeless Outreach Population Estimate (HOPE)](https://www1.nyc.gov/assets/dhs/downloads/pdf/hope-2019-results.pdf)

- [Three-Quarter Housing Report - Placements](https://data.cityofnewyork.us/City-Government/Three-Quarter-Housing-Report-Placements/vntq-qu86)

- [Basic Facts About NYC homelessness](https://www.coalitionforthehomeless.org/basic-facts-about-homelessness-new-york-city/)

- [DOHMH Community Mental Health Survey](https://data.cityofnewyork.us/Health/DOHMH-Community-Mental-Health-Survey/wi3r-8uzb)

- [NYT article about homelessness](https://www.nytimes.com/2019/05/30/nyregion/homeless-nyc.html)

- [Data on Student Homelessness in NYS](https://nysteachs.org/topic-resource/data-on-student-homelessness-nys/)

- [Understanding homelessness](http://www.understandhomelessness.com/)

- [HUD 2019 Continuum of Care Homeless Assistance Programs Homeless Populations and Subpopulations](https://files.hudexchange.info/reports/published/CoC_PopSub_State_NY_2019.pdf)

- [NYC Open Data(Evictions)](https://data.cityofnewyork.us/City-Government/Evictions/6z8x-wfk4/data)

- [SAIPE Model Input Data(census)](https://www.census.gov/data/datasets/time-series/demo/saipe/model-tables.html#)

- [Poverty is Down — But Concerns About Homelessness Remain Up(NETEH)](https://endhomelessness.org/poverty-is-down-but-concerns-about-homelessness-remain-up/)

- [2019 PIT Estimate of Veteran Homelessness in the U.S.](https://www.hudexchange.info/resource/5877/2019-pit-estimate-of-veteran-homelessness-in-the-us/)

- [311 Service Requests from 2010 to Present](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9/data)
- [Shelter Repair Scorecard](https://data.cityofnewyork.us/Social-Services/Shelter-Repair-Scorecard/dvaj-b7yx/data)

- [Capital Improvement Evictions](https://www.antievictionmap.com/capital-improvement-evictions-san-francisco/)

- [Directory Of Unsheltered Street Homeless To General Population Ratio 2012](https://data.cityofnewyork.us/Social-Services/Directory-Of-Unsheltered-Street-Homeless-To-Genera/483x-fy9e)

- [Stats & Reports - DHS](https://www1.nyc.gov/site/dhs/about/stats-and-reports.page)

- [Review of models of homelessness](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/788838/Homelessness_Models.pdf)
- [Homelessness Service Provision: A Data Science Perspective](https://www.cse.wustl.edu/~sanmay/papers/hsp-dsp.pdf)
 
- [Panel Paper: Predicting Family Homelessness Using Machine Learning](https://appam.confex.com/appam/2016/webprogram/Paper19172.html)

- [The Methodology of Counting the Homeless](https://www.ncbi.nlm.nih.gov/books/NBK218229/)

- [Identifying and Supporting Angelenos At Risk of Experiencing Homelessness](https://urbanlabs.uchicago.edu/projects/using-predictive-analytics-to-prevent-homelessness-in-los-angeles)

- [Predicting Homeless Shelter Entry](https://www1.nyc.gov/site/cidi/projects/predicting-homeless-shelter-entry.page)

- [Homelessness Prevention](https://www1.nyc.gov/site/hra/help/homelessness-prevention.page) 

- [Housing New York Units by Building](https://data.cityofnewyork.us/Housing-Development/Housing-New-York-Units-by-Building/hg8x-zxpr/data) 

- [Local Law 37 - DHS Report](https://data.cityofnewyork.us/Social-Services/Local-Law-37-DHS-Report/2mqz-v5im)