# NYC OpenData - Department of Homelessness Analysis

#### Sources

priority | dataset  | description | link |
--------------|--------------|-------------|---------------|
1 | DHS Daily Report | This dataset includes the daily number of families and individuals residing in the Department of <br> Homeless Services (DHS) shelter system and the daily number of families applying to the DHS shelter system. | https://dev.socrata.com/foundry/data.cityofnewyork.us/k46n-sa2m |
2 | Evictions | This dataset lists pending, scheduled and executed evictions within the five boroughs, for the year 2017 - Present. <br> Eviction data is compiled from the majority of New York City Marshals.  | https://dev.socrata.com/foundry/data.cityofnewyork.us/6z8x-wfk4 |
3 | DYCD Demographics by Zip Code | This dataset provides a Demographic breakdown of only DYCD-funded participants within a Zip Code of NYC | https://dev.socrata.com/foundry/data.cityofnewyork.us/hebw-6hze |



#### References

| source  | link |
|--------------|---------------|
| Time Series | https://dev.socrata.com/blog/2019/10/07/time-series-analysis-with-jupyter-notebooks-and-socrata.html |


## Imports


In [540]:
import dash
import dash
from dash import Dash, dcc, html, Input, Output, dash_table, callback
import dash_mantine_components as dmc
import dash_bootstrap_components as dbc
import pandas as pd
from sodapy import Socrata
import plotly.express as px
import nbformat
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
from datetime import *
import statsmodels.api as sm
import seaborn as sns
import plotly.io as pio
import numpy as np
from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go
import plotly.subplots as sp

## Read data from API to Dataframes

**Dept of Homelessness Daily Report**

In [541]:
# https://dev.socrata.com/foundry/data.cityofnewyork.us/k46n-sa2m
client = Socrata("data.cityofnewyork.us", None)
results = client.get("k46n-sa2m",limit=80000)

# Convert to pandas DataFrame
dhs_daily_df = pd.DataFrame.from_records(results)



**Evictions**

In [542]:
# https://dev.socrata.com/foundry/data.cityofnewyork.us/6z8x-wfk4
client = Socrata("data.cityofnewyork.us", None)
results = client.get("6z8x-wfk4",limit=80000)

# Convert to pandas DataFrame
evictions_df = pd.DataFrame.from_records(results)



**DYCD - Dept of Youth & Community Development** - 
Demographics by Zip Code

In [543]:
# https://dev.socrata.com/foundry/data.cityofnewyork.us/hebw-6hze

client = Socrata("data.cityofnewyork.us", None)
results = client.get("hebw-6hze",limit=80000)


# Convert to pandas DataFrame
demos_by_zip_df = pd.DataFrame.from_records(results)



## Exploratory Data Analysis

### Schema Info + Data Cleaning

#### dhs_report

In [544]:
# dhs_daily_report
# columns are objects > causing issues downstream when trying to graph > need to convert to numeric


#converting to type numeric
cols = dhs_daily_df.columns.drop('date_of_census')
dhs_daily_df[cols] = dhs_daily_df[cols].apply(pd.to_numeric)
dhs_daily_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3550 entries, 0 to 3549
Data columns (total 13 columns):
 #   Column                                                   Non-Null Count  Dtype 
---  ------                                                   --------------  ----- 
 0   date_of_census                                           3550 non-null   object
 1   total_adults_in_shelter                                  3550 non-null   int64 
 2   total_children_in_shelter                                3550 non-null   int64 
 3   total_individuals_in_shelter                             3550 non-null   int64 
 4   single_adult_men_in_shelter                              3550 non-null   int64 
 5   single_adult_women_in_shelter                            3550 non-null   int64 
 6   total_single_adults_in_shelter                           3550 non-null   int64 
 7   families_with_children_in_shelter                        3550 non-null   int64 
 8   adults_in_families_with_children_in_sh

In [545]:
dhs_daily_df['date_of_census'] = pd.to_datetime(dhs_daily_df['date_of_census'])
dhs_daily_df

Unnamed: 0,date_of_census,total_adults_in_shelter,total_children_in_shelter,total_individuals_in_shelter,single_adult_men_in_shelter,single_adult_women_in_shelter,total_single_adults_in_shelter,families_with_children_in_shelter,adults_in_families_with_children_in_shelter,children_in_families_with_children_in_shelter,total_individuals_in_families_with_children_in_shelter_,adult_families_in_shelter,individuals_in_adult_families_in_shelter
0,2023-07-12,53507,28096,81603,16781,4742,21523,16445,25850,28096,53946,2887,6134
1,2023-07-11,53469,28025,81494,16819,4732,21551,16411,25773,28025,53798,2893,6145
2,2023-07-09,53486,27895,81381,16973,4710,21683,16337,25683,27895,53578,2881,6120
3,2023-07-08,53471,27865,81336,16943,4729,21672,16329,25665,27865,53530,2887,6134
4,2023-07-07,53545,27859,81404,16968,4719,21687,16308,25654,27859,53513,2922,6204
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3545,2013-08-25,28458,21400,49858,7230,2704,9934,10324,14694,21400,36094,1804,3830
3546,2013-08-24,28274,21343,49617,7110,2690,9800,10291,14650,21343,35993,1801,3824
3547,2013-08-23,28257,21291,49548,7149,2671,9820,10266,14611,21291,35902,1802,3826
3548,2013-08-22,28366,21324,49690,7201,2716,9917,10274,14622,21324,35946,1803,3827


#### evictions

In [546]:
#evictions
#evictions_df.info()
evictions_df['executed_date'] = pd.to_datetime(evictions_df['executed_date'])
evictions_df.sort_values(by='executed_date',ascending=False)

Unnamed: 0,court_index_number,docket_number,eviction_address,eviction_apt_num,executed_date,marshal_first_name,marshal_last_name,residential_commercial_ind,borough,eviction_zip,ejectment,eviction_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta
71981,313742/22,29428,37-05 79TH STREET,4Q,2023-07-13,Edward,Guida,Residential,QUEENS,11372,Not an Ejectment,Possession,40.749136,-73.887031,3,25,285,4029968,4012900001,Jackson Heights
6024,58175/19,26428,32-30 93 STREET,,2023-07-13,Edward,Guida,Residential,QUEENS,11369,Not an Ejectment,Possession,40.757934,-73.875386,3,21,351,4034859,4014220019,Jackson Heights
46839,319709/22,30167,204-09 HOLLIS AVENUE,,2023-07-13,Edward,Guida,Commercial,QUEENS,11412,Not an Ejectment,Possession,40.706806,-73.753593,12,27,508,4448896,4109070129,Hollis
59639,323873/22,364183,2776 WEBSTER AVENUE,,2023-07-13,Thomas,Bia,Commercial,BRONX,10458,Not an Ejectment,Possession,40.866087,-73.885639,7,15,397,2016233,2032730100,Belmont
34715,47371/19,358331,315 EAST 162ND STREET,3RD FLOOR,2023-07-13,Thomas,Bia,Residential,BRONX,10451,Not an Ejectment,Possession,40.826024,-73.916371,4,17,173,2001971,2024220041,East Concourse-Concourse Village
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53218,48132/16,318190,2108 RYER AVENUE,B5,2017-01-03,John,Villanueva,Residential,BRONX,10457,Not an Ejectment,Possession,40.853434,-73.900648,5,15,381,2013534,2031490084,Mount Hope
3659,81583/16,329339,220 SOUTH STREET,STORAGE,2017-01-03,Thomas,Bia,Commercial,MANHATTAN,10002,Not an Ejectment,Possession,40.709452,-73.994330,3,1,8,1003205,1002500001,Chinatown
184,75869/15,6297,148-48 88TH AVENUE,5N,2017-01-03,Edward,Guida,Residential,QUEENS,11435,Not an Ejectment,Possession,40.705260,-73.807710,12,24,238,4207097,4096930029,Jamaica
13816,77609/16,058316,310 EAST 44TH STREET,723,2017-01-03,Justin,Grossman,Residential,MANHATTAN,10017,Not an Ejectment,Possession,40.750894,-73.970758,6,4,88,1038671,1013360040,Turtle Bay-East Midtown


#### demos by zip

In [547]:
#demos_by_zip_df.info()
demos_by_zip_df['data_os_of_date'] = pd.to_datetime(demos_by_zip_df['data_os_of_date'])
demos_by_zip_df

Unnamed: 0,data_os_of_date,zip_code,program_type,female_count,female_percentage,male_count,male_percentage,gender_nonconforming_count,gender_nonconforming,american_indian_or_alaskan,...,non_binary_not_female_or_male_count,non_binary_not_female_or_male_percentage,not_sure_gender_identity_count,not_sure_gender_identity_percentage,two_spirit_native_american_first_nations_count,two_spirit_native_american_first_nations_percentage,middle_eastern_and_north_african_count,middle_eastern_and_north_african_percentage,other_count,other_percentage
0,2020-07-16,0,Beacon,3,0.1,4,0.13,0,0,0,...,0,0,0,0,0,0,0,0,5,0.16
1,2020-07-16,1017,Cornerstone,1,0.03,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0.03
2,2020-07-16,2121,Beacon,0,0,1,0.03,0,0,0,...,0,0,0,0,0,0,0,0,1,0.03
3,2020-07-16,2420,Beacon,1,0.03,0,0,0,0,0,...,0,0,0,0,0,0,1,0.03,0,0
4,2020-07-16,2453,Beacon,0,0,1,0.03,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4908,2019-06-01,91103,Beacon,1,0.03,0,0,0,0,0,...,,,,,,,,,,
4909,2019-06-01,92620,Beacon,2,0.06,0,0,0,0,0,...,,,,,,,,,,
4910,2019-06-01,94539,Cornerstone,0,0,5,0.15,0,0,0,...,,,,,,,,,,
4911,2019-06-01,95351,Beacon,0,0,1,0.03,0,0,0,...,,,,,,,,,,


### Handling Nulls

In [548]:
# check for nulls 
evictions_df.isna().any()

court_index_number            False
docket_number                 False
eviction_address              False
eviction_apt_num               True
executed_date                 False
marshal_first_name            False
marshal_last_name             False
residential_commercial_ind    False
borough                       False
eviction_zip                  False
ejectment                     False
eviction_possession           False
latitude                       True
longitude                      True
community_board                True
council_district               True
census_tract                   True
bin                            True
bbl                            True
nta                            True
dtype: bool

In [549]:
# check for nulls 
dhs_daily_df.isna().any()

date_of_census                                             False
total_adults_in_shelter                                    False
total_children_in_shelter                                  False
total_individuals_in_shelter                               False
single_adult_men_in_shelter                                False
single_adult_women_in_shelter                              False
total_single_adults_in_shelter                             False
families_with_children_in_shelter                          False
adults_in_families_with_children_in_shelter                False
children_in_families_with_children_in_shelter              False
total_individuals_in_families_with_children_in_shelter_    False
adult_families_in_shelter                                  False
individuals_in_adult_families_in_shelter                   False
dtype: bool

In [550]:
demos_by_zip_df.isna().any()

data_os_of_date                                        False
zip_code                                               False
program_type                                           False
female_count                                           False
female_percentage                                      False
male_count                                             False
male_percentage                                        False
gender_nonconforming_count                             False
gender_nonconforming                                   False
american_indian_or_alaskan                             False
american_indian_or_alaskan_1                           False
asian_count                                            False
asian_percentage                                       False
black_or_african_american                              False
black_or_african_american_1                            False
multi_race_count                                       False
multi_race_percentage   

### Data Pivots

#### DHS Time Series Analysis

Questions to Answer <br>
- How has the homeless population in NYC evolved over time?

In [551]:
dhs_daily_df

Unnamed: 0,date_of_census,total_adults_in_shelter,total_children_in_shelter,total_individuals_in_shelter,single_adult_men_in_shelter,single_adult_women_in_shelter,total_single_adults_in_shelter,families_with_children_in_shelter,adults_in_families_with_children_in_shelter,children_in_families_with_children_in_shelter,total_individuals_in_families_with_children_in_shelter_,adult_families_in_shelter,individuals_in_adult_families_in_shelter
0,2023-07-12,53507,28096,81603,16781,4742,21523,16445,25850,28096,53946,2887,6134
1,2023-07-11,53469,28025,81494,16819,4732,21551,16411,25773,28025,53798,2893,6145
2,2023-07-09,53486,27895,81381,16973,4710,21683,16337,25683,27895,53578,2881,6120
3,2023-07-08,53471,27865,81336,16943,4729,21672,16329,25665,27865,53530,2887,6134
4,2023-07-07,53545,27859,81404,16968,4719,21687,16308,25654,27859,53513,2922,6204
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3545,2013-08-25,28458,21400,49858,7230,2704,9934,10324,14694,21400,36094,1804,3830
3546,2013-08-24,28274,21343,49617,7110,2690,9800,10291,14650,21343,35993,1801,3824
3547,2013-08-23,28257,21291,49548,7149,2671,9820,10266,14611,21291,35902,1802,3826
3548,2013-08-22,28366,21324,49690,7201,2716,9917,10274,14622,21324,35946,1803,3827


In [552]:
# get percentage of total per subgroup
# denominator is always total_individuals_in_shelter
# create a for loop for every column except total_individuals_in_shelter, divide & then append result as column name + concat perc_ as prefix

# Create a list to store the new column names
new_columns = []

# Iterate over each column
for column in dhs_daily_df.columns:
    if column != 'total_individuals_in_shelter' and column != 'date_of_census':
        # Generate the new column name with the prefix "perc_" followed by the original column name
        new_column = 'perc_' + column
        
        # Divide the values in the current column by the values in the 'total_individuals_in_shelter' column
        new_values = ((dhs_daily_df[column].astype(float) / dhs_daily_df['total_individuals_in_shelter'].astype(float))*100).round(2)
        
        # Append the new column to the DataFrame
        dhs_daily_df[new_column] = new_values
        
        # Append the new column name to the list
        new_columns.append(new_column)

In [553]:
dhs_daily_df.columns

Index(['date_of_census', 'total_adults_in_shelter',
       'total_children_in_shelter', 'total_individuals_in_shelter',
       'single_adult_men_in_shelter', 'single_adult_women_in_shelter',
       'total_single_adults_in_shelter', 'families_with_children_in_shelter',
       'adults_in_families_with_children_in_shelter',
       'children_in_families_with_children_in_shelter',
       'total_individuals_in_families_with_children_in_shelter_',
       'adult_families_in_shelter', 'individuals_in_adult_families_in_shelter',
       'perc_total_adults_in_shelter', 'perc_total_children_in_shelter',
       'perc_single_adult_men_in_shelter',
       'perc_single_adult_women_in_shelter',
       'perc_total_single_adults_in_shelter',
       'perc_families_with_children_in_shelter',
       'perc_adults_in_families_with_children_in_shelter',
       'perc_children_in_families_with_children_in_shelter',
       'perc_total_individuals_in_families_with_children_in_shelter_',
       'perc_adult_families_

In [554]:
'''filtered_columns = []
for column in dhs_daily_df.columns:
    if column.startswith('perc_'):
        filtered_columns.append(column)  
dhs_daily_df[filtered_columns].info()'''

"filtered_columns = []\nfor column in dhs_daily_df.columns:\n    if column.startswith('perc_'):\n        filtered_columns.append(column)  \ndhs_daily_df[filtered_columns].info()"

In [555]:
# convert to monthly & quarterly
# Set the 'date' column as the index
dhs_daily_df.set_index('date_of_census', inplace=True)

# Resample to monthly average data
monthly_avg_df = dhs_daily_df.resample('M').mean()

# Resample to quarterly average data
quarterly_avg_df = dhs_daily_df.resample('Q').mean()

# Resample to quarterly average data
yearly_avg_df = dhs_daily_df.resample('Y').mean()

In [556]:
#dhs_daily_df
#monthly_avg_df
#quarterly_avg_df
#yearly_avg_df

dhs_daily_df


Unnamed: 0_level_0,total_adults_in_shelter,total_children_in_shelter,total_individuals_in_shelter,single_adult_men_in_shelter,single_adult_women_in_shelter,total_single_adults_in_shelter,families_with_children_in_shelter,adults_in_families_with_children_in_shelter,children_in_families_with_children_in_shelter,total_individuals_in_families_with_children_in_shelter_,...,perc_total_children_in_shelter,perc_single_adult_men_in_shelter,perc_single_adult_women_in_shelter,perc_total_single_adults_in_shelter,perc_families_with_children_in_shelter,perc_adults_in_families_with_children_in_shelter,perc_children_in_families_with_children_in_shelter,perc_total_individuals_in_families_with_children_in_shelter_,perc_adult_families_in_shelter,perc_individuals_in_adult_families_in_shelter
date_of_census,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-07-12,53507,28096,81603,16781,4742,21523,16445,25850,28096,53946,...,34.43,20.56,5.81,26.38,20.15,31.68,34.43,66.11,3.54,7.52
2023-07-11,53469,28025,81494,16819,4732,21551,16411,25773,28025,53798,...,34.39,20.64,5.81,26.44,20.14,31.63,34.39,66.01,3.55,7.54
2023-07-09,53486,27895,81381,16973,4710,21683,16337,25683,27895,53578,...,34.28,20.86,5.79,26.64,20.07,31.56,34.28,65.84,3.54,7.52
2023-07-08,53471,27865,81336,16943,4729,21672,16329,25665,27865,53530,...,34.26,20.83,5.81,26.65,20.08,31.55,34.26,65.81,3.55,7.54
2023-07-07,53545,27859,81404,16968,4719,21687,16308,25654,27859,53513,...,34.22,20.84,5.80,26.64,20.03,31.51,34.22,65.74,3.59,7.62
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2013-08-25,28458,21400,49858,7230,2704,9934,10324,14694,21400,36094,...,42.92,14.50,5.42,19.92,20.71,29.47,42.92,72.39,3.62,7.68
2013-08-24,28274,21343,49617,7110,2690,9800,10291,14650,21343,35993,...,43.02,14.33,5.42,19.75,20.74,29.53,43.02,72.54,3.63,7.71
2013-08-23,28257,21291,49548,7149,2671,9820,10266,14611,21291,35902,...,42.97,14.43,5.39,19.82,20.72,29.49,42.97,72.46,3.64,7.72
2013-08-22,28366,21324,49690,7201,2716,9917,10274,14622,21324,35946,...,42.91,14.49,5.47,19.96,20.68,29.43,42.91,72.34,3.63,7.70


In [557]:
monthly_avg_df = monthly_avg_df.filter(regex='^perc_')
#monthly_avg_df

In [558]:
quarterly_avg_df = quarterly_avg_df.filter(regex='^perc_')
#quarterly_avg_df

In [559]:
yearly_avg_df = yearly_avg_df.filter(regex='^perc_')
#yearly_avg_df

#### Evictions

In [560]:
counts_by_boro_piv = pd.pivot_table(evictions_df,values='docket_number', aggfunc='count',index='executed_date',columns='borough').reset_index().sort_values(by='executed_date',ascending=False)
#counts_by_boro_piv = pd.pivot_table(results_df,values='docket_number', aggfunc='count',index='executed_date',columns='borough')

In [561]:
# Convert 'executed_date' column to datetime
counts_by_boro_piv['executed_date'] = pd.to_datetime(counts_by_boro_piv['executed_date'])

In [562]:
counts_by_boro_piv

borough,executed_date,BRONX,BROOKLYN,MANHATTAN,QUEENS,STATEN ISLAND
1268,2023-07-13,3.0,1.0,,3.0,
1267,2023-07-12,4.0,17.0,1.0,18.0,1.0
1266,2023-07-11,34.0,3.0,21.0,8.0,2.0
1265,2023-07-10,7.0,23.0,3.0,15.0,
1264,2023-07-07,21.0,19.0,19.0,4.0,
...,...,...,...,...,...,...
4,2017-01-09,24.0,49.0,18.0,12.0,2.0
3,2017-01-06,50.0,59.0,14.0,26.0,1.0
2,2017-01-05,52.0,18.0,25.0,27.0,5.0
1,2017-01-04,44.0,64.0,11.0,19.0,6.0


In [563]:
evictions_df.columns

Index(['court_index_number', 'docket_number', 'eviction_address',
       'eviction_apt_num', 'executed_date', 'marshal_first_name',
       'marshal_last_name', 'residential_commercial_ind', 'borough',
       'eviction_zip', 'ejectment', 'eviction_possession', 'latitude',
       'longitude', 'community_board', 'council_district', 'census_tract',
       'bin', 'bbl', 'nta'],
      dtype='object')

In [564]:
# create new dataframe with eviction counts
evictions_counts_df = evictions_df.groupby('executed_date')['docket_number'].count().reset_index()

# Rename the count column to 'count_of_docket_numbers'
evictions_counts_df = evictions_counts_df.rename(columns={'docket_number': 'eviction_count'})

In [565]:
#evictions_counts_df.columns
#dhs_daily_df.columns

#### data merge

this is the section where the merge between dhs_daily_df & evictions_df take place <br>
- dhs_daily_df is data dated back from 2013
- evictions_df is data dated back til 2017
- merged_df is data dated back til 2017

In [566]:
# Reset the index of dhs_daily_df if it is currently set as the index
#dhs_daily_df.reset_index(inplace=True)

# Merge the two DataFrames based on the 'executed_date' column
merged_df = pd.merge(dhs_daily_df, evictions_counts_df, left_on='date_of_census', right_on='executed_date')

# Multiply 'eviction_count' by arbitrary number of 10 to normalize given the limitations of the dataset
merged_df['eviction_count'] = merged_df['eviction_count'] * 10

In [567]:
# please not that the executed_date from the dhs_daily_df is the index
merged_df.columns

Index(['total_adults_in_shelter', 'total_children_in_shelter',
       'total_individuals_in_shelter', 'single_adult_men_in_shelter',
       'single_adult_women_in_shelter', 'total_single_adults_in_shelter',
       'families_with_children_in_shelter',
       'adults_in_families_with_children_in_shelter',
       'children_in_families_with_children_in_shelter',
       'total_individuals_in_families_with_children_in_shelter_',
       'adult_families_in_shelter', 'individuals_in_adult_families_in_shelter',
       'perc_total_adults_in_shelter', 'perc_total_children_in_shelter',
       'perc_single_adult_men_in_shelter',
       'perc_single_adult_women_in_shelter',
       'perc_total_single_adults_in_shelter',
       'perc_families_with_children_in_shelter',
       'perc_adults_in_families_with_children_in_shelter',
       'perc_children_in_families_with_children_in_shelter',
       'perc_total_individuals_in_families_with_children_in_shelter_',
       'perc_adult_families_in_shelter',
     

In [568]:
#dhs_daily_df
merged_df

Unnamed: 0,total_adults_in_shelter,total_children_in_shelter,total_individuals_in_shelter,single_adult_men_in_shelter,single_adult_women_in_shelter,total_single_adults_in_shelter,families_with_children_in_shelter,adults_in_families_with_children_in_shelter,children_in_families_with_children_in_shelter,total_individuals_in_families_with_children_in_shelter_,...,perc_single_adult_women_in_shelter,perc_total_single_adults_in_shelter,perc_families_with_children_in_shelter,perc_adults_in_families_with_children_in_shelter,perc_children_in_families_with_children_in_shelter,perc_total_individuals_in_families_with_children_in_shelter_,perc_adult_families_in_shelter,perc_individuals_in_adult_families_in_shelter,executed_date,eviction_count
0,53507,28096,81603,16781,4742,21523,16445,25850,28096,53946,...,5.81,26.38,20.15,31.68,34.43,66.11,3.54,7.52,2023-07-12,410
1,53469,28025,81494,16819,4732,21551,16411,25773,28025,53798,...,5.81,26.44,20.14,31.63,34.39,66.01,3.55,7.54,2023-07-11,680
2,53545,27859,81404,16968,4719,21687,16308,25654,27859,53513,...,5.80,26.64,20.03,31.51,34.22,65.74,3.59,7.62,2023-07-07,630
3,53579,27818,81397,17025,4729,21754,16287,25628,27818,53446,...,5.81,26.73,20.01,31.49,34.18,65.66,3.58,7.61,2023-07-06,640
4,53763,27811,81574,17148,4721,21869,16306,25663,27811,53474,...,5.79,26.81,19.99,31.46,34.09,65.55,3.60,7.64,2023-07-05,570
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1249,36889,23484,60373,9868,3805,13673,13040,17910,23484,41394,...,6.30,22.65,21.60,29.67,38.90,68.56,4.19,8.79,2017-01-09,1050
1250,36705,23386,60091,9807,3767,13574,12990,17837,23386,41223,...,6.27,22.59,21.62,29.68,38.92,68.60,4.20,8.81,2017-01-06,1500
1251,36785,23365,60150,9873,3785,13658,12992,17847,23365,41212,...,6.29,22.71,21.60,29.67,38.84,68.52,4.18,8.78,2017-01-05,1270
1252,36733,23308,60041,9850,3787,13637,12951,17804,23308,41112,...,6.31,22.71,21.57,29.65,38.82,68.47,4.20,8.81,2017-01-04,1440


In [569]:
'''# Set the dark style
plt.style.use('dark_background')

# Multiply 'eviction_count' by arbitrary number of 10 to normalize given the limitations of the dataset
merged_df['eviction_count'] = merged_df['eviction_count'] * 10

# Plot the data
fig, ax1 = plt.subplots()

# Line plot for total_individuals_in_shelter
color = 'orange'
ax1.set_xlabel('Date')
ax1.set_ylabel('Number of Individuals in Shelter', color=color)
ax1.plot(merged_df['executed_date'], merged_df['total_individuals_in_shelter'], color=color)
ax1.tick_params(axis='y', labelcolor=color)

# Create a second y-axis for eviction_count
ax2 = ax1.twinx()

# Bar plot for eviction_count
color = 'white'
ax2.set_ylabel('Eviction Count', color=color)
ax2.bar(merged_df['executed_date'], merged_df['eviction_count'], color=color, alpha=0.8)
ax2.tick_params(axis='y', labelcolor=color)

# Set the x-axis labels to display vertically for better readability
plt.xticks(rotation='vertical')

# Add a title and legend
plt.title('Individuals in Shelter vs. Eviction Counts in NYC\n (2017-Present)')
#plt.legend(['Total Individuals in Shelter', 'Eviction Count'])

# Adjust the layout
plt.tight_layout()

# Display the plot
plt.show()'''

"# Set the dark style\nplt.style.use('dark_background')\n\n# Multiply 'eviction_count' by arbitrary number of 10 to normalize given the limitations of the dataset\nmerged_df['eviction_count'] = merged_df['eviction_count'] * 10\n\n# Plot the data\nfig, ax1 = plt.subplots()\n\n# Line plot for total_individuals_in_shelter\ncolor = 'orange'\nax1.set_xlabel('Date')\nax1.set_ylabel('Number of Individuals in Shelter', color=color)\nax1.plot(merged_df['executed_date'], merged_df['total_individuals_in_shelter'], color=color)\nax1.tick_params(axis='y', labelcolor=color)\n\n# Create a second y-axis for eviction_count\nax2 = ax1.twinx()\n\n# Bar plot for eviction_count\ncolor = 'white'\nax2.set_ylabel('Eviction Count', color=color)\nax2.bar(merged_df['executed_date'], merged_df['eviction_count'], color=color, alpha=0.8)\nax2.tick_params(axis='y', labelcolor=color)\n\n# Set the x-axis labels to display vertically for better readability\nplt.xticks(rotation='vertical')\n\n# Add a title and legend\np

In [570]:
# Set the template to 'plotly_dark'
#pio.templates.default = "plotly_dark"

In [571]:
# Multiply 'eviction_count' by arbitrary number of 10 to normalize given the limitations of the dataset
#merged_df['eviction_count'] = merged_df['eviction_count'] * 10

# Create subplots with shared x-axis
fig = sp.make_subplots(specs=[[{"secondary_y": True}]])

# Line plot for total_individuals_in_shelter
fig.add_trace(
    go.Scatter(
        x=merged_df['executed_date'],
        y=merged_df['total_individuals_in_shelter'],
        mode='lines',
        name='Number of Individuals in Shelter',
        line=dict(color='orange'),
    ),
    secondary_y=False,
    
)

# Bar plot for eviction_count
fig.add_trace(
    go.Bar(
        x=merged_df['executed_date'],
        y=merged_df['eviction_count'],
        name='Eviction Count',
        marker_color='white',
        opacity=0.8
    ),
    secondary_y=True,
    
)

# Set the y-axis titles
fig.update_yaxes(title_text='Number of Individuals in Shelter', color='orange', secondary_y=False, tickfont=dict(size=30))
fig.update_yaxes(title_text='Eviction Count', color='white', secondary_y=True, tickfont=dict(size=30))

# Set the x-axis title
fig.update_xaxes(title_text='Date', tickfont=dict(size=30))

# Set the chart title
fig.update_layout(
    title='Individuals in Shelter vs. Eviction Counts in NYC\n(2017-Present)',
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    legend=dict(orientation='h', y=1.1, x=0, font=dict(size=30))
)

# Rotate x-axis tick labels for better readability
fig.update_layout(xaxis_tickangle=0)

# Show the figure
fig.show()

### Data Visualizations

#### evictions

In [572]:
# Melt the DataFrame
counts_by_boro_fig = counts_by_boro_piv.melt(id_vars='executed_date', var_name='borough', value_name='count')

In [573]:
# Create bar plot using Plotly
#fig = px.bar(counts_by_boro_fig, x='executed_date', y='count', color='borough', barmode='group')

# Display the plot
#fig.show()

#### dept homelessness

##### time series

In [574]:
dhs_daily_df.info()
#monthly_avg_df.info()
#quarterly_avg_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3550 entries, 2023-07-12 to 2013-08-21
Data columns (total 23 columns):
 #   Column                                                        Non-Null Count  Dtype  
---  ------                                                        --------------  -----  
 0   total_adults_in_shelter                                       3550 non-null   int64  
 1   total_children_in_shelter                                     3550 non-null   int64  
 2   total_individuals_in_shelter                                  3550 non-null   int64  
 3   single_adult_men_in_shelter                                   3550 non-null   int64  
 4   single_adult_women_in_shelter                                 3550 non-null   int64  
 5   total_single_adults_in_shelter                                3550 non-null   int64  
 6   families_with_children_in_shelter                             3550 non-null   int64  
 7   adults_in_families_with_children_in_shelter        

In [575]:
# filter functions

# date filter
def get_date(date_of_census, start_date, end_date):
    date_of_census=date_of_census
    start_date=start_date
    end_date=end_date
    return start_date <= date_of_census <= end_date

# column filter
def get_columns(df, desired_columns):
    columns = []
    for i, column in enumerate(df.columns):
        if i in desired_columns:
            columns.append(column)
    return df[columns]

In [576]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filter = get_columns(dhs_daily_df,desired_columns)
#filter.columns
dhs_daily_df.columns

Index(['total_adults_in_shelter', 'total_children_in_shelter',
       'total_individuals_in_shelter', 'single_adult_men_in_shelter',
       'single_adult_women_in_shelter', 'total_single_adults_in_shelter',
       'families_with_children_in_shelter',
       'adults_in_families_with_children_in_shelter',
       'children_in_families_with_children_in_shelter',
       'total_individuals_in_families_with_children_in_shelter_',
       'adult_families_in_shelter', 'individuals_in_adult_families_in_shelter',
       'perc_total_adults_in_shelter', 'perc_total_children_in_shelter',
       'perc_single_adult_men_in_shelter',
       'perc_single_adult_women_in_shelter',
       'perc_total_single_adults_in_shelter',
       'perc_families_with_children_in_shelter',
       'perc_adults_in_families_with_children_in_shelter',
       'perc_children_in_families_with_children_in_shelter',
       'perc_total_individuals_in_families_with_children_in_shelter_',
       'perc_adult_families_in_shelter',
     

In [577]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(dhs_daily_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
#pio.templates.default = "plotly_dark"

# Create the bar chart
fig = px.line(filtered_data, x='date_of_census', y='total_individuals_in_shelter',
             color_discrete_sequence=['orange',],
             labels=dict(
                 date_of_census='Year Start',
                 total_individuals_in_shelter='Total Sheltered Individuals'
             ),             
             title='Daily Total Individuals in NYC Shelters',            
             )

# Customize the layout if needed
fig.update_layout( xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),                  
                  yaxis=dict(tickmode='auto', tickformat='d', range=[40000, filtered_data['total_individuals_in_shelter'].max()]),
                  margin=dict(l=100, r=50, t=80, b=20),  # Adjust the left and right margin values as desired
                  width=1500,  # Set the desired width of the figure
                  height=500  # Set the desired height of the figure
                  )


# Display the plot
fig.show()

In [578]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(dhs_daily_df,desired_columns)
filtered_data.columns

Index(['total_adults_in_shelter', 'total_children_in_shelter',
       'total_individuals_in_shelter', 'single_adult_men_in_shelter',
       'single_adult_women_in_shelter', 'total_single_adults_in_shelter',
       'families_with_children_in_shelter',
       'adults_in_families_with_children_in_shelter',
       'children_in_families_with_children_in_shelter',
       'total_individuals_in_families_with_children_in_shelter_',
       'adult_families_in_shelter', 'individuals_in_adult_families_in_shelter',
       'perc_total_adults_in_shelter', 'perc_total_children_in_shelter',
       'perc_single_adult_men_in_shelter',
       'perc_single_adult_women_in_shelter',
       'perc_total_single_adults_in_shelter',
       'perc_families_with_children_in_shelter',
       'perc_adults_in_families_with_children_in_shelter',
       'perc_children_in_families_with_children_in_shelter',
       'perc_total_individuals_in_families_with_children_in_shelter_',
       'perc_adult_families_in_shelter',
     

In [579]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10,11]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(dhs_daily_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
#pio.templates.default = "plotly_dark"

# Create the bar chart
fig = px.line(filtered_data, x='date_of_census', y=['total_individuals_in_families_with_children_in_shelter_','total_single_adults_in_shelter','individuals_in_adult_families_in_shelter'],
             color_discrete_sequence=['sienna', 'orange','silver'],
             labels=dict(
                 date_of_census='Year Start',
                 total_individuals_in_shelter='Total Sheltered Individuals'
             ),             
             title='Daily Total Individuals in NYC Shelters',            
             )

# Customize the layout if needed
fig.update_layout( xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.0, xanchor='left', x=0),                  
                  yaxis=dict(tickmode='auto', tickformat='d', range=[0, filtered_data['total_individuals_in_shelter'].max()]),
                  margin=dict(l=100, r=50, t=80, b=20),  # Adjust the left and right margin values as desired
                  width=1500,  # Set the desired width of the figure
                  height=500  # Set the desired height of the figure
                  )


# Display the plot
fig.show()

In [580]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(monthly_avg_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
#pio.templates.default = "plotly_dark"

# Create the bar chart
fig = px.line(filtered_data, x='date_of_census', y=['perc_total_adults_in_shelter', 'perc_total_children_in_shelter','perc_total_single_adults_in_shelter','perc_total_individuals_in_families_with_children_in_shelter_',],
             color_discrete_sequence=['silver','purple','orange', 'sienna'],
             labels={'date_of_census': 'Year_Start', 'value': 'Percentage'},
             title='Average Population in NYC Shelters (%)',
             
             )

# Customize the layout if needed
fig.update_layout( xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),
                  xaxis=dict(tickmode='auto', tickformat='%Y'),
                  width=1500,  # Set the desired width of the figure
                  height=500  # Set the desired height of the figure
                  )

# Display the plot
fig.show()

In [581]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(monthly_avg_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
#pio.templates.default = "plotly_dark"

# Create the line chart
fig = px.line(filtered_data, x='date_of_census', y=['perc_total_adults_in_shelter', 'perc_total_children_in_shelter'],
             color_discrete_sequence=['silver', 'orange'],
             labels={'date_of_census': 'Year_Start', 'value': 'Percentage'},
             title='Average Population in NYC Shelters (%)',
             width=1500,  # Set the desired width of the figure
             height=500  # Set the desired height of the figure
             )

# Add the second y-axis
fig.update_traces(yaxis="y")

# Add data labels
texts = [filtered_data['perc_total_adults_in_shelter'].apply(lambda x: f'{round(x)}%'),
         filtered_data['perc_total_children_in_shelter'].apply(lambda x: f'{round(x)}%')]
for i, t in enumerate(texts):
    fig.data[i].text = t

# Customize the layout if needed
fig.update_layout(xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),
                  xaxis=dict(tickmode='auto', tickformat='%Y'),
                  yaxis=dict(title='Percentage', side='left'),  # Set side='left' for the first y-axis
                  )

# Display the plot
fig.show()


##### distribution

In [582]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(yearly_avg_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
#pio.templates.default = "plotly_dark"
#plt.style.use('dark_background')

# Create the bar chart
fig = px.bar(filtered_data, x='date_of_census', y=['perc_total_individuals_in_families_with_children_in_shelter_','perc_total_single_adults_in_shelter','perc_individuals_in_adult_families_in_shelter'],
             color_discrete_sequence=['sienna', 'orange','silver'],
             labels={'date_of_census': 'Year_Start', 'value': 'Percentage'},
             #title='NYC Sheltered Population Year-over-Year',
             width=1500,  # Set the desired width of the figure
            height=500,  # Set the desired height of the figure            
             )

# Customize the layout if needed
fig.update_layout(barmode='group', xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0,font=dict(size=30)),
                  font=dict(size=30),
                  paper_bgcolor='rgba(0,0,0,0)',
                  plot_bgcolor='rgba(0,0,0,0)',

)

# Add data labels
texts = [filtered_data['perc_total_individuals_in_families_with_children_in_shelter_'].apply(lambda x: f'{round(x)}%'),
         filtered_data['perc_total_single_adults_in_shelter'].apply(lambda x: f'{round(x)}%'),
         filtered_data['perc_individuals_in_adult_families_in_shelter'].apply(lambda x: f'{round(x)}%'),
         ]
for i, t in enumerate(texts):
    fig.data[i].text = t
fig.update_traces(textposition='inside', textfont=dict(color='white', size=40), textangle=0,
        )

# Display the plot
fig.show()

In [583]:
desired_columns = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(yearly_avg_df, desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)

# Create the bar chart
fig = go.Figure()

# Add the bar traces
fig.add_trace(
    go.Bar(
        x=filtered_data['date_of_census'],
        y=filtered_data['perc_total_individuals_in_families_with_children_in_shelter_'],
        name='Total Individuals in Families with Children',
        marker_color='sienna',
        text=filtered_data['perc_total_individuals_in_families_with_children_in_shelter_'].apply(lambda x: f'{round(x)}%'),
        textposition='inside',
        textangle=0,
        textfont=dict(color='white', size=18),
        opacity=0.8
    )
)

fig.add_trace(
    go.Bar(
        x=filtered_data['date_of_census'],
        y=filtered_data['perc_total_single_adults_in_shelter'],
        name='Total Single Adults',
        marker_color='orange',
        text=filtered_data['perc_total_single_adults_in_shelter'].apply(lambda x: f'{round(x)}%'),
        textposition='inside',
        textangle=0,
        textfont=dict(color='white', size=18),
        opacity=0.8
    )
)

fig.add_trace(
    go.Bar(
        x=filtered_data['date_of_census'],
        y=filtered_data['perc_individuals_in_adult_families_in_shelter'],
        name='Individuals in Adult Families',
        marker_color='silver',
        text=filtered_data['perc_individuals_in_adult_families_in_shelter'].apply(lambda x: f'{round(x)}%'),
        textposition='inside',
        textangle=0,
        textfont=dict(color='white', size=18),
        opacity=0.8
    )
)

# Customize the layout
fig.update_layout(
    barmode='group',
    xaxis_tickangle=0,
    legend=dict(
        orientation='h',
        yanchor='bottom',
        y=1.05,
        xanchor='left',
        x=0,
        font=dict(size=24),
        title=dict(font=dict(size=24))
    ),
    font=dict(size=24),
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    width=1500,
    height=500
)

# Set x-axis and y-axis labels
fig.update_xaxes(title_text='Year Start')
fig.update_yaxes(title_text='Percentage')

# Display the plot
fig.show()


In [584]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(yearly_avg_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
pio.templates.default = "plotly_dark"

# Create the bar chart
fig = px.bar(filtered_data, x='date_of_census', y=['perc_total_adults_in_shelter', 'perc_total_children_in_shelter'],
             color_discrete_sequence=['silver', 'orange'],
             labels={'date_of_census': 'Year_Start', 'value': 'Percentage'},
             title='Percentage of Adults vs. Children in NYC Shelter YoY',
             
             )

# Customize the layout if needed
fig.update_layout(barmode='group', xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),
                  xaxis=dict(tickmode='auto', tickformat='%Y'),
                  width=1500,  # Set the desired width of the figure
                  height=500  # Set the desired height of the figure
                  )

# Add data labels
texts = [filtered_data['perc_total_adults_in_shelter'].apply(lambda x: f'{round(x)}%'),
         filtered_data['perc_total_children_in_shelter'].apply(lambda x: f'{round(x)}%')]
for i, t in enumerate(texts):
    fig.data[i].text = t
fig.update_traces(textposition='inside', textfont=dict(color='black', size=10), textangle=0)

# Display the plot
fig.show()

In [585]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(yearly_avg_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
pio.templates.default = "plotly_dark"

# Create the bar chart
fig = px.bar(filtered_data, x='date_of_census', y=['perc_single_adult_men_in_shelter', 'perc_single_adult_women_in_shelter'],
             color_discrete_sequence=['sienna', 'silver'],
             labels={'date_of_census': 'Year_Start', 'value': 'Percentage'},
             title='NYC Sheltered Population by Gender YoY',
             
             )

# Customize the layout if needed
fig.update_layout(barmode='group', xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),
                  width=1500,  # Set the desired width of the figure
                  height=500  # Set the desired height of the figure
                  )


# Add data labels
texts = [filtered_data['perc_single_adult_men_in_shelter'].apply(lambda x: f'{round(x)}%'),
         filtered_data['perc_single_adult_women_in_shelter'].apply(lambda x: f'{round(x)}%')]
for i, t in enumerate(texts):
    fig.data[i].text = t
fig.update_traces(textposition='inside', textfont=dict(color='black', size=10), textangle=0)

# Display the plot
fig.show()

##### correlation analyses

In [586]:
'''correlation_matrix = filtered_data.corr()  # Correlation matrix
sns.heatmap(correlation_matrix, annot=True, cmap='rocket_r')
plt.show()'''

"correlation_matrix = filtered_data.corr()  # Correlation matrix\nsns.heatmap(correlation_matrix, annot=True, cmap='rocket_r')\nplt.show()"

In [587]:
'''desired_columns = [0,1,2,3,4,5,6,7,8]
filtered_data = get_columns(quarterly_avg_df,desired_columns)


correlation_matrix = filtered_data.corr()  # Correlation matrix
sns.heatmap(correlation_matrix, annot=True, cmap='rocket_r')
plt.show()'''

"desired_columns = [0,1,2,3,4,5,6,7,8]\nfiltered_data = get_columns(quarterly_avg_df,desired_columns)\n\n\ncorrelation_matrix = filtered_data.corr()  # Correlation matrix\nsns.heatmap(correlation_matrix, annot=True, cmap='rocket_r')\nplt.show()"

#### web app development

In [588]:
app = dash.Dash(__name__)

date_range_select = dcc.DatePickerRange(
        id="date-range",
        min_date_allowed=datetime(2013, 1, 1),
        max_date_allowed=datetime.today(),
        start_date=datetime(2013, 1, 1),
        end_date=datetime.today(),
    )

output = dcc.Graph(style={'width':'90vw',
                          'height':'50vw'},
                   id="graph")

app.layout = html.Div([
    dbc.Row(date_range_select),
    dbc.Row(output),
])

@app.callback(
    Output("graph", "figure"),
    Input("date-range", "start_date"),
    Input("date-range", "end_date")
)
def data_visualization(start_date, end_date):
    filtered_data = get_columns(dhs_daily_df, desired_columns)

    # Apply date filter
    filtered_data = filtered_data[
        (filtered_data.index >= start_date) & (filtered_data.index <= end_date)
    ]

    # Set the template to 'plotly_dark'
    pio.templates.default = "plotly_dark"

    # Create the line chart
    fig = px.line(
        filtered_data,
        x=filtered_data.index,
        y='total_individuals_in_shelter',
        color_discrete_sequence=['orange'],
        labels=dict(
            x='Date',
            y='Total Sheltered Individuals'
        ),
        title='Daily Total Individuals in NYC Shelters'
    )

    # Customize the layout if needed
    fig.update_layout(
        xaxis_tickangle=0,
        legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),
        yaxis=dict(tickmode='auto', tickformat='d', range=[40000, filtered_data['total_individuals_in_shelter'].max()]),
        margin=dict(l=100, r=50, t=80, b=20),  # Adjust the left and right margin values as desired
        width=1500,  # Set the desired width of the figure
        height=500  # Set the desired height of the figure
    )
    return fig

if __name__ == '__main__':
    app.run_server(debug=True)

In [589]:
dhs_daily_df

Unnamed: 0_level_0,total_adults_in_shelter,total_children_in_shelter,total_individuals_in_shelter,single_adult_men_in_shelter,single_adult_women_in_shelter,total_single_adults_in_shelter,families_with_children_in_shelter,adults_in_families_with_children_in_shelter,children_in_families_with_children_in_shelter,total_individuals_in_families_with_children_in_shelter_,...,perc_total_children_in_shelter,perc_single_adult_men_in_shelter,perc_single_adult_women_in_shelter,perc_total_single_adults_in_shelter,perc_families_with_children_in_shelter,perc_adults_in_families_with_children_in_shelter,perc_children_in_families_with_children_in_shelter,perc_total_individuals_in_families_with_children_in_shelter_,perc_adult_families_in_shelter,perc_individuals_in_adult_families_in_shelter
date_of_census,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-07-12,53507,28096,81603,16781,4742,21523,16445,25850,28096,53946,...,34.43,20.56,5.81,26.38,20.15,31.68,34.43,66.11,3.54,7.52
2023-07-11,53469,28025,81494,16819,4732,21551,16411,25773,28025,53798,...,34.39,20.64,5.81,26.44,20.14,31.63,34.39,66.01,3.55,7.54
2023-07-09,53486,27895,81381,16973,4710,21683,16337,25683,27895,53578,...,34.28,20.86,5.79,26.64,20.07,31.56,34.28,65.84,3.54,7.52
2023-07-08,53471,27865,81336,16943,4729,21672,16329,25665,27865,53530,...,34.26,20.83,5.81,26.65,20.08,31.55,34.26,65.81,3.55,7.54
2023-07-07,53545,27859,81404,16968,4719,21687,16308,25654,27859,53513,...,34.22,20.84,5.80,26.64,20.03,31.51,34.22,65.74,3.59,7.62
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2013-08-25,28458,21400,49858,7230,2704,9934,10324,14694,21400,36094,...,42.92,14.50,5.42,19.92,20.71,29.47,42.92,72.39,3.62,7.68
2013-08-24,28274,21343,49617,7110,2690,9800,10291,14650,21343,35993,...,43.02,14.33,5.42,19.75,20.74,29.53,43.02,72.54,3.63,7.71
2013-08-23,28257,21291,49548,7149,2671,9820,10266,14611,21291,35902,...,42.97,14.43,5.39,19.82,20.72,29.49,42.97,72.46,3.64,7.72
2013-08-22,28366,21324,49690,7201,2716,9917,10274,14622,21324,35946,...,42.91,14.49,5.47,19.96,20.68,29.43,42.91,72.34,3.63,7.70


# forecasting

In [590]:
# forecasting
def forecast_sarimax(time_series, forecast_steps):
    forecasts = {}

    for column in time_series.columns:
        # Extract the individual time series
        ts = time_series[column]

        # Fit the SARIMAX model
        model = sm.tsa.SARIMAX(ts, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0))
        model_fit = model.fit()

        # Forecast future values
        forecast = model_fit.get_forecast(steps=forecast_steps)

        # Extract the forecasted values and confidence intervals
        forecast_values = forecast.predicted_mean
        forecast_ci = forecast.conf_int()

        # Create a DataFrame with the forecasted values and confidence intervals
        forecast_df = pd.DataFrame({'Forecast': forecast_values,
                                    'Lower CI': forecast_ci.iloc[:, 0],
                                    'Upper CI': forecast_ci.iloc[:, 1]})

        # Store the forecast for the current variable
        forecasts[column] = forecast_df

    return forecasts

In [591]:
'''forecasts = forecast_sarimax(filtered_data, forecast_steps=5)
variable_name = 'perc_total_adults_in_shelter'
forecast_variable_1 = forecasts[variable_name]
forecast_variable_1'''

"forecasts = forecast_sarimax(filtered_data, forecast_steps=5)\nvariable_name = 'perc_total_adults_in_shelter'\nforecast_variable_1 = forecasts[variable_name]\nforecast_variable_1"