# NYC OpenData - Department of Homelessness Analysis

#### Sources

priority | dataset  | description | link |
--------------|--------------|-------------|---------------|
1 | DHS Daily Report | This dataset includes the daily number of families and individuals residing in the Department of <br> Homeless Services (DHS) shelter system and the daily number of families applying to the DHS shelter system. | https://dev.socrata.com/foundry/data.cityofnewyork.us/k46n-sa2m |
2 | Evictions | This dataset lists pending, scheduled and executed evictions within the five boroughs, for the year 2017 - Present. <br> Eviction data is compiled from the majority of New York City Marshals.  | https://dev.socrata.com/foundry/data.cityofnewyork.us/6z8x-wfk4 |
3 | DYCD Demographics by Zip Code | This dataset provides a Demographic breakdown of only DYCD-funded participants within a Zip Code of NYC | https://dev.socrata.com/foundry/data.cityofnewyork.us/hebw-6hze |



#### References

| source  | link |
|--------------|---------------|
| Time Series | https://dev.socrata.com/blog/2019/10/07/time-series-analysis-with-jupyter-notebooks-and-socrata.html |


## Imports


In [802]:
import pandas as pd
from sodapy import Socrata
import plotly.express as px
import nbformat
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
from datetime import *
import statsmodels.api as sm
import seaborn as sns
import plotly.io as pio
import dash_bootstrap_components as dbc

## Read data from API to Dataframes

**Dept of Homelessness Daily Report**

In [803]:
# https://dev.socrata.com/foundry/data.cityofnewyork.us/k46n-sa2m
client = Socrata("data.cityofnewyork.us", None)
results = client.get("k46n-sa2m",limit=80000)

# Convert to pandas DataFrame
dhs_daily_df = pd.DataFrame.from_records(results)



**Evictions**

In [804]:
# https://dev.socrata.com/foundry/data.cityofnewyork.us/6z8x-wfk4
client = Socrata("data.cityofnewyork.us", None)
results = client.get("6z8x-wfk4",limit=80000)

# Convert to pandas DataFrame
evictions_df = pd.DataFrame.from_records(results)



**DYCD - Dept of Youth & Community Development** - 
Demographics by Zip Code

In [805]:
# https://dev.socrata.com/foundry/data.cityofnewyork.us/hebw-6hze

client = Socrata("data.cityofnewyork.us", None)
results = client.get("hebw-6hze",limit=80000)


# Convert to pandas DataFrame
demos_by_zip_df = pd.DataFrame.from_records(results)



## Exploratory Data Analysis

### Schema Info + Data Cleaning

#### dhs_report

In [806]:
# dhs_daily_report
# columns are objects > causing issues downstream when trying to graph > need to convert to numeric


#converting to type numeric
cols = dhs_daily_df.columns.drop('date_of_census')
dhs_daily_df[cols] = dhs_daily_df[cols].apply(pd.to_numeric)
dhs_daily_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3547 entries, 0 to 3546
Data columns (total 13 columns):
 #   Column                                                   Non-Null Count  Dtype 
---  ------                                                   --------------  ----- 
 0   date_of_census                                           3547 non-null   object
 1   total_adults_in_shelter                                  3547 non-null   int64 
 2   total_children_in_shelter                                3547 non-null   int64 
 3   total_individuals_in_shelter                             3547 non-null   int64 
 4   single_adult_men_in_shelter                              3547 non-null   int64 
 5   single_adult_women_in_shelter                            3547 non-null   int64 
 6   total_single_adults_in_shelter                           3547 non-null   int64 
 7   families_with_children_in_shelter                        3547 non-null   int64 
 8   adults_in_families_with_children_in_sh

In [807]:
dhs_daily_df['date_of_census'] = pd.to_datetime(dhs_daily_df['date_of_census'])
dhs_daily_df

Unnamed: 0,date_of_census,total_adults_in_shelter,total_children_in_shelter,total_individuals_in_shelter,single_adult_men_in_shelter,single_adult_women_in_shelter,total_single_adults_in_shelter,families_with_children_in_shelter,adults_in_families_with_children_in_shelter,children_in_families_with_children_in_shelter,total_individuals_in_families_with_children_in_shelter_,adult_families_in_shelter,individuals_in_adult_families_in_shelter
0,2023-07-08,53471,27865,81336,16943,4729,21672,16329,25665,27865,53530,2887,6134
1,2023-07-07,53545,27859,81404,16968,4719,21687,16308,25654,27859,53513,2922,6204
2,2023-07-06,53579,27818,81397,17025,4729,21754,16287,25628,27818,53446,2918,6197
3,2023-07-05,53763,27811,81574,17148,4721,21869,16306,25663,27811,53474,2935,6231
4,2023-07-04,53463,27684,81147,16965,4699,21664,16246,25552,27684,53236,2946,6247
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3542,2013-08-25,28458,21400,49858,7230,2704,9934,10324,14694,21400,36094,1804,3830
3543,2013-08-24,28274,21343,49617,7110,2690,9800,10291,14650,21343,35993,1801,3824
3544,2013-08-23,28257,21291,49548,7149,2671,9820,10266,14611,21291,35902,1802,3826
3545,2013-08-22,28366,21324,49690,7201,2716,9917,10274,14622,21324,35946,1803,3827


#### evictions

In [808]:
#evictions
#evictions_df.info()
evictions_df['executed_date'] = pd.to_datetime(evictions_df['executed_date'])
evictions_df.sort_values(by='executed_date',ascending=False)

Unnamed: 0,court_index_number,docket_number,eviction_address,eviction_apt_num,executed_date,marshal_first_name,marshal_last_name,residential_commercial_ind,borough,eviction_zip,ejectment,eviction_possession,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta
54074,317973/22,16662,90-11 160TH STREET APT B506,B506,2023-07-06,Robert,Renzulli,Residential,QUEENS,11432,Not an Ejectment,Possession,40.704969,-73.799903,12,24,44601,4000000,4097577501,Jamaica
16797,305757/21,117070,10 RICHMAN PLAZA,37H,2023-07-06,Justin,Grossman,Residential,BRONX,10453,Not an Ejectment,Possession,40.851533,-73.922961,5,16,53,2093973,2028820229,University Heights-Morris Heights
29840,300506/23,16286,259 MONTAUK AVENUE,2ND FL APT,2023-07-06,Robert,Renzulli,Residential,BROOKLYN,11208,Not an Ejectment,Possession,40.672539,-73.877532,5,42,1194,3393986,3040400104,East New York
63148,300902/22,120059,196 SPRING STREET,4,2023-07-06,Justin,Grossman,Residential,MANHATTAN,10012,Not an Ejectment,Possession,40.725130,-74.003153,2,3,47,1007354,1004890018,SoHo-TriBeCa-Civic Center-Little Italy
58447,318157/22N,115285,75 SHERMAN AVENUE A/ K/A 73-75 SHERMAN AVENUE,32,2023-07-06,Justin,Grossman,Residential,MANHATTAN,10040,Not an Ejectment,Possession,40.863397,-73.926465,12,10,285,1064160,1021740034,Washington Heights North
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5373,31314/16,053818,65 EAST 190TH STREET,2J,2017-01-03,Justin,Grossman,Residential,BRONX,10468,Not an Ejectment,Possession,40.862984,-73.898367,7,14,401,2013930,2031750007,Bedford Park-Fordham North
66577,N070028/16,371877,315 EAST 102ND ST,729,2017-01-03,Richard,McCoy,Residential,MANHATTAN,10029,Not an Ejectment,Possession,40.787834,-73.943766,11,8,164,1088305,1016740010,East Harlem South
69040,55509/16,462258,1880 LAFAYETTE AVENU E,22D,2017-01-03,Danny,Weinheim,Residential,BRONX,10473,Not an Ejectment,Possession,40.821892,-73.859109,9,18,16,2092711,2036000004,Soundview-Castle Hill-Clason Point-Harding Park
34942,74907/16,7346,108-02 LIBERTY AVENUE,,2017-01-03,Edward,Guida,Commercial,QUEENS,11419,Not an Ejectment,Possession,40.683729,-73.833682,10,28,96,4202271,4095310001,South Ozone Park


#### demos by zip

In [809]:
#demos_by_zip_df.info()
demos_by_zip_df['data_os_of_date'] = pd.to_datetime(demos_by_zip_df['data_os_of_date'])
demos_by_zip_df

Unnamed: 0,data_os_of_date,zip_code,program_type,female_count,female_percentage,male_count,male_percentage,gender_nonconforming_count,gender_nonconforming,american_indian_or_alaskan,...,non_binary_not_female_or_male_count,non_binary_not_female_or_male_percentage,not_sure_gender_identity_count,not_sure_gender_identity_percentage,two_spirit_native_american_first_nations_count,two_spirit_native_american_first_nations_percentage,middle_eastern_and_north_african_count,middle_eastern_and_north_african_percentage,other_count,other_percentage
0,2020-07-16,0,Beacon,3,0.1,4,0.13,0,0,0,...,0,0,0,0,0,0,0,0,5,0.16
1,2020-07-16,1017,Cornerstone,1,0.03,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0.03
2,2020-07-16,2121,Beacon,0,0,1,0.03,0,0,0,...,0,0,0,0,0,0,0,0,1,0.03
3,2020-07-16,2420,Beacon,1,0.03,0,0,0,0,0,...,0,0,0,0,0,0,1,0.03,0,0
4,2020-07-16,2453,Beacon,0,0,1,0.03,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4908,2019-06-01,91103,Beacon,1,0.03,0,0,0,0,0,...,,,,,,,,,,
4909,2019-06-01,92620,Beacon,2,0.06,0,0,0,0,0,...,,,,,,,,,,
4910,2019-06-01,94539,Cornerstone,0,0,5,0.15,0,0,0,...,,,,,,,,,,
4911,2019-06-01,95351,Beacon,0,0,1,0.03,0,0,0,...,,,,,,,,,,


### Handling Nulls

In [810]:
# check for nulls 
evictions_df.isna().any()

court_index_number            False
docket_number                 False
eviction_address              False
eviction_apt_num               True
executed_date                 False
marshal_first_name            False
marshal_last_name             False
residential_commercial_ind    False
borough                       False
eviction_zip                  False
ejectment                     False
eviction_possession           False
latitude                       True
longitude                      True
community_board                True
council_district               True
census_tract                   True
bin                            True
bbl                            True
nta                            True
dtype: bool

In [811]:
# check for nulls 
dhs_daily_df.isna().any()

date_of_census                                             False
total_adults_in_shelter                                    False
total_children_in_shelter                                  False
total_individuals_in_shelter                               False
single_adult_men_in_shelter                                False
single_adult_women_in_shelter                              False
total_single_adults_in_shelter                             False
families_with_children_in_shelter                          False
adults_in_families_with_children_in_shelter                False
children_in_families_with_children_in_shelter              False
total_individuals_in_families_with_children_in_shelter_    False
adult_families_in_shelter                                  False
individuals_in_adult_families_in_shelter                   False
dtype: bool

In [812]:
demos_by_zip_df.isna().any()

data_os_of_date                                        False
zip_code                                               False
program_type                                           False
female_count                                           False
female_percentage                                      False
male_count                                             False
male_percentage                                        False
gender_nonconforming_count                             False
gender_nonconforming                                   False
american_indian_or_alaskan                             False
american_indian_or_alaskan_1                           False
asian_count                                            False
asian_percentage                                       False
black_or_african_american                              False
black_or_african_american_1                            False
multi_race_count                                       False
multi_race_percentage   

### Data Pivots

#### DHS Time Series Analysis

Questions to Answer <br>
- How has the homeless population in NYC evolved over time?

In [813]:
dhs_daily_df

Unnamed: 0,date_of_census,total_adults_in_shelter,total_children_in_shelter,total_individuals_in_shelter,single_adult_men_in_shelter,single_adult_women_in_shelter,total_single_adults_in_shelter,families_with_children_in_shelter,adults_in_families_with_children_in_shelter,children_in_families_with_children_in_shelter,total_individuals_in_families_with_children_in_shelter_,adult_families_in_shelter,individuals_in_adult_families_in_shelter
0,2023-07-08,53471,27865,81336,16943,4729,21672,16329,25665,27865,53530,2887,6134
1,2023-07-07,53545,27859,81404,16968,4719,21687,16308,25654,27859,53513,2922,6204
2,2023-07-06,53579,27818,81397,17025,4729,21754,16287,25628,27818,53446,2918,6197
3,2023-07-05,53763,27811,81574,17148,4721,21869,16306,25663,27811,53474,2935,6231
4,2023-07-04,53463,27684,81147,16965,4699,21664,16246,25552,27684,53236,2946,6247
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3542,2013-08-25,28458,21400,49858,7230,2704,9934,10324,14694,21400,36094,1804,3830
3543,2013-08-24,28274,21343,49617,7110,2690,9800,10291,14650,21343,35993,1801,3824
3544,2013-08-23,28257,21291,49548,7149,2671,9820,10266,14611,21291,35902,1802,3826
3545,2013-08-22,28366,21324,49690,7201,2716,9917,10274,14622,21324,35946,1803,3827


In [814]:
# get percentage of total per subgroup
# denominator is always total_individuals_in_shelter
# create a for loop for every column except total_individuals_in_shelter, divide & then append result as column name + concat perc_ as prefix

# Create a list to store the new column names
new_columns = []

# Iterate over each column
for column in dhs_daily_df.columns:
    if column != 'total_individuals_in_shelter' and column != 'date_of_census':
        # Generate the new column name with the prefix "perc_" followed by the original column name
        new_column = 'perc_' + column
        
        # Divide the values in the current column by the values in the 'total_individuals_in_shelter' column
        new_values = ((dhs_daily_df[column].astype(float) / dhs_daily_df['total_individuals_in_shelter'].astype(float))*100).round(2)
        
        # Append the new column to the DataFrame
        dhs_daily_df[new_column] = new_values
        
        # Append the new column name to the list
        new_columns.append(new_column)

#dhs_daily_df

In [815]:
'''filtered_columns = []
for column in dhs_daily_df.columns:
    if column.startswith('perc_'):
        filtered_columns.append(column)  
dhs_daily_df[filtered_columns].info()'''

"filtered_columns = []\nfor column in dhs_daily_df.columns:\n    if column.startswith('perc_'):\n        filtered_columns.append(column)  \ndhs_daily_df[filtered_columns].info()"

In [816]:
# convert to monthly & quarterly
# Set the 'date' column as the index
dhs_daily_df.set_index('date_of_census', inplace=True)

# Resample to monthly average data
monthly_avg_df = dhs_daily_df.resample('M').mean()

# Resample to quarterly average data
quarterly_avg_df = dhs_daily_df.resample('Q').mean()

# Resample to quarterly average data
yearly_avg_df = dhs_daily_df.resample('Y').mean()

In [817]:
monthly_avg_df = monthly_avg_df.filter(regex='^perc_')
#monthly_avg_df

In [818]:
quarterly_avg_df = quarterly_avg_df.filter(regex='^perc_')
#quarterly_avg_df

In [819]:
yearly_avg_df = yearly_avg_df.filter(regex='^perc_')
#yearly_avg_df

#### Evictions

In [820]:
counts_by_boro_piv = pd.pivot_table(evictions_df,values='docket_number', aggfunc='count',index='executed_date',columns='borough').reset_index().sort_values(by='executed_date',ascending=False)
#counts_by_boro_piv = pd.pivot_table(results_df,values='docket_number', aggfunc='count',index='executed_date',columns='borough')

In [821]:
# Convert 'executed_date' column to datetime
counts_by_boro_piv['executed_date'] = pd.to_datetime(counts_by_boro_piv['executed_date'])

In [822]:
counts_by_boro_piv

borough,executed_date,BRONX,BROOKLYN,MANHATTAN,QUEENS,STATEN ISLAND
1263,2023-07-06,17.0,13.0,9.0,2.0,
1262,2023-07-05,1.0,21.0,7.0,11.0,2.0
1261,2023-07-03,6.0,6.0,11.0,7.0,
1260,2023-06-30,22.0,20.0,1.0,6.0,1.0
1259,2023-06-29,34.0,8.0,17.0,1.0,
...,...,...,...,...,...,...
4,2017-01-09,24.0,49.0,18.0,12.0,2.0
3,2017-01-06,50.0,59.0,14.0,26.0,1.0
2,2017-01-05,52.0,18.0,25.0,27.0,5.0
1,2017-01-04,44.0,64.0,11.0,19.0,6.0


### Data Visualizations

#### evictions

In [823]:
# Melt the DataFrame
counts_by_boro_fig = counts_by_boro_piv.melt(id_vars='executed_date', var_name='borough', value_name='count')

In [824]:
# Create bar plot using Plotly
fig = px.bar(counts_by_boro_fig, x='executed_date', y='count', color='borough', barmode='group')

# Display the plot
fig.show()

#### dept homelessness

##### time series

In [825]:
dhs_daily_df.info()
#monthly_avg_df.info()
#quarterly_avg_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3547 entries, 2023-07-08 to 2013-08-21
Data columns (total 23 columns):
 #   Column                                                        Non-Null Count  Dtype  
---  ------                                                        --------------  -----  
 0   total_adults_in_shelter                                       3547 non-null   int64  
 1   total_children_in_shelter                                     3547 non-null   int64  
 2   total_individuals_in_shelter                                  3547 non-null   int64  
 3   single_adult_men_in_shelter                                   3547 non-null   int64  
 4   single_adult_women_in_shelter                                 3547 non-null   int64  
 5   total_single_adults_in_shelter                                3547 non-null   int64  
 6   families_with_children_in_shelter                             3547 non-null   int64  
 7   adults_in_families_with_children_in_shelter        

In [826]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filter = get_columns(dhs_daily_df,desired_columns)
#filter.columns
dhs_daily_df.columns

Index(['total_adults_in_shelter', 'total_children_in_shelter',
       'total_individuals_in_shelter', 'single_adult_men_in_shelter',
       'single_adult_women_in_shelter', 'total_single_adults_in_shelter',
       'families_with_children_in_shelter',
       'adults_in_families_with_children_in_shelter',
       'children_in_families_with_children_in_shelter',
       'total_individuals_in_families_with_children_in_shelter_',
       'adult_families_in_shelter', 'individuals_in_adult_families_in_shelter',
       'perc_total_adults_in_shelter', 'perc_total_children_in_shelter',
       'perc_single_adult_men_in_shelter',
       'perc_single_adult_women_in_shelter',
       'perc_total_single_adults_in_shelter',
       'perc_families_with_children_in_shelter',
       'perc_adults_in_families_with_children_in_shelter',
       'perc_children_in_families_with_children_in_shelter',
       'perc_total_individuals_in_families_with_children_in_shelter_',
       'perc_adult_families_in_shelter',
     

In [827]:
# filter functions

# date filter
def get_date(date_of_census, start_date, end_date):
    date_of_census=date_of_census
    start_date=start_date
    end_date=end_date
    return start_date <= date_of_census <= end_date

# column filter
def get_columns(df, desired_columns):
    columns = []
    for i, column in enumerate(df.columns):
        if i in desired_columns:
            columns.append(column)
    return df[columns]

In [828]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(dhs_daily_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
pio.templates.default = "plotly_dark"

# Create the bar chart
fig = px.line(filtered_data, x='date_of_census', y='total_individuals_in_shelter',
             color_discrete_sequence=['orange',],
             labels=dict(
                 date_of_census='Date',
                 total_individuals_in_shelter='Total Sheltered Individuals'
             ),             
             title='Daily Total Individuals in NYC Shelters',            
             )

# Customize the layout if needed
fig.update_layout( xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),                  
                  yaxis=dict(tickmode='auto', tickformat='d', range=[40000, filtered_data['total_individuals_in_shelter'].max()]),
                  margin=dict(l=100, r=50, t=80, b=20),  # Adjust the left and right margin values as desired
                  width=1500,  # Set the desired width of the figure
                  height=500  # Set the desired height of the figure
                  )


# Display the plot
fig.show()

In [829]:
import dash

app = dash.Dash(__name__)

date_range_select = dcc.DatePickerRange(
        id="date-range",
        min_date_allowed=datetime(2013, 1, 1),
        max_date_allowed=datetime.today(),
        start_date=datetime(2013, 1, 1),
        end_date=datetime.today(),
    )

output = dcc.Graph(style={'width':'90vw',
                          'height':'50vw'},
                   id="graph")

app.layout = html.Div([
    dbc.Row(date_range_select),
    dbc.Row(output),
])

@app.callback(
    Output("graph", "figure"),
    Input("date-range", "start_date"),
    Input("date-range", "end_date")
)
def data_visualization(start_date, end_date):
    # Convert index values to datetime
    filtered_data = get_columns(dhs_daily_df, desired_columns)

    # Apply date filter
    filtered_data = filtered_data[
        (filtered_data.index >= start_date) & (filtered_data.index <= end_date)
    ]

    # Set the template to 'plotly_dark'
    pio.templates.default = "plotly_dark"

    # Create the line chart
    fig = px.line(
        filtered_data,
        x=filtered_data.index,
        y='total_individuals_in_shelter',
        color_discrete_sequence=['orange'],
        labels=dict(
            x='Date',
            y='Total Sheltered Individuals'
        ),
        title='Daily Total Individuals in NYC Shelters'
    )

    # Customize the layout if needed
    fig.update_layout(
        xaxis_tickangle=0,
        legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),
        yaxis=dict(tickmode='auto', tickformat='d', range=[40000, filtered_data['total_individuals_in_shelter'].max()]),
        margin=dict(l=100, r=50, t=80, b=20),  # Adjust the left and right margin values as desired
        width=1500,  # Set the desired width of the figure
        height=500  # Set the desired height of the figure
    )
    return fig

if __name__ == '__main__':
    app.run_server(debug=True)

In [830]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(monthly_avg_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
pio.templates.default = "plotly_dark"

# Create the bar chart
fig = px.line(filtered_data, x='date_of_census', y=['perc_total_adults_in_shelter', 'perc_total_children_in_shelter','perc_total_single_adults_in_shelter','perc_total_individuals_in_families_with_children_in_shelter_',],
             color_discrete_sequence=['silver','purple','orange', 'sienna'],
             labels={'date_of_census': 'Year_Start', 'value': 'Percentage'},
             title='Average Population in NYC Shelters (%)',
             
             )

# Customize the layout if needed
fig.update_layout( xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),
                  xaxis=dict(tickmode='auto', tickformat='%Y'),
                  width=1500,  # Set the desired width of the figure
                  height=500  # Set the desired height of the figure
                  )

# Display the plot
fig.show()

In [831]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(monthly_avg_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
pio.templates.default = "plotly_dark"

# Create the line chart
fig = px.line(filtered_data, x='date_of_census', y=['perc_total_adults_in_shelter', 'perc_total_children_in_shelter'],
             color_discrete_sequence=['silver', 'orange'],
             labels={'date_of_census': 'Year_Start', 'value': 'Percentage'},
             title='Average Population in NYC Shelters (%)',
             width=1500,  # Set the desired width of the figure
             height=500  # Set the desired height of the figure
             )

# Add the second y-axis
fig.update_traces(yaxis="y")

# Add data labels
texts = [filtered_data['perc_total_adults_in_shelter'].apply(lambda x: f'{round(x)}%'),
         filtered_data['perc_total_children_in_shelter'].apply(lambda x: f'{round(x)}%')]
for i, t in enumerate(texts):
    fig.data[i].text = t

# Customize the layout if needed
fig.update_layout(xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),
                  xaxis=dict(tickmode='auto', tickformat='%Y'),
                  yaxis=dict(title='Percentage', side='left'),  # Set side='left' for the first y-axis
                  )

# Display the plot
fig.show()


##### distribution

In [832]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(yearly_avg_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
pio.templates.default = "plotly_dark"

# Create the bar chart
fig = px.bar(filtered_data, x='date_of_census', y=['perc_total_adults_in_shelter', 'perc_total_children_in_shelter'],
             color_discrete_sequence=['silver', 'orange'],
             labels={'date_of_census': 'Year_Start', 'value': 'Percentage'},
             title='Percentage of Adults vs. Children in NYC Shelter by Year',
             
             )

# Customize the layout if needed
fig.update_layout(barmode='group', xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),
                  xaxis=dict(tickmode='auto', tickformat='%Y'),
                  width=1500,  # Set the desired width of the figure
                  height=500  # Set the desired height of the figure
                  )

# Add data labels
texts = [filtered_data['perc_total_adults_in_shelter'].apply(lambda x: f'{round(x)}%'),
         filtered_data['perc_total_children_in_shelter'].apply(lambda x: f'{round(x)}%')]
for i, t in enumerate(texts):
    fig.data[i].text = t
fig.update_traces(textposition='inside', textfont=dict(color='black', size=10), textangle=0)

# Display the plot
fig.show()

In [833]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(yearly_avg_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
pio.templates.default = "plotly_dark"

# Create the bar chart
fig = px.bar(filtered_data, x='date_of_census', y=['perc_single_adult_men_in_shelter', 'perc_single_adult_women_in_shelter'],
             color_discrete_sequence=['sienna', 'silver'],
             labels={'date_of_census': 'Year_Start', 'value': 'Percentage'},
             title='Percentage of Single Adults by Gender in NYC Shelter by Year',
             
             )

# Customize the layout if needed
fig.update_layout(barmode='group', xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0),
                  width=1500,  # Set the desired width of the figure
                  height=500  # Set the desired height of the figure
                  )


# Add data labels
texts = [filtered_data['perc_single_adult_men_in_shelter'].apply(lambda x: f'{round(x)}%'),
         filtered_data['perc_single_adult_women_in_shelter'].apply(lambda x: f'{round(x)}%')]
for i, t in enumerate(texts):
    fig.data[i].text = t
fig.update_traces(textposition='inside', textfont=dict(color='black', size=10), textangle=0)

# Display the plot
fig.show()

In [834]:
desired_columns = [0,1,2,3,4,5,6,7,8,9,10]
start_date = datetime(2013, 1, 1)
end_date = datetime(2022, 12, 31)

# Convert index values to datetime
filtered_data = get_columns(yearly_avg_df,desired_columns)

# Apply date filter
filtered_data = filtered_data[filtered_data.index.to_series().apply(get_date, start_date=start_date, end_date=end_date)]
filtered_data.reset_index(inplace=True)
filtered_data

# Set the template to 'plotly_dark'
pio.templates.default = "plotly_dark"

# Create the bar chart
fig = px.bar(filtered_data, x='date_of_census', y=['perc_total_individuals_in_families_with_children_in_shelter_','perc_total_single_adults_in_shelter','perc_individuals_in_adult_families_in_shelter'],
             color_discrete_sequence=['sienna', 'orange','silver'],
             labels={'date_of_census': 'Year_Start', 'value': 'Percentage'},
             title='Percentage of Single Adults by Gender in NYC Shelter by Year',
             width=1500,  # Set the desired width of the figure
            height=500  # Set the desired height of the figure
             )

# Customize the layout if needed
fig.update_layout(barmode='group', xaxis_tickangle=0,
                  legend=dict(orientation='h', yanchor='bottom', y=1.05, xanchor='left', x=0))

# Add data labels
texts = [filtered_data['perc_total_individuals_in_families_with_children_in_shelter_'].apply(lambda x: f'{round(x)}%'),
         filtered_data['perc_total_single_adults_in_shelter'].apply(lambda x: f'{round(x)}%'),
         filtered_data['perc_individuals_in_adult_families_in_shelter'].apply(lambda x: f'{round(x)}%'),
         ]
for i, t in enumerate(texts):
    fig.data[i].text = t
fig.update_traces(textposition='inside', textfont=dict(color='black', size=10), textangle=0)

# Display the plot
fig.show()

##### correlation analyses

In [835]:
'''correlation_matrix = filtered_data.corr()  # Correlation matrix
sns.heatmap(correlation_matrix, annot=True, cmap='rocket_r')
plt.show()'''

"correlation_matrix = filtered_data.corr()  # Correlation matrix\nsns.heatmap(correlation_matrix, annot=True, cmap='rocket_r')\nplt.show()"

In [836]:
'''desired_columns = [0,1,2,3,4,5,6,7,8]
filtered_data = get_columns(quarterly_avg_df,desired_columns)


correlation_matrix = filtered_data.corr()  # Correlation matrix
sns.heatmap(correlation_matrix, annot=True, cmap='rocket_r')
plt.show()'''

"desired_columns = [0,1,2,3,4,5,6,7,8]\nfiltered_data = get_columns(quarterly_avg_df,desired_columns)\n\n\ncorrelation_matrix = filtered_data.corr()  # Correlation matrix\nsns.heatmap(correlation_matrix, annot=True, cmap='rocket_r')\nplt.show()"

# forecasting

In [837]:
# forecasting
def forecast_sarimax(time_series, forecast_steps):
    forecasts = {}

    for column in time_series.columns:
        # Extract the individual time series
        ts = time_series[column]

        # Fit the SARIMAX model
        model = sm.tsa.SARIMAX(ts, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0))
        model_fit = model.fit()

        # Forecast future values
        forecast = model_fit.get_forecast(steps=forecast_steps)

        # Extract the forecasted values and confidence intervals
        forecast_values = forecast.predicted_mean
        forecast_ci = forecast.conf_int()

        # Create a DataFrame with the forecasted values and confidence intervals
        forecast_df = pd.DataFrame({'Forecast': forecast_values,
                                    'Lower CI': forecast_ci.iloc[:, 0],
                                    'Upper CI': forecast_ci.iloc[:, 1]})

        # Store the forecast for the current variable
        forecasts[column] = forecast_df

    return forecasts

In [838]:
'''forecasts = forecast_sarimax(filtered_data, forecast_steps=5)
variable_name = 'perc_total_adults_in_shelter'
forecast_variable_1 = forecasts[variable_name]
forecast_variable_1'''

"forecasts = forecast_sarimax(filtered_data, forecast_steps=5)\nvariable_name = 'perc_total_adults_in_shelter'\nforecast_variable_1 = forecasts[variable_name]\nforecast_variable_1"