### Homeless Not Hopeless ###

- Data Source:
  - Housing Availability and Affordability:
    - Monthly Rent Data/Runting in Burdern/Median Income: https://data.cccnewyork.org/

    - Children in Shelter: https://data.cccnewyork.org/data

- Project Description: In this project, homelessness_study, We used data from different sources to check why homelessness is getting worse within NYC. There are many facts that are impacting the homelessness phenomena such as:
    - The increasing price of renting
    - Housing is STILL not affordable
    - Median Income of Renters: $34,255
    - Average 2- Bedroom Rent: $1,651
    - Income to afford that rent: $59,430
    - Other facts


## Libraries

In [None]:
import pandas as pd
import numpy as np
# Visualizing time series data
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11, 4)})
import matplotlib.dates as mdates

%matplotlib inline 
import scipy.stats as stats
import re

## An Overview of Monthly Rent Data
- What are the minimum and maximum people can afford to pay for rent

In [None]:
#If you want to upload the files into your google colab use this code: 
# from google.colab import files
# uploaded=files.upload()
# Median Monthly Rent
# path1='/content/Median Monthly Rent.csv'
# #Creating dataframe
# df=pd.read_csv(path1, skiprows=5)
# df.head(5)

In [None]:
!curl https://raw.githubusercontent.com/MBouchaqour/homelessness_project/main/data/rent/Median%20Monthly%20Rent.csv -o monthly_rent.csv


In [None]:
df=pd.read_csv("monthly_rent.csv")
df.head(5)

In [None]:
#The period of data
start_date=df['TimeFrame'].min()
end_date=df['TimeFrame'].max()
print(f'Data is ranged between {start_date} and {end_date}')

In [None]:
#filtring locations
target_location=['Manhattan','Queens','Staten Island','Bronx','Brooklyn']
filter_by_location=df['Location'].isin(target_location)
df=df[filter_by_location]
df.head(5)

In [None]:
data_info=[('Null values check: ', df.isnull().sum().sort_values(ascending=False)),
           ('Duplicate data: ', df.duplicated().sum()), 
           ('Type: ' , df.dtypes) ]
for item in data_info:
  print(item[0], end='\t')
  print(item[1])

In [None]:
pivot = df.pivot_table(
    index=['Location'], 
    values=['Data'], 
    aggfunc='mean')
pivot

In [None]:
# colours = ["green", "red", "purple", "blue", "orange" ]
pivot.plot(kind='bar', colormap='Paired')
plt.title("Monthly Median Rent By Boroughs")
plt.xlabel("Boroughs")
plt.ylabel("Monthly Median Rent")
plt.show()

* The graph above shows that Manhattan is the leading borough when it comes to higher renting price, while Bronx, is still considered the lower renting price.

In [None]:
plt.scatter(df['TimeFrame'],df['Data'])

plt.title("The increasing price in rent with time")
plt.xlabel('Years')
plt.ylabel('Rent Pricing')
plt.show()

* According to data, we may infer that the renting price is getting higher year after year and that may increase the homelesseness

## Renting in Burden: How much percentage people in low income are paying for rent.

In [None]:
#Working locally from google colab us this code
# from google.colab import files
# uploaded=files.upload()
# # Renting in Burden
# path1='/content/Severe Rent Burden.csv'
# #Creating dataframe
# df_2=pd.read_csv(path1, skiprows=5)
# df_2.head(5)

In [None]:
!curl https://raw.githubusercontent.com/MBouchaqour/homelessness_project/main/data/rent/Severe%20Rent%20Burden.csv -o severe_rent_burden.csv


In [None]:
df_2=pd.read_csv("severe_rent_burden.csv")
df_2.head(5)

In [None]:
#filtring locations
target_location=['Manhattan','Queens','Staten Island','Bronx','Brooklyn']
filter_by_location=df_2['Location'].isin(target_location)
df_2=df_2[filter_by_location]
df_2.head(5)

In [None]:
data_info=[('Null values check: ', df_2.isnull().sum().sort_values(ascending=False)),
           ('Duplicate data: ', df_2.duplicated().sum()), 
           ('Type: ' , df_2.dtypes) ]
for item in data_info:
  print(item[0], end='\t')
  print(item[1])

In [None]:
dt={
    'Location':df_2['Location'],
    'AverageIncome%':(df_2['Data']).round(2)
}
data_frame=pd.DataFrame.from_dict(dt)
pivot = data_frame.pivot_table(
    index=['Location'], 
    values=['AverageIncome%'], 
    aggfunc='mean')
pivot


In [None]:
#This code is for comparing the income with how much spending on rent our data is table_income
table_per_income=pivot.reset_index()
table_per_income.style.format({'AverageIncome%': "{:.2%}"})

* Hosehold who lives In boroughs except Manhattan are paying almsot 28% from their income to rent. An example of this will be as the followin: 
If an individual has income of $25,000 a year, and lives in Brookly, they will be paying at least $7000 just for rent, the remaining will be $18,000. However, most of people who are more likely homeless, their income is either 0 or less than $1000 a year. The next data will further investigate this point. 

# Median Income data

In [None]:
#using googl colab use this code:
# from google.colab import files
# uploaded=files.upload()
# # Median Income
# path1='/content/Median Incomes.csv'
# #Creating dataframe
# df_3=pd.read_csv(path1, skiprows=5)
# df_3.head(5)

In [None]:
!curl https://raw.githubusercontent.com/MBouchaqour/homelessness_project/main/data/income/Median%20Incomes.csv -o median_income.csv

In [None]:
df_3=pd.read_csv("median_income.csv")
df_3.head(5)

In [None]:
#filtring locations
target_location=['Manhattan','Queens','Staten Island','Bronx','Brooklyn']
filter_by_location=df_3['Location'].isin(target_location)
df_3=df_3[filter_by_location]
df_3.tail(5)

In [None]:
print('Null values: ', df_3.isnull().sum().sort_values(ascending=False))
print()
print("Duplicate: ", df_3.duplicated().sum())
print()
print('Data type: ', df_3.dtypes)

In [None]:
df_3['Data'] = df_3['Data'] .astype(float)
print('Data type: ', df_3.dtypes)

In [None]:
pivot = df_3.pivot_table(
    index=['Location'], 
    columns=['Household Type'],
    values=['Data'], 
    aggfunc='median')
pivot

In [None]:
#This code is for comparing the income with how much spending on rent our data is table_income
table_income=pivot.reset_index()
Picked_cols=[('Location',                          ''),(    'Data',    'Families with Children')]
table_income=table_income[Picked_cols]
new_cols=['Location','Families_With_Children']
table_income.columns=new_cols
table_income

In [None]:
pivot.plot()
plt.show()

* We should consider that the lower income occurs in Bronx, with Families with Children in roughly $37,000. In the next dataset we will investigate children in the shelter.

In [None]:
# Investigating income based on year
def get_location(loc, data=df_3):
  cond=df_3['Location']==loc
  gp=df_3[cond]
  return gp

In [None]:
#choose the location
data=get_location('Bronx')

#See the pattern
plt.scatter(data['TimeFrame'],data['Data'])
plt.xlabel('Bronx')
plt.ylabel('Median Income')
plt.show()

* Some more investigation regarding the income and how much goes for rent

In [None]:
#Mergin the two dataframes table_income and table_per_income
merged_income_percentage = pd.merge(table_income, table_per_income)
merged_income_percentage['How much goes for rent']=merged_income_percentage['Families_With_Children'] * merged_income_percentage['AverageIncome%']
merged_income_percentage

In [None]:
cond=merged_income_percentage['Location']=='Bronx'
data=merged_income_percentage[cond]
# barWidth = 0.25
# fig = plt.subplots(figsize =(12, 8))
data.pivot(index='Location', columns=['Families_With_Children','How much goes for rent'], values=['Families_With_Children','How much goes for rent']).plot(kind='bar')
plt.title('The difference in median icome Vs Rent')
plt.show()

## Children in Shelter

In [None]:
DHS_Daily_Report = pd.read_csv('https://raw.githubusercontent.com/MBouchaqour/homelessness_project/main/data/DHS_Daily_Report.csv')
DHS_Daily_Report.head(5)

In [None]:
# The period we deal with is started from 2013-08-21 and end at 2021-11-03
check_date=list(pd.date_range('2013-08-21', '2021-11-03', freq='D'))
check_date

In [None]:
current_date= list(DHS_Daily_Report.index)


In [None]:
DHS_Daily_Report.columns

In [None]:
DHS_Daily_Report.loc['2013']

In [None]:
diff=[item for item in check_date if item not in current_date ]
def filter_years(year,data=diff):
  new_list=[]
  for n in data:
    txt=str(n)
    if txt.find(year)!=-1:
      new_list.append(n)
  return new_list

In [None]:
filter_years('2013')

In [None]:
data={
       'Date of Census':filter_years('2013'),
      'Total Adults in Shelter':DHS_Daily_Report.loc['2013']['Total Adults in Shelter'].median(), 
      'Total Children in Shelter':DHS_Daily_Report.loc['2013']['Total Children in Shelter'].median(),
       'Total Individuals in Shelter':DHS_Daily_Report.loc['2013']['Total Individuals in Shelter'].median(), 
      'Single Adult Men in Shelter':DHS_Daily_Report.loc['2013']['Single Adult Men in Shelter'].median(),
       'Single Adult Women in Shelter':DHS_Daily_Report.loc['2013']['Single Adult Women in Shelter'].median(), 
      'Total Single Adults in Shelter':DHS_Daily_Report.loc['2013']['Total Adults in Shelter'].median(),
       'Families with Children in Shelter':DHS_Daily_Report.loc['2013']['Families with Children in Shelter'].median(),
       'Adults in Families with Children in Shelter':DHS_Daily_Report.loc['2013']['Adults in Families with Children in Shelter'].median(),
       'Children in Families with Children in Shelter':DHS_Daily_Report.loc['2013']['Children in Families with Children in Shelter'].median(),
       'Total Individuals in Families with Children in Shelter ':DHS_Daily_Report.loc['2013']['Total Individuals in Families with Children in Shelter '].median(),
       'Adult Families in Shelter':DHS_Daily_Report.loc['2013']['Adult Families in Shelter'].median(), 
      'Individuals in Adult Families in Shelter':DHS_Daily_Report.loc['2013']['Individuals in Adult Families in Shelter'].median()
      
}



In [None]:
print("data shape: ", DHS_Daily_Report.shape)
print("Date type: ", DHS_Daily_Report.dtypes)

In [None]:
#changing the date from object to date
DHS_Daily_Report['Date of Census']=pd.to_datetime(DHS_Daily_Report['Date of Census'])
DHS_Daily_Report.dtypes

In [None]:
DHS_Daily_Report.sort_values(by=['Date of Census'], inplace=True)
DHS_Daily_Report.head(5)

In [None]:
DHS_Daily_Report.isnull().sum() # checking for missing data

In [None]:
# Cheking for duplicate data
n_dupes=DHS_Daily_Report.duplicated().sum()
print("Number of duplicate rows are %i." % n_dupes)

In [None]:
#check this one
dupplicate=DHS_Daily_Report[DHS_Daily_Report.duplicated()]
dupplicate

In [None]:
cond=DHS_Daily_Report['Date of Census']=="2021-10-21"
DHS_Daily_Report[cond]

In [None]:
DHS_Daily_Report.drop_duplicates(inplace=True)
n_dupes=DHS_Daily_Report.duplicated().sum()
print("Number of duplicate rows are %i." % n_dupes)


In [None]:
#getting the date period
star_date=min(DHS_Daily_Report['Date of Census'])
end_date=max(DHS_Daily_Report['Date of Census'])
print(f"The period we deal with is started from {star_date.date()} and end at {end_date.date()}")

In [None]:
#indexing the data by date and creating 3 columns
DHS_Daily_Report=DHS_Daily_Report.set_index('Date of Census')
DHS_Daily_Report['Year'] = DHS_Daily_Report.index.year
DHS_Daily_Report['Month'] = DHS_Daily_Report.index.month_name()
DHS_Daily_Report['Weekday_Name'] = DHS_Daily_Report.index.day_name()

In [None]:
  #Target variables
Childrens=['Total Children in Shelter','Families with Children in Shelter']
Adults=['Total Adults in Shelter','Single Adult Men in Shelter']
Individuals=['Total Individuals in Shelter','Individuals in Adult Families in Shelter']

In [None]:
def slicing_Data(group, period, data=DHS_Daily_Report):
  try:
  #collecting the target group
    Childrens=['Total Children in Shelter','Families with Children in Shelter']
    Adults=['Total Adults in Shelter','Single Adult Men in Shelter']
    Individuals=['Total Individuals in Shelter','Individuals in Adult Families in Shelter']
    if group=='Childrens':
      group=Childrens[:]
    elif group=='Adults':
      group=Adults[:]
    elif group=='Individuals':
      group=Individuals[:]
    else:
      return 'Type does not exist'
    if period not in ["Year", "Month", "Weekday_Name"]:
      return 'You picked a wrong Period'
    group.append(period)
    df_table=data[group]
    return df_table
  except:
    print('An error occurs while we respond to your request')

In [None]:
#slcing data by Year, Month, Weekday_Name /// Childrens/ Adults /// Individuals
table=slicing_Data("Childrens","Year")
table

In [None]:
table.filter(
    items=['Total Children in Shelter','Families with Children in Shelter']
).plot()
plt.show()

* As we can see from the graph, there is a huge drop regarding the number of children in the Shelter. We can't really infer anything. It may be a good or bad sign to see that dropping occurs:
As a good interpretation: We may say that childrens are out of shelter and they moved to somewhere else or to houses where they will have normal life.
Bad interpretation: We can say that children moved to outddor/street and that occurs because of the bad situation in the shelter (probably not safe), they were encoraged by the weather (When it hot outside families/childrens step out of the shelter). 

In [None]:
# Activate this function after you create the table:
#This function return a pivot table
def pivote_table(group, period, data=table):
  Childrens=['Total Children in Shelter','Families with Children in Shelter']
  Adults=['Total Adults in Shelter','Single Adult Men in Shelter']
  Individuals=['Total Individuals in Shelter','Individuals in Adult Families in Shelter']
  if group=='Childrens':
    group=Childrens[:]
  elif group=='Adults':
    group=Adults[:]
  elif group=='Individuals':
    group=Individuals[:]
  else:
    return 'Type does not exist'
  pivot=table.pivot_table(index=period,values=group,aggfunc='sum', margins=True)
  return pivot

In [None]:
#making sure you pick the same period (Year/Month/Day) as the table above
pivot =pivote_table("Childrens","Year")
pivot


In [None]:
#graphing the pivot table
# new_pivot=pivot.reset_index()
pivot.plot(kind='bar')
plt.show()

* This graph tells us that the number of children that enters to shelter increased in 2021. We will investigate this case further. 

In [None]:
#Intro to Time series
#Checking the differences between entries children for each day/date

new_table=table.iloc[:,0:2]
new_table.columns=['Total', 'T_familes']
new_table['Difference']=new_table.eval('Total - T_familes')
new_table

In [None]:
# date is ranged from 2013-08-21 and  2021-11-03
ax1 = new_table.loc['2021-8':'2021-9', 'Difference']
ax1.plot(linewidth=2.5)

* The increas we see in the begening of 2021 is more detailed within the graph above. We've seen a huge drop again from Jan to July: This happened because families/childrens decided first to back to shelter then they started leaving shelters. From July to September we saw almost no change, then some increasing happaned again starting from Sept.  

In [None]:
#Rolling over July to November
#Change period you wanted to investigate further in above function 
ax = ax1.plot(alpha=0.30)
ax1.rolling(5).mean().plot(ax=ax)
ax.legend(["2021","Rolling Mean"])

In [None]:
#An overview of what is happening
new_table.plot()
plt.ylabel('Yearly children in Shelter');

In [None]:
#QS stands for Quarter Start
#MS Month Start
#BA stands for Business year end
# Q	Quarter end

#Resampling and checking the change in the graph
Monthly = new_table.resample('BA').mean()
Monthly.plot(style=[':', '--', '-'])
plt.ylabel('Monthly children in Shelter');

In [None]:
#Rooling data
daily = new_table.resample('Q').sum()
daily.rolling(1, center=True).sum().plot(style=[':', '--', '-'])
plt.ylabel('mean hourly count');

In [None]:
#More details in data
#requesting data by date1:date2
def data_time_base(given_date,col=DHS_Daily_Report.columns,data=DHS_Daily_Report):
  if given_date.find(":")!=-1:
    arrange_date=given_date.split(":")
    if date_exist(arrange_date[0]) and date_exist(arrange_date[1]):
      return data.loc[arrange_date[0]:arrange_date[1]][col]
  
  elif given_date in data.index:
    return data.loc[given_date][col]
  else:
    return 'Wrong date or not exist in data'

In [None]:
#Check if the date exist in data
def date_exist(given_date,data=DHS_Daily_Report.index):
  if given_date in data:
    return True
  else:
    return False

In [None]:
#Picking up some specific period within the dataset
reques_date=data_time_base('2021-7:2021-11','Total Children in Shelter')
reques_date.plot()
plt.show()

* Roughly speaking, in the year 2020, from July to november, we see a decreasing in number of children in shelters. In contrast, in other years, 2013 up to 2019 including the year 2021, we see the same pattern which is increasing in the numbers the children entering the shelters.  

In [None]:
reques_date=data_time_base('2021-07: 2021-11','Total Children in Shelter')

In [None]:
reques_date.plot()
plt.show()

In [None]:
cols_plot =Childrens
axes=DHS_Daily_Report[cols_plot].plot(marker='.', alpha=0.5, linestyle='None', figsize=(11, 9), subplots=True)
# axes = data_time_base('2020-01:2020-02',cols_plot).plot(marker='.', alpha=0.5, linestyle='None', figsize=(11, 9), subplots=True)
for ax in axes:
    ax.set_ylabel('Children in shelter')

In [None]:
# Some visualization
ax=data_time_base('2017','Total Children in Shelter').plot()
ax.set_ylabel('Variance number of children in the shelter');

In [None]:
ax =data_time_base('2017-01:2017-02','Total Children in Shelter').plot(marker='o', linestyle='-')
ax.set_ylabel('Variance number of children in the shelter');