# SDCTA - RTFH Project

This script contains all data cleaning scripts and processes from raw to processed data. Each sction is divided by output CSV file. 

## Data Pipeline
raw datasets:
- `CityExpendituresRaw.csv`
- `PITCount.csv`

| **INPUT DATASET(S)**  | **DATA CLEANING SCRIPT** | **OUTPUT DATASET(S)** |
| ------------- | ------------- | ------------- |
| `CityExpendituresRaw.csv`  | `raw_data_cleaning.ipynb`  | `processed.csv`  |
| `processed.csv`  | `updated_dataset.ipynb`  | `updated_dataset.csv`  |
| `updated_dataset` `PITCount.csv`  | `RP_preprocessing_20241025.ipynb`  | `expenditures_and_PIT.csv` |
| `expenditures_and_PIT.csv` | `pivoted_dataset (1).ipynb` | `pivoted_dataset.csv`  |
| `expenditures_and_PIT.csv` `PITCount.csv`*  | `RP_preprocessing_20241025.ipynb`  | `pivoted_and_PIT.csv`  |
| `pivoted_and_PIT.csv` `expenditures_and_PIT.csv`  | `pivoted_pit_grantee.ipynb`  | `pivoted_pit_grantee.csv`  |

## Setup

In [203]:
# Imports
import numpy as np
import pandas as pd 

import os

## `processed.csv`
Input dataset:
- `CityExpendituresRaw.csv`

In [204]:
# Reading in data
raw = pd.read_csv("../data/raw/CityExpendituresRaw.csv")
raw.head(5)

Unnamed: 0,Unique.ID,Grantor,Grantee,Program,Year,Date,EndDate,Amount,AmendmentNumber,Funding.Agency,...,Issued,Funding.Type,Years,Average.By.Year,City.Year,Population,Amount.Per.Capita,Amount.Per.PEH,Population.PEH,ExpenditureType
0,,City of Imperial Beach,,,2022.0,,,0,,,...,,,,,City of Imperial Beach|2022,,$0.00,$0.00,0.0,Other/Unknown
1,,City of Imperial Beach,,,2021.0,,,0,,,...,,,,,City of Imperial Beach|2021,,$0.00,$0.00,0.0,Other/Unknown
2,,City of Imperial Beach,,,2020.0,,,0,,,...,,,,,City of Imperial Beach|2020,,$0.00,$0.00,16.0,Other/Unknown
3,,City of Imperial Beach,,,2019.0,,,0,,,...,,,,,City of Imperial Beach|2019,,$0.00,$0.00,12.0,Other/Unknown
4,,City of Imperial Beach,,,2018.0,,,0,,,...,,,,,City of Imperial Beach|2018,,$0.00,$0.00,7.0,Other/Unknown


**Basic Filtering**

Since the goal is to filter the unique programs down as small as possible with the intention of combinining programs that do the same thing, but named slightly differently, checking if any of the `Program` row strings contain the name of another program could yield some level of shrinkage.


In [205]:
# Function to update program names by checking for occurrences in the 'Program' column
def update_program_column(processed):
    """
    Update program names by checking if other rows contain the same string.

    Parameters:
    - processed: The DataFrame containing the 'Program' column.

    Returns:
    - A new DataFrame with updated program names.
    """
    # Create a mapping of unique program values
    program_mapping = {val: val for val in processed['Program'].dropna().unique()}

    # Iterate over each row in the DataFrame
    for index, row in processed.iterrows():
        program = row['Program']
        # Check if the program is a string
        if isinstance(program, str):
            # Update other occurrences of the program name
            processed.loc[processed['Program'].str.contains(program, na=False), 'Program'] = program_mapping[program]

    return processed


# Function to replace dashes with spaces in the 'Program' column
def replace_dashes_with_spaces(processed):
    """
    Replace all dashes in the 'Program' column with spaces.

    Parameters:
    - processed: The DataFrame containing the 'Program' column.

    Returns:
    - A new DataFrame with the dashes replaced by spaces.
    """
    # Replace dashes with spaces in the 'Program' column
    processed['Program'] = processed['Program'].str.replace('-', ' ', regex=False)

    return processed

# Create copy of raw data frame for data cleaing
processed = raw.copy()

# Convert all program names to lowercase for standardization
processed['Program'] = processed['Program'].str.lower()

# Replace dashes with spaces in the 'Program' column
processed = replace_dashes_with_spaces(processed)

# Update program names based on occurrences in the DataFrame
processed = update_program_column(processed)

# Display the number of unique program names before and after processing
print("Unique program names in raw data:", len(raw["Program"].unique()))
print("Unique program names in processed data:", len(processed['Program'].unique()))


  processed.loc[processed['Program'].str.contains(program, na=False), 'Program'] = program_mapping[program]
  processed.loc[processed['Program'].str.contains(program, na=False), 'Program'] = program_mapping[program]
  processed.loc[processed['Program'].str.contains(program, na=False), 'Program'] = program_mapping[program]
  processed.loc[processed['Program'].str.contains(program, na=False), 'Program'] = program_mapping[program]
  processed.loc[processed['Program'].str.contains(program, na=False), 'Program'] = program_mapping[program]
  processed.loc[processed['Program'].str.contains(program, na=False), 'Program'] = program_mapping[program]
  processed.loc[processed['Program'].str.contains(program, na=False), 'Program'] = program_mapping[program]


Unique program names in raw data: 186
Unique program names in processed data: 107


Manual cleaning Attempt

In [206]:
def convert_program_value(processed, value_to_convert, new_value):
    """
    Convert a specified value in the 'Program' column of the processed DataFrame.

    Parameters:
    - processed: The DataFrame containing the 'Program' column.
    - value_to_convert: The value in the 'Program' column that you want to convert.
    - new_value: The new value to replace the old value with.

    Returns:
    - A new DataFrame with the specified conversion applied.
    """
    
    # Replace the specified value in the 'Program' column
    processed['Program'] = processed['Program'].replace(value_to_convert, new_value)

    return processed

In [207]:
processed = convert_program_value(processed,
                                    'address homeless issues through case management; provide food, shelter vouchers, as well as skill development for long-term self-sufficiency to 200 residents.',  
                                   'address homeless issues through case management, provide food, shelter vouchers, and skill development for long-term self sufficiency'
)

processed = convert_program_value(processed,
                                  'outreeach',
                                  'outreach'
)
print("Unique program names in processed data:", processed['Program'].nunique())

Unique program names in processed data: 105


This completes the data cleaning from which `processed.csv` is constructed.

In [208]:
"""
# Create the 'data/processed' directory if it doesn't exist
os.makedirs('../data/processed', exist_ok=True)

# Define the path for the CSV file
csv_file_path = os.path.join('data', 'processed', 'processed.csv')

# Save the DataFrame to a CSV file
processed.to_csv(csv_file_path, index=False)

print(f"Data saved to {csv_file_path}")
"""

'\n# Create the \'data/processed\' directory if it doesn\'t exist\nos.makedirs(\'../data/processed\', exist_ok=True)\n\n# Define the path for the CSV file\ncsv_file_path = os.path.join(\'data\', \'processed\', \'processed.csv\')\n\n# Save the DataFrame to a CSV file\nprocessed.to_csv(csv_file_path, index=False)\n\nprint(f"Data saved to {csv_file_path}")\n'

## `updated_dataset.csv`
Input dataset:
- `processed.csv`

In [209]:
#df = pd.read_csv('../data/processed/processed.csv')
df = processed
df.head()

Unnamed: 0,Unique.ID,Grantor,Grantee,Program,Year,Date,EndDate,Amount,AmendmentNumber,Funding.Agency,...,Issued,Funding.Type,Years,Average.By.Year,City.Year,Population,Amount.Per.Capita,Amount.Per.PEH,Population.PEH,ExpenditureType
0,,City of Imperial Beach,,,2022.0,,,0,,,...,,,,,City of Imperial Beach|2022,,$0.00,$0.00,0.0,Other/Unknown
1,,City of Imperial Beach,,,2021.0,,,0,,,...,,,,,City of Imperial Beach|2021,,$0.00,$0.00,0.0,Other/Unknown
2,,City of Imperial Beach,,,2020.0,,,0,,,...,,,,,City of Imperial Beach|2020,,$0.00,$0.00,16.0,Other/Unknown
3,,City of Imperial Beach,,,2019.0,,,0,,,...,,,,,City of Imperial Beach|2019,,$0.00,$0.00,12.0,Other/Unknown
4,,City of Imperial Beach,,,2018.0,,,0,,,...,,,,,City of Imperial Beach|2018,,$0.00,$0.00,7.0,Other/Unknown


In [210]:
unique_program_names = df['Program'].unique()
print(unique_program_names)

[nan 'rental assistance' 'homeless services' 'shelter' 'motel voucher'
 'rapid re housing' 'project h.o.p.e.' 'take back the streets'
 'work for hope' 'housing navigator' 'improve fencing' 'hvac replacements'
 'acquisition of facilility for provision of homeless'
 'railing replacement' 'security fencing' 'outreach' 'scattered site'
 'supportive service  a way back home'
 'general funding for homelessness services' 'housing stability services'
 'housing navigation services' 'hygiene supplies' 'gift cards'
 'city housing support' 'fair housing' 'program development'
 '211 assistance' 'case management'
 'provide emergency housing to imminently homeless, or episodically and chronically homeless individuals and families in the city of santee, and who are unable to access housing during the coronavirus pandemic'
 'provide support for regional homeless service providers, networking and communication for organizations serving and impacted by homeless persons, and building capacity of the east 

In [211]:
program_mapping = {
    'rapid rehousing': 'rapid rehousing program',
    'nousing navigation': 'housing navigation services',
    'provide services to families, abused youth, seniors and veterans experiencing homelessness and domestic violence with housing and wrap around services.': 'flexible funds',
    'homless prevention plan': 'homeless prevention',
    'provide support for regional homeless service providers, networking and communication for organizations serving and impacted by homeless persons, and building capacity of the east county homeless task force': 'flexible funds',
    'motel program': 'motel voucher',
    'navigation center': 'housing navigation services',
    'hsg/econ dev homeless brochure': 'homelessness educational initiatives',
    'postage homelessness education': 'homelessness educational initiatives',
    'homelessness education mailer': 'homelessness educational initiatives',
    'think dignity tsc manual': 'homelessness educational initiatives',
    'homeless ed postcard design': 'homelessness educational initiatives',
}
df['Program'] = df['Program'].str.lower().str.strip()
df['Program'] = df['Program'].replace(program_mapping)

print(df['Program'].unique())

[nan 'rental assistance' 'homeless services' 'shelter' 'motel voucher'
 'rapid re housing' 'project h.o.p.e.' 'take back the streets'
 'work for hope' 'housing navigator' 'improve fencing' 'hvac replacements'
 'acquisition of facilility for provision of homeless'
 'railing replacement' 'security fencing' 'outreach' 'scattered site'
 'supportive service  a way back home'
 'general funding for homelessness services' 'housing stability services'
 'housing navigation services' 'hygiene supplies' 'gift cards'
 'city housing support' 'fair housing' 'program development'
 '211 assistance' 'case management'
 'provide emergency housing to imminently homeless, or episodically and chronically homeless individuals and families in the city of santee, and who are unable to access housing during the coronavirus pandemic'
 'flexible funds' 'point in time count' 'consulting services homeless'
 'communities in action' 'legal services solutions for change'
 'translation homeless plan mtg' 'cdbg consolida

In [212]:
program_mapping = {
    'homelessness prevenetion': 'homelessness prevention', # fixed typo
    'provide emergency housing to imminently homeless, or episodically and chronically homeless individuals and families in the city of santee, and who are unable to access housing during the coronavirus pandemic': 'flexible funds',
    'homeless prevention and intervention': 'homlessness prevention and intervention',
    'rapid rehousing program': 'rapid re housing',
    'homlessness services': 'homeless services',
    'litter removal': 'neighborhood revitalization services',
    'emcampment/trash cleanup': 'neighborhood revitalization services',
    'facility imporvement': 'facility improvement', #fixed typo
    'hvac replacements': 'facility improvement',
    'improve fencing': 'railing/fencing improvements',
    'railing replacement': 'railing/fencing improvements',
    'security fencing': 'railing/fencing improvements',
}
df['Program'] = df['Program'].str.lower().str.strip()
df['Program'] = df['Program'].replace(program_mapping)

print(df['Program'].unique())

[nan 'rental assistance' 'homeless services' 'shelter' 'motel voucher'
 'rapid re housing' 'project h.o.p.e.' 'take back the streets'
 'work for hope' 'housing navigator' 'railing/fencing improvements'
 'facility improvement'
 'acquisition of facilility for provision of homeless' 'outreach'
 'scattered site' 'supportive service  a way back home'
 'general funding for homelessness services' 'housing stability services'
 'housing navigation services' 'hygiene supplies' 'gift cards'
 'city housing support' 'fair housing' 'program development'
 '211 assistance' 'case management' 'flexible funds' 'point in time count'
 'consulting services homeless' 'communities in action'
 'legal services solutions for change' 'translation homeless plan mtg'
 'cdbg consolidated plan consult' 'plha plan 7699835'
 'homeless prevention program' 'bridge to housing program'
 'social worker program' 'homeshare program'
 'homelessness educational initiatives'
 'regional task force homelessness meeting'
 'vista ho

In [213]:
program_mapping = {
    'rapid rehousing services': 'rapid rehousing',
    'homeshare for seniros': 'homeshare for seniors', # typo
    'cortez hill family center interim housing program': 'interim housing facility',
    'homeshare for seniors': 'homeshare program',
    'encampment/trash cleanup': 'neighborhood revitalization services',
    'permanent housing with supportive services and property management': 'fair housing',
    'homlessness prevention and intervention': 'homelessness prevention',
    'general homelessness services': 'homeless services',
    'homeless storage check in center': 'transitional storage',
}
df['Program'] = df['Program'].str.lower().str.strip()
df['Program'] = df['Program'].replace(program_mapping)

print(df['Program'].unique())

[nan 'rental assistance' 'homeless services' 'shelter' 'motel voucher'
 'rapid re housing' 'project h.o.p.e.' 'take back the streets'
 'work for hope' 'housing navigator' 'railing/fencing improvements'
 'facility improvement'
 'acquisition of facilility for provision of homeless' 'outreach'
 'scattered site' 'supportive service  a way back home'
 'general funding for homelessness services' 'housing stability services'
 'housing navigation services' 'hygiene supplies' 'gift cards'
 'city housing support' 'fair housing' 'program development'
 '211 assistance' 'case management' 'flexible funds' 'point in time count'
 'consulting services homeless' 'communities in action'
 'legal services solutions for change' 'translation homeless plan mtg'
 'cdbg consolidated plan consult' 'plha plan 7699835'
 'homeless prevention program' 'bridge to housing program'
 'social worker program' 'homeshare program'
 'homelessness educational initiatives'
 'regional task force homelessness meeting'
 'vista ho

In [214]:
program_mapping = {
    'homeshare for seniors': 'homeshare program',
    'rapid rehousing': 'rapid re housing',
    'afforable housing fund services': 'housing assistance',
    'prevention programs': 'homeless prevention',
    'food drive services': 'food and nutrition',
    'emergency food supplies': 'food and nutrition',
    'bridge housing transitional housing': 'transitional housing and supportive services',
    'turning point transitional living program': 'transitional housing and supportive services',
}
df['Program'] = df['Program'].str.lower().str.strip()
df['Program'] = df['Program'].replace(program_mapping)

print(df['Program'].unique())

[nan 'rental assistance' 'homeless services' 'shelter' 'motel voucher'
 'rapid re housing' 'project h.o.p.e.' 'take back the streets'
 'work for hope' 'housing navigator' 'railing/fencing improvements'
 'facility improvement'
 'acquisition of facilility for provision of homeless' 'outreach'
 'scattered site' 'supportive service  a way back home'
 'general funding for homelessness services' 'housing stability services'
 'housing navigation services' 'hygiene supplies' 'gift cards'
 'city housing support' 'fair housing' 'program development'
 '211 assistance' 'case management' 'flexible funds' 'point in time count'
 'consulting services homeless' 'communities in action'
 'legal services solutions for change' 'translation homeless plan mtg'
 'cdbg consolidated plan consult' 'plha plan 7699835'
 'homeless prevention program' 'bridge to housing program'
 'social worker program' 'homeshare program'
 'homelessness educational initiatives'
 'regional task force homelessness meeting'
 'vista ho

In [215]:
program_mapping = {
    'facility improvements': 'facility improvement',
    'scattered site': 'shelter',
    'restroom rental': 'restrooms', # not sure if these need to be combined but they seem similar
    'homelessness response center services': 'service center',
    'operation of city of san diego day center for homeless adults': 'service center',
    'supportive service  a way back home': 'family reunification program',
    'safetay network program': 'outreach',
}
df['Program'] = df['Program'].str.lower().str.strip()
df['Program'] = df['Program'].replace(program_mapping)

print(df['Program'].unique())

[nan 'rental assistance' 'homeless services' 'shelter' 'motel voucher'
 'rapid re housing' 'project h.o.p.e.' 'take back the streets'
 'work for hope' 'housing navigator' 'railing/fencing improvements'
 'facility improvement'
 'acquisition of facilility for provision of homeless' 'outreach'
 'family reunification program'
 'general funding for homelessness services' 'housing stability services'
 'housing navigation services' 'hygiene supplies' 'gift cards'
 'city housing support' 'fair housing' 'program development'
 '211 assistance' 'case management' 'flexible funds' 'point in time count'
 'consulting services homeless' 'communities in action'
 'legal services solutions for change' 'translation homeless plan mtg'
 'cdbg consolidated plan consult' 'plha plan 7699835'
 'homeless prevention program' 'bridge to housing program'
 'social worker program' 'homeshare program'
 'homelessness educational initiatives'
 'regional task force homelessness meeting'
 'vista homeless prevention and ec

In [216]:
program_mapping = {
    'hygiene supplies': 'homeless services',
    'city housing support': 'housing assistance',
    'animal care': 'homeless services',
    'low income housing services': 'homeless services',
    'interprovider networking and program facilitation': 'flexible funds',
    'homeless prevention program': 'homeless prevention',
}
df['Program'] = df['Program'].str.lower().str.strip()
df['Program'] = df['Program'].replace(program_mapping)

print(df['Program'].unique())

[nan 'rental assistance' 'homeless services' 'shelter' 'motel voucher'
 'rapid re housing' 'project h.o.p.e.' 'take back the streets'
 'work for hope' 'housing navigator' 'railing/fencing improvements'
 'facility improvement'
 'acquisition of facilility for provision of homeless' 'outreach'
 'family reunification program'
 'general funding for homelessness services' 'housing stability services'
 'housing navigation services' 'gift cards' 'housing assistance'
 'fair housing' 'program development' '211 assistance' 'case management'
 'flexible funds' 'point in time count' 'consulting services homeless'
 'communities in action' 'legal services solutions for change'
 'translation homeless plan mtg' 'cdbg consolidated plan consult'
 'plha plan 7699835' 'homeless prevention' 'bridge to housing program'
 'social worker program' 'homeshare program'
 'homelessness educational initiatives'
 'regional task force homelessness meeting'
 'vista homeless prevention and economic recovery project'
 'hom

In [217]:
program_mapping = {
    'meal delivery for seniors': 'homeless services',
    'housing homeless assistance program (hhap) housing navigaiton/ casem management': 'flexible funds',
    'd76 housing prevention and intervention program': 'housing assistance',
    'vista homeless prevention and economic recovery project': 'flexible funds',
    'gift cards': 'homeless services', 
    'regional task force homelessness meeting': 'homeless prevention',
    'interim housing services for downtown chronically homeless': 'interim housing facility',
}
df['Program'] = df['Program'].str.lower().str.strip()
df['Program'] = df['Program'].replace(program_mapping)

print(df['Program'].unique())

[nan 'rental assistance' 'homeless services' 'shelter' 'motel voucher'
 'rapid re housing' 'project h.o.p.e.' 'take back the streets'
 'work for hope' 'housing navigator' 'railing/fencing improvements'
 'facility improvement'
 'acquisition of facilility for provision of homeless' 'outreach'
 'family reunification program'
 'general funding for homelessness services' 'housing stability services'
 'housing navigation services' 'housing assistance' 'fair housing'
 'program development' '211 assistance' 'case management' 'flexible funds'
 'point in time count' 'consulting services homeless'
 'communities in action' 'legal services solutions for change'
 'translation homeless plan mtg' 'cdbg consolidated plan consult'
 'plha plan 7699835' 'homeless prevention' 'bridge to housing program'
 'social worker program' 'homeshare program'
 'homelessness educational initiatives' 'homelessness prevention'
 'training/technical assistance' 'restrooms' 'service center'
 'neighborhood revitalization ser

In [218]:
program_mapping = {
    'womens resource center transitional housing': 'transitional housing and supportive services',
    'fair housing': 'housing assistance', # not sure about this one
    'clinical social worker': 'case management',
    'program development': 'general funding for homelessness services',
    'social worker program': 'case management',
    'covid 19 homeless response full time caseworker': 'case management',
    'acquisition of facilility for provision of homeless': 'housing assistance',
}
df['Program'] = df['Program'].str.lower().str.strip()
df['Program'] = df['Program'].replace(program_mapping)

print(df['Program'].unique())

[nan 'rental assistance' 'homeless services' 'shelter' 'motel voucher'
 'rapid re housing' 'project h.o.p.e.' 'take back the streets'
 'work for hope' 'housing navigator' 'railing/fencing improvements'
 'facility improvement' 'housing assistance' 'outreach'
 'family reunification program'
 'general funding for homelessness services' 'housing stability services'
 'housing navigation services' '211 assistance' 'case management'
 'flexible funds' 'point in time count' 'consulting services homeless'
 'communities in action' 'legal services solutions for change'
 'translation homeless plan mtg' 'cdbg consolidated plan consult'
 'plha plan 7699835' 'homeless prevention' 'bridge to housing program'
 'homeshare program' 'homelessness educational initiatives'
 'homelessness prevention' 'training/technical assistance' 'restrooms'
 'service center' 'neighborhood revitalization services'
 'employment/benefits' 'temporary housing and services'
 'food and nutrition' 'emergency stabalization and supp

In [219]:
program_mapping = {
    'housing navigator': 'housing navigation services',
    'homeless prevention': 'homelessness prevention',
}
df['Program'] = df['Program'].str.lower().str.strip()
df['Program'] = df['Program'].replace(program_mapping)

print(df['Program'].unique())

[nan 'rental assistance' 'homeless services' 'shelter' 'motel voucher'
 'rapid re housing' 'project h.o.p.e.' 'take back the streets'
 'work for hope' 'housing navigation services'
 'railing/fencing improvements' 'facility improvement'
 'housing assistance' 'outreach' 'family reunification program'
 'general funding for homelessness services' 'housing stability services'
 '211 assistance' 'case management' 'flexible funds' 'point in time count'
 'consulting services homeless' 'communities in action'
 'legal services solutions for change' 'translation homeless plan mtg'
 'cdbg consolidated plan consult' 'plha plan 7699835'
 'homelessness prevention' 'bridge to housing program' 'homeshare program'
 'homelessness educational initiatives' 'training/technical assistance'
 'restrooms' 'service center' 'neighborhood revitalization services'
 'employment/benefits' 'temporary housing and services'
 'food and nutrition' 'emergency stabalization and supportive services'
 'call center' 'safe parki

In [220]:
# df.to_csv('../data/processed/updated_dataset.csv', index=False)  # index=False prevents saving row numbers

## `expenditures_and_PIT.csv`

Input datasets: 
- `updated_dataset.csv`
- `PITCount.csv`

In [221]:
#exp = pd.read_csv('../data/processed/updated_dataset.csv')
exp = df
exp.head()

Unnamed: 0,Unique.ID,Grantor,Grantee,Program,Year,Date,EndDate,Amount,AmendmentNumber,Funding.Agency,...,Issued,Funding.Type,Years,Average.By.Year,City.Year,Population,Amount.Per.Capita,Amount.Per.PEH,Population.PEH,ExpenditureType
0,,City of Imperial Beach,,,2022.0,,,0,,,...,,,,,City of Imperial Beach|2022,,$0.00,$0.00,0.0,Other/Unknown
1,,City of Imperial Beach,,,2021.0,,,0,,,...,,,,,City of Imperial Beach|2021,,$0.00,$0.00,0.0,Other/Unknown
2,,City of Imperial Beach,,,2020.0,,,0,,,...,,,,,City of Imperial Beach|2020,,$0.00,$0.00,16.0,Other/Unknown
3,,City of Imperial Beach,,,2019.0,,,0,,,...,,,,,City of Imperial Beach|2019,,$0.00,$0.00,12.0,Other/Unknown
4,,City of Imperial Beach,,,2018.0,,,0,,,...,,,,,City of Imperial Beach|2018,,$0.00,$0.00,7.0,Other/Unknown


In [222]:
# Remove NAs in the Date column
exp = exp[exp['Date'].notna()]
exp.head()

Unnamed: 0,Unique.ID,Grantor,Grantee,Program,Year,Date,EndDate,Amount,AmendmentNumber,Funding.Agency,...,Issued,Funding.Type,Years,Average.By.Year,City.Year,Population,Amount.Per.Capita,Amount.Per.PEH,Population.PEH,ExpenditureType
8,,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,,"$14,142.74",,,...,,,,,City of Chula Vista|2017,266427.0,$0.05,$38.54,367.0,Prevention
9,,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,,"$12,931.10",,,...,,,,,City of Chula Vista|2017,266427.0,$0.05,$35.23,367.0,Prevention
10,,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,,"$10,285.61",,,...,,,,,City of Chula Vista|2017,266427.0,$0.04,$28.03,367.0,Prevention
11,1068.0,City of Chula Vista,SBCS CORPORATION,homeless services,2017.0,11/7/2017,,$276.00,,,...,,,,,City of Chula Vista|2017,266427.0,$0.00,$0.75,367.0,Crisis Management
12,1067.0,City of Chula Vista,INTERFAITH SHELTER NETWORK,shelter,2017.0,11/8/2017,,"$1,971.50",,,...,,,,,City of Chula Vista|2017,266427.0,$0.01,$5.37,367.0,Crisis Management


In [223]:
exp.columns

Index(['Unique.ID', 'Grantor', 'Grantee', 'Program', 'Year', 'Date', 'EndDate',
       'Amount', 'AmendmentNumber', 'Funding.Agency', 'Funding.Source',
       'Category', 'Location', 'Issued', 'Funding.Type', 'Years',
       'Average.By.Year', 'City.Year', 'Population', 'Amount.Per.Capita',
       'Amount.Per.PEH', 'Population.PEH', 'ExpenditureType'],
      dtype='object')

In [224]:
columns_to_drop = ['Unique.ID', 'EndDate', 'AmendmentNumber', 'Funding.Agency', 'Funding.Source',
       'Category', 'Location', 'Issued', 'Funding.Type', 'Years',
       'Average.By.Year', 'City.Year', 'Population', 'Amount.Per.Capita',
       'Amount.Per.PEH', 'Population.PEH']
exp = exp.drop(columns=columns_to_drop)
exp.head()

Unnamed: 0,Grantor,Grantee,Program,Year,Date,Amount,ExpenditureType
8,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,"$14,142.74",Prevention
9,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,"$12,931.10",Prevention
10,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,"$10,285.61",Prevention
11,City of Chula Vista,SBCS CORPORATION,homeless services,2017.0,11/7/2017,$276.00,Crisis Management
12,City of Chula Vista,INTERFAITH SHELTER NETWORK,shelter,2017.0,11/8/2017,"$1,971.50",Crisis Management


In [225]:
# Rename the columns
exp = exp.rename(columns={
    'Grantor': 'city', 
    'Grantee': 'grantee', 
    'Program': 'program', 
    'Year': 'year', 
    'Date': 'date', 
    'Amount': 'amount', 
    'ExpenditureType': 'exp_type'
})

# Try converting the 'date' column to datetime, allowing for mixed formats and coercing errors
exp['date'] = pd.to_datetime(exp['date'], format='mixed', errors='coerce')

# Check if any rows have invalid dates (i.e., rows where 'date' is NaT after coercion)
invalid_dates = exp[exp['date'].isna()]

# Add the "month" column with month names in word form
exp['month'] = exp['date'].dt.strftime('%B')

In [226]:
exp['program'].unique()

array(['rental assistance', 'homeless services', 'shelter',
       'motel voucher', 'rapid re housing', 'project h.o.p.e.',
       'take back the streets', 'work for hope',
       'housing navigation services', 'railing/fencing improvements',
       'facility improvement', 'housing assistance', 'outreach',
       'family reunification program',
       'general funding for homelessness services',
       'housing stability services', '211 assistance', 'case management',
       'flexible funds', 'point in time count',
       'consulting services homeless', 'communities in action',
       'legal services solutions for change',
       'translation homeless plan mtg', 'cdbg consolidated plan consult',
       'plha plan 7699835', 'homelessness prevention',
       'bridge to housing program', 'homeshare program',
       'homelessness educational initiatives',
       'training/technical assistance', 'restrooms', 'service center',
       'neighborhood revitalization services', 'employment/benefi

In [227]:
#cleanexp = pd.read_csv('../data/processed/updated_dataset.csv')
cleanexp = df
cleanexp.head()

Unnamed: 0,Unique.ID,Grantor,Grantee,Program,Year,Date,EndDate,Amount,AmendmentNumber,Funding.Agency,...,Issued,Funding.Type,Years,Average.By.Year,City.Year,Population,Amount.Per.Capita,Amount.Per.PEH,Population.PEH,ExpenditureType
0,,City of Imperial Beach,,,2022.0,,,0,,,...,,,,,City of Imperial Beach|2022,,$0.00,$0.00,0.0,Other/Unknown
1,,City of Imperial Beach,,,2021.0,,,0,,,...,,,,,City of Imperial Beach|2021,,$0.00,$0.00,0.0,Other/Unknown
2,,City of Imperial Beach,,,2020.0,,,0,,,...,,,,,City of Imperial Beach|2020,,$0.00,$0.00,16.0,Other/Unknown
3,,City of Imperial Beach,,,2019.0,,,0,,,...,,,,,City of Imperial Beach|2019,,$0.00,$0.00,12.0,Other/Unknown
4,,City of Imperial Beach,,,2018.0,,,0,,,...,,,,,City of Imperial Beach|2018,,$0.00,$0.00,7.0,Other/Unknown


In [228]:
cleanexp['Program'].unique()

array([nan, 'rental assistance', 'homeless services', 'shelter',
       'motel voucher', 'rapid re housing', 'project h.o.p.e.',
       'take back the streets', 'work for hope',
       'housing navigation services', 'railing/fencing improvements',
       'facility improvement', 'housing assistance', 'outreach',
       'family reunification program',
       'general funding for homelessness services',
       'housing stability services', '211 assistance', 'case management',
       'flexible funds', 'point in time count',
       'consulting services homeless', 'communities in action',
       'legal services solutions for change',
       'translation homeless plan mtg', 'cdbg consolidated plan consult',
       'plha plan 7699835', 'homelessness prevention',
       'bridge to housing program', 'homeshare program',
       'homelessness educational initiatives',
       'training/technical assistance', 'restrooms', 'service center',
       'neighborhood revitalization services', 'employment/b

In [229]:
prog_to_drop = ['point in time count','railing/fencing improvements','facility improvement','training/technical assistance']
cleanexp = cleanexp[~cleanexp['Program'].isin(prog_to_drop)]
cleanexp = cleanexp.dropna(subset=['Program'])
cleanexp

Unnamed: 0,Unique.ID,Grantor,Grantee,Program,Year,Date,EndDate,Amount,AmendmentNumber,Funding.Agency,...,Issued,Funding.Type,Years,Average.By.Year,City.Year,Population,Amount.Per.Capita,Amount.Per.PEH,Population.PEH,ExpenditureType
8,,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,,"$14,142.74",,,...,,,,,City of Chula Vista|2017,266427.0,$0.05,$38.54,367.0,Prevention
9,,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,,"$12,931.10",,,...,,,,,City of Chula Vista|2017,266427.0,$0.05,$35.23,367.0,Prevention
10,,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,,"$10,285.61",,,...,,,,,City of Chula Vista|2017,266427.0,$0.04,$28.03,367.0,Prevention
11,1068,City of Chula Vista,SBCS CORPORATION,homeless services,2017.0,11/7/2017,,$276.00,,,...,,,,,City of Chula Vista|2017,266427.0,$0.00,$0.75,367.0,Crisis Management
12,1067,City of Chula Vista,INTERFAITH SHELTER NETWORK,shelter,2017.0,11/8/2017,,"$1,971.50",,,...,,,,,City of Chula Vista|2017,266427.0,$0.01,$5.37,367.0,Crisis Management
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2026,1066,City of Chula Vista,ONE TIME VENDOR,motel voucher,2021.0,12/23/2021,,$560.30,,CDBG,...,,,,,,,,,,Crisis Management
2027,,City of Chula Vista,KIKU GARDENS,rental assistance,2021.0,12/28/2021,,$212.00,,HOME,...,,,,,,,,,,Prevention
2028,,City of Chula Vista,KIKU GARDENS,rental assistance,2021.0,12/28/2021,,$212.00,,HOME,...,,,,,,,,,,Prevention
2029,,City of Chula Vista,KIKU GARDENS,rental assistance,2021.0,12/28/2021,,$212.00,,HOME,...,,,,,,,,,,Prevention


In [230]:
cleanexp.columns

Index(['Unique.ID', 'Grantor', 'Grantee', 'Program', 'Year', 'Date', 'EndDate',
       'Amount', 'AmendmentNumber', 'Funding.Agency', 'Funding.Source',
       'Category', 'Location', 'Issued', 'Funding.Type', 'Years',
       'Average.By.Year', 'City.Year', 'Population', 'Amount.Per.Capita',
       'Amount.Per.PEH', 'Population.PEH', 'ExpenditureType'],
      dtype='object')

In [231]:
col_to_drop = ['Unique.ID', 'EndDate', 'AmendmentNumber', 'Funding.Agency', 'Funding.Source', 'Category','Location', 'Issued', 'Funding.Type', 'Years','Average.By.Year', 'City.Year', 'Population', 'Amount.Per.Capita','Amount.Per.PEH', 'Population.PEH']
cleanexp = cleanexp.drop(columns=col_to_drop)
cleanexp.head()

Unnamed: 0,Grantor,Grantee,Program,Year,Date,Amount,ExpenditureType
8,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,"$14,142.74",Prevention
9,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,"$12,931.10",Prevention
10,City of Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,"$10,285.61",Prevention
11,City of Chula Vista,SBCS CORPORATION,homeless services,2017.0,11/7/2017,$276.00,Crisis Management
12,City of Chula Vista,INTERFAITH SHELTER NETWORK,shelter,2017.0,11/8/2017,"$1,971.50",Crisis Management


In [232]:
cleanexp['Program'].unique()

array(['rental assistance', 'homeless services', 'shelter',
       'motel voucher', 'rapid re housing', 'project h.o.p.e.',
       'take back the streets', 'work for hope',
       'housing navigation services', 'housing assistance', 'outreach',
       'family reunification program',
       'general funding for homelessness services',
       'housing stability services', '211 assistance', 'case management',
       'flexible funds', 'consulting services homeless',
       'communities in action', 'legal services solutions for change',
       'translation homeless plan mtg', 'cdbg consolidated plan consult',
       'plha plan 7699835', 'homelessness prevention',
       'bridge to housing program', 'homeshare program',
       'homelessness educational initiatives', 'restrooms',
       'service center', 'neighborhood revitalization services',
       'employment/benefits', 'temporary housing and services',
       'food and nutrition',
       'emergency stabalization and supportive services', 

In [233]:

consolidation_map = {
    'flexible funds': ['general funding for homelessness services', 'flexible funds'],
    'transitional housing': ['temporary housing and services', 'transitional storage', 'transitional housing and supportive services', 'interim housing facility'],
    'emergency shelter': ['shelter', 'emergency stabilization and supportive services'],
    'staff and operations': ['211 assistance', 'case management', 'translation homeless plan mtg', 'cdbg consolidated plan consult', 'call center', 'employment/benefits']
}

cleanexp['Program'] = cleanexp['Program'].replace('emergency stabalization and supportive services', 'emergency shelter')

def consolidate_program(program):
    for consolidated_name, keywords in consolidation_map.items():
        if any(keyword in program.lower() for keyword in keywords):
            return consolidated_name
    return program  # Return original if no match

cleanexp['Program'] = cleanexp['Program'].apply(consolidate_program)
cleanexp['Program'].unique()

array(['rental assistance', 'homeless services', 'emergency shelter',
       'motel voucher', 'rapid re housing', 'project h.o.p.e.',
       'take back the streets', 'work for hope',
       'housing navigation services', 'housing assistance', 'outreach',
       'family reunification program', 'flexible funds',
       'housing stability services', 'staff and operations',
       'consulting services homeless', 'communities in action',
       'legal services solutions for change', 'plha plan 7699835',
       'homelessness prevention', 'bridge to housing program',
       'homeshare program', 'homelessness educational initiatives',
       'restrooms', 'service center',
       'neighborhood revitalization services', 'transitional housing',
       'food and nutrition', 'safe parking', 'opening doors program',
       'bridge to housing network', 'contract sobering services'],
      dtype=object)

Merging `updated_dataset.csv` with `PITCount.csv`

In [234]:
PIT = pd.read_csv('../data/raw/PITCount.csv')
PIT.head()

Unnamed: 0,City,Year,Total PEH,Unsheltered PEH,Population,"PEH Per 100,000","Unsheltered Per 100,000",Latitude,Longitude
0,Carlsbad,2015,88,21,112662,78.109744,18.639825,33.158092,-117.350594
1,Chula Vista,2015,498,321,264206,188.489285,121.496105,32.639954,-117.106705
2,El Cajon,2015,711,191,103230,688.753269,185.023733,32.794773,-116.962524
3,Encinitas,2015,123,80,62590,196.517016,127.815945,33.039139,-117.295425
4,Escondido,2015,430,112,150683,285.367294,74.328225,33.124722,-117.080833


In [235]:
cleanexp = cleanexp.rename(columns={
    'Grantor': 'City',
    'Grantee': 'Grantee',
    'Program': 'Program',
    'Year': 'Year',
    'Date': 'Date',
    'Amount': 'Amount',
    'ExpenditureType': 'ExpenditureType'
})

In [236]:
cleanexp['City'] = cleanexp['City'].str.replace('City of ', '')
cleanexp.head()

Unnamed: 0,City,Grantee,Program,Year,Date,Amount,ExpenditureType
8,Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,"$14,142.74",Prevention
9,Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,"$12,931.10",Prevention
10,Chula Vista,SBCS CORPORATION,rental assistance,2017.0,10/23/2017,"$10,285.61",Prevention
11,Chula Vista,SBCS CORPORATION,homeless services,2017.0,11/7/2017,$276.00,Crisis Management
12,Chula Vista,INTERFAITH SHELTER NETWORK,emergency shelter,2017.0,11/8/2017,"$1,971.50",Crisis Management


In [237]:
cleanexp = cleanexp[~cleanexp['Year'].isna()]
cleanexp['Year'] = cleanexp['Year'].astype(int)
cleanexp.head()

Unnamed: 0,City,Grantee,Program,Year,Date,Amount,ExpenditureType
8,Chula Vista,SBCS CORPORATION,rental assistance,2017,10/23/2017,"$14,142.74",Prevention
9,Chula Vista,SBCS CORPORATION,rental assistance,2017,10/23/2017,"$12,931.10",Prevention
10,Chula Vista,SBCS CORPORATION,rental assistance,2017,10/23/2017,"$10,285.61",Prevention
11,Chula Vista,SBCS CORPORATION,homeless services,2017,11/7/2017,$276.00,Crisis Management
12,Chula Vista,INTERFAITH SHELTER NETWORK,emergency shelter,2017,11/8/2017,"$1,971.50",Crisis Management


In [238]:
cleanexp['City'] = cleanexp['City'].replace('SDHC', 'San Diego')
PIT['Year'] -= 1 # Offset `Year` in `PIT` to have PIT assosiate with next "Observation" of PIT count of PEH
df = pd.merge(PIT, cleanexp, on=['City', 'Year'], how='inner')
df.head()

Unnamed: 0,City,Year,Total PEH,Unsheltered PEH,Population,"PEH Per 100,000","Unsheltered Per 100,000",Latitude,Longitude,Grantee,Program,Date,Amount,ExpenditureType
0,El Cajon,2014,711,191,103230,688.753269,185.023733,32.794773,-116.962524,East County Transitional Living Center,emergency shelter,7/1/2013,"$75,000.00",Crisis Management
1,El Cajon,2014,711,191,103230,688.753269,185.023733,32.794773,-116.962524,East County Transitional Living Center,emergency shelter,7/1/2013,"$75,000.00",Crisis Management
2,El Cajon,2015,321,218,103527,310.064041,210.573087,32.794773,-116.962524,East County Transitional Living Center,emergency shelter,7/1/2014,"$120,000.00",Crisis Management
3,El Cajon,2015,321,218,103527,310.064041,210.573087,32.794773,-116.962524,East County Transitional Living Center,emergency shelter,7/1/2014,"$120,000.00",Crisis Management
4,San Marcos,2015,99,44,94932,104.285173,46.348966,33.1385,-117.1688,North County Lifeline,housing assistance,,21600,Prevention


In [239]:
df = df.drop(columns= ['Latitude', 'Longitude', 'Date'])
df['Amount'] = df['Amount'].str.replace('$', '', regex=False)
df['Amount'] = df['Amount'].str.replace(',', '',regex=False)
df.head()

Unnamed: 0,City,Year,Total PEH,Unsheltered PEH,Population,"PEH Per 100,000","Unsheltered Per 100,000",Grantee,Program,Amount,ExpenditureType
0,El Cajon,2014,711,191,103230,688.753269,185.023733,East County Transitional Living Center,emergency shelter,75000.0,Crisis Management
1,El Cajon,2014,711,191,103230,688.753269,185.023733,East County Transitional Living Center,emergency shelter,75000.0,Crisis Management
2,El Cajon,2015,321,218,103527,310.064041,210.573087,East County Transitional Living Center,emergency shelter,120000.0,Crisis Management
3,El Cajon,2015,321,218,103527,310.064041,210.573087,East County Transitional Living Center,emergency shelter,120000.0,Crisis Management
4,San Marcos,2015,99,44,94932,104.285173,46.348966,North County Lifeline,housing assistance,21600.0,Prevention


In [240]:
df['Year'] = df['Year'].replace(0, np.nan)
df = df.dropna()


df.to_csv('../data/processed/expenditures_and_PIT.csv', index=False)

## `pivoted_dataset.csv`
Input datasets:
- `expenditures_and_PIT.csv`

In [241]:
df = pd.read_csv("../data/processed/expenditures_and_pit.csv")
df.head()

Unnamed: 0,City,Year,Total PEH,Unsheltered PEH,Population,"PEH Per 100,000","Unsheltered Per 100,000",Grantee,Program,Amount,ExpenditureType
0,El Cajon,2014,711,191,103230,688.753269,185.023733,East County Transitional Living Center,emergency shelter,75000.0,Crisis Management
1,El Cajon,2014,711,191,103230,688.753269,185.023733,East County Transitional Living Center,emergency shelter,75000.0,Crisis Management
2,El Cajon,2015,321,218,103527,310.064041,210.573087,East County Transitional Living Center,emergency shelter,120000.0,Crisis Management
3,El Cajon,2015,321,218,103527,310.064041,210.573087,East County Transitional Living Center,emergency shelter,120000.0,Crisis Management
4,San Marcos,2015,99,44,94932,104.285173,46.348966,North County Lifeline,housing assistance,21600.0,Prevention


In [242]:
# Grouping by City, Year, and Program, then aggregate the total amount spent
pivoted_df = df.groupby(['City', 'Year', 'Program'])['Amount'].sum().reset_index()

pivoted_df = pivoted_df.pivot(index=['City', 'Year'], columns='Program', values='Amount').fillna(0).reset_index()

In [243]:
# noticed there were a lot of zeroes as some programs only occur once or twice throughout the entire dataset
#pivoted_df.to_csv('../data/processed/pivoted_dataset.csv', index=False)

## `pivoted_and_PIT.csv`
Input datasets:
- `pivoted_dataset.csv`
- `PITCount.csv` (as cleaned in `expenditures_and_PIT.csv` cleaning)

In [244]:
#pivoted = pd.read_csv('../data/processed/pivoted_dataset.csv')
pivoted = pivoted_df
pivoted.head()

Program,City,Year,bridge to housing network,emergency shelter,family reunification program,flexible funds,food and nutrition,homeless services,homelessness prevention,homeshare program,...,project h.o.p.e.,rapid re housing,rental assistance,restrooms,safe parking,service center,staff and operations,take back the streets,transitional housing,work for hope
0,Carlsbad,2017,0.0,14896.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Carlsbad,2018,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,122472.0,0.0,0.0,0.0
2,Carlsbad,2019,0.0,0.0,0.0,0.0,0.0,20000.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,122472.0,0.0,0.0,0.0
3,Carlsbad,2021,0.0,52574.0,0.0,0.0,0.0,419583.25,0.0,0.0,...,0.0,350000.0,41211.0,125000.0,0.0,204217.0,40000.0,0.0,0.0,0.0
4,Carlsbad,2022,0.0,202698.0,0.0,0.0,0.0,29825.0,0.0,0.0,...,0.0,350000.0,42396.0,25000.0,0.0,204217.0,180000.0,0.0,0.0,0.0


In [245]:
PIT = pd.read_csv('../data/raw/PITCount.csv')
PIT.head()

Unnamed: 0,City,Year,Total PEH,Unsheltered PEH,Population,"PEH Per 100,000","Unsheltered Per 100,000",Latitude,Longitude
0,Carlsbad,2015,88,21,112662,78.109744,18.639825,33.158092,-117.350594
1,Chula Vista,2015,498,321,264206,188.489285,121.496105,32.639954,-117.106705
2,El Cajon,2015,711,191,103230,688.753269,185.023733,32.794773,-116.962524
3,Encinitas,2015,123,80,62590,196.517016,127.815945,33.039139,-117.295425
4,Escondido,2015,430,112,150683,285.367294,74.328225,33.124722,-117.080833


In [246]:
PIT['Year'] -= 1
df = pd.merge(pivoted, PIT, on=['Year', 'City'], how='left')
df = df.drop(columns= ['Latitude', 'Longitude'])
df.head()

Unnamed: 0,City,Year,bridge to housing network,emergency shelter,family reunification program,flexible funds,food and nutrition,homeless services,homelessness prevention,homeshare program,...,service center,staff and operations,take back the streets,transitional housing,work for hope,Total PEH,Unsheltered PEH,Population,"PEH Per 100,000","Unsheltered Per 100,000"
0,Carlsbad,2017,0.0,14896.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,210,152,115518,181.789851,131.581225
1,Carlsbad,2018,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,122472.0,0.0,0.0,0.0,161,102,115382,139.536496,88.402004
2,Carlsbad,2019,0.0,0.0,0.0,0.0,0.0,20000.0,0.0,0.0,...,0.0,122472.0,0.0,0.0,0.0,148,94,114747,128.979407,81.919353
3,Carlsbad,2021,0.0,52574.0,0.0,0.0,0.0,419583.25,0.0,0.0,...,204217.0,40000.0,0.0,0.0,0.0,118,75,114160,103.3637,65.697267
4,Carlsbad,2022,0.0,202698.0,0.0,0.0,0.0,29825.0,0.0,0.0,...,204217.0,180000.0,0.0,0.0,0.0,103,60,113792,90.516029,52.727784


In [247]:
# Define the columns you want at the front
front_columns = ['Year', 'City', 'Total PEH', 'Unsheltered PEH', 'Population', 'PEH Per 100,000', 'Unsheltered Per 100,000']

# Reorder the DataFrame
df = df[front_columns + [col for col in df.columns if col not in front_columns]]

df.to_csv('../data/processed/pivoted_and_PIT.csv', index=False)

## `pivoted_pit_grantee.csv`
Input datasets:
- `pivoted_and_PIT`
- `expenditures_and_PIT.csv`

In [248]:
pivoted_df = pd.read_csv("../data/processed/pivoted_and_PIT.csv")
pivoted_df.head()

Unnamed: 0,Year,City,Total PEH,Unsheltered PEH,Population,"PEH Per 100,000","Unsheltered Per 100,000",bridge to housing network,emergency shelter,family reunification program,...,project h.o.p.e.,rapid re housing,rental assistance,restrooms,safe parking,service center,staff and operations,take back the streets,transitional housing,work for hope
0,2017,Carlsbad,210,152,115518,181.789851,131.581225,0.0,14896.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2018,Carlsbad,161,102,115382,139.536496,88.402004,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,122472.0,0.0,0.0,0.0
2,2019,Carlsbad,148,94,114747,128.979407,81.919353,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,122472.0,0.0,0.0,0.0
3,2021,Carlsbad,118,75,114160,103.3637,65.697267,0.0,52574.0,0.0,...,0.0,350000.0,41211.0,125000.0,0.0,204217.0,40000.0,0.0,0.0,0.0
4,2022,Carlsbad,103,60,113792,90.516029,52.727784,0.0,202698.0,0.0,...,0.0,350000.0,42396.0,25000.0,0.0,204217.0,180000.0,0.0,0.0,0.0


In [249]:
original_df = pd.read_csv("../data/processed/expenditures_and_pit.csv")
original_df.head()

Unnamed: 0,City,Year,Total PEH,Unsheltered PEH,Population,"PEH Per 100,000","Unsheltered Per 100,000",Grantee,Program,Amount,ExpenditureType
0,El Cajon,2014,711,191,103230,688.753269,185.023733,East County Transitional Living Center,emergency shelter,75000.0,Crisis Management
1,El Cajon,2014,711,191,103230,688.753269,185.023733,East County Transitional Living Center,emergency shelter,75000.0,Crisis Management
2,El Cajon,2015,321,218,103527,310.064041,210.573087,East County Transitional Living Center,emergency shelter,120000.0,Crisis Management
3,El Cajon,2015,321,218,103527,310.064041,210.573087,East County Transitional Living Center,emergency shelter,120000.0,Crisis Management
4,San Marcos,2015,99,44,94932,104.285173,46.348966,North County Lifeline,housing assistance,21600.0,Prevention


In [250]:
# merging datasets to add 'Grantee' as a column on the pivoted dataset that contains pit count
try:
    merged_df = pivoted_df.merge(original_df[['City', 'Year', 'Grantee']].drop_duplicates(), on=['City', 'Year'], how='left')
    print("\nMerged Data:")
    print(merged_df.head())
except TypeError as e:
    print("TypeError:", e)


Merged Data:
   Year      City  Total PEH  Unsheltered PEH  Population  PEH Per 100,000  \
0  2017  Carlsbad        210              152      115518       181.789851   
1  2018  Carlsbad        161              102      115382       139.536496   
2  2018  Carlsbad        161              102      115382       139.536496   
3  2018  Carlsbad        161              102      115382       139.536496   
4  2019  Carlsbad        148               94      114747       128.979407   

   Unsheltered Per 100,000  bridge to housing network  emergency shelter  \
0               131.581225                        0.0            14896.0   
1                88.402004                        0.0                0.0   
2                88.402004                        0.0                0.0   
3                88.402004                        0.0                0.0   
4                81.919353                        0.0                0.0   

   family reunification program  ...  rapid re housing  rent

In [251]:
# moving grantee over to the third row so its more accessible
columns = merged_df.columns.tolist()

columns.insert(2, columns.pop(columns.index('Grantee'))) 
merged_df = merged_df[columns] 

merged_df.to_csv('../data/processed/pivoted_pit_grantee.csv', index=False)  

## Acknowledgement of Use of Generative AI

During the preparation of this work the authors used ChatGPT as a coding assistant. After using this tool, the authors identified and reviewed the content as needed and takes full responsibility for the content of the code and resulting processed data.
