### Introduction:

The problem addressed is the lack of affordable and sustainable housing, particularly in California, which impacts individuals' financial stability, health, and overall well-being. This issue is significant as it disproportionately affects low-income families and minority communities, exacerbating existing disparities.

To address this problem, I plan to utilize US datasets such as the Affordable Housing and Sustainable Communities Awards data and housing cost information from sources like HUD and the Census Bureau. Through exploratory data analysis (EDA) techniques like correlation matrixes, pairplots and other data visualizations, I aim to identify patterns and relationships between sustainable housing initiatives, affordability measures, and their impacts.

By employing this approach, we can gain valuable insights into the effectiveness of sustainable housing initiatives in different regions of California. Understanding the correlations between award amounts, project components, greenhouse gas reductions, and housing affordability metrics can inform policymakers, urban planners, and housing developers about which strategies are most effective in promoting both affordability and environmental sustainability. This analysis could lead to more informed decision-making and targeted interventions to address housing challenges in California and beyond.


### The modules used and why they are necessary:

1. **numpy (np):**
   - For numerical operations and efficient array handling.
        <br><br>
2. **pandas (pd):**
   - Data manipulation and analysis with structured data.
<br><br>
3. **matplotlib.pyplot (plt):**
   - Creating static and interactive visualizations easily.
<br><br>
4. **seaborn (sns):**
   - High-level interface for attractive statistical graphics.

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Here is the source of each dataset, along with an explanation of its original purpose, collection timeframe, the number of variables in the original dataset, and any peculiarities with the data:

1. **California Affordable Housing and Sustainable Communities**  
   - **Source**: [Data.ca.gov](https://data.ca.gov/dataset/california-affordable-housing-and-sustainable-communities)
   - **Original Purpose**: To provide information on Affordable Housing and Sustainable Communities Awards in California, including project details, award amounts, and associated benefits. This data is essential for tracking investments in affordable housing and sustainable development initiatives.
   - **Collection Time**: Created(	August 10, 2019, 3:29 AM (UTC-07:00)) 
   - **Number of Variables**: Multiple, including project name, developer, award date, project location, funding allocations, greenhouse gas reductions, and more.
   - **Peculiarities**: Data includes monetary values, categorical variables for project types and eligibility criteria. Everything except PIN is an object dtype.
<br><br>
1. **Housing Cost Burden**  
   - **Source**: [Data.ca.gov](https://data.ca.gov/dataset/housing-cost-burden)
   - **Original Purpose**: To provide data on the percentage of households paying more than 30% (or 50%) of their monthly household income towards housing costs for California, its regions, counties, cities/towns, and census tracts. This data is vital for understanding housing affordability issues and their impact on health and well-being.
   - **Collection Time**: Created(December 4, 2023, 3:11 PM (UTC-08:00))
   - **Number of Variables**: Multiple, including household income, housing costs, location identifiers, and more.
   - **Peculiarities**: Data sourced from U.S. Department of Housing and Urban Development (HUD) and U.S. Census Bureau. 30% versus 50% burden. NaN entries. Mixed DTypes warning. Kernel keeps dying when running .info()
<br><br>
2. **Housing Element Compliance Report**  
   - **Source**: [Data.ca.gov](https://data.ca.gov/dataset/housing-element-compliance-report)
   - **Original Purpose**: To track the compliance status of local government housing plans (housing elements) in California, ensuring they meet state requirements. This data is crucial for monitoring and enforcing housing planning regulations at the local level.
   - **Collection Time**: Created(August 30, 2022, 11:31 AM (UTC-07:00))
   - **Number of Variables**: Multiple, including county, jurisdiction, planning period, review status, compliance status, and more.
   - **Peculiarities**: Data includes categorical variables indicating compliance status and review progress. Using IN and OUT instead of True or False.
   <br><br>


## California Affordable Housing and Sustainable Communities

In [5]:
awarddf = pd.read_csv('2017-ca-affordable-housing-and-sustainable-communities-ahsc-round-3-awards.csv', delimiter='|')
awarddf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Project Type            19 non-null     object
 1   PIN                     19 non-null     int64 
 2   Project Name            19 non-null     object
 3   Applicants              19 non-null     object
 4   Project Location        19 non-null     object
 5   Project Area Type       19 non-null     object
 6   DAC %                   19 non-null     object
 7   Low Income              19 non-null     object
 8    Total AHSC Requested   19 non-null     object
dtypes: int64(1), object(8)
memory usage: 1.5+ KB


In [6]:
awarddf.describe()

Unnamed: 0,PIN
count,19.0
mean,41238.421053
std,186.089977
min,40963.0
25%,41095.0
50%,41248.0
75%,41289.5
max,41576.0


In [7]:
awarddf.isna()

Unnamed: 0,Project Type,PIN,Project Name,Applicants,Project Location,Project Area Type,DAC %,Low Income,Total AHSC Requested
0,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False
5,False,False,False,False,False,False,False,False,False
6,False,False,False,False,False,False,False,False,False
7,False,False,False,False,False,False,False,False,False
8,False,False,False,False,False,False,False,False,False
9,False,False,False,False,False,False,False,False,False


## Housing Cost Burden

In [8]:
df_burden = pd.read_csv('hci_acs_chas_raceincome_housingcostburden_ct_pl_co_re_st_7-30-14-ada (1).csv')

  df_burden = pd.read_csv('hci_acs_chas_raceincome_housingcostburden_ct_pl_co_re_st_7-30-14-ada (1).csv')


In [9]:
df_burden.head()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Table 1
ind_id,ind_definition,datasource,reportyear,burden,tenure,race_eth_code,race_eth_name,income_level,geotype,geotypevalue,geoname,county_name,county_fips,region_name,region_code,total_households,burdened_households,percent,LL95CI,UL95CI,SE,rse,CA_decile,CA_RR,version
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at <=30% of HUD-adjusted family median income,CA,06,California,,,,,482570,357025,74,74,74,0,1,,1,29Jul2014
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at all levels of HUD-adjusted family median income,CA,06,California,,,,,7112050,2929600,41,41,42,0,0,,1,29Jul2014
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at <=30% of HUD-adjusted family median income,CA,06,California,,,,,482570,293485,61,60,61,0,1,,1,29Jul2014
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at all levels of HUD-adjusted family median income,CA,06,California,,,,,7112050,1302270,18,18,18,0,0,,1,29Jul2014


In [10]:
df_burden.describe()

Unnamed: 0,Table 1
count,521263
unique,2
top,29Jul2014
freq,521262


In [11]:
df_burden.isna()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Table 1
ind_id,ind_definition,datasource,reportyear,burden,tenure,race_eth_code,race_eth_name,income_level,geotype,geotypevalue,geoname,county_name,county_fips,region_name,region_code,total_households,burdened_households,percent,LL95CI,UL95CI,SE,rse,CA_decile,CA_RR,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at <=30% of HUD-adjusted family median income,CA,06,California,,,,,482570,357025,74,74,74,0,1,,1,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at all levels of HUD-adjusted family median income,CA,06,California,,,,,7112050,2929600,41,41,42,0,0,,1,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at <=30% of HUD-adjusted family median income,CA,06,California,,,,,482570,293485,61,60,61,0,1,,1,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at all levels of HUD-adjusted family median income,CA,06,California,,,,,7112050,1302270,18,18,18,0,0,,1,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, gross rent",Renter-occupied households,7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, gross rent",Renter-occupied households,7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, gross rent or selected housing costs",Total households (owner- and renter-occupied),7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, gross rent or selected housing costs",Total households (owner- and renter-occupied),7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,False


In [12]:
df_burden.isnull()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Table 1
ind_id,ind_definition,datasource,reportyear,burden,tenure,race_eth_code,race_eth_name,income_level,geotype,geotypevalue,geoname,county_name,county_fips,region_name,region_code,total_households,burdened_households,percent,LL95CI,UL95CI,SE,rse,CA_decile,CA_RR,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at <=30% of HUD-adjusted family median income,CA,06,California,,,,,482570,357025,74,74,74,0,1,,1,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at all levels of HUD-adjusted family median income,CA,06,California,,,,,7112050,2929600,41,41,42,0,0,,1,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at <=30% of HUD-adjusted family median income,CA,06,California,,,,,482570,293485,61,60,61,0,1,,1,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at all levels of HUD-adjusted family median income,CA,06,California,,,,,7112050,1302270,18,18,18,0,0,,1,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, gross rent",Renter-occupied households,7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, gross rent",Renter-occupied households,7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, gross rent or selected housing costs",Total households (owner- and renter-occupied),7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,False
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, gross rent or selected housing costs",Total households (owner- and renter-occupied),7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,False


In [13]:
df_burden.dropna()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Table 1
ind_id,ind_definition,datasource,reportyear,burden,tenure,race_eth_code,race_eth_name,income_level,geotype,geotypevalue,geoname,county_name,county_fips,region_name,region_code,total_households,burdened_households,percent,LL95CI,UL95CI,SE,rse,CA_decile,CA_RR,version
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at <=30% of HUD-adjusted family median income,CA,06,California,,,,,482570,357025,74,74,74,0,1,,1,29Jul2014
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at all levels of HUD-adjusted family median income,CA,06,California,,,,,7112050,2929600,41,41,42,0,0,,1,29Jul2014
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at <=30% of HUD-adjusted family median income,CA,06,California,,,,,482570,293485,61,60,61,0,1,,1,29Jul2014
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,9,Total,Monthly household income at all levels of HUD-adjusted family median income,CA,06,California,,,,,7112050,1302270,18,18,18,0,0,,1,29Jul2014
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, selected, housing costs",Owner-occupied households,7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,29Jul2014
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, gross rent",Renter-occupied households,7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,29Jul2014
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 50% of monthly household income consumed by monthly, gross rent",Renter-occupied households,7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,29Jul2014
106,Percent of households spending more than 30% (50%) of monthly household income on monthly gross rent or selected housing costs,CHAS,2006-2010,"> 30% of monthly household income consumed by monthly, gross rent or selected housing costs",Total households (owner- and renter-occupied),7,Multiple,All income levels,PL,86804.0,Yolo CDP,Yolo,6113.0,Sacramento Area,8,0.0,0.0,,,,,,,,29Jul2014


#### Kernel keeps dying when running .info()

## Housing Element Compliance Report

In [14]:
ca_co_df = pd.read_csv('2dcd1cd4-1348-4fc5-9c9c-219f82daac00.csv')

In [15]:
ca_co_df.head()

Unnamed: 0,_id,County,Jurisdiction,Planning Period,Record Type,Review Status,Date Received,Date Reviewed,Compliance Status
0,1,ALAMEDA,ALAMEDA,6.0,ADOPTED,IN,11/16/2022,12/20/2022,IN
1,2,ALAMEDA,ALAMEDA COUNTY,6.0,INITIAL DRAFT,OUT,10/6/2023,1/4/2024,OUT
2,3,ALAMEDA,ALBANY,6.0,ADOPTED,IN,8/16/2023,9/8/2023,IN
3,4,ALAMEDA,BERKELEY,6.0,ADOPTED,IN,1/24/2023,2/28/2023,IN
4,5,ALAMEDA,DUBLIN,6.0,ADOPTED,IN,11/22/2023,1/19/2024,IN


In [16]:
ca_co_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541 entries, 0 to 540
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   _id                541 non-null    int64  
 1   County             540 non-null    object 
 2   Jurisdiction       539 non-null    object 
 3   Planning Period    539 non-null    float64
 4   Record Type        539 non-null    object 
 5   Review Status      539 non-null    object 
 6   Date Received      539 non-null    object 
 7   Date Reviewed      480 non-null    object 
 8   Compliance Status  539 non-null    object 
dtypes: float64(1), int64(1), object(7)
memory usage: 38.2+ KB


In [11]:
ca_co_df.describe()

Unnamed: 0,_id,Planning Period
count,541.0,539.0
mean,271.0,6.003711
std,156.317519,0.060858
min,1.0,6.0
25%,136.0,6.0
50%,271.0,6.0
75%,406.0,6.0
max,541.0,7.0
