# Homelessness in the US (2023): Exploratory Data Analysis
This notebook provides an exploratory data analysis (EDA) of homelessness in the United States for the year 2023. The dataset includes information on the number of individuals and family members experiencing homelessness in each state, along with the state's population and region. The goal is to understand patterns and key insights from the data.

## Data Loading

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('homelessness_in_US_2023.csv')
df.head()

## Basic Inspection

In [None]:
# Shape of the dataset
df.shape

In [None]:
# Data types and missing values
df.info()

In [None]:
# Descriptive statistics
df.describe()

In [None]:
# Unique regions
df['region'].unique()

## Data Cleaning

In [None]:
# Check for null values
df.isnull().sum()

In [None]:
# Ensure correct data types
df['state_pop'] = pd.to_numeric(df['state_pop'], errors='coerce')

In [None]:
# Create new columns for total homeless and homeless per 100,000
df['total_homeless'] = df['individuals'] + df['family_members']
df['homeless_per_100k'] = (df['total_homeless'] / df['state_pop']) * 100000
df.head()

## Data Visualization

In [None]:
top_10_states = df.sort_values('total_homeless', ascending=False).head(10)
plt.figure(figsize=(12,6))
sns.barplot(x='state', y='total_homeless', data=top_10_states, palette='Reds_r')
plt.title('Top 10 States by Total Homeless Population (2023)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(12,6))
sns.boxplot(x='region', y='homeless_per_100k', data=df)
plt.title('Homelessness per 100,000 People by Region')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Key Findings
- **California** has the highest number of homeless individuals by far.
- There is considerable regional variation, with western states showing higher per capita homelessness.
- Some smaller states (like Alaska and Hawaii) have high homelessness rates relative to their populations.
- Data quality appears to be good, with no missing values or major formatting issues.

## Conclusion
This exploratory analysis provides insights into homelessness in the US for 2023. Further analysis could involve time series data, policy impacts, or correlating with other socioeconomic factors.