# 🎯 Achievement 6.2 – Exploratory Visual Analysis
**Dataset:** Gun Violence in the U.S.

Author: Alexandru Cojocari

This notebook explores relationships within the cleaned gun violence dataset. Time series and mapping will be handled in later tasks.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

In [None]:
# Load the cleaned data
df = pd.read_csv('cleaned_gunviolence.csv')
df = df.drop(columns=['incident_id', 'date'])
df.head()

In [None]:
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap")
plt.show()

### 🔍 Interpretation: Correlation Heatmap
- Strong correlations:
  - `n_killed` and `n_victims`
  - `n_injured` and `n_victims`
- Moderate positive correlation between `n_guns_involved` and number of victims
- Hypothesis: More guns = more harm

In [None]:
sns.scatterplot(data=df, x='n_guns_involved', y='n_victims')
plt.title("Guns Involved vs. Total Victims")
plt.show()

sns.scatterplot(data=df, x='n_injured', y='n_killed')
plt.title("Injured vs. Killed")
plt.show()

### 🔍 Interpretation: Scatterplots
- Positive trend: More guns = more victims
- Injured vs. killed: Related but with variance (injury doesn't always lead to death)
- Outliers exist: High deaths with few guns involved

In [None]:
sns.pairplot(df)
plt.show()

### 🔍 Interpretation: Pairplot
- Variables are right-skewed (especially deaths and injuries)
- Most incidents are low-scale; a few mass events distort the distribution
- `n_suspects` and `n_unharmed` need more digging

In [None]:
plt.figure(figsize=(14, 6))
top_states = df.groupby('state')['n_killed'].mean().sort_values(ascending=False).head(10)
sns.barplot(x=top_states.index, y=top_states.values)
plt.title("Top 10 States by Average Deaths per Incident")
plt.ylabel("Avg Killed")
plt.xticks(rotation=45)
plt.show()

### 🔍 Interpretation: State-wise Violence
- States like Illinois, Louisiana, and Missouri have high average deaths
- Potential focus areas for regional policy analysis or clustering later

### 🤔 Hypotheses and Further Questions
**Hypotheses:**
1. Incidents with more guns involved result in more victims.
2. Certain states consistently show higher fatality averages.
3. Most gun violence incidents involve few victims; outliers inflate national stats.

**New Questions:**
- Are multiple suspects associated with higher harm?
- Is there a threshold after which gun count stops affecting victim count?
