# Electric Vehicle (EV) Adoption and Charging Station Optimization

## Project Overview
This project analyzes Electric Vehicle (EV) adoption trends in India and performs optimization for charging station placement. We utilize two datasets:
1. **EV Sales Data**: State-wise and vehicle-category-wise sales data.
2. **Charging Station Data**: Realistic dataset (~29,000 points) representing late 2025 charging infrastructure distribution across India.

## Objectives
1. Analyze EV sales growth over time and by state.
2. Visualize the distribution of existing charging infrastructure.
3. Perform optimization analysis (e.g., Clustering) to identify optimal zones for new charging stations.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# specific encoding might be needed for some CSVs
try:
    sales_df = pd.read_csv('electric_vehicle_sales_by_state.csv')
except:
    sales_df = pd.read_csv('electric_vehicle_sales_by_state.csv', encoding='latin1')

stations_df = pd.read_csv('charging_stations_india.csv')

print("Sales Data Head:")
display(sales_df.head())
print("\nStations Data Head:")
display(stations_df.head())

## Data Preprocessing

In [None]:
# Convert date column to datetime
sales_df['date'] = pd.to_datetime(sales_df['date'], format='%d-%b-%y', errors='coerce')

# Check for missing values
print(sales_df.isnull().sum())

# Fill or drop if necessary (assuming clean-ish data for now)
sales_df.dropna(subset=['date'], inplace=True)

## 1. EV Adoption Analysis
### Trend over time

In [None]:
monthly_sales = sales_df.groupby('date')['electric_vehicles_sold'].sum().reset_index()

plt.figure(figsize=(12, 6))
sns.lineplot(data=monthly_sales, x='date', y='electric_vehicles_sold', marker='o')
plt.title('Total EV Sales in India Over Time')
plt.ylabel('EVs Sold')
plt.xlabel('Date')
plt.grid(True)
plt.show()

### State-wise Adoption

In [None]:
state_sales = sales_df.groupby('state')['electric_vehicles_sold'].sum().sort_values(ascending=False).head(10)

plt.figure(figsize=(12, 6))
sns.barplot(x=state_sales.values, y=state_sales.index, palette='viridis')
plt.title('Top 10 States by Total EV Sales')
plt.xlabel('Total EVs Sold')
plt.show()

## 2. Charging Infrastructure Analysis

In [None]:
plt.figure(figsize=(10, 8))
sns.scatterplot(data=stations_df, x='Longitude', y='Latitude', hue='Charger_Type', alpha=0.6)
plt.title('Distribution of Charging Stations (Simulated)')
plt.show()

## 3. Optimization: Facility Location using K-Means Clustering
We use K-Means clustering to identify high-density clusters of existing stations, which might indicate demand hot-spots, or to propose centroids for new "Super Hubs".

In [None]:
from sklearn.cluster import KMeans

# Analyze stations in specific state, e.g., Maharashtra
mh_stations = stations_df[stations_df['State'] == 'Maharashtra'][['Latitude', 'Longitude']]

if not mh_stations.empty:
    # Propose 5 optimal locations for new hubs based on existing density
    kmeans = KMeans(n_clusters=5, random_state=42, n_init=10)
    mh_stations['Cluster'] = kmeans.fit_predict(mh_stations[['Latitude', 'Longitude']])
    centers = kmeans.cluster_centers_

    plt.figure(figsize=(10, 8))
    sns.scatterplot(data=mh_stations, x='Longitude', y='Latitude', hue='Cluster', palette='tab10', legend='full')
    plt.scatter(centers[:, 1], centers[:, 0], c='red', s=200, marker='X', label='Proposed New Hubs')
    plt.title('Optimal Hub Locations in Maharashtra (K-Means)')
    plt.legend()
    plt.show()

    print("Proposed Hub Coordinates (Lat, Lon):")
    for i, center in enumerate(centers):
        print(f"Hub {i+1}: {center[0]:.4f}, {center[1]:.4f}")

## Conclusion
This notebook demonstrates the fetching and analysis of EV adoption data and uses a simple clustering algorithm to suggest charging station optimizations.