
### Clustering Electric Vehicles by Model Year and Electric Range

#### Overview:
In this notebook, we will explore how to use **K-Means Clustering** to group Battery Electric Vehicles (BEVs) based on their **Model Year** and **Electric Range**. We will walk through the process step-by-step, from loading the dataset to visualizing the clusters. The goal is to identify which cluster has the highest average electric range, providing insight into BEVs with superior performance in this area.

---

#### Objective:
1. Install necessary libraries.
2. Load and filter the dataset to focus on BEVs.
3. Select key features for clustering.
4. Normalize the data to improve clustering performance.
5. Apply K-Means Clustering to group the vehicles.
6. Identify and analyze the "winning" cluster with the highest electric range.
7. Visualize the clusters with a focus on the winning group.

Let's begin!



#### Step 1: Installing Required Libraries
First, ensure that you have the necessary libraries installed. We will be using **scikit-learn** for clustering, **pandas** for data manipulation, and **matplotlib** for plotting.


In [None]:

%pip install scikit-learn pandas matplotlib
print("Package dependencies have been installed... keep going!")



#### Step 2: Importing Libraries
Now, let's import the libraries we will be using in this notebook. **pandas** for data handling, **numpy** for numerical operations, **scikit-learn** for machine learning, and **matplotlib** for visualizations.


In [None]:

import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
print("Libraries have been successfully imported.")



#### Step 3: Loading and Preprocessing Data
We will now load the dataset of electric vehicles, specifically focusing on **Battery Electric Vehicles (BEVs)**. We'll also ensure that we exclude any vehicles with an electric range of zero.


In [None]:

# Load the dataset
df = pd.read_csv('https://jerrycuomo.github.io/Think_Artificial_Intelligence/datasets/electric_vehicle_population_data.csv')

# Ensure we are focusing only on BEVs
df = df[df['Electric Vehicle Type'] == 'Battery Electric Vehicle (BEV)']
df = df[df['Electric Range'] > 0]
print(f"Dataset loaded and filtered. Total records: {df.shape[0]}")



#### Step 4: Selecting Features and Normalizing Data
We will select the **Model Year** and **Electric Range** as our features for clustering. Since these features are on different scales, we'll normalize the data using **StandardScaler**.


In [None]:

# Select features for clustering: Model Year and Electric Range
features = df[['Model Year', 'Electric Range']]

# Normalize the features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)
print("Features selected and normalized.")



#### Step 5: Performing K-Means Clustering
Now, we will apply the **K-Means Clustering** algorithm to group BEVs into clusters based on their **Model Year** and **Electric Range**.


In [None]:

# Perform K-Means Clustering
num_clusters = 5
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
df['Cluster'] = kmeans.fit_predict(features_scaled)
print(f"K-Means clustering completed with {num_clusters} clusters.")



#### Step 6: Identifying the Winning Cluster
Next, we will identify the cluster with the highest average electric range and print out a sample of vehicles from this cluster.


In [None]:

# Identify the "winning" cluster with the highest average electric range
winning_cluster = df.groupby('Cluster')['Electric Range'].mean().idxmax()
print(f"The winning cluster is: {winning_cluster}")

# Sample vehicles from the winning cluster
winning_cars = df[df['Cluster'] == winning_cluster][['Model Year', 'Make', 'Model', 'Electric Range']].sample(5, random_state=42)
print("Sample vehicles from the winning cluster:")
print(winning_cars)



#### Step 7: Visualizing the Clusters
Finally, we will visualize the clusters, highlighting the winning cluster with distinct colors.


In [None]:

# Colors for each cluster
colors = plt.cm.rainbow(np.linspace(0, 1, num_clusters))

# Plotting the clusters
plt.figure(figsize=(12, 8))
for cluster in range(num_clusters):
    cluster_data = df[df['Cluster'] == cluster]
    plt.scatter(cluster_data['Model Year'], cluster_data['Electric Range'], 
                color=colors[cluster], 
                label=f'Cluster {cluster}' if cluster == winning_cluster else None, 
                s=250 if cluster == winning_cluster else 150, 
                alpha=0.7)

plt.title('BEV Clusters by Model Year and Electric Range')
plt.xlabel('Model Year')
plt.ylabel('Electric Range')
plt.grid(True)

# Add legend for the winning cluster
winning_legend = plt.legend(title="Winning Cluster", loc="upper left", fontsize='large', fancybox=True)
plt.setp(winning_legend.get_title(), fontsize='small', fontweight='bold')

plt.tight_layout()
plt.show()

print("Clusters have been visualized.")



## Conclusion:
In this notebook, we successfully applied **K-Means Clustering** to group Battery Electric Vehicles (BEVs) based on their **Model Year** and **Electric Range**. We identified the cluster with the highest average electric range and visualized the clusters. 

You can further explore this analysis by changing the number of clusters or adding more features to see how it affects the grouping. Happy experimenting!
