# K-Means Clustering Analysis on Story Points

This Jupyter Notebook outlines the process of performing a K-Means clustering analysis with a focus on 'story points' as the primary independent variable. The steps include data loading, preprocessing, applying K-Means clustering, and evaluating the results.

## Importing Necessary Libraries

Before we start, let's import the necessary libraries for our analysis.

In [None]:
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Display settings
pd.options.display.max_columns = None
sns.set(style="whitegrid")

## Data Loading

Let's load the dataset and take a quick look at its first few rows to understand its structure.

In [None]:
# Load the dataset
data_path = 'path_to_your_dataset.csv'
data = pd.read_csv(data_path)

# Display the first few rows of the dataset
data.head()

## Data Exploration and Cleaning

In this section, we will explore the 'story points' column, handle missing values, and prepare the data for clustering.

In [None]:
# Exploring 'story points' column
print(data['Story points'].describe())

# Handling missing values
# Here we drop rows where 'story points' is missing as it's crucial for our analysis
cleaned_data = data.dropna(subset=['Story points'])

# Let's look at the cleaned data
cleaned_data['Story points'].describe()

## K-Means Clustering

We will now perform K-Means clustering on the 'story points'. Before clustering, it's essential to scale our data.

In [None]:
# Scaling the 'Story points'
scaler = StandardScaler()
scaled_features = scaler.fit_transform(cleaned_data[['Story points']])

# Applying K-Means
# We choose an arbitrary k for now, but this should ideally be determined using a method like the elbow method
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(scaled_features)

# Adding the cluster labels to our dataset
cleaned_data['Cluster'] = kmeans.labels_

cleaned_data.head()

## Evaluation

Let's visualize the clusters formed by our K-means model to evaluate how well the 'story points' have been grouped.

In [None]:
# Visualizing the clusters
plt.figure(figsize=(10, 6))
sns.scatterplot(x=cleaned_data.index, y='Story points', hue='Cluster', data=cleaned_data, palette='viridis')
plt.title('K-Means Clustering of Story Points')
plt.xlabel('Index')
plt.ylabel('Story Points')
plt.show()