# Customer Segmentation using Unsupervised Learning

#### 1. Project Goal
#### 2. Libraries Used
#### 3. Dataset Description
#### 4. Loading the Dataset
#### 5. Initial Data Inspection
#### 6. Summary of Findings

The goal of this project is to segment mall customers into meaningful groups
using unsupervised learning techniques. Customer segmentation helps businesses
understand customer behavior and design targeted marketing strategies.

This project uses the Mall Customers dataset and applies clustering algorithms
such as K-Means, Hierarchical Clustering, and DBSCAN.

## Libraries Used

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option('display.max_columns', None)

## Dataset Description

The Mall Customers dataset contains demographic and behavioral information
about customers visiting a shopping mall.

###### Features:
- CustomerID: Unique identifier for each customer
- Gender: Male or Female
- Age: Age of the customer
- Annual Income (k$): Annual income in thousand dollars
- Spending Score (1-100): Score assigned based on customer spending behavior

The dataset is suitable for distance-based clustering algorithms.
 


## Loading the Dataset

In [5]:
df = pd.read_csv("Mall_Customers.csv")
df.head()

Unnamed: 0,CustomerID,Gender,Age,Annual Income (k$),Spending Score (1-100)
0,1,Male,19,15,39
1,2,Male,21,15,81
2,3,Female,20,16,6
3,4,Female,23,16,77
4,5,Female,31,17,40


## Initial Data Inspection

In [6]:
df.shape

(200, 5)

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   CustomerID              200 non-null    int64 
 1   Gender                  200 non-null    object
 2   Age                     200 non-null    int64 
 3   Annual Income (k$)      200 non-null    int64 
 4   Spending Score (1-100)  200 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 7.9+ KB


In [9]:
df.isnull().sum()

CustomerID                0
Gender                    0
Age                       0
Annual Income (k$)        0
Spending Score (1-100)    0
dtype: int64

In [10]:
df.describe()

Unnamed: 0,CustomerID,Age,Annual Income (k$),Spending Score (1-100)
count,200.0,200.0,200.0,200.0
mean,100.5,38.85,60.56,50.2
std,57.879185,13.969007,26.264721,25.823522
min,1.0,18.0,15.0,1.0
25%,50.75,28.75,41.5,34.75
50%,100.5,36.0,61.5,50.0
75%,150.25,49.0,78.0,73.0
max,200.0,70.0,137.0,99.0


## Summary

- The dataset contains both demographic and behavioral features.
- No missing values are present in the dataset.
- Spending Score is scaled between 1 and 100.
- The data is suitable for unsupervised clustering techniques.