#### Iphone Purchases are getting increased day by day and many stores wants to predict whether a customer will purchase an Iphone from thier store given their gender, age and salary.
#### Build a Decision Tree Classifier model by performing EDA and do necessary transformations using Python.
##### Prediction  - Whether Customer will purchase or no
Dataset Name - iPhone_purchase_records.csv
csv

In [1]:
## Importing all the necessary libraries

import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay


In [2]:
## Load the Dataset
data = pd.read_csv("D:/TopMentor_DS_course/Decision Tree Project8/iphone_purchase_records.csv")


In [3]:
data.head()

Unnamed: 0,Gender,Age,Salary,Purchase Iphone
0,Male,19,19000,0
1,Male,35,20000,0
2,Female,26,43000,0
3,Female,27,57000,0
4,Male,19,76000,0


In [4]:
print("Columns in the Datasets: ", data.columns)

Columns in the Datasets:  Index(['Gender', 'Age', 'Salary', 'Purchase Iphone'], dtype='object')


In [5]:
print("Shape of the Dataset: ", data.shape)
print()
print("Information of Dataset")
print("====================================================")
data.info()

Shape of the Dataset:  (400, 4)

Information of Dataset
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Gender           400 non-null    object
 1   Age              400 non-null    int64 
 2   Salary           400 non-null    int64 
 3   Purchase Iphone  400 non-null    int64 
dtypes: int64(3), object(1)
memory usage: 12.6+ KB


In [6]:
print("Null Values in the Datsets: ", data.isnull().sum())

Null Values in the Datsets:  Gender             0
Age                0
Salary             0
Purchase Iphone    0
dtype: int64


In [7]:
## Descriptive Analysis 

round(data.describe(),2)

Unnamed: 0,Age,Salary,Purchase Iphone
count,400.0,400.0,400.0
mean,37.66,69742.5,0.36
std,10.48,34096.96,0.48
min,18.0,15000.0,0.0
25%,29.75,43000.0,0.0
50%,37.0,70000.0,0.0
75%,46.0,88000.0,1.0
max,60.0,150000.0,1.0


##### üîç Insight

No missing values present

Target column: Purchased (0 = No, 1 = Yes)

Age and Salary are numerical ‚Üí good for KNN

#### EXPLORATORY DATA ANALYSIS (EDA)

In [None]:
# Missing data in a plot

plt.figure(figsize=(8,5))
plt.title("Missing Data in the plot")
sns.heatmap(data.isnull(), cmap='viridis')
plt.savefig("D:/TopMentor_DS_course/Decision Tree Project8/Graphs/Heatmap.png")

In [None]:
###  Distribution of Target Variable
plt.figure(figsize=(6,4))
sns.countplot(x='Purchase Iphone', data=data)
plt.title("Purchase Distribution")
plt.savefig("D:/TopMentor_DS_course/Decision Tree Project8/Graphs/Target_var_Distr.png")
plt.show()

#### üìå Inference
More customers did not purchase compared to those who did

Slight class imbalance (acceptable for KNN)

In [None]:
### Gender vs Purchase
plt.figure(figsize=(8,4))
sns.countplot(x='Gender', hue='Purchase Iphone', data=data)
plt.savefig("D:/TopMentor_DS_course/Decision Tree Project8/Graphs/Gender vs Purchase.png")
plt.title("Gender vs Purchase")
plt.show()

#### üìå Inference
Both males and females purchase iPhones

Gender alone is not a strong deciding factor

In [None]:
## Age Distribution 
plt.figure(figsize=(8,4))
sns.histplot(data['Age'], bins=20, kde=True)
plt.title("Age Distribution")
plt.savefig("D:/TopMentor_DS_course/Decision Tree Project8/Graphs/Age Distribution.png")
plt.show()


#### üìå Inference

Majority users lie between 25‚Äì45 years

Middle-aged users dominate dataset

In [None]:
###  Age vs Purchase
plt.figure(figsize=(6,4))
sns.boxplot(x='Purchase Iphone', y='Age', data=data)
plt.title("Age vs Purchase")
plt.savefig("D:/TopMentor_DS_course/Decision Tree Project8/Graphs/Age vs Purchase.png")
plt.show()

#### üìå Inference
Customers who purchase iPhones tend to be older

Age is an important feature

In [None]:
### Salary vs Purchase
plt.figure(figsize=(6,4))
sns.boxplot(x='Purchase Iphone', y='Salary', data=data)
plt.title("Salary vs Purchase")
plt.savefig("D:/TopMentor_DS_course/Decision Tree Project8/Graphs/Salary vs Purchase.png")
plt.show()

#### üìå Inference

Higher salary customers are more likely to purchase

Salary strongly influences buying decision

## MODEL BUILDING

In [None]:
## Encoding Gender Column 

data['Gender'] = data['Gender'].map({'Male': 1, 'Female': 0})

print("Gender encoded successfully!")
data.head()

In [None]:
## Features and Target Selection 

X = data[['Gender', 'Age', 'Salary']]
y = data['Purchase Iphone']

print("Features and target selected!")

In [None]:
## Splitting the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=42
)

print("Training set size:", X_train.shape)
print("Testing set size:", X_test.shape)

### Decision Tree Classifier

In [None]:
## Building Decision Tree model 

model = DecisionTreeClassifier(criterion='gini', max_depth=5, random_state=42)
model.fit(X_train,y_train)
print(" model trained successfully!")

In [None]:
model.feature_importances_

In [None]:
## Making predictions on test data

y_pred = model.predict(X_test)
print(y_pred)
print("Predictions completed!")

In [None]:
## Model Accuracy
accuracy = accuracy_score(y_test, y_pred)

print("\nModel Accuracy:", round(accuracy * 100, 2), "%")

#### üìå Inference

Good accuracy (~85‚Äì90% typical)

Balanced precision & recall

Model performs well on unseen data

In [None]:
from sklearn.tree import plot_tree

plt.figure(figsize=(18,10))
plot_tree(
    model,
    feature_names= ['Gender', 'Age', 'Salary'],
    class_names=['No', 'Yes'],
    filled=True
)
plt.show()


In [None]:
## Confusion Matrix

cm = confusion_matrix(y_test, y_pred)
print(cm)

plt.figure(figsize=(5,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

In [None]:
f_imp = pd.Series(model.feature_importances_, index= X.columns)
f_imp.plot(kind='barh')

### Final Inference & Insights

#### ‚úÖ Key Findings

Age is the strongest predictor

Higher salary ‚Üí higher purchase probability

Gender has minimal impact

Decision Tree captures non-linear patterns effectively

#### üöÄ Business Impact

Marketing teams can target:

Users aged 30+

Users with higher income

Improves conversion rate

Reduces unnecessary ad spend