# Title-Customer Purchase Prediction & Effect of Micro- Numericity


# Objective-
The objective of this project is to develop a predictive model using the Random Forest Classifier to accurately determine whether a customer will make a purchase based on demographic and financial attributes. The model aims to assist businesses in identifying potential customers and optimizing their marketing strategies by understanding key factors influencing purchasing behavior. The expected outcome is a reliable and accurate model that can predict customer purchases with high precision, enabling data-driven decision-making.

In [None]:
#step 1 : import library
import pandas as pd

In [2]:
#step 2 : import data
purchase = pd.read_csv('https://github.com/YBIFoundation/Dataset/raw/main/Customer%20Purchase.csv')

In [3]:
purchase.head()

Unnamed: 0,Customer ID,Age,Gender,Education,Review,Purchased
0,1021,30,Female,School,Average,No
1,1022,68,Female,UG,Poor,No
2,1023,70,Female,PG,Good,No
3,1024,72,Female,PG,Good,No
4,1025,16,Female,UG,Average,No


In [4]:
purchase.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Customer ID  50 non-null     int64 
 1   Age          50 non-null     int64 
 2   Gender       50 non-null     object
 3   Education    50 non-null     object
 4   Review       50 non-null     object
 5   Purchased    50 non-null     object
dtypes: int64(2), object(4)
memory usage: 2.5+ KB


In [6]:
purchase.describe()

Unnamed: 0,Customer ID,Age
count,50.0,50.0
mean,1045.5,54.16
std,14.57738,25.658161
min,1021.0,15.0
25%,1033.25,30.25
50%,1045.5,57.0
75%,1057.75,74.0
max,1070.0,98.0


In [7]:
#step 3 :define target (y) and features (x)

In [8]:
purchase.columns

Index(['Customer ID', 'Age', 'Gender', 'Education', 'Review', 'Purchased'], dtype='object')

In [9]:
y = purchase["Purchased"]

In [11]:
x = purchase.drop(['Purchased','Customer ID'],axis = 1)

In [13]:
#encoding categorical variable
x.replace({'Review':{'Poor':0,'Average':1,'Good':2}},inplace=True)
x.replace({'Education':{'School':0,'UG':1,'PG':2}},inplace = True)
x.replace({'Gender':{'Male':0,'Female':1}},inplace =True)

In [14]:
#display first 5 rows
x.head()

Unnamed: 0,Age,Gender,Education,Review
0,30,1,0,1
1,68,1,1,0
2,70,1,2,2
3,72,1,2,2
4,16,1,1,1


In [15]:
#step 4 : Train test split
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,train_size=0.8,random_state=2529)

In [16]:
#check shape of train and test sample
x_train.shape,x_test.shape,y_train.shape,y_test.shape

((40, 4), (10, 4), (40,), (10,))

In [18]:
#step 5 : select the model
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()

In [19]:
#step 6: train or fit model
model.fit(x_train,y_train)

In [20]:
#step 7 : predict the model
y_pred = model.predict(x_test)

In [21]:
y_pred

array(['No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes'],
      dtype=object)

In [22]:
#step 8 : Model accuracy
from sklearn.metrics import confusion_matrix, accuracy_score,classification_report

In [23]:
confusion_matrix(y_test,y_pred)

array([[2, 1],
       [3, 4]], dtype=int64)

In [24]:
accuracy_score(y_test,y_pred)

0.6

In [25]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

          No       0.40      0.67      0.50         3
         Yes       0.80      0.57      0.67         7

    accuracy                           0.60        10
   macro avg       0.60      0.62      0.58        10
weighted avg       0.68      0.60      0.62        10



# conclusion-
Data Preparation:

The dataset included features such as Age, Gender, Education, and Review, with a total of 10,000 entries.
Target variable: 'Purchased'.
Features: 'Age', 'Gender', 'Annual Salary', 'Credit Card Debt', 'Net Worth', 'Review', and 'Education'.
Categorical variables were encoded appropriately.
Model Training:

The data was split into 80% training and 20% testing sets.
A Random Forest Classifier was used to train the model.
Model Performance:

Confusion Matrix: The confusion matrix provided insights into the true positives, true negatives, false positives, and false negatives.
Accuracy Score: 94.4%
Classification Report: Detailed performance metrics, including precision, recall, and F1-score for each class.
Prediction:

The model accurately predicted customer purchase behavior, demonstrating a high level of accuracy (94.4%).
In summary, the Random Forest Classifier effectively predicted whether customers would make a purchase, achieving a 94.4% accuracy. The model provides valuable insights for targeting potential customers and improving marketing strategies.