# **Title of Project**

**Customer Purchase Prediction & Effect of Micro-Numerosity**

# **Objective**

The goal of this project is to develop a predictive model based on Customer's Age, Gender, Education and Product Review. This model is likely to predict whether the Customer has purchased the Product or not.

# **Data Source**

**YBI Foundation GitHub Page**

# **Import Library**

In [1]:
import pandas as pd

# **Import Data**

In [2]:
purchase = pd.read_csv('https://github.com/YBIFoundation/Dataset/raw/main/Customer%20Purchase.csv')

# **Describe Data**

In [3]:
purchase.head()

Unnamed: 0,Customer ID,Age,Gender,Education,Review,Purchased
0,1021,30,Female,School,Average,No
1,1022,68,Female,UG,Poor,No
2,1023,70,Female,PG,Good,No
3,1024,72,Female,PG,Good,No
4,1025,16,Female,UG,Average,No


# **Data Preprocessing**

In [4]:
purchase.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Customer ID  50 non-null     int64 
 1   Age          50 non-null     int64 
 2   Gender       50 non-null     object
 3   Education    50 non-null     object
 4   Review       50 non-null     object
 5   Purchased    50 non-null     object
dtypes: int64(2), object(4)
memory usage: 2.5+ KB


In [5]:
purchase.describe()

Unnamed: 0,Customer ID,Age
count,50.0,50.0
mean,1045.5,54.16
std,14.57738,25.658161
min,1021.0,15.0
25%,1033.25,30.25
50%,1045.5,57.0
75%,1057.75,74.0
max,1070.0,98.0


# **Define Target Variable (Y) and Feature Variables (X)**

In [6]:
purchase.columns

Index(['Customer ID', 'Age', 'Gender', 'Education', 'Review', 'Purchased'], dtype='object')

In [7]:
Y= purchase['Purchased']

In [8]:
Y.shape

(50,)

In [9]:
X= purchase.drop(['Purchased','Customer ID'],axis=1)

In [10]:
X.shape

(50, 4)

In [11]:
X.head()

Unnamed: 0,Age,Gender,Education,Review
0,30,Female,School,Average
1,68,Female,UG,Poor
2,70,Female,PG,Good
3,72,Female,PG,Good
4,16,Female,UG,Average


# **Encoding Categorical Variable**

In [12]:
X.replace({'Review':{'Poor':0,'Average':1,'Good':2}},inplace=True)

In [13]:
X.replace({'Education':{'School':0,'UG':1,'PG':2}},inplace=True)

In [14]:
X.replace({'Gender':{'Male': 0,'Female':1}},inplace=True)

In [15]:
X.head()

Unnamed: 0,Age,Gender,Education,Review
0,30,1,0,1
1,68,1,1,0
2,70,1,2,2
3,72,1,2,2
4,16,1,1,1


# **Train Test Split Data**

In [16]:
from sklearn.model_selection import train_test_split

In [17]:
X_train, X_test, Y_train, Y_test = train_test_split(X,Y,train_size =0.8,random_state=2529)

In [18]:
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

((40, 4), (10, 4), (40,), (10,))

# **Model Selection**

In [19]:
from sklearn.ensemble import RandomForestClassifier

In [20]:
model = RandomForestClassifier()

# **Model Evaluation**

In [21]:
model.fit(X_train,Y_train)

# **Predict Test Data**

In [22]:
Y_pred = model.predict(X_test)

In [23]:
Y_pred

array(['No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes'],
      dtype=object)

# **Model Accuracy**

In [24]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

In [25]:
confusion_matrix(Y_test,Y_pred)

array([[2, 1],
       [3, 4]])

In [26]:
accuracy_score(Y_test,Y_pred)

0.6

In [27]:
print(classification_report(Y_test,Y_pred))

              precision    recall  f1-score   support

          No       0.40      0.67      0.50         3
         Yes       0.80      0.57      0.67         7

    accuracy                           0.60        10
   macro avg       0.60      0.62      0.58        10
weighted avg       0.68      0.60      0.62        10



# **Explaination**

**A "Purchase Prediction" Machine Learning (ML) project involves developing a model that can accurately predict whether the Customer is willing to buy the Product or not based on certain input features.**

Here's an explanation of the key steps and components involved in such a project:

**Data Collection and Preprocessing:**

Gather a dataset that contains information about various Customer's purchases. The dataset should include features like Customer's Age, Gender, Education and Product Review. Clean and preprocess the data to handle missing values, outliers, and categorical variables.

**Feature Selection/Engineering:**

Identify the most relevant features that could contribute to predicting the chances of Purchase accurately. You might need to transform or engineer features to make them more suitable for modeling.

**Data Splitting:**

Split the dataset into training and testing subsets. The training subset will be used to train the model, while the testing subset will be used to evaluate its performance on unseen data.

**Model Selection:**

Choose an appropriate algorithm for your Purchase prediction task. Random Forest Classifier is a common choice for such problems.

**Model Training:**

Train the selected model using the training data. During training, the model learns the relationship between the input features and the target (Purchased) by adjusting its internal parameters.

**Model Evaluation:**

Use the testing data to evaluate the model's performance. Common evaluation metrics for Random Forest Classifier tasks include Confusion Matrix, Accuracy Score, Classification Report and Area Under Curve (AUC) score. These metrics help you understand how well the model's predictions match the actual Chances of Purchase.

**Prediction and Deployment (Optional):**

Once you're satisfied with the model's performance, you can deploy it for making real-world predictions. Users could input Customer details, and the model would predict whether the Customer is likely to buy the Product or not.

**Iterative Refinement:**

If your model's performance is not satisfactory, you can revisit earlier steps to improve it. This might involve collecting more data, experimenting with different features, trying different algorithms.

**Communication:**

Finally, communicate your results and findings. This could involve creating visualizations to show how well the model's predictions align with the actual Chances of Purchase. Explain the model's strengths, weaknesses, and potential applications.

**Overall, a Purchase Prediction ML project demonstrates how machine learning can be applied to real-world problems in the retail industry domain, predicting whether the Customer is likely to buy the Product or not.**