**Customer Purchase Prediction & Effect of Micro-Numerosity**

The Customer Purchase datasheet is a comprehensive collection of customer data aimed at facilitating customer purchase prediction and analyzing the effect of micro-numerosity. The datasheet includes the following columns:

Customer Id: A unique identifier for each customer.

Age: The age of the customer.

Gender: The gender of the customer.

Education: The highest education level attained by the customer.

Review: Customer reviews or feedback, providing qualitative insights into their purchase experience.

Purchased: A binary indicator of whether a purchase was made (yes/no).

This datasheet enables the analysis of purchasing patterns and helps in understanding how micro-numerosity (the effect of small numerical differences) impacts customer decisions.








In [2]:
import pandas as pd

In [3]:
purchase = pd.read_csv('Customer Purchase.csv')

In [4]:
purchase.head()

Unnamed: 0,Customer ID,Age,Gender,Education,Review,Purchased
0,1021,30,Female,School,Average,No
1,1022,68,Female,UG,Poor,No
2,1023,70,Female,PG,Good,No
3,1024,72,Female,PG,Good,No
4,1025,16,Female,UG,Average,No


In [5]:
purchase.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Customer ID  50 non-null     int64 
 1   Age          50 non-null     int64 
 2   Gender       50 non-null     object
 3   Education    50 non-null     object
 4   Review       50 non-null     object
 5   Purchased    50 non-null     object
dtypes: int64(2), object(4)
memory usage: 2.5+ KB


In [6]:
purchase.describe()

Unnamed: 0,Customer ID,Age
count,50.0,50.0
mean,1045.5,54.16
std,14.57738,25.658161
min,1021.0,15.0
25%,1033.25,30.25
50%,1045.5,57.0
75%,1057.75,74.0
max,1070.0,98.0


In [7]:
purchase.columns

Index(['Customer ID', 'Age', 'Gender', 'Education', 'Review', 'Purchased'], dtype='object')

In [8]:
y = purchase['Purchased']
X = purchase.drop(['Purchased','Customer ID'],axis=1)

In [9]:
# encoding categorical variable
X.replace({'Review':{'Poor':0,'Average':1,'Good':2}},inplace=True)
X.replace({'Education':{'School':0,'UG':1,'PG':2}},inplace=True)
X.replace({'Gender':{'Male': 0,'Female':1}},inplace=True)

In [10]:
# display first 5 rows
X.head()

Unnamed: 0,Age,Gender,Education,Review
0,30,1,0,1
1,68,1,1,0
2,70,1,2,2
3,72,1,2,2
4,16,1,1,1


In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=0.8, random_state=2529)

In [13]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((40, 4), (10, 4), (40,), (10,))

In [14]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()

In [15]:
model.fit(X_train,y_train)

In [16]:
y_pred = model.predict(X_test)

In [17]:
y_pred

array(['No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes'],
      dtype=object)

In [18]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

In [19]:
confusion_matrix(y_test,y_pred)

array([[2, 1],
       [3, 4]], dtype=int64)

In [20]:
accuracy_score(y_test,y_pred)

0.6

In [21]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

          No       0.40      0.67      0.50         3
         Yes       0.80      0.57      0.67         7

    accuracy                           0.60        10
   macro avg       0.60      0.62      0.58        10
weighted avg       0.68      0.60      0.62        10

