## Wholesale Customers Data - XGBoost Classifier Report

## 1. Introduction

The goal of this project is to classify wholesale customers using an XGBoost Classifier. This report covers dataset analysis, preprocessing, model building, evaluation, and conclusions.

## 2. Libraries Used

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


## 3. Dataset Overview

The dataset is loaded using Pandas:

In [2]:
data = pd.read_csv(r"C:\Users\devad\Downloads\Wholesale customers data.csv")

### Columns in the Dataset:  

- **Channel**: The target variable (1 or 2).  
- **Region**: Geographical region of the customer.  
- **Fresh, Milk, Grocery, Frozen, Detergents_Paper, Delicassen**: Different product categories and their purchase amounts.  

### Relationships:  

- Some features (e.g., **Grocery** and **Detergents_Paper**) are highly correlated, indicating a potential relationship between purchasing behaviors.  


## 4. Data Analysis

### Basic Info and Statistics

In [3]:
data.info()
data.describe()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 440 entries, 0 to 439
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype
---  ------            --------------  -----
 0   Channel           440 non-null    int64
 1   Region            440 non-null    int64
 2   Fresh             440 non-null    int64
 3   Milk              440 non-null    int64
 4   Grocery           440 non-null    int64
 5   Frozen            440 non-null    int64
 6   Detergents_Paper  440 non-null    int64
 7   Delicassen        440 non-null    int64
dtypes: int64(8)
memory usage: 27.6 KB


Unnamed: 0,Channel,Region,Fresh,Milk,Grocery,Frozen,Detergents_Paper,Delicassen
count,440.0,440.0,440.0,440.0,440.0,440.0,440.0,440.0
mean,1.322727,2.543182,12000.297727,5796.265909,7951.277273,3071.931818,2881.493182,1524.870455
std,0.468052,0.774272,12647.328865,7380.377175,9503.162829,4854.673333,4767.854448,2820.105937
min,1.0,1.0,3.0,55.0,3.0,25.0,3.0,3.0
25%,1.0,2.0,3127.75,1533.0,2153.0,742.25,256.75,408.25
50%,1.0,3.0,8504.0,3627.0,4755.5,1526.0,816.5,965.5
75%,2.0,3.0,16933.75,7190.25,10655.75,3554.25,3922.0,1820.25
max,2.0,3.0,112151.0,73498.0,92780.0,60869.0,40827.0,47943.0


- **No null values** are found in the dataset.
- The dataset contains **440 rows and 8 columns**.

## Checking for Null Values

In [4]:
data.isnull().sum()


Channel             0
Region              0
Fresh               0
Milk                0
Grocery             0
Frozen              0
Detergents_Paper    0
Delicassen          0
dtype: int64

## 5. Data Preprocessing


### Feature and Target Selection

In [5]:
x = data.drop(['Channel'], axis=1)
y = data['Channel']


- **X (features)**: All columns except `Channel`

- **Y (target variable)**: `Channel`

### Splitting the Data

In [6]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)

- 70% training data, 30% testing data.

## 6. Model Building

### Initializing and Training the XGBoost Classifier

In [None]:
from xgboost import XGBClassifier

params = {
    'objective': 'binary:logistic',
    'max_depth': 4,
    'alpha': 10,
    'learning_rate': 0.1,
    'n_estimators': 100
}

xgb_clf = XGBClassifier(**params)
xgb_clf.fit(x_train, y_train)


### Predictions

In [None]:
y_pred = xgb_clf.predict(x_test)

## 7. Model Evaluation

### Accuracy Score

In [None]:
accuracy = accuracy_score(y_test, y_pred) * 100
print("The Accuracy Score:", accuracy)


Accuracy: 87.88%