<a href="https://colab.research.google.com/github/bobyrajtamuli/Customer-Analytics/blob/main/Session_2_Hands_on_Lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Session 2: Hands-On Lab: AI-Powered Customer Data Collection and Management**

This hands-on lab demonstrates AI-powered customer data collection, segmentation, personalization, and predictive analytics using **Hugging Face Transformers**.

## **Step 1: Setting Up the Environment**

### **Install Necessary Libraries**

In [1]:
!pip install pandas numpy scikit-learn tensorflow transformers torch seaborn matplotlib faker

Collecting faker
  Downloading faker-37.6.0-py3-none-any.whl.metadata (15 kB)
Downloading faker-37.6.0-py3-none-any.whl (1.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m38.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faker
Successfully installed faker-37.6.0


### **Import Required Libraries**

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from faker import Faker
from transformers import pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
import torch

KeyboardInterrupt: 

## **Step 2: Generate and Explore Synthetic Customer Data**

In [None]:
fake = Faker()
sentiment_pipeline = pipeline("sentiment-analysis", model='distilbert-base-uncased-finetuned-sst-2-english')
def generate_synthetic_data(n=500):
    data = []
    for _ in range(n):
        age = np.random.randint(18, 70)
        gender = np.random.choice(['Male', 'Female'])
        income = np.random.randint(20000, 120000)
        spending_score = np.random.randint(1, 101)
        visit_frequency = np.random.randint(1, 30)
        purchase_count = np.random.randint(1, 50)
        feedback = fake.sentence()
        sentiment_score = sentiment_pipeline(feedback)[0]['score']
        data.append([fake.uuid4(), age, gender, income, spending_score, visit_frequency, purchase_count, feedback, sentiment_score])
    return pd.DataFrame(data, columns=['Customer_ID', 'Age', 'Gender', 'Annual_Income', 'Spending_Score', 'Visit_Frequency', 'Purchase_Count', 'Feedback', 'Sentiment_Score'])

customer_data = generate_synthetic_data()
print(customer_data.head())

## **Step 3: AI-Powered Data Processing & Integration**

In [None]:
customer_data['Gender'] = customer_data['Gender'].map({'Male': 0, 'Female': 1})
scaler = StandardScaler()
numeric_cols = ['Age', 'Annual_Income', 'Spending_Score', 'Visit_Frequency', 'Purchase_Count', 'Sentiment_Score']
customer_data[numeric_cols] = scaler.fit_transform(customer_data[numeric_cols])

## **Step 4: Customer Segmentation Using AI**

In [None]:
kmeans = KMeans(n_clusters=3, random_state=42)
customer_data['Segment'] = kmeans.fit_predict(customer_data[numeric_cols])
sns.scatterplot(x=customer_data['Annual_Income'], y=customer_data['Spending_Score'], hue=customer_data['Segment'], palette='viridis')
plt.title("Customer Segmentation Based on Spending and Income")
plt.show()

## **Step 5: NLP using Hugging Face Transformers for Sentiment Analysis**

In [None]:
def analyze_sentiment(text):
    return sentiment_pipeline(text)[0]['score']

customer_data['Sentiment_Score'] = customer_data['Feedback'].apply(analyze_sentiment)
print(customer_data[['Feedback', 'Sentiment_Score']].head())

## **Step 6: AI-Powered Personalization Using Deep Learning Transformers**

In [None]:
X = customer_data.drop(['Customer_ID', 'Segment', 'Feedback'], axis=1)
y = customer_data['Segment']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## **Step 7: Fine-Tune DistilBERT for Customer Segmentation**

In [None]:
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import torch
from torch.utils.data import DataLoader, TensorDataset
import torch.optim as optim

# Load tokenizer and model
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=3)

# Tokenize customer feedback
tokenized_texts = tokenizer(customer_data['Feedback'].tolist(), padding=True, truncation=True, return_tensors="pt")

# Convert labels to tensor and ensure correct data type
labels = torch.tensor(customer_data['Segment'].values, dtype=torch.long)  # Convert to long

# Create DataLoader
dataset = TensorDataset(tokenized_texts["input_ids"], tokenized_texts["attention_mask"], labels)
train_loader = DataLoader(dataset, batch_size=16, shuffle=True)

# Define optimizer and loss function
optimizer = optim.AdamW(model.parameters(), lr=2e-5)
loss_function = torch.nn.CrossEntropyLoss()

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Training loop
model.train()
for epoch in range(3):
    for batch in train_loader:
        input_ids, attention_mask, labels = [x.to(device) for x in batch]
        optimizer.zero_grad()
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}: Loss = {loss.item()}")

## **Step 8: Predict Customer Segments Using Fine-Tuned Model**

In [None]:
model.eval()
test_texts = tokenizer(customer_data['Feedback'].tolist(), padding=True, truncation=True, return_tensors="pt")
input_ids = test_texts["input_ids"].to(device)
attention_mask = test_texts["attention_mask"].to(device)

with torch.no_grad():
    outputs = model(input_ids, attention_mask=attention_mask)
predictions = torch.argmax(outputs.logits, axis=1).cpu().numpy()
customer_data["Predicted_Segment"] = predictions

## **Step 9: Compare Actual vs. Predicted Segments**

In [None]:
print(customer_data[['Customer_ID', 'Feedback', 'Segment', 'Predicted_Segment']].head())