# **AI Chartbot for Telecom**

chatbot that provides insights from customer data. Using Haystack’s NLP framework, the system processes customer records, stores them in a searchable format, and allows users to ask questions about churn reasons, available offers, and contract details.

# **Key Features:**
1. Data Processing → Cleans and structures customer data, including churn status and offers.

2. Text-Based Document Storage → Converts records into a retrievable format.

3. BM25 Retrieval System → Stores customer data for efficient searching.

4. Chatbot with Q&A Pipeline → Allows users to ask questions about churn, offers, and contracts.

5. Data Persistence → Saves and reloads processed documents for future use.



## **Install Required Packages**

In [8]:
pip install -U pydantic pandas farm-haystack


Collecting pydantic
  Using cached pydantic-2.11.1-py3-none-any.whl.metadata (63 kB)


## **Load & Preprocess Customer Churn Data**

In [2]:
import pandas as pd
from haystack.schema import Document
import pickle

df = pd.read_csv("Updated_7034_with_Offers.csv")
required_columns = ["customerID", "Churn", "comments", "Offers", "Contract",
                    "PaymentMethod", "InternetService", "MonthlyCharges",
                    "TotalCharges", "tenure"]

missing_cols = [col for col in required_columns if col not in df.columns]
if missing_cols:
    raise ValueError(f"❌ Missing columns in dataset: {missing_cols}")

df["TotalCharges"] = pd.to_numeric(df["TotalCharges"], errors="coerce")
df = df.dropna()

docs = []
for _, row in df.iterrows():
    text = f"Customer {row['customerID']} has {'churned' if row['Churn'] == 'Yes' else 'not churned'}. " \
           f"Feedback: {row['comments']}. Offer: {row['Offers']}. Contract: {row['Contract']}. " \
           f"Payment Method: {row['PaymentMethod']}. Monthly Charges: {row['MonthlyCharges']}. " \
           f"Total Charges: {row['TotalCharges']}. Tenure: {row['tenure']} months."

    docs.append(Document(content=text, meta={"customerID": row["customerID"]}))

if not docs:
    raise ValueError("❌ No customer records processed. Check your dataset!")
with open("processed_docs.pkl", "wb") as f:
    pickle.dump(docs, f)

print(f"✅ Processed {len(docs)} customer records and saved.")



✅ Processed 7032 customer records and saved.


## **Store Data in Haystack's In-Memory Document Store**

In [3]:
from haystack.document_stores import InMemoryDocumentStore
document_store = InMemoryDocumentStore(use_bm25=True)
document_store.write_documents(docs)

print("Data stored in Haystack with BM25 enabled!")


Updating BM25 representation...: 100%|██████████| 7032/7032 [00:00<00:00, 27898.71 docs/s]


Data stored in Haystack with BM25 enabled!


## **Implement Question-Answering Pipeline**

In [4]:
from haystack.nodes import BM25Retriever, FARMReader
from haystack.pipelines import ExtractiveQAPipeline
retriever = BM25Retriever(document_store=document_store)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=False)
pipeline = ExtractiveQAPipeline(reader, retriever)
def ask_chatbot(query):
    result = pipeline.run(query=query, params={"Retriever": {"top_k": 5}, "Reader": {"top_k": 1}})
    return result["answers"][0].answer if result["answers"] else "No relevant information found."




The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

## **Test the Chatbot with Example Queries**

In [5]:
print(ask_chatbot("Why did customer 123 churn?"))
print(ask_chatbot("What offer is available for long-term customers?"))


Inferencing Samples: 100%|██████████| 1/1 [00:05<00:00,  5.39s/ Batches]


Despite spending 50.7 monthly, Customer 123 might look for alternative plans


Inferencing Samples: 100%|██████████| 1/1 [00:04<00:00,  4.72s/ Batches]

Long-Term Discount





## **Save Document Store to JSON File**

In [6]:
import json
docs = document_store.get_all_documents()
docs_json = [doc.to_dict() for doc in docs]
with open("customer_churn_store.json", "w") as f:
    json.dump(docs_json, f)

print("Model saved successfully in customer_churn_store.json!")


Model saved successfully in customer_churn_store.json!


## **Load Document Store from JSON File**

In [7]:
from haystack.schema import Document
import json
with open("customer_churn_store.json", "r") as f:
    docs_json = json.load(f)
docs = [Document.from_dict(doc) for doc in docs_json]
document_store = InMemoryDocumentStore(use_bm25=True)
document_store.write_documents(docs)

print("Model loaded successfully!")


Updating BM25 representation...: 100%|██████████| 7032/7032 [00:00<00:00, 19970.66 docs/s]


Model loaded successfully!


#**Conclusion:**
This project provides an AI-powered chatbot that helps businesses understand customer churn patterns and available offers through intelligent search and retrieval.

# **Key Takeaways:**
1. Quick and Accurate Search → Users can query churn reasons, offers, and payment details.

2. Efficient Data Handling → Stores customer records in a structured and searchable way.

3. Future Enhancements → Can be improved with real-time chatbot deployment, advanced retrievers, or deeper sentiment analysis.

With this system, businesses can gain better insights into customer behavior, offer effectiveness, and retention strategies, helping them make data-driven decisions.