# **Telco Customer Retention: Churn Prediction and Feedback Insights Using Classification and Topic Modelling**

## I. Import Libraries

In [19]:
import hf_xet
import bertopic
import pandas as pd
import numpy as np
from bertopic import BERTopic
import re
import nltk
import pickle, json

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from sentence_transformers import SentenceTransformer

nltk.download('stopwords')
nltk.download('punkt_tab')
nltk.download('wordnet')
nltk.download('vader_lexicon')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

## II. Model Load

In [20]:
# load numeric column data
with open('num_cols.txt', 'r') as f_num:
  num_cols = json.load(f_num)

# load categorical column data
with open('cat_cols.txt', 'r') as f_cat:
  cat_cols = json.load(f_cat)

# load clasification model
with open('best_model.pkl', 'rb') as f_model:
  best_model = pickle.load(f_model)

model = BERTopic.load("bertopic_model")

## II. Data Input

### A. Input for Classification Models

| Column | User |
| --- | --- |
| customerID | 4500-ruyt |
| gender | male |
| SeniorCitizen | no |
| Partner | no |
| Dependents | no |
| tenure | 12 |
| PhoneService | no |
| MultipleLines | no |
| InternetService | fiber optic |
| OnlineSecurity | yes |
| StreamingTV | yes |
| Contract | month-to-month |
| PaperlessBilling | yes |
| PaymentMethod | electronic check |
| MonthlyCharges | 30.00 |
| TotalCharges | 360.00 |

In [21]:
# user input for classification
customer_id = '4500-ruyt'
gender = 'male'
senior_citizen = 'no'
partner = 'no'
dependents = 'no'
tenure = 12
phone_service = 'no'
multiple_lines = 'no'
internet_service = 'fiber optic'
online_security = 'yes'
streaming_tv = 'yes'
contract = 'month-to-month'
paperless_billing = 'yes'
payment_method = 'electronic check'
monthly_charges = 30.00
total_charges = 360.00

In [22]:
# make it into dataframe
data = pd.DataFrame([{'customer_id': customer_id,
                     'gender': gender,
                     'senior_citizen': senior_citizen,
                     'partner': partner,
                     'dependents': dependents,
                     'tenure': tenure,
                     'phone_service': phone_service,
                     'multiple_lines': multiple_lines,
                     'internet_service': internet_service,
                     'online_security': online_security,
                     'streaming_tv': streaming_tv,
                     'contract': contract,
                     'paperless_billing': paperless_billing,
                     'payment_method': payment_method,
                     'monthly_charges': monthly_charges,
                     'total_charges': total_charges
                     }])

df_class = data.copy()
df_class

Unnamed: 0,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,multiple_lines,internet_service,online_security,streaming_tv,contract,paperless_billing,payment_method,monthly_charges,total_charges
0,4500-ruyt,male,no,no,no,12,no,no,fiber optic,yes,yes,month-to-month,yes,electronic check,30.0,360.0


### B. Input for NLP Models

| Column | User |
| --- | --- |
| customer_feedback | The intenet service is bad For me. it is has expensive monthly charges, might switch to another provider. |

In [23]:
# we will use random review
review = "The intenet service is bad For me. it is has expensive monthly charges, might switch to another provider."

## III. Prediction Process

In [24]:
# function for churn production
def predict_churn(table):
    # predict churn or not
    y_pred_inf = best_model.predict(table)

    # if 1 is churn and 0 is not churn
    if y_pred_inf == 1:
        print('Customer predicted will churn')
    else:
        print('Customer predicted will not Churn')

In [25]:
# prediction result
predict_churn(df_class)

Customer predicted will churn


Explanation:

We get the result that the customers will churn. From the data we use as input, we set the contract type to month-to-month, 12-month tenure, and having no partner. In the Exploratory Data Analysis Section, customers have the tendency to churn, because customers that is a single widowers or divorcees tend to move a lot, and generally have short-term loyalty. We can say the model performed well when we did the inference process.

## IV. Suggested Topic

In [None]:
# create function for show sentiment and best topics
def show_best_topic(text):
    # transform process
    topics, probs = model.transform([text])

    # get top topic index
    topic_index = topics[0]

    # get top keywords
    main_topic_words = model.get_topic(topic_index)
    main_keywords = [word for word, _ in main_topic_words[:5]]  # top 5 keywords

    print("- Suggested Topic")
    if topic_index == -1:
        print("Customers satisfied with the product")
    elif topic_index == 0:
        print("Customers satisfied with internet experience.")
    elif topic_index == 1:
        print("Customers complaining about internet connectivity, customer support, and buggy payment app.")
    elif topic_index == 2:
        print("Customers complaining about added extras with extra charges without warning.")
    elif topic_index == 3:
        print("Customers complaining with overly technical customer support email.")
    elif topic_index == 4:
        print("Customers complaining about renewed contract with different terms.")
    elif topic_index == 5:
        print("Customers complaining about slow respond from customer support.")
    elif topic_index == 6:
        print("Customers complaining about buffering video streaming.")
    elif topic_index == 7:
        print("Customers satisfied with customer support, but with minimal user-friendly documentation.")
    else:
        print("Customers complaining about confusing bills and reconsider to cancelling their subscriptions.")
    print(f"Keywords: {', '.join(main_keywords)}\n")

    # find related topic
    print("- Related Topics (Keywords):")
    similar_topic_indices, _ = model.find_topics(text, top_n=5)
    for id in similar_topic_indices:
        if id == topic_index:
            continue  # skip the main topic
        related_words = model.get_topic(id)
        related_keywords = [word for word, _ in related_words[:5]]

        if id == -1:
            print("Customers satisfied with the product")
        elif id == 0:
            print("Customers satisfied with internet experience.")
        elif id == 1:
            print("Customers complaining about internet connectivity, customer support, and buggy payment app.")
        elif id == 2:
            print("Customers complaining about added extras with extra charges without warning.")
        elif id == 3:
            print("Customers complaining with overly technical customer support email.")
        elif id == 4:
            print("Customers complaining about renewed contract with different terms.")
        elif id == 5:
            print("Customers complaining about slow respond from customer support.")
        elif id == 6:
            print("Customers complaining about buffering video streaming.")
        elif id == 7:
            print("Customers satisfied with customer support, but with minimal user-friendly documentation.")
        else:
            print("Customers complaining about confusing bills and reconsider to cancelling their subscriptions.")
        print(f"Keywords: {', '.join(related_keywords)}")

In [31]:
# give suggested topics
show_best_topic(review)

- Suggested Topic
Customers complaining about internet connectivity, customer support, and buggy payment app.
Keywords: drops, support, ages, terrible, buggy

- Related Topics (Keywords):
Customers satisfied with internet experience.
Keywords: internet, month, monthly, charges, satisfied
Customers satisfied with the product
Keywords: overall, churn, check, would, others
Customers complaining about buffering video streaming.
Keywords: drop, video, everything, looked, calls
Customers complaining about slow respond from customer support.
Keywords: days, usable, right, polite, perfectly


Explanation:

Based on the result we got, the suggested topic from the review is that customers often complain about internet connectivity, customer support, and a buggy payment app, but we knew that the reviews were customers complaining about the quality of internet service and expensive monthly charges. If we look at the related keywords, the topics that are close to customer reviews are. It is influenced by the data that the majority of them are positive sentiments. If we look at related topics, the keywords are much more related to the review we have input is about customers complaining about buffering video streaming, since the reviews are complaining about bad internet services. This result is influenced by the performance of our embedding models, since the semantic similarity score is 37.47. The best embedding model is Qwen3-embedding-model with a semantic similarity score of 81.00.