# Business Problem

This project intends to increase adoption and open up a new revenue stream from an underutilised feature, this initiative aims to assist telecommunication operators in Kenya in identifying the clients who are most likely to use voicemail services.

# Business Understanding

Despite its potential to improve customer communication and increase revenue for telecommunication carriers, voicemail is still a function that mobile users in Kenya underutilise.  The goal of this research is to assist telecommunication providers in determining which clients are more likely to use voicemail services.  The company may create customised advertisements that market voicemail to the appropriate audience by knowing the usage habits and characteristics of its customers.  In a competitive telecommunication market, this strategy boosts service adoption, encourages customer involvement, and creates a new source of income.

# Data Understanding

The dataset includes data about customers from a telecommunication operator, such as plan kinds, use trends, and interactions with customer support.  There are 3,333 rows and 21 columns.
- Our target variable is `voice mail plan` and it shows if a customer subscribes to a voicemail plan or not
- Our features(Independent variables) are `account length`, `international plan`, `total intl minutes`, `total intl calls`, `total intl charge` 

# Data Preparation

In [1]:
# importing the necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix


In [2]:
# Load the data
df = pd.read_csv("bigml_59c28831336c6604c800002a.csv")

In [3]:
# Drop irrelevant or leakage columns
df = df.drop(columns=['phone number', 'state', 'area code', 'number vmail messages', 'churn'])

In [4]:
# Encode binary categorical variables
df['voice mail plan'] = df['voice mail plan'].map({'yes': 1, 'no': 0})
df['international plan'] = df['international plan'].map({'yes': 1, 'no': 0})

In [5]:
# Define features and target
X = df.drop(columns='voice mail plan')
y = df['voice mail plan']

In [6]:
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [7]:
# Scale numeric features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [8]:
# Optional: convert back to DataFrame for readability
X_train_scaled = pd.DataFrame(X_train_scaled, columns=X.columns)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X.columns)

In [9]:
# Confirm dimensions
print("Train shape:", X_train_scaled.shape)
print("Test shape:", X_test_scaled.shape)

Train shape: (2666, 15)
Test shape: (667, 15)


# Testing code

# Modelling

# Evaluation