The workflow for the project consists of the following key steps:
- Data Loading: Load data from an SQL database using PyODBC.
- Data Preprocessing: Clean and preprocess the dataset for model training.
- Model Training: Train a Logistic Regression model using scikit-learn.
- Model Evaluation: Calculate the model's accuracy.
- MLOps Integration: Log metrics and model to MLflow, and register the model in Azure Machine Learning Studio.


### Step 1: Set up Workspace & Experiment

We begin by connecting to your Azure ML workspace, which acts as the environment for managing experiments, models, and data.


In [1]:
from azureml.core import Workspace, Experiment

# Connect to your workspace
ws = Workspace.from_config()

# Create an experiment in the workspace
experiment = Experiment(workspace=ws, name='customer-churn-prediction')


### Step 2: Data Loading and Preprocessing

- The customer churn data is loaded from an SQL Server using the PyODBC library.
- The data is cleaned by handling missing values and transforming the Customer_Status column to a binary target variable for churn prediction. Additionally, categorical columns are one-hot encoded.

In [2]:
import pandas as pd
import pyodbc

# Connection details for SQL Server
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=group1server.database.windows.net;DATABASE=DEPI_DB;UID=group1;PWD=cust@g100')

# Load data into DataFrame
query = "SELECT * FROM [dbo].[telecom_customer_churn];"
df = pd.read_sql(query, conn)
conn.close()

# Data preprocessing
df = df.dropna()
df['Churn'] = df['Customer_Status'].apply(lambda x: 1 if x == 'Churned' else 0)

# Include actual categorical columns
categorical_columns = ['Gender', 'City', 'Offer', 'Phone_Service', 'Multiple_Lines', 'Internet_Service', 'Internet_Type', 'Online_Security', 'Online_Backup', 'Device_Protection_Plan', 'Premium_Tech_Support', 'Streaming_TV', 'Streaming_Movies', 'Streaming_Music', 'Unlimited_Data', 'Contract', 'Paperless_Billing', 'Payment_Method', 'Churn_Category', 'Churn_Reason']

# One-hot encode categorical columns
df = pd.get_dummies(df, columns=categorical_columns, drop_first=True)
df = df.drop(columns=['Customer_ID', 'Customer_Status'])


### Step 3: Model Training

A Logistic Regression model is trained using the processed dataset. The dataset is split into training and testing sets, and the model is fitted to the training data.


In [3]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

X = df.drop(columns=['Churn'])
y = df['Churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy}")


Model Accuracy: 1.0


### Step 4: Track with MLflow


MLflow is used to log model metrics and the trained logistic regression model. The model is also signed with its input and output signatures.

In [4]:
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature

mlflow.start_run()
mlflow.log_metric("accuracy", accuracy)
signature = infer_signature(X_train, model.predict(X_train))
mlflow.sklearn.log_model(model, "logistic_regression_model", signature=signature, input_example=X_train[:5])
mlflow.end_run()




Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

2024/10/18 13:17:59 INFO mlflow.tracking._tracking_service.client: 🏃 View run bright_octopus_v3kdt0vv at: https://eastus2.api.azureml.ms/mlflow/v2.0/subscriptions/2d57ab39-534b-4ea0-9f64-e8d1c37adc8c/resourceGroups/mm30207021600537-rg/providers/Microsoft.MachineLearningServices/workspaces/group1/#/experiments/94b469b1-b31a-48bf-a59a-f710d728eba3/runs/9f68fda6-3fac-4a40-8166-a8230ccca80d.
2024/10/18 13:17:59 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://eastus2.api.azureml.ms/mlflow/v2.0/subscriptions/2d57ab39-534b-4ea0-9f64-e8d1c37adc8c/resourceGroups/mm30207021600537-rg/providers/Microsoft.MachineLearningServices/workspaces/group1/#/experiments/94b469b1-b31a-48bf-a59a-f710d728eba3.


### Step 5: Register Model in Azure ML


Once the model is trained and logged, it is saved and registered in Azure Machine Learning Studio. This ensures version control and easy deployment in production.


In [5]:
from azureml.core import Workspace, Model
import joblib

# Connect to your workspace
ws = Workspace.from_config()

# Save the model
joblib.dump(model, 'logistic_regression_model.pkl')

# Register the model
model = Model.register(
    workspace=ws, 
    model_name="customer_churn_model", 
    model_path="logistic_regression_model.pkl",
    description="Logistic Regression model for customer churn prediction"
)


Registering model customer_churn_model
