<a href="https://colab.research.google.com/github/amzad-786githumb/AI_and_ML_by-Microsoft/blob/main/33_Implementing_logging__in_ML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h2> Tasks:</h2>

*  Set up Python’s logging module to capture and store logs.

*  Implement logging at key stages of the machine learning pipeline, including data preprocessing, model training, and predictions.

*  Log errors and exceptions to a file for debugging purposes.

<h3>Step 1: Set up logging</h3>

In [1]:
import logging

# Set up logging to a file
logging.basicConfig(filename='ml_pipeline.log', level=logging.INFO)

# Example log message
logging.info("Logging setup complete.")

<h3>Step 2: Log data preprocessing</h3>

In [2]:
import pandas as pd

# Log the start of data loading
logging.info("Loading dataset...")

# Load the dataset
df = pd.read_csv('/content/Customer_churn.csv')
logging.info("Dataset loaded successfully.")

# Log the start of preprocessing
logging.info("Starting data preprocessing...")

# Example preprocessing: handling missing values
df.fillna(0, inplace=True)
logging.info("Missing values filled with 0.")

# Log the completion of preprocessing
logging.info("Data preprocessing completed.")

<h3>Step 3: Log model training</h3>

In [4]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Log the start of model training
logging.info("Starting model training...")

try:
    # Assuming 'df' is your DataFrame from the previous step
    # Define features (X) and target (y)
    # You'll need to replace 'TargetColumn' with the actual name of your target variable column
    # and handle categorical features if necessary
    X = df.drop('Churn', axis=1)  # Replace 'Churn' with your target column name
    y = df['Churn'] # Replace 'Churn' with your target column name

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train the decision tree model
    model = DecisionTreeClassifier()
    # Convert all columns in X_train to numeric type, coercing errors to NaN
    X_train = X_train.apply(pd.to_numeric, errors='coerce')
    # Drop columns with NaN values that resulted from coercion if any non-numeric data was present
    X_train = X_train.dropna(axis=1)
    # Do the same for X_test to ensure consistency
    X_test = X_test.apply(pd.to_numeric, errors='coerce')
    X_test = X_test.dropna(axis=1)

    # Check if there are still features left to train on
    if not X_train.empty:
        model.fit(X_train, y_train)
        logging.info("Model trained successfully.")

        # Example logging of training accuracy (if applicable)
        accuracy = model.score(X_train, y_train)
        logging.info(f"Training accuracy: {accuracy:.2f}")
    else:
        logging.error("No numeric features left after preprocessing to train the model.")

except Exception as e:
    logging.error(f"Error during model training: {e}")

<h3>Step 5: Log errors and exceptions<h3>

In [5]:
# Example: logging an exception during data validation
def validate_data(data):
    try:
        if not isinstance(data, pd.DataFrame):
            raise ValueError("Input must be a pandas DataFrame.")
        logging.info("Data validation successful.")
    except ValueError as e:
        logging.error(f"Data validation error: {e}")

# Validate the dataset
validate_data(df)