<a href="https://colab.research.google.com/github/abraham3333/MLOps-_Titanik/blob/main/MLOps_titanik.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Data Acquisition and Exploration


1.1 Importing Necessary Libraries


In [None]:
# Basic libraries
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

2 Loading the Data

In [None]:
# Load the dataset using pandas
df = pd.read_csv('titanic.csv')

In [None]:
#df = sns.load_dataset('titanic')

1.3 Exploratory Data Analysis (EDA)


In [None]:
# Display the first 5 rows
print(df.head())

# Dataset dimensions
print("Dataset shape:", df.shape)

# Check for missing values
print(df.isnull().sum())

# Basic statistics
print(df.describe())

Visualization Examples:





In [None]:
# Survival count based on the 'survived' column
sns.countplot(x='survived', data=df)
plt.title('Survival Distribution')
plt.show()

# Age distribution
sns.histplot(df['age'].dropna(), bins=30)
plt.title('Age Distribution')
plt.show()

2. Data Preprocessing

2.1 Handling Missing Values


In [None]:
# Drop columns with more than 20% missing values
threshold = len(df) * 0.2
df = df.dropna(axis=1, thresh=threshold)

# Fill remaining missing values appropriately
# For example, fill missing 'age' values with the median
df['age'].fillna(df['age'].median(), inplace=True)

# Fill missing 'embarked' values with the mode
df['embarked'].fillna(df['embarked'].mode()[0], inplace=True)

2.2 Handling Categorical Variables

In [None]:
# Convert categorical variables into dummy/indicator variables
categorical_vars = ['sex', 'embarked', 'class', 'who', 'adult_male', 'deck', 'alone']
df = pd.get_dummies(df, columns=categorical_vars, drop_first=True)

2.3 Dropping Unnecessary Columns


In [None]:
# Drop columns that are not useful for modeling
df.drop(['name', 'ticket', 'fare', 'adult_male', 'alive', 'who', 'deck', 'alone'], axis=1, inplace=True)

2.4 Defining Features and Target Variable

In [None]:
# Target variable
y = df['survived']

# Features
X = df.drop('survived', axis=1)

2.5 Splitting Data into Training and Testing Sets


In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

3. Model Development and Training


3.1 Model Selection and Training


In [None]:
from sklearn.ensemble import RandomForestClassifier

# Define the model
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
model.fit(X_train, y_train)

3.2 Hyperparameter Optimization (Optional)


In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [4, 6, 8],
    'min_samples_split': [2, 5]
}

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, n_jobs=-1, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Best model after hyperparameter tuning
best_model = grid_search.best_estimator_

4. Model Evaluation


In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Make predictions on the test set
y_pred = model.predict(X_test)

# Accuracy score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy Score:", accuracy)

# Classification report
print(classification_report(y_test, y_pred))

# Confusion matrix visualization
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.show()

5. Model Saving


In [None]:
import joblib

# Save the trained model
joblib.dump(model, 'model.joblib')

# Save the feature names
joblib.dump(X.columns, 'features.joblib')

6. API Creation using FastAPI


6.1 Installing Necessary Libraries
Install FastAPI and Uvicorn if you haven't already:

In [None]:
# bash
pip install fastapi uvicorn

6.2 Writing the API Code (app.py)


In [None]:
from fastapi import FastAPI
import joblib
import pandas as pd

app = FastAPI()

# Load the model and features
model = joblib.load('model.joblib')
features = joblib.load('features.joblib')

@app.post('/predict')
def predict(data: dict):
    # Convert incoming data to DataFrame
    df = pd.DataFrame([data])

    # Ensure the DataFrame has the correct columns
    df = df.reindex(columns=features, fill_value=0)

    # Make prediction
    prediction = model.predict(df)
    probability = model.predict_proba(df)

    return {
        'prediction': int(prediction[0]),
        'probability': probability[0].tolist()
    }

6.3 Running the API


In [None]:
# bash
uvicorn app:app --host 0.0.0.0 --port 8000

7. Dockerization


7.1 Creating a Dockerfile

Create a Dockerfile with the following content:



In [None]:
# DOCKER FILE

# Base image
FROM python:3.8-slim

# Set the working directory
WORKDIR /app

# Copy requirements.txt and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy all application files
COPY . .

# Expose port
EXPOSE 80

# Run the app with Uvicorn
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "80"]

7.2 Creating requirements.txt


Include the necessary libraries in requirements.txt:




fastapi
uvicorn
pandas
joblib
scikit-learn

7.3 Building the Docker Image


In [None]:
docker build -t titanic-api .

7.4 Running the Docker Container


In [None]:
docker run -d -p 80:80 titanic-api

This command runs the Docker container in detached mode and maps port 80 of the container to port 80 on the host machine.

8. Preparation for Cloud Deployment


For deploying to a cloud provider (AWS, GCP, Azure, etc.):

Push Docker Image to Cloud Registry: For example, using AWS ECR to store your Docker images.
Deployment with Kubernetes: Use Kubernetes to manage your containers in a scalable way.
Create Deployment and Service YAMLs: Prepare deployment.yaml and service.yaml files for Kubernetes.
Set Up CI/CD Pipeline for Automatic Deployment: Integrate your code repository with continuous deployment tools to automatically deploy changes.

9. CI/CD Pipeline Example (using GitHub Actions)


9.1 Creating a GitHub Actions Workflow File


Create a file named .github/workflows/docker-image.yml in your repository:



Create a file named .github/workflows/docker-image.yml in your repository:



In [None]:
# YAML file
name: Docker Image CI

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - name: Check out code
      uses: actions/checkout@v2

    - name: Log in to Docker Hub
      uses: docker/login-action@v2
      with:
        username: ${{ secrets.DOCKER_USERNAME }}
        password: ${{ secrets.DOCKER_PASSWORD }}

    - name: Build and push Docker image
      uses: docker/build-push-action@v2
      with:
        push: true
        tags: yourdockerhubusername/titanic-api:${{ github.sha }}

Note: Replace yourdockerhubusername with your actual Docker Hub username. Also, make sure to add DOCKER_USERNAME and DOCKER_PASSWORD to your GitHub repository secrets.

