# Data Drift Detection:
* This code aims to develop a predictive model that estimates the probability of default on loans using historical loan data.

## Data Preparation:
* The baseline loan data is loaded from a CSV file named baseline_data.csv and converted to a NumPy array.
* This baseline data will be used to compare against new data to detect any data drift over time.
## API Development:
* The code uses the FastAPI framework to build an API with two endpoints:
* /predict: This endpoint accepts a list of input data and returns the prediction made by the trained model.
* /detect_data_drift: This endpoint accepts a list of new data and checks for data drift compared to the baseline data.
## Data Drift Detection:
* The detect_data_drift function uses the Kolmogorov-Smirnov (KS) test to compare the baseline data and the new data.
* The KS test is a non-parametric test that compares the cumulative distribution functions of two samples to determine if they come from the same underlying distribution.
* If the p-value of the KS test is less than the specified significance level (default is 0.05), the function returns True, indicating that data drift has been detected.
## Prediction Endpoint:
* The /predict endpoint loads the trained model from a file named trained_model.pkl.
* It then performs the prediction on the input data provided in the request and returns the prediction results.
## Data Drift Endpoint:
* The /detect_data_drift endpoint receives the new data in the request and compares it to the baseline data using the detect_data_drift function.
* The function logs the detection process and returns a JSON response indicating whether data drift has been detected or not.
## Logging:
* The code sets up basic logging configuration to log the data drift detection process.
## Main Execution:
* The if __name__ == "__main__": block is used to run the FastAPI application using the Uvicorn server.
* The nest_asyncio.apply() function is called to ensure compatibility with the asyncio event loop.
* This code provides a foundation for building a loan default prediction model and monitoring data drift over time. 

## Conclusion
* The /predict endpoint can be used to make predictions on new loan data, while the /detect_data_drift endpoint can be used to monitor for any changes in the data distribution that may affect the model's performance. By regularly checking for data drift, financial institutions can ensure that their predictive models remain accurate and reliable over time.



In [7]:
import numpy as np
import pandas as pd
from scipy.stats import ks_2samp
from fastapi import FastAPI
import uvicorn
from pydantic import BaseModel
from typing import List
import joblib
import asyncio

# Load the CSV file and save as pickle
baseline_data_csv = pd.read_csv('baseline_data.csv')
baseline_data = baseline_data_csv.to_numpy()  # convert to numpy array

app = FastAPI()

def detect_data_drift(baseline_data, new_data, alpha=0.05):
    """
    Detect data drift using the Kolmogorov-Smirnov test.
    
    Args:
        baseline_data (np.ndarray): Baseline data sample.
        new_data (np.ndarray): New data sample.
        alpha (float): Significance level for the test. Default is 0.05.
    
    Returns:
        bool: True if data drift is detected, False otherwise.
    """
    _, p_value = ks_2samp(baseline_data, new_data)
    return p_value < alpha

class PredictionRequest(BaseModel):
    input_data: List[List[float]]

@app.post("/predict")
async def predict(request: PredictionRequest):
    # Load your trained model
    your_model = joblib.load('trained_model.pkl')

    # Perform the prediction
    input_data = np.array(request.input_data)
    prediction = your_model.predict(input_data)
    return {"prediction": prediction.tolist()}

class DataDriftRequest(BaseModel):
    new_data: List[List[float]]

@app.post("/detect_data_drift")
async def detect_data_drift_endpoint(request: DataDriftRequest):
    # Check for data drift
    new_data = np.array(request.new_data)
    if detect_data_drift(baseline_data, new_data):
        return {"data_drift": True}
    else:
        return {"data_drift": False}
    
import logging

logging.basicConfig(level=logging.INFO)

@app.post("/detect_data_drift")
async def detect_data_drift_endpoint(request: DataDriftRequest):
    # Check for data drift
    new_data = np.array(request.new_data)
    logging.info("Detecting data drift...")
    if detect_data_drift(baseline_data, new_data):
        logging.info("Data drift detected!")
        return {"data_drift": True}
    else:
        logging.info("No data drift detected.")
        return {"data_drift": False}



if __name__ == "__main__":
    import uvicorn
    import nest_asyncio
    nest_asyncio.apply()
    uvicorn.run(app, host="0.0.0.0", port=8080)                     

INFO:     Started server process [10424]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
