# Predicting One Variable Based on Another Variable (i.e. Choosing the Right Medication to Avoid Side Effects)

In this notebook, we will explore two approaches to solve a prediction problem. Our goal is to accurately predict the value of one variable based on specific characteristics. To illustrate this more clearly, let's consider a simple example: in the medical field, we need to choose the appropriate medication for a patient based on their characteristics to avoid undesirable side effects. However, this prediction model can be applied to various scenarios, especially in the field of medicine.

# generate random data

"We randomly generate this data to better explain the problem. The chosen variables are not relevant. We generate two numerical variables, 'White Blood Cells' and 'Hemoglobin,' and a categorical variable, 'Sex' (0 and 1). We also generate a binary variable for medication (0 and 1) for two types of drugs. Finally, we generate a categorical variable, 'Side Effects' (0 and 1), where 0 means no adverse effects. Our goal is to select the wright 'medication' where 'side effects' = 0

In [132]:
import pandas as pd
import numpy as np

n = 1000  # Nombre d'échantillons

np.random.seed(42) 
Wb = np.random.uniform(10, 20, n)  
Hb = np.random.uniform(12, 18, n)  

sexe = np.random.choice([0, 1], n)

medication = np.random.choice([0, 1], n)

Side_Effects = np.random.choice([0, 1], n)

# Création de la base de données
df = pd.DataFrame({'Wb': Wb, 'Hb': Hb, 'sexe': sexe, 'medication': medication, 'Side_Effects': Side_Effects})

df.head()

Unnamed: 0,Wb,Hb,sexe,medication,Side_Effects
0,13.745401,13.110798,1,0,0
1,19.507143,15.251406,0,0,0
2,17.319939,17.237675,0,0,0
3,15.986585,16.393349,0,0,1
4,11.560186,16.839367,1,1,0


# Approach 1 :

For this approach, we divide our data into two parts. One part where 'side_effects' = 0, and the other where 'side_effects' = 1. For each of these datasets, we create two models that would predict the medication for the subjects. The first one will predict the medication that does not cause adverse effects, and the second one will predict the medication that causes adverse effects. Finally, we will add the two probabilities of predictions to determine which medication to administer to the patient to avoid adverse effects.

## model no side effect

In [133]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

NoSide_df = df[df['Side_Effects'] == 0]

X = NoSide_df[['Wb', 'Hb', 'sexe']]
y = NoSide_df['medication'] 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

modelNoSide = RandomForestClassifier(random_state=42)

modelNoSide.fit(X_train, y_train)

## model side effect

In [134]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

# Filtrage des échantillons où 'effet_indesirables' est égal à 0
Side_df = df[df['Side_Effects'] == 1]

X = Side_df[['Wb', 'Hb', 'sexe']]
y = Side_df['medication']  

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

modelSide = RandomForestClassifier(random_state=42)

modelSide.fit(X_train, y_train)

## New data : 

In [135]:
new_Wb =19.74
new_Hb = 19.11	
new_sexe = 0

new_data = pd.DataFrame({'Wb': [new_Wb], 'Hb': [new_Hb], 'sexe': [new_sexe]})

## prediction No Side effect

In [136]:
prediction_medicamentNoside = modelNoSide.predict(new_data)

proba_prediction_medicamentNoside = modelNoSide.predict_proba(new_data)

probability_class_NOSide0 = proba_prediction_medicamentNoside[0, 0] 
probability_class_NoSide1 = proba_prediction_medicamentNoside[0, 1]  

## prediction Side effect

In [137]:
prediction_medicamentSide = modelSide.predict(new_data)

# Utilisation du modèle pour faire une prédiction de probabilité sur le médicament approprié
proba_prediction_medicamentSide = modelSide.predict_proba(new_data)

# La variable proba_prediction_medicament contient les probabilités pour chaque classe de sortie.
probability_class_Side0 = proba_prediction_medicamentSide[0, 0] 
probability_class_Side1 = proba_prediction_medicamentSide[0, 1]  

## combined probabilities

In [138]:
if prediction_medicamentNoside[0] != prediction_medicamentSide[0]: 
    max_probability_modelNoSide = max(probability_class_NOSide0, probability_class_NoSide1)
    max_probability_modelSide = max(probability_class_Side0, probability_class_Side1)
    average_max_probability = (max_probability_modelNoSide + max_probability_modelSide) / 2.0
    if average_max_probability >= 0.5 :
        print("medication =", prediction_medicamentNoside[0], "does not have side effects with a probability of ",
            average_max_probability )
    else: 
        print("medication =",1- prediction_medicamentNoside[0], "does not have side effects with a probability of ",
            1-average_max_probability )
else:
    max_probability_modelNoSide = max(probability_class_NOSide0, probability_class_NoSide1)
    min_probability_modelSide = min(probability_class_Side0, probability_class_Side1)
    average_max_probability = (max_probability_modelNoSide + min_probability_modelSide) / 2.0
    if average_max_probability >= 0.5 :
        print("medication =", prediction_medicamentNoside[0], "does not have side effects with a probability of ",
            average_max_probability )
    else: 
        print("medication =",1- prediction_medicamentNoside[0], "does not have side effects with a probability of ",
            1-average_max_probability )

medication = 1 does not have side effects with a probability of  0.615


# Approach 2 :

In this approach, we will create a classification model that uses patient characteristics and the type of medication as independent variables to predict whether the patient will have side effects or not. Once the model is created, we will test new data using a loop that tries both medications. So, in the end, we will have the probability of not experiencing side effects with each medication.

In [139]:
# Séparation des caractéristiques (X) et de la variable cible (y)
X = df[['Wb', 'Hb', 'sexe','medication']]
y = df['Side_Effects']  # Variable cible : 0 ou 1 pour le médicament

# Séparation des données en ensembles d'entraînement et de test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Création du modèle RandomForestClassifier
model= RandomForestClassifier(random_state=42)

# Entraînement du modèle sur les données d'entraînement
model.fit(X_train, y_train)

In [140]:
new_data = pd.DataFrame({
    'Wb': [19.23],
    'Hb': [20.15],
    'sexe': [0],
})

# Create a list of values to test for 'new_medication' (0 and 1)
medication_values = [0, 1]

# Loop through the 'new_medication' values and make predictions
for medication_value in medication_values:
    new_data['medication'] = medication_value
    predictions = model.predict(new_data)
    
    # Check if the prediction is 0
    predict_proba = model.predict_proba(new_data) 
    print('for medication = ' ,medication_value,'has no side effect with propability = ', predict_proba[:, 0] )

for medication =  0 has no side effect with propability =  [0.66]
for medication =  1 has no side effect with propability =  [0.37]
