**Title: Predicting Abnormal Blood Pressure with Random Forest algorithm with Grid Search CV**

**Description:**
The "Predicting Abnormal Blood Pressure with Random Forest: Grid Search CV Implementation" notebook presents a comprehensive analysis of a dataset containing patient information, focusing on predicting abnormal blood pressure using the Random Forest algorithm with Grid Search Cross-Validation (CV). This interactive document combines code, visualizations, and explanatory text to guide users through the process of data preprocessing, model building, hyperparameter tuning, and evaluation.

**Key Components:**

1. **Data Loading and Preprocessing:**
   - Load the patient dataset containing features such as age, gender, BMI, medical history, and blood pressure readings.
   - Perform data preprocessing tasks, including handling missing values, encoding categorical variables, and scaling numerical features.

2. **Exploratory Data Analysis (EDA):**
   - Explore the dataset to understand the distribution of features, identify outliers, and visualize relationships between variables, with a focus on blood pressure readings and associated factors.

3. **Feature Engineering:**
   - Engineer new features or transform existing ones to capture relevant information for predicting abnormal blood pressure, such as mean arterial pressure or pulse pressure.

4. **Model Building with Random Forest:**
   - Choose Random Forest as the predictive modeling algorithm, known for its ability to handle complex datasets and nonlinear relationships.
   - Train the Random Forest model on the training data and evaluate its performance using standard evaluation metrics.

5. **Hyperparameter Tuning with Grid Search CV:**
   - Define a parameter grid for hyperparameter tuning using Grid Search Cross-Validation to optimize model performance.
   - Perform grid search CV to find the best combination of hyperparameters for the Random Forest model.

6. **Model Evaluation and Interpretation:**
   - Evaluate the performance of the tuned Random Forest model on the testing data using appropriate evaluation metrics.
   - Interpret the results and provide insights into the factors influencing abnormal blood pressure predictions.

**Note:**
The "Predicting Abnormal Blood Pressure with Random Forest: Grid Search CV Implementation" notebook provides a comprehensive and interactive platform for building and evaluating a predictive model for abnormal blood pressure detection. By leveraging Random Forest with Grid Search CV, it aims to optimize model performance and provide valuable insights for healthcare professionals in patient risk assessment and personalized treatment planning.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
dfbp = pd.read_csv(r"Patient_with_abnormal_bloodpressure.csv")

In [3]:
dfbp.head()

Unnamed: 0,Patient_Number,Blood_Pressure_Abnormality,Level_of_Hemoglobin,Genetic_Pedigree_Coefficient,Age,BMI,Sex,Pregnancy,Smoking,Physical_activity,salt_content_in_the_diet,alcohol_consumption_per_day,Level_of_Stress,Chronic_kidney_disease,Adrenal_and_thyroid_disorders
0,1,1,11.28,0.9,34,23,1,1.0,0,45961,48071,,2,1,1
1,2,0,9.75,0.23,54,33,1,,0,26106,25333,205.0,3,0,0
2,3,1,10.79,0.91,70,49,0,,0,9995,29465,67.0,2,1,0
3,4,0,11.0,0.43,71,50,0,,0,10635,7439,242.0,1,1,0
4,5,1,14.17,0.83,52,19,0,,0,15619,49644,397.0,2,0,0


In [4]:
dfbp.isnull().sum()

Patient_Number                      0
Blood_Pressure_Abnormality          0
Level_of_Hemoglobin                 0
Genetic_Pedigree_Coefficient       92
Age                                 0
BMI                                 0
Sex                                 0
Pregnancy                        1558
Smoking                             0
Physical_activity                   0
salt_content_in_the_diet            0
alcohol_consumption_per_day       242
Level_of_Stress                     0
Chronic_kidney_disease              0
Adrenal_and_thyroid_disorders       0
dtype: int64

In [5]:
dfbp = dfbp.drop(["Pregnancy"],axis=1)

In [6]:
dfbp.isnull().sum()

Patient_Number                     0
Blood_Pressure_Abnormality         0
Level_of_Hemoglobin                0
Genetic_Pedigree_Coefficient      92
Age                                0
BMI                                0
Sex                                0
Smoking                            0
Physical_activity                  0
salt_content_in_the_diet           0
alcohol_consumption_per_day      242
Level_of_Stress                    0
Chronic_kidney_disease             0
Adrenal_and_thyroid_disorders      0
dtype: int64

In [7]:
dfbp.Genetic_Pedigree_Coefficient.value_counts()

0.86    32
0.13    30
0.63    28
0.56    27
0.17    27
        ..
0.78    11
0.91    11
0.16    10
0.15     9
0.65     9
Name: Genetic_Pedigree_Coefficient, Length: 101, dtype: int64

In [8]:
dfbp.Genetic_Pedigree_Coefficient = dfbp.Genetic_Pedigree_Coefficient.fillna(dfbp.Genetic_Pedigree_Coefficient.mean())

In [9]:
dfbp.alcohol_consumption_per_day.value_counts()

253.0    11
401.0    10
302.0    10
144.0    10
485.0     9
         ..
21.0      1
406.0     1
346.0     1
244.0     1
326.0     1
Name: alcohol_consumption_per_day, Length: 488, dtype: int64

In [10]:
dfbp.alcohol_consumption_per_day = dfbp.alcohol_consumption_per_day.fillna(dfbp.alcohol_consumption_per_day.mean())

In [11]:
dfbp.isnull().sum()

Patient_Number                   0
Blood_Pressure_Abnormality       0
Level_of_Hemoglobin              0
Genetic_Pedigree_Coefficient     0
Age                              0
BMI                              0
Sex                              0
Smoking                          0
Physical_activity                0
salt_content_in_the_diet         0
alcohol_consumption_per_day      0
Level_of_Stress                  0
Chronic_kidney_disease           0
Adrenal_and_thyroid_disorders    0
dtype: int64

In [12]:
dfbp.head()

Unnamed: 0,Patient_Number,Blood_Pressure_Abnormality,Level_of_Hemoglobin,Genetic_Pedigree_Coefficient,Age,BMI,Sex,Smoking,Physical_activity,salt_content_in_the_diet,alcohol_consumption_per_day,Level_of_Stress,Chronic_kidney_disease,Adrenal_and_thyroid_disorders
0,1,1,11.28,0.9,34,23,1,0,45961,48071,251.008532,2,1,1
1,2,0,9.75,0.23,54,33,1,0,26106,25333,205.0,3,0,0
2,3,1,10.79,0.91,70,49,0,0,9995,29465,67.0,2,1,0
3,4,0,11.0,0.43,71,50,0,0,10635,7439,242.0,1,1,0
4,5,1,14.17,0.83,52,19,0,0,15619,49644,397.0,2,0,0


In [13]:
dfbp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 14 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Patient_Number                 2000 non-null   int64  
 1   Blood_Pressure_Abnormality     2000 non-null   int64  
 2   Level_of_Hemoglobin            2000 non-null   float64
 3   Genetic_Pedigree_Coefficient   2000 non-null   float64
 4   Age                            2000 non-null   int64  
 5   BMI                            2000 non-null   int64  
 6   Sex                            2000 non-null   int64  
 7   Smoking                        2000 non-null   int64  
 8   Physical_activity              2000 non-null   int64  
 9   salt_content_in_the_diet       2000 non-null   int64  
 10  alcohol_consumption_per_day    2000 non-null   float64
 11  Level_of_Stress                2000 non-null   int64  
 12  Chronic_kidney_disease         2000 non-null   i

In [14]:
#sampling

In [15]:
from sklearn.model_selection import train_test_split

In [16]:
dfbp_train,dfbp_test = train_test_split(dfbp,test_size=.25)

In [17]:
dfbp_train_x = dfbp_train.iloc[:,2:]
dfbp_train_y = dfbp_train.iloc[:,1]

In [18]:
dfbp_test_x = dfbp_test.iloc[:,2:]
dfbp_test_y = dfbp_test.iloc[:,1]

In [19]:
#model building
from sklearn.ensemble import RandomForestClassifier

In [20]:
ra = RandomForestClassifier()

In [21]:
ra.fit(dfbp_train_x,dfbp_train_y) 

In [22]:
#prediction

In [23]:
pred_test = ra.predict(dfbp_test_x)

In [24]:
pred_test

array([1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1,
       0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1,
       0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0,
       0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0,
       0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0,
       0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1,
       1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1,
       0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0,
       1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1,

In [25]:
#Evaluation

In [26]:
from sklearn.metrics import confusion_matrix

In [27]:
dfbp_tab = confusion_matrix(dfbp_test_y,pred_test)

In [28]:
dfbp_tab

array([[227,  35],
       [ 31, 207]], dtype=int64)

In [29]:
#2 accuracy

In [30]:
from sklearn.metrics import accuracy_score
accuracy_score(dfbp_test_y,pred_test)*100

86.8

In [31]:
from sklearn.tree import DecisionTreeClassifier

In [32]:
trtest = DecisionTreeClassifier()

In [None]:
from sklern.model_Selection import GridSearchCv

In [33]:
from sklearn.model_selection import GridSearchCV

In [34]:
search_dict = {"criterion":["entropy","gini"],
               "max_depth":(5,6,7,8,9,10),
               "min_samples_split":(40,50,60,70,80,90,110,120),
               "class_weight":("balanced",None)
              }

In [35]:
grid = GridSearchCV(trtest,param_grid =search_dict)
grid.fit(dfbp_train_x,dfbp_train_y)

In [36]:
grid.best_params_

{'class_weight': 'balanced',
 'criterion': 'gini',
 'max_depth': 6,
 'min_samples_split': 80}

In [37]:
trcvbp = DecisionTreeClassifier(class_weight='balanced',criterion ='gini',max_depth = 7,min_samples_split = 50)

In [38]:
trcvbp.fit(dfbp_train_x,dfbp_train_y)

In [53]:
predgrid_test = grid.predict(dfbp_test_x)

In [54]:
np.set_printoptions(threshold=np.inf)

In [55]:
predgrid_test

array([1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1,
       0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
       0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0,
       0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0,
       0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1,
       1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
       0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0,
       1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0,
       0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1,

In [58]:
dfbpgrid_tab = confusion_matrix(dfbp_test_y,predgrid_test)

In [59]:
dfbpgrid_tab

array([[235,  27],
       [ 39, 199]], dtype=int64)

In [60]:
accuracy_score(dfbp_test_y,predgrid_test)*100

86.8