For this model to predict the presence or absence of any sleep disorder, the following details are needed:

* Gender - Male/Female
* Age 	
* Occupation 
* sleep duration - in hours 
* sleep quality - an integer ranged from 1 to 10	
* physical activity level - in min/day	
* stress level - an integer ranged from 1 to 10	
* BMI category - Normal/Obese/Overweight 	
* BP - a string in the form of 'Upper/Lower' or systolic pressure over diastolic pressure
* heart rate - in bpm	
* daily steps 


Some limilations of this model:
* The model is not trained for people with underwight BMI because of non-data availability.
* The model was mainly trained on people with high and normal BP range.
* Although the data was unbalanced with most people without any sleep disorders (219 out of 374 to be exact), the model accurately predicts the presence of any sleep disorder in the individual.
    * It is to be noted, though, that the model might still misjudge the exact disorder - Sleep Apnea or Insomnia. 
    * This can be improved with more data or by increasing/obtaining some other features that particularly impact their diagnosis.
    * Also, sources suggest there is a little confusion in diagnosis of these disorders irrespective of the involvement of machine learning approaches. 


In [1]:
pred_data_1 = [1,'Male',28,'Sales rep',5.9,4,30,8,'Obese','140/90',85,3000]
pred_data_2 = [2,'Male',28,'Doctor',6.2,6,60,8,'Normal','125/80',75,10000]

In [2]:
import joblib
import pandas as pd
import numpy as np

# loading our original set of columns as well as scaled+OHE columns needed for the model 
original_cols_lst = joblib.load('original_col_names.pkl')
scaled_cols = joblib.load('scaled_col_names.pkl')

# make a list of the input data 
data_lst = [pred_data_1,pred_data_2]
data_df = pd.DataFrame(data=data_lst,columns=original_cols_lst)

# make changes to the blood pressure column to determine bp categories 
# although, we need to keep in mind that this model has been trained on people with high and normal bp

data_df[['upper','lower']] = data_df['Blood Pressure'].str.split('/',expand=True)
data_df[['upper','lower']] = data_df[['upper','lower']].astype(int)

def bld_press(upp,low):
    if ((upp < 90) & (low < 60)):
        return 'Low'
    elif ((upp < 120) & (low < 80)):
        return 'Normal'
    elif ((120 <= upp < 130) & (low < 80)):
        return 'Elevated'
    elif ((upp >= 130) or (low >= 80)):
        return 'High'
    
data_df['BP Category'] = np.vectorize(bld_press)(data_df['upper'],data_df['lower'])

# removing columns we don't need 
data_df_rev = data_df.drop(['Person ID','Occupation','Blood Pressure', 'upper', 'lower'],axis=1)

# loading our model now and finally, predicting

new_cols = joblib.load('scaled_col_names.pkl')
loaded_model = joblib.load('final_clf_model.pkl')

loaded_model.predict(data_df_rev)

array(['Insomnia', 'No disorder'], dtype=object)