# Operationalising ML

![](https://www.researchgate.net/profile/Philipp-Hartlieb/publication/361258805/figure/fig1/AS:1168965330042880@1655714461815/Main-phases-of-the-ML-life-cycle-targeted-at-operationalising-ML-models-in-production.ppm)

In [2]:
import pandas as pd
df3 = pd.read_csv('datasets/drug200.csv')

In [3]:
df3

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,23,F,HIGH,HIGH,25.355,DrugY
1,47,M,LOW,HIGH,13.093,drugC
2,47,M,LOW,HIGH,10.114,drugC
3,28,F,NORMAL,HIGH,7.798,drugX
4,61,F,LOW,HIGH,18.043,DrugY
...,...,...,...,...,...,...
195,56,F,LOW,HIGH,11.567,drugC
196,16,M,LOW,HIGH,12.006,drugC
197,52,M,NORMAL,HIGH,9.894,drugX
198,23,M,NORMAL,NORMAL,14.020,drugX


# Making it work for 2 inputs. This is your task

In [4]:
test_data = [[28, 'F', 'NORMAL', 'HIGH', 7.798], [61, 'F', 'LOW', 'HIGH', 18.043]]

In [5]:
df3.iloc[3:5,:]

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
3,28,F,NORMAL,HIGH,7.798,drugX
4,61,F,LOW,HIGH,18.043,DrugY


In [6]:
# get_prediction('my_df3_decision_tree.joblib', 
#                'my_df3_encoder.joblib', 
#                'my_df3_label_encoder.joblib', 
#                [
#                    [28, 'F', 'NORMAL', 'HIGH', 7.798], 
#                    [61, 'F', 'LOW', 'HIGH', 18.043]
#                ])

# Answer

In [19]:
from joblib import load, dump

def get_prediction(model_path, encoder_path, label_encoder_path, user_input):
    
    # Let's load our model
    clf = load(model_path) # load and reuse the model
    print('Model Successfully Loaded')

    # Let's load our encoder
    enc = load(encoder_path) # load and reuse the model
    print('Encoder Successfully Loaded')
 
    # Let's load my label encoder
    le = load(label_encoder_path) # load and reuse the model
    print('Label Encoder Successfully Loaded')
    
    # 1. Firstly, create a DataFrame out of the user input
    
    # Now, i know i have told you that you can do this with dictionary comprehensions. 
    # But i was just sneakily trying to teach you about them. 
    # The truth is, you could have easily done the same with pd.DataFrame. Look !!!
    # Hey, don't be mad at me, we all learned something here. 
    
    pd_columns = ['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']

    # Creating DataFrame
    df_temp = pd.DataFrame(user_input, columns = pd_columns)
    print("DataFrame Successfully created from user data.")
    
    # 2. Get your categorical df with df3[['Sex', 'BP', 'Cholesterol']]
    cat_data = df_temp[['Sex', 'BP', 'Cholesterol']]
    
    # 3. Get your numerical df with df3[['Age', 'Na_to_K']]
    num_data = df_temp[['Age', 'Na_to_K']]
    
    # 4. Encode your categorical columns
    enc.transform(cat_data) # This will encode, but give me a sparse matrix in result
    enc.transform(cat_data).toarray() # This will give me the array i want
    
    # Let's create a DataFrame now
    pd.DataFrame(enc.transform(cat_data).toarray(), columns = enc.get_feature_names_out())
    print("DataFrame Successfully Encoded.")
    
    # 5. Save your encoded df.
    df_encoded = pd.DataFrame(enc.transform(cat_data).toarray(), columns = enc.get_feature_names_out())
    
    # 6. Combine your encoded df with your df_num
    df_X = num_data.join(df_encoded)
    print("Data Successfully Transformed to desired format.")
    
    # This about this
    # 7. At step 7, your data looks exactly like the data you used to train your model
    
    # ---------------------- This is Done now ----------------------
    
    # 8. Can you not just do clf.predict(yourdata)
    clf.predict(df_X) # This gives me a label as a prediction in an array
    
    prediction = clf.predict(df_X) # Saving my prediction
    print("Successfully captured prediction.")

    
    # 9. This will give your a label.
    le.inverse_transform(prediction) # This will again give me the output saved in an array. I will extract the 0'th item and return that.
    output = le.inverse_transform(prediction) # Saving my output
    print("Label Generated")
    
    print('------------------------------------------------------')
    
    # 10. You will have to convert that encoded label to the actual label, you can do that with you label encoder. 
    return output

# Let's see if this works now

In [20]:
get_prediction('my_df3_decision_tree.joblib', 
               'my_df3_encoder.joblib', 
               'my_df3_label_encoder.joblib', 
               [
                   [28, 'F', 'NORMAL', 'HIGH', 7.798], 
                   [61, 'F', 'LOW', 'HIGH', 18.043]
               ])

Model Successfully Loaded
Encoder Successfully Loaded
Label Encoder Successfully Loaded
DataFrame Successfully created from user data.
DataFrame Successfully Encoded.
Data Successfully Transformed to desired format.
Successfully captured prediction.
Label Generated
------------------------------------------------------


array(['drugX', 'DrugY'], dtype=object)

In [21]:
# Let's see
get_prediction('my_df3_decision_tree.joblib', 'my_df3_encoder.joblib', 'my_df3_label_encoder.joblib', [[28, 'F', 'NORMAL', 'HIGH', 7.798], [61, 'F', 'LOW', 'HIGH', 18.043]])

Model Successfully Loaded
Encoder Successfully Loaded
Label Encoder Successfully Loaded
DataFrame Successfully created from user data.
DataFrame Successfully Encoded.
Data Successfully Transformed to desired format.
Successfully captured prediction.
Label Generated
------------------------------------------------------


array(['drugX', 'DrugY'], dtype=object)

In [22]:
get_prediction('my_df3_decision_tree.joblib', 'my_df3_encoder.joblib', 'my_df3_label_encoder.joblib', [[28, 'F', 'NORMAL', 'HIGH', 7.798], [61, 'F', 'LOW', 'HIGH', 18.043]])

Model Successfully Loaded
Encoder Successfully Loaded
Label Encoder Successfully Loaded
DataFrame Successfully created from user data.
DataFrame Successfully Encoded.
Data Successfully Transformed to desired format.
Successfully captured prediction.
Label Generated
------------------------------------------------------


array(['drugX', 'DrugY'], dtype=object)

### An explanation, of how i used it 

In [7]:
# Import pandas library
import pandas as pd

In [9]:
# initialize list of lists
data = [['SomeText', 20, 10], ['Sometext2', 5, 10], ['some_text3', 100, 10]]

In [10]:
pd.DataFrame(data,columns=['text', 'number1', 'number2'])

Unnamed: 0,text,number1,number2
0,SomeText,20,10
1,Sometext2,5,10
2,some_text3,100,10


In [13]:
pd.DataFrame([['one_value']], columns=['mycolumns'])

Unnamed: 0,mycolumns
0,one_value


In [15]:
user_data = [23,'F','HIGH','HIGH',25.355]

In [16]:
pd_columns = ['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']

In [18]:
pd.DataFrame([user_data], columns=pd_columns)

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K
0,23,F,HIGH,HIGH,25.355


In [19]:
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age', 'random'])

In [20]:
# Let's look at the dataframe.
df

Unnamed: 0,Name,Age,random
0,SomeText,20,10
1,Sometext2,5,10
2,some_text3,100,10


### For 1 column

In [30]:
# initialize list of lists
data = [['SomeText'], ['Sometext2'], ['some_text3']]

In [31]:
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name'])

In [32]:
# Let's look at the dataframe.
df

Unnamed: 0,Name
0,SomeText
1,Sometext2
2,some_text3


### An explanation for our use case. 

In [22]:
data = [23,'F','HIGH','HIGH',25.355]

In [24]:
df = pd.DataFrame([data], columns = ['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K'])

In [29]:
df

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K
0,23,F,HIGH,HIGH,25.355


# What if we want to read from a csv. 

That would make my life much easier

In [23]:
import pandas as pd
df3 = pd.read_csv('datasets/drug200.csv')

In [24]:
df3

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,23,F,HIGH,HIGH,25.355,DrugY
1,47,M,LOW,HIGH,13.093,drugC
2,47,M,LOW,HIGH,10.114,drugC
3,28,F,NORMAL,HIGH,7.798,drugX
4,61,F,LOW,HIGH,18.043,DrugY
...,...,...,...,...,...,...
195,56,F,LOW,HIGH,11.567,drugC
196,16,M,LOW,HIGH,12.006,drugC
197,52,M,NORMAL,HIGH,9.894,drugX
198,23,M,NORMAL,NORMAL,14.020,drugX


In [25]:
df3 = df3.head()

In [26]:
df3

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,23,F,HIGH,HIGH,25.355,DrugY
1,47,M,LOW,HIGH,13.093,drugC
2,47,M,LOW,HIGH,10.114,drugC
3,28,F,NORMAL,HIGH,7.798,drugX
4,61,F,LOW,HIGH,18.043,DrugY


In [28]:
import numpy as np
df3['Drug'] = pd.Series(np.zeros(5))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3['Drug'] = pd.Series(np.zeros(5))


In [29]:
df3

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,23,F,HIGH,HIGH,25.355,0.0
1,47,M,LOW,HIGH,13.093,0.0
2,47,M,LOW,HIGH,10.114,0.0
3,28,F,NORMAL,HIGH,7.798,0.0
4,61,F,LOW,HIGH,18.043,0.0


In [30]:
df3.to_csv('get_predictions.csv',index=False)

# Let's load our dataset

In [31]:
df4 = pd.read_csv('get_predictions.csv')

In [32]:
df4

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,35,M,HIGH,HIGH,15,
1,25,F,LOW,NORMAL,25,


In [33]:
from joblib import load, dump

def get_prediction(model_path, encoder_path, label_encoder_path, user_file_path):
    
    # Let's load our model
    clf = load(model_path) # load and reuse the model
    print('Model Successfully Loaded')

    # Let's load our encoder
    enc = load(encoder_path) # load and reuse the model
    print('Encoder Successfully Loaded')
 
    # Let's load my label encoder
    le = load(label_encoder_path) # load and reuse the model
    print('Label Encoder Successfully Loaded')
    
    # Let's load the user file
    df3 = pd.read_csv(user_file_path)
    print('User File successfully loaded')
    
    # 2. Get your categorical df with df3[['Sex', 'BP', 'Cholesterol']]
    cat_data = df3[['Sex', 'BP', 'Cholesterol']]
    
    # 3. Get your numerical df with df3[['Age', 'Na_to_K']]
    num_data = df3[['Age', 'Na_to_K']]
    
    # 4. Encode your categorical columns
    enc.transform(cat_data) # This will encode, but give me a sparse matrix in result
    enc.transform(cat_data).toarray() # This will give me the array i want
    
    # Let's create a DataFrame now
    pd.DataFrame(enc.transform(cat_data).toarray(), columns = enc.get_feature_names_out())
    print("DataFrame Successfully Encoded.")
    
    # 5. Save your encoded df.
    df_encoded = pd.DataFrame(enc.transform(cat_data).toarray(), columns = enc.get_feature_names_out())
    
    # 6. Combine your encoded df with your df_num
    df_X = num_data.join(df_encoded)
    print("Data Successfully Transformed to desired format.")
    
    # This about this
    # 7. At step 7, your data looks exactly like the data you used to train your model
    
    # ---------------------- This is Done now ----------------------
    
    # 8. Can you not just do clf.predict(yourdata)
    clf.predict(df_X) # This gives me a label as a prediction in an array
    
    prediction = clf.predict(df_X) # Saving my prediction
    print("Successfully captured prediction.")

    
    # 9. This will give your a label.
    le.inverse_transform(prediction) # This will again give me the output saved in an array. I will extract the 0'th item and return that.
    output = le.inverse_transform(prediction) # Saving my output
    print("Label Generated")
    # 10. You will have to convert that encoded label to the actual label, you can do that with you label encoder. 
    
    # 11. Now that we have our predictions, time to save our predictions to our new columns
    df3['Drug'] = output
    
    # 12. Saving our DataFrame with our predictions
    df3.to_csv(user_file_path,index=False)
    
    print('------------------------------------------------------')

    
    return output

# Let's see if this works now

In [34]:
# Let's see
get_prediction('my_df3_decision_tree.joblib', 
               'my_df3_encoder.joblib', 
               'my_df3_label_encoder.joblib', 
               'get_predictions.csv')

Model Successfully Loaded
Encoder Successfully Loaded
Label Encoder Successfully Loaded
User File successfully loaded
DataFrame Successfully Encoded.
Data Successfully Transformed to desired format.
Successfully captured prediction.
Label Generated
------------------------------------------------------


array(['DrugY', 'DrugY'], dtype=object)

# One Last Problem.
How will the user know which all things are possible.
- Time to solve for the same.
- By you !!!

In [63]:
df3['Sex'].unique()

array(['F', 'M'], dtype=object)

In [64]:
df3['BP'].unique()

array(['HIGH', 'LOW', 'NORMAL'], dtype=object)

In [65]:
df3['Cholesterol'].unique()

array(['HIGH', 'NORMAL'], dtype=object)

In [60]:
df3['Cholesterol'].value_counts()

HIGH      103
NORMAL     97
Name: Cholesterol, dtype: int64

# Now, we move onto scripting

# We meet back at 11:40
![](https://media.tenor.com/obSoMvKiIVwAAAAC/peanuts-snoopy.gif)