# Machine Learning Case Study: Absenteeism 
#### by Sooyeon Won 

### Part 3: Model Deployment

### Keywords 
- Model Deployment
- Model vs. Module
- Model Prediction


### Contents 

<ul>    
<li><a href="#Preprocessing">1.  Data Preprocessing</a></li>
<li><a href="#Analysis">2.  Machine Learning</a></li>
<li><a href="#Deployment">3.  Model Deployment</a></li>
</ul>


### 3. Model Deployment

In [1]:
# Import the relevant module - customized libraries
from absenteeism_module import *

> Since the absenteeism_module contains Numpy, Pandas, and Sklearn, by importing the module, everything runs smoothly.

**Model vs. Module** 
- Model: The analytical tool applied to solve the business problem
- Module: A software component containing the code that will help to execute the 'model'. According to the python documentation, module is defined as a file containing python definitions and statements with the suffix .py 

In [2]:
# Data before preprocessing 
pd.read_csv('Absenteeism_new_data.csv').head()

Unnamed: 0,ID,Reason for Absence,Date,Transportation Expense,Distance to Work,Age,Daily Work Load Average,Body Mass Index,Education,Children,Pets
0,22,27,01/06/2018,179,26,30,237.656,19,3,0,0
1,10,7,04/06/2018,361,52,28,237.656,27,1,1,4
2,14,23,06/06/2018,155,12,34,237.656,25,1,2,0
3,17,25,08/06/2018,179,22,40,237.656,22,2,2,0
4,14,10,08/06/2018,155,12,34,237.656,25,1,2,0


In [3]:
# The instance of the 'absenteeism_model'  class is now in the variable called 'adj_model'
adj_model = absenteeism_model('model', 'scaler')

> As defined, the module 'absenteeism_model' requires a file ('model'), containing a fine-tuned finalized version of a logistic regression model, and the other file ('scaler'), containing the statistical parameters needed to adjust the magnitude of all numbers we have in this data set. 

In [4]:
# Instance_variable.method()
adj_model.load_and_clean_data('Absenteeism_new_data.csv')

> As defined in the 'absenteeism_model',
>- **.load_and_clean_data()** method will preprocess the entire dataset newly provided.
>- **.predicted_outputs()** method below is  to feed the cleaned data into the model, and deliver the output.

In [5]:
adj_model.predicted_outputs().head()

Unnamed: 0,Reason_1,Reason_2,Reason_3,Reason_4,Month Value,Transportation Expense,Age,Body Mass Index,Education,Children,Pet,Probability,Prediction
0,0,0.0,0,1,6,179,30,19,1,0,0,0.137536,0
1,1,0.0,0,0,6,361,28,27,0,1,4,0.889385,1
2,0,0.0,0,1,6,155,34,25,0,2,0,0.290087,0
3,0,0.0,0,1,6,179,40,22,1,2,0,0.209434,0
4,1,0.0,0,0,6,155,34,25,0,2,0,0.743461,1


In [6]:
# Export the obtained dataset
adj_model.predicted_outputs().to_csv('Absenteeism_predictions.csv', index = False)