# Project: kedro_iris_ml_ops

### Author: Dale Jacques
### Contact: djacques@uwalumni.com
### Repository: https://github.com/AIgenVectorLabs/kedro_iris_ml_ops
#### Description: This notebook demonstrates concepts like the Kedro "Session", "Context", and "Catalog" that will be required when deploying our model into production. Specifically, it shows how a Kedro pipeline can be parameterized, run, and results can be extracted. 

In [1]:
# Load the context, then load and examine our training set from the catalog 
my_context = session.load_context()
my_context.io.load("iris_data")

2021-08-08 19:04:07,164 - kedro.io.data_catalog - INFO - Loading data from `iris_data` (CSVDataSet)...


  warn(


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica


In [2]:
# Run the training pipeline
session.run(pipeline_name="train")

2021-08-08 19:04:07,237 - root - INFO - ** Kedro project kedro_iris_ml_ops
2021-08-08 19:04:07,679 - kedro.io.data_catalog - INFO - Loading data from `iris_data` (CSVDataSet)...
2021-08-08 19:04:07,683 - kedro.io.data_catalog - INFO - Loading data from `params:test_data_ratio` (MemoryDataSet)...
2021-08-08 19:04:07,684 - kedro.pipeline.node - INFO - Running node: split: split_data([iris_data,params:test_data_ratio]) -> [test_x,test_y,train_x,train_y]
2021-08-08 19:04:07,703 - kedro.io.data_catalog - INFO - Saving data to `train_x` (MemoryDataSet)...
2021-08-08 19:04:07,709 - kedro.io.data_catalog - INFO - Saving data to `train_y` (MemoryDataSet)...
2021-08-08 19:04:07,712 - kedro.io.data_catalog - INFO - Saving data to `test_x` (MemoryDataSet)...
2021-08-08 19:04:07,715 - kedro.io.data_catalog - INFO - Saving data to `test_y` (MemoryDataSet)...
2021-08-08 19:04:07,723 - kedro.runner.sequential_runner - INFO - Completed 1 out of 7 tasks
2021-08-08 19:04:07,731 - kedro.io.data_catalog - 

{}

In [3]:
import pandas as pd
from kedro.framework.session import KedroSession

# Create a pandas dataframe with new observations to predict
predict_input = pd.DataFrame({
    "sepal_length": [5.1, 5.0, 4.5, 4.8, 6.2, 1],
    "sepal_width" : [3.2, 3.3, 3.5, 3.7, 3.2, 1],
    "petal_length": [1.3, 1.3, 1.4, 1.4, 5.2, 1],
    "petal_width": [0.2, 0.2, 0.2, 0.2, 2.0, 1]
})

predict_input

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.2,1.3,0.2
1,5.0,3.3,1.3,0.2
2,4.5,3.5,1.4,0.2
3,4.8,3.7,1.4,0.2
4,6.2,3.2,5.2,2.0
5,1.0,1.0,1.0,1.0


In [4]:
# Close an existing session
session.close()

# Execute our prediction pipeline on new data:
with KedroSession.create(package_name="kedro_iris_ml_ops", project_path="../", extra_params={"prediction_input": predict_input}) as session:
    
    session.run(pipeline_name="predict")

2021-08-08 19:04:08,439 - kedro.framework.session.store - INFO - `save()` not implemented for `BaseSessionStore`. Skipping the step.
2021-08-08 19:04:08,758 - kedro.framework.session.store - INFO - `read()` not implemented for `BaseSessionStore`. Assuming empty store.
2021-08-08 19:04:08,825 - root - INFO - ** Kedro project kedro_iris_ml_ops
2021-08-08 19:04:08,861 - kedro.io.data_catalog - INFO - Loading data from `params:prediction_input` (MemoryDataSet)...
2021-08-08 19:04:08,862 - kedro.pipeline.node - INFO - Running node: predict_input_validation([params:prediction_input]) -> [predict_df]
2021-08-08 19:04:08,869 - kedro.io.data_catalog - INFO - Saving data to `predict_df` (MemoryDataSet)...
2021-08-08 19:04:08,879 - kedro.runner.sequential_runner - INFO - Completed 1 out of 3 tasks
2021-08-08 19:04:08,885 - kedro.io.data_catalog - INFO - Loading data from `predict_df` (MemoryDataSet)...
2021-08-08 19:04:08,893 - kedro.io.data_catalog - INFO - Loading data from `scaler` (PickleData

In [5]:
# Extract predictions from our catalog
session.load_context().catalog.load("predictions")

2021-08-08 19:04:09,066 - kedro.io.data_catalog - INFO - Loading data from `predictions` (PickleDataSet)...


array([0, 0, 0, 0, 2, 1])