# *[Title]*

**Author:** [Computas AS](https://github.com/computas) ([kontakt@computas.com](mailto:kontakt@computas.com))

**Achievement:** *[Short, preferably single-line, statement of what has been accomplished. For example, "Assuming ... and using ... we show that ...".]*

## Introduction

*[Short introduction to this notebook. State motivation, assumptions, method and results in more detail than in the one-line achievement above.]*

# Reproducibility and code formatting

In [1]:
# To watermark the environment
%load_ext watermark

# For automatic code formatting in jupyter lab.
%load_ext lab_black

# For automatic code formatting in jupyter notebook
%load_ext nb_black

In [None]:
%watermark -gb -iv -m -v

# Analysis

In [2]:
# Imports
# -------

# System
import sys

# Logging
import logging
logging.basicConfig(format='%(message)s', level=logging.INFO, stream=sys.stdout)

# Other packages
import zenml
from zenml.core.datasources.csv_datasource import CSVDatasource
from zenml.core.pipelines.training_pipeline import TrainingPipeline
from zenml.core.steps.evaluator.tfma_evaluator import TFMAEvaluator
from zenml.core.steps.preprocesser.standard_preprocesser.standard_preprocesser import StandardPreprocesser
from zenml.core.steps.split.random_split import RandomSplit
from zenml.core.steps.trainer.feedforward_trainer.trainer import FeedForwardTrainer

Using Any for unsupported type: typing.Sequence[~T]


In [3]:
training_pipeline = TrainingPipeline(name='Quickstart')

2021-01-22 13:02:10,842 — zenml.core.pipelines.base_pipeline — INFO — Pipeline Quickstart created.


In [4]:
# Add a datasource. This will automatically track and version it.
ds = CSVDatasource(name='Pima Indians Diabetes Dataset', 
                   path='gs://zenml_quickstart/diabetes.csv')
training_pipeline.add_datasource(ds)

2021-01-22 13:02:33,511 — zenml.core.datasources.base_datasource — INFO — Datasource Pima Indians Diabetes Dataset created.


In [5]:
# Add a random 70/30 train-eval split
training_pipeline.add_split(RandomSplit(split_map={'train': 0.7, 'eval': 0.3}))

In [6]:
# StandardPreprocesser() has sane defaults for normal preprocessing methods
training_pipeline.add_preprocesser(
    StandardPreprocesser(
        features=['times_pregnant', 'pgc', 'dbp', 'tst', 'insulin', 'bmi',
                  'pedigree', 'age'],
        labels=['has_diabetes'],
        overwrite={'has_diabetes': {
            'transform': [{'method': 'no_transform', 'parameters': {}}]}}
    ))

In [7]:
# Add a trainer
training_pipeline.add_trainer(FeedForwardTrainer(
    loss='binary_crossentropy',
    last_activation='sigmoid',
    output_units=1,
    metrics=['accuracy'],
    epochs=20))

In [8]:
# Add an evaluator
training_pipeline.add_evaluator(
    TFMAEvaluator(slices=[['has_diabetes']],
                  metrics={'has_diabetes': ['binary_crossentropy',
                                            'binary_accuracy']}))


In [9]:
# Run the pipeline locally
training_pipeline.run()

2021-01-22 13:03:14,790 — zenml.core.backends.orchestrator.local.zenml_local_orchestrator — INFO — Component DataGen is running.
Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.


form_ExtractEvaluateAndWriteResults/BatchedInputsToExtracts/AddArrowRecordBatchKey_24))+(ref_AppliedPTransform_ExtractEvaluateAndWriteResults/ExtractAndEvaluate/ExtractBatchedInputs/ExtractInputs_27))+(ref_AppliedPTransform_ExtractEvaluateAndWriteResults/ExtractAndEvaluate/ExtractBatchPredictions/Predict_29))+(ref_AppliedPTransform_ExtractEvaluateAndWriteResults/ExtractAndEvaluate/ExtractUnbatchedInputs/UnbatchInputs_31))+(ref_AppliedPTransform_ExtractEvaluateAndWriteResults/ExtractAndEvaluate/ExtractSliceKeys/ParDo(ExtractSliceKeysFn)_33))+(ref_AppliedPTransform_ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/Preprocesss_36))+(ref_AppliedPTransform_ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/FanoutSlices/DoSlicing_38))+(ref_AppliedPTransform_ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/FanoutSlices/TrackDistinctSliceKeys/Extrac

In [10]:
# See schema of data
training_pipeline.view_schema()

Unnamed: 0_level_0,Type,Presence,Valency,Domain
Feature name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
'age',FLOAT,required,single,-
'bmi',FLOAT,required,single,-
'dbp',FLOAT,required,single,-
'has_diabetes',FLOAT,required,single,-
'insulin',FLOAT,required,single,-
'pedigree',FLOAT,required,single,-
'pgc',FLOAT,required,single,-
'times_pregnant',FLOAT,required,single,-
'tst',FLOAT,required,single,-


In [11]:
# See statistics of train and eval
training_pipeline.view_statistics()

2021-01-22 13:04:31,169 — zenml.core.pipelines.training_pipeline — INFO — Viewing statistics. If magic=False then a new window will open up with a notebook for evaluation. If magic=True, then an attempt will be made to append to the current notebook.
Starting Bokeh server version 1.4.0 (running on Tornado 6.1)
User authentication hooks NOT provided (default user enabled)
Launching server at http://localhost:39455
200 GET / (127.0.0.1) 14.11ms
200 GET /static/js/bokeh.min.js?v=3e3c0eee857419c9d818b81fee3c130027f3a211949247db518cc67221e9bb741a2c32baa08d53d817cced9cf5b7e8185331a4e05acdf87a4b5498f8476f941e (127.0.0.1) 1.76ms
200 GET /static/js/bokeh-widgets.min.js?v=20fca0f0ecad0fdbcf7efe518caf4ace2c0657c2e405f6c74e06cdd1ffe03c2a512b1aa34d43f2fbe18cd7c0a9a359d98d39446a4940c928c1606bcc9433d54e (127.0.0.1) 2.02ms
200 GET /static/js/bokeh-tables.min.js?v=86ab633bbdeea2ee5157a1cf15d677b3c6061cfb00a663cb70619dfcfc86c116781eaf630541df4bcfb7083d08cc7475af79dbb5b3f83a20faa6716bee97f3f7 (127.0.0.1)

In [12]:
# Creates a notebook for evaluation
training_pipeline.evaluate()

2021-01-22 13:41:27,134 — zenml.core.pipelines.training_pipeline — INFO — Evaluating pipeline. If magic=False then a new window will open up with a notebook for evaluation. If magic=True, then an attempt will be made to append to the current notebook.
