<a href="https://colab.research.google.com/github/beekal/MachieneLearningProjects/blob/master/0%20Basics%20-%20TF/TFX_Production_Scale_ML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## TFX : Why ?
  - Provides end to end research to prod level ML solution , including Model versioning.
  - Bundles all the task into a package ranging from
    - Reading a data
    - Preprocessing
    - Model building / validation
    - Model Deployment
    - Model Versioning with rollback feature

If we do not use the TFX and a separate model independent preprocessing steps, then we really would not be able to rollback, in case preprocessing step changes with the model version updates.

### TFX Components :
A TFX consists of following components which we will discuss here
  - ExampleGen : Read data into TFX pipeline
  - StatisticsGen : Calculate exploratory Statistics about the data
  - SchemaGen : Create a data schema based on the Statistics
  - ExampleValidator: Analyse the data for abnormalities / inconsistencies
  - Transform: Perform necessary transformation in the data
  - Trainer : Trains the model
  - Evaluator : Evaluate the model performance to determine its readiness for deployment/ discard
  - Pusher : Deploys model to the production
  ![alt text](https://www.tensorflow.org/tfx/guide/images/diag_all.png)

## TF Libraries for the components:
  - StatsticsGen / SchemaGen/ ExampleValidator: [(TFDV) Tensorflow Data Validation](https://www.tensorflow.org/tfx/guide/tfdv) to generate Statistics, inspect Schema, Analyse/ validate Data. Also used to calculate/ record drifts/ anamolies  to identify if a model needs a retraining.
  - ExampleValidator: (TFMD) : TensorFlow MetaData provides/stores Schema metadata to aid in Validation. Contains Schema for Data, Summary Statistics of the data.
  - Transform: [(TFT). Tensorflow Transform](https://www.tensorflow.org/tfx/guide/tft)
  - Evaluator: (TFMA). TensorFlow Model Analysis to evaluate models. Allows eval over large amount of data in a distributed way.
  - MLMD: [ML MetaData](https://www.tensorflow.org/tfx/guide/mlmd) stores all relevant ML information other than data statistics including workflow.
  - Pusher : [SavedModel](https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/saved_model) + [TF Serving](https://www.tensorflow.org/tfx/tutorials/serving/rest_simple). SavedModel is a universal, portable and recommended serialization for TF Model deployment across any mobile, JS app or any other infrastructure . TFServing serilises the TF model as SavedModel and deploys it.



## Visualisation :
We can inspect the data visually using TFDV library using
  - tfdv.load_statistics()
  - tfdv.visualize_statistics()

## Evaluator:

## [TFX Guidelines](https://www.tensorflow.org/tfx/guide/train):
If using TFX for developing pipelines, then 
- Model's input layes must consume from the SavedModel
- Transform must be included in the model, so that the transformation can be exported along with the model using SavedModel
- The model must be saved as both SavedModel (Used by TF Serving) and EvalSavedModel(used by TF Model Analyis for evaluation)

REF: 

## Additional Dependency :
  - Apache Beam :Develop in single node, run in multi-node 

  We want our ML to run parallely for  greater speed while also being scalable. E.g we would likely develop the Ml model  on a single computer, however when we want it on a prod, we would like to run it on a multi-node cluster environment to serve a lot of parallel requests  with low latency.  Apache  Beam provides this abstraction i.e whatever we research / develop in a single nodeis easily scalable  to multi-node cluster, without any extra work/effort or code modification.
  - Apache Airflow / Kubeflow : Deploy, Scale and manage ML application automatically. 
  
  Some example Cases Airflow/ Kubeflow handles
    - Define 100 nodes/ input file path / checkpoint path / src code github path / 
    - Install required libraries in 100 node clusters
    - then Deploy ML model to all of them
    - Receiving tremendous volume of request for ML model, scale them up..
    - Terrible disaster 50 nodes have gone down, we need to bring another 50 uoo to compensate.
    - Efficiently utilise CPU/ GPU and minismise cost
    - Train your model  cheaply using the AWS spot instance ( i.e use lower cost spot if available )

REF : https://docs.agilestacks.com/article/gkyq26pzmr-creating-an-ml-pipeline