## **Working of with Keyowrd**

[GPT_Explanation](https://chatgpt.com/share/681f4657-6520-8006-b695-a215a1783899)

[Real_Python_Implementation](https://www.youtube.com/watch?v=iba-I4CrmyA)


## **Before Starting Project**

`source mlflow_env/bin/activate`

`mlflow ui`


## **ML Flow**

Open-source. We can track our Machine Learning project such as performance metrices etc.

## **Lifecycle of a Data Science Project**

**Data Pre**

**EDA**

**Feature Eng**

**Model Training**

**Model Validation**

**Deployment**

**Monitoring**

## **How ML Flow is used by Data Scientist**

- Experiment Tracking

- Hypothesis Testing in EDA

- Code Structuring (Pipeline)

- Model Packaging and Dependency Management

- Evaluating Hyperparameter : Track every combination of Hyperparameter

- Compare the results of model and deploy the best performing model

## **How ML Flow is used by ML Engineeer**

- Manage the lifecycle of trained models both pre and post deployment

- Deploy models security to the production env

- Manage Deployment Dependencies


## **ML Flow Starter**

### **ML Flow Tracking Server**

For tracking our experiments we need to create a server

To start the server we use `mlflow ui`.

Then we will need to provide the tracking UI so that everything is tracked by MLFlow. `mlflow.set_tracking_uri("http://127.0.0.1:5000")`

Then to log our performance metrices we will use as below:

```py

mlflow.set_experiment("Day2")

# Start the MLFlow Run

with mlflow.start_run():
  # Log the hyperparameters
  mlflow.log_params(params)

	# Log te accuracy metrics
  mlflow.log_metric("Accuracy",accuracy)

	# Set tag that we can use to remind ourselves what this run was for
  mlflow.set_tag("Training Info", "Basic LR Model for Iris Data")

	# Infer the model signature
  signature = infer_signature(x_train, model.predict(x_train))

	# Log the model
  model_info = mlflow.sklearn.log_model(
    sk_model=model,
    artifact_path="Iris Mode",
    signature = signature,
    input_example=x_train,
    registered_model_name="Tracking-quickstart"
	)

```

A new folder named `mlruns` is created which stores all the info about our experiments. We should not delete the `mlruns` folder.

## **Tracking a ML Project with MLFlow**

`project.ipynb`

Let's create a sparate folder for our ML Project.

Once we have setup our ML Project, now we will have to keep track of different performance metrics on the basis of our used hyperparameters. For which we will use `ML Flow`.


## **Inference of Model Artifacts**

### **UI**

`path` : Path of artifacts

**Validate Before Deployment**

As soon as we complete training our model, the model is saved as `model.pkl` in the `artifacts` but before using the model in the production we will need to validate it.

For that the base code already provided in the UI only.

```Py

# Validate The Model

import mlflow
from mlflow.models import Model

model_uri = 'runs:/cd866b98bcfb4235bbe3b225ece9fce9/Iris Mode'
# The model is logged with an input example
pyfunc_model = mlflow.pyfunc.load_model(model_uri)

predictions = pyfunc_model.predict(x_test)

predictions

```

`mlflow.pyfunc.load_model` loads the model as `Python's` generic function.

## **Model Registry Tracking**

Model Registry is a centralized model store, set of APIs, and UI to collaboratively manage the full lifecycke of an MLFlow Model. It provides model lineage (which MLFlow exps and runs produced the model), model versioning, model aliasing, model tagging and annotations.

In the previous code, we directly saved (Registerd) the model without even validating if it the best model. As we provided `registered_model_name="Tracking-quickstart"` argument in the `log_model` function which registers and maintains the model versioning.

To avoid it we should not pass this parameter. If we not pass this parameter in the `UI` there will be a `Button` as `Register Model`. If the model has been registered then it would be `Model Registered` with it's version.

How do we choose the best model? We need to compare the experiments and then find the experiment with the highest accuracy and then register that experiment.

Okay, we've saved our best model but how are we going to predict from the saved best model?

```Py

# Inferencing the Model from the Model Registry (Prediction from the Best Model)

# Inferencing the Model from the Model Registry (Prediction from the Best Model)

import mlflow.sklearn

model_name = 'Tracking-quickstart'
model_version = '6' # Version of the best model {latest, number_version, ..}

# Path for the model from the Model Registry
model_uri = f"models:/{model_name}/{model_version}"

model = mlflow.sklearn.load_model(model_uri)

model.predict(x_test)

```


## **Hosue Price Pred (MLFlow)**

Refer to `ML_Project/Phase2(House).ipynb` file


## **ANN with MLFlow**

Refer to `ML_Flow(ANN_Project)`

### **Pipeline**

- Build an ANN Project

- Run a hyperparameter sweep on a training script.

- Compare the results of the runs in the MLFlow UI

- Choose the best run and register it as a model

- Deploy the Model to a REST API

- Build a container image suitable for deployment to a cloud platform

**Libraries**

`keras`

`tensorflow`

`hyperopt` : Hyperparameter Tuining for the `ANN`

**Documentation**

[Hyperopt](https://hyperopt.github.io/hyperopt/)

**Dataset**

```Py

# Data Wine Data

data = pd.read_csv(
  'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv',
  sep=';'
)

data

```

In the above data set, the `target`is the `quality {1-6}`. It is a classification task.

Now, we will need to build an ANN to classify it.

```Py

	model.compile(optimizer=keras.optimizers.SGD(
		learning_rate=params['lr'],momentum=params["momentum"]
	))

```

In the above, we change the `Dense` layers as an HyperParameter Tuining but it will take a lot of time. Instead we tune the `learning_rate` hyperparameter and `momemtum`.

Now, we will try to train our model for the different combination of values of `learning_rate` and `momentum` and track each and every experiment for each combination.

For the combination of different values of our `HyperParameters` we will use the `Hyperopt` library.

Now once the model `compilation` code is written.

```Py

# ANN Model

import mlflow.tensorflow


def train_model(params, epochs, train_x, train_y, valid_x, valid_y, test_x, test_y):

	# Noramlization
	mean = np.mean(train_x,axis=0) # Mean of each col
	var = np.var(train_x,axis=0) # Var of Each col

	model = keras.Sequential(
		[
			keras.Input([train_x.shape[1]]),
			keras.layers.Normalization(mean=mean,variance=var),
			keras.layers.Dense(64,activation='relu'),
			keras.layers.Dense(1) # Classification
		]
	)

	# Model Compile
	model.compile(optimizer=keras.optimizers.SGD(
		learning_rate=params['lr'],momentum=params["momentum"]
	),
	loss="mean_squared_error",
	metrics=[keras.metrics.RootMeanSquaredError()])

	# Train and Track the Hyperparam with MLFlow tracking

	with mlflow.start_run(nested=True): # As we are trying with multiple combination, nested = True
		model.fit(train_x,train_y,validation_data=(valid_x,valid_y),
						epochs=epochs,
						batch_size=64)

		# Evaluate the model
		eval_result = model.evaluate(valid_x, valid_y, batch_size=64)

		eval_rmse = eval_result[1]

		# Log the params
		mlflow.log_param(params)
		mlflow.log_metric("Eval Rms", eval_rmse)

		# Log the model
		mlflow.tensorflow.log_model(
			model,
			"model",
			signature=signature
		)

		return {
			'loss': eval_rmse,
			'status': STATUS_OK,
			'model': model
		}
```

We will need to create an `objective` function for the `HyperOpt`.

```Py

# Objective Function for Hyperopt

def objective(params):

  # MlFlow will track the params and results for each run

  result = train_model(
    params,
    epochs=3,
    train_x=train_x,
    train_y=train_y,
    valid_x=valid_x,
    valid_y=valid_y,
    test_x=test_x,
    test_y=test_y
  )

  return result

```

**Parameters**

```Py

# Set all the parameters

space = {
 'lr': hp.loguniform('lr',np.log(1e-5),np.log(1e-1)),
 'momentum': hp.uniform("momentum",0.0,1.0)
}

```

**Parent Run**

```Py

# Set Exp

import mlflow.tensorflow


mlflow.set_experiment("/wine-quality")

# Create another run so that the nested run will work
with mlflow.start_run():

  # Conduct Hyperparameter search using Hyperopt
  trails = Trials()
  best = fmin(
    fn=objective,
    space=space,
    algo=tpe.suggest,
    max_evals=4,
    trials=trails
  )

  # Fetch the details of the best run
  best_run = sorted(trails.results, key=lambda x:x['loss'])[0]

  # Log the best parameters, loss and model
  for key, value in best.items():
    mlflow.log_param(key, value)

  mlflow.log_metric("Eval_RMSE", best_run['loss'])
  mlflow.tensorflow.log_model(
    best_run['model'],
    "model",
    signature=signature,
  )

  # Print out the best params and loss
  print(f"Best Param: {best}")
  print(f"Best Eval EMSE: {best_run['loss']}")

```

The `fmin` function calls the `objective` function 4 times as `max_eval=4`. Each call trains a new model using new hyperparameter combination. The `trials.results` contains results of all runs.

**Inferencing (Load and Predict)**

```Py

# Inferencing Model

import mlflow

model_uri = 'runs:/d6afa02fdf47443bb97a85e0068fd121/model'

loaded_model = mlflow.pyfunc.load_model(model_uri)
predictions = loaded_model.predict(test_x)
predictions

```


## **DVC and DagsHub**

### **DVC (Data Version Control)**

For data verioning data.

If you store and process data files or datasets to produce other data or machine learning models, and you want to

- track and save data and machine learning models the same way you capture code;

- create and switch between versions of data and ML models easily;

- understand how datasets and ML artifacts were built in the first place;

- compare model metrics among experiments;

- adopt engineering tools and best practices in data science projects;

`pip install dvc`

**Initialize the DVC**

For it to be initialiZed the `git` should be initialized

`dvc init`

`.dvc` folder is created.

Note that git should not track the `Dataset` folder.

Now, add the files or data that you want to keep track of. `dvc add location/file.txt`

After the there is change in the Dataset, always add the file using `dvc add 'Datasets(DVC)/Day1.txt'`

Then we will only track the hash value (.dvc) file and .gitignore file from the dataset not the dataset. `git add 'Datasets(DVC)/Day1.txt.dvc' 'Datasets(DVC)/.gitignore'`


### **Next Day**

**Revise Everything Before Starting**

https://www.udemy.com/course/complete-mlops-bootcamp-with-10-end-to-end-ml-projects/learn/lecture/46093051#overview
