AutoML in Azure Databricks simplifies the process of building an effective machine learning model for your data.
AutoML is a feature of Azure Databricks that tries multiple algorithms and parameters with your data to train an optimal machine learning model.


Azure Databricks has an AutoML feature that automates the process of training and validating models using different algorithms and hyperparameters. AutoML significantly reduces the effort needed to run and track model training experiments.

- You start an AutoML experiment, specifying a table in your Azure Databricks workspace as the data source for training and the specific performance metric for which you want to optimize.
- The AutoML experiment generates multiple MLflow runs, each producing a notebook with code to preprocess the data before training and validating a model. The trained models are saved as artifacts in the MLflow runs or files in the DBFS store.
- The experiment runs are listed in order of performance, with the best performing models shown first. You can explore the notebooks that were generated for each run, choose the model you want to use, and then register and deploy it.

# Powershell to get a databricks environment provisoned

rm -r mslearn-databricks -f

git clone https://github.com/MicrosoftLearning/mslearn-databricks

./mslearn-databricks/setup.ps1

./mslearn-databricks/setup.ps1 eastus

# Exercise

## Upload training data to a SQL Warehouse

- Download the penguins.csv file from https://raw.githubusercontent.com/MicrosoftLearning/mslearn-databricks/main/data/penguins.csv to your local computer, saving it as penguins.csv.
- In the Azure Databricks workspace portal, in the sidebar, select (+) New and then select Add or upload data. 
- In the Add data page, select Create or modify table and upload the penguins.csv file you downloaded to your computer.
- In the Create or modify table from file upload page, select the default schema and set the table name to penguins. 
- Then select Create table.
- When the table has been created, review its details.

# Create an AutoML experiment

Now that you have some data, you can use it with AutoML to train a model.

- In the sidebar on the left, select Experiments.
- On the Experiments page, find the Classification tile and select Start training.
- Configure the AutoML experiment with the following settings:
  - Cluster: Select your cluster
  - Input training dataset: Browse to the default database and select the penguins table
  - Prediction target: Species
  - Experiment name: Penguin-classification
  - Advanced configuration:
    - Evaluation metric: Precision
    - Training frameworks: lightgbm, sklearn, xgboost
    - Timeout: 5
    - Time column for training/validation/testing split: Leave blank
    - Positive label: Leave blank
    - Intermediate data storage location: MLflow Artifact
- Use the Start AutoML button to start the experiment. Close any information dialogs that are displayed.
- Wait for the experiment to complete. You can view details of the runs that are generated under the Runs tab.
- After five minutes, the experiment will end. Refreshing the runs will show the run that resulted in the best performing model (based on the precision metric you selected) at the top of the list.

# Deploy the best performing model

Having run an AutoML experiment, you can explore the best performing model that it generated.

- In the Penguin-classification experiment page, select View notebook for best model to open the notebook used to train the model in a new browser tab.
- Scroll through the cells in the notebook, noting the code that was used to train the model.
- Close the browser tab containing the notebook to return to the Penguin-classification experiment page.
- In the list of runs, select the name of the first run (which produced the best model) to open it.
- In the Artifacts section, note that the model has been saved as an MLflow artifact. Then use the Register model button to register the model as a new model named Penguin-Classifier.
- In the sidebar on the left, switch to the Models page. Then select the Penguin-Classifier model you just registered.
- On the Penguin-Classifier page, use the Use model for inference button to create a new real-time endpoint with the following settings:
  - Model: Penguin-Classifier
  - Model version: 1
  - Endpoint: classify-penguin
  - Compute size: Small
The serving endpoint is hosted in a new cluster, which it may take several minutes to create.

- When the endpoint has been created, use the Query endpoint button at the top right to open an interface from which you can test the endpoint. Then in the test interface, on the Browser tab, enter the following JSON request and use the Send Request button to call the endpoint and generate a prediction.

code
 {
   "dataframe_records": [
   {
      "Island": "Biscoe",
      "CulmenLength": 48.7,
      "CulmenDepth": 14.1,
      "FlipperLength": 210,
      "BodyMass": 4450
   }
   ]
 }
 
- Experiment with a few different values for the penguin features and observe the results that are returned. Then, close the test interface.


# Delete the endpoint

- When the endpoint is not longer required, you should delete it to avoid unnecessary costs.

- In the classify-penguin endpoint page, in the ‚Åù menu, select Delete.