# Lab: Packaging a Project

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lab you:<br>
- Define an MLProject file
- Define a Conda environment
- Define your machine learning script
- Execute your solution as a run

## Prerequisites
- Web browser: Chrome
- A cluster configured with **8 cores** and **DBR 7.3 ML**

In [0]:
%run "../Includes/Classroom-Setup"

## Defining the MLproject file

Write an MLproject file called `MLproject` to the path defined for you below.

In [0]:
path = f"{workingDir}/03-lab/"

dbutils.fs.rm(path, True) # Clears the directory if it already exists
dbutils.fs.mkdirs(path)

print("Created directory `{}` to house the project files.".format(path))

The file should consist of the following aspects:<br><br>

0. The name should be `Lab-03`
0. It should use the environment `conda.yaml`
0. It should take the following parameters:
   - `data_path`: a string with a default of `/dbfs/mnt/training/airbnb/sf-listings/airbnb-cleaned-mlflow.csv`
   - `bootstrap`: a boolean with a default of `True`
   - `min_impurity_decrease`: a float with a default of `0.`
0. The command that uses the parameters listed above

In [0]:
#  TODO
dbutils.fs.put(path + "MLproject", 
'''

  FILL_IN

'''.strip())

## Defining the Environment

Define the conda environment.  It should include the following libraries:<br><br>

  - `cloudpickle`
  - `numpy`
  - `pandas`
  - `scikit-learn`
  - `pip:`
    - `mlflow`

Make sure that the versions of the libraries match to the version of the current environment.

In [0]:
#  TODO
dbutils.fs.put(path + "conda.yaml", 
'''

  FILL_IN

'''.strip())

## Defining the Machine Learning Script

Based on the script from Lesson 3, create a Random Forest model that uses the parameters `data_path`, `bootstrap`, and `min_impurity_decrease`.

In [0]:
#  TODO
dbutils.fs.put(path + "train.py", 
'''

  FILL_IN
  
'''.strip())

## Executing your Solution

First make sure that the three necessary files are where they need to be.

In [0]:
dbutils.fs.ls(path)

Execute your solution with the following code.

In [0]:
import mlflow

mlflow.projects.run(uri=path.replace("dbfs:","/dbfs"),
  parameters={
    "data_path": "/dbfs/mnt/training/airbnb/sf-listings/airbnb-cleaned-mlflow.csv",
    "bootstrap": False,
    "min_impurity_decrease": .1
})

-sandbox
<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> See the solutions folder for an example solution to this lab.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Cleanup<br>

Run the **`Classroom-Cleanup`** cell below to remove any artifacts created by this lesson.

In [0]:
%run "../Includes/Classroom-Cleanup"

-sandbox
## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Next Steps<br>

<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> See the solutions folder for an example solution to this lab.

Start the next lesson, Model Management.

-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>