### Q1. Install the Package
To get started with Weights & Biases you'll need to install the appropriate Python package.

For this we recommend creating a separate Python environment, for example, you can use conda environments, and then install the package there with pip or conda.

Following are the libraries you need to install:

* pandas
* matplotlib
* scikit-learn
* pyarrow
* wandb
Once you installed the package, run the command wandb --version and check the output.

What's the version that you have? wandb, version 0.15.4

Set up environment first

Check available environment
`conda env list`

Create new environment
`conda create -n wandb-tracking-env`

Install libraries
`pip install -r requirements.txt`

Activate environment
`conda activate wandb-tracking-env`

Then check the version
`wandb --version`


### Q2. Download and preprocess the data
We'll use the Green Taxi Trip Records dataset to predict the amount of tips for each trip.

Download the data for January, February and March 2022 in parquet format from here.

The script will:

initialize a Weights & Biases run.
load the data from the folder <TAXI_DATA_FOLDER> (the folder where you have downloaded the data),
fit a DictVectorizer on the training set (January 2022 data),
save the preprocessed datasets and the DictVectorizer to your Weights & Biases dashboard as an artifact of type preprocessed_dataset.
Your task is to download the datasets and then execute this command:

python preprocess_data.py \
  --wandb_project <WANDB_PROJECT_NAME> \
  --wandb_entity <WANDB_USERNAME> \
  --raw_data_path <TAXI_DATA_FOLDER> \
  --dest_path ./output

Once you navigate to the Files tab of your artifact on your Weights & Biases page, what's the size of the saved DictVectorizer file?

54 kB
154 kB *
54 MB
154 MB

### Q3. Train a model with Weights & Biases logging
We will train a RandomForestRegressor (from Scikit-Learn) on the taxi dataset.
Once you have successfully ran the script, navigate the Overview section of the run in the Weights & Biases UI and scroll down to the Configs. What is the value of the max_depth parameter:

4
6
8
10 *

### Q4. Tune model hyperparameters
Now let's try to reduce the validation error by tuning the hyperparameters of the RandomForestRegressor using Weights & Biases Sweeps. We have prepared the script sweep.py for this exercise in the homework-wandb directory.
which hyperparameter is the most important:

max_depth *
n_estimators
min_samples_split
min_samples_leaf

### Q5. Link the best model to the model registry
Now that we have obtained the optimal set of hyperparameters and trained the best model, we can assume that we are ready to test some of these models in production. In this exercise, you'll create a model registry and link the best model from the Sweep to the model registry.