## Weights and Biases Fundamentals
* This notebook covers W&B fundamentals
  * Artefacts
   * Runs
   * Using scripts to create and use an artefact
* See slides [here](https://docs.google.com/presentation/d/1R5RLXXCvCXvFeUONR1R81ZZSkRxhMYO8CA63Nmvh7ww/edit?usp=sharing) for summary of how the different elements of W&B fit together

In [None]:
import pandas as pd

import wandb

### 1. Artefacts - Runs
* Create artefact (a file, the output of a component)
* Start a run
    * Attach the artefact to wandb artefact object
    * Add artefact to run
* Close run
* Runs can be managed using context managers

* Once we have a file we want to attach as an artefact, we:
    * create an artefact object in W&B
    * attach file to artefact object
    * log artefact object in W&B
* Note we have to create and finish a run to successfully log the artefact

In [None]:
# create run
run = wandb.init(project="demo_artifact_1", group="experiment_1")

In [None]:
# create artifact object
artifact = wandb.Artifact(
    name="my_artifact.txt",
    type="data",
    description="This is an example of an artifact",
    metadata={
        "key_1": "value_1"
    }
)

In [None]:
# add artifact to artifact object
artifact.add_file("my_artifact.txt")

In [None]:
# close run to add artifact to wandb server
run.finish()

In [None]:
# use context managers
# here we manage multiple runs
# the manager closes the run

with wandb.init(project="multiple_runs") as run:

    artifact = wandb.Artifact(
    name="my_artifact.txt",
    type="data",
    description="This is an example of an artifact",
    metadata={
        "key_1": "value_1"
    }
    )
    artifact.add_file("my_artifact.txt")

with wandb.init(project="multiple_runs") as run:

    artifact = wandb.Artifact(
    name="my_artifact.txt",
    type="data",
    description="This is an example of an artifact",
    metadata={
        "key_1": "value_1"
    }
    )
    artifact.add_file("my_artifact.txt")

In [None]:
## example of adding atrefact from the command line

wandb artifact put \
      --name exercise_4/genres_mod.parquet \
      --type raw_data \
      --description "A modified version of the songs dataset" genres_mod.parquet

### 2. Using scripts to create and use artefacts

Examples of a script to:
 * create an artefact - see `upload_artifact.py`, which can be run using:

`python upload_artifact.py --input_file zen.txt \
              --artifact_name zen_of_python \
              --artifact_type text_file \
              --artifact_description "20 aphorisms about writing good python code"`

* this will log a v0 version of `zen.txt` to W&B
* we can alter the `zen.txt` file, re-run the command line script, and this will log a v1 to W&B

* use an artefact - see `use_artifact.py`, which can be run using:

`python use_artifact.py --artifact_name exercise_1/zen_of_python:v1`

* this will return the file contents.

In [None]:
# once we have started a run:

# we get the artifact locally and then get its path
atrifact = run.use_artifact('path')
local_path = atrifact.file()

pd.read_parquet(local_path)

### 3. Using W&B to log EDA
* Write an MLflow component that installs Jupyter and all the libraries that we need, and execute the EDA as a notebook from within this component
* Embed plots and comments into the Jupyter notebook itself
* Track inputs and outputs of the notebook with your artifact tracking, in our case Weights & Biases

In [None]:
# Tracking a notebook in W&B is as simple as adding the save_code=True option when creating the run:

run = wandb.init(
  project="my_exercise",
  save_code=True
)