# Dataset and model versioning with W&B
Use W&B Artifacts for dataset versioning and model management. The Artifacts API is simple: 

### Save a dataset or model version
1. `artifact = wandb.Artifact('my-artifact', type='artifact-type')`: Construct a new artifact to log. You can also set a dictionary of metadata on the artifact.
2. `artifact.add_file('my-file.csv')`: Add files to the artifact to save. You can also add whole directories, or log a reference to an external bucket like S3 instead of uploading files to W&B.
3. `run.log_artifact(artifact)`: Save the artifact to W&B. This happens asynchronously, so make sure to let the upload finish before trying to use the artifact elsewhere.

### Use a saved dataset or model version
1. `run.use_artifact('my-artifact:latest')`: Mark that saved artifact as input to the current pipeline step.
2. `artifact_dir = artifact.download()`: Pull down the saved artifact data

### In this quickstart
In this tiny example, we'll simulate a common workflow:
1. Save a dataset with `log_artifact`
2. Use the dataset to train a model with `use_artifact`
3. Save the resulting model with `log_artifact`
4. Save an updated dataset version wtih `log_artifact`

In [None]:
# Install the Weights & Biases library
!pip install wandb -qqq

In [None]:
# 1. Log a dataset version as an artifact
import wandb
import os

# Initialize a new W&B run to track this job
run = wandb.init(project="artifacts-quickstart", job_type="dataset-creation")

# Create a sample dataset to log as an artifact
f = open('my-dataset.txt', 'w')
f.write('Imagine this is a big dataset.')
f.close()

# Create a new artifact, which is a sample dataset
dataset = wandb.Artifact('my-dataset', type='dataset')
# Add files to the artifact, in this case a simple text file
dataset.add_file('my-dataset.txt')
# Log the artifact to save it as an output of this run
run.log_artifact(dataset)

wandb.finish()

In [None]:
# 2. Use that dataset to train a model
run = wandb.init(project="artifacts-quickstart", job_type="model-training")

# Pull down that dataset you logged in the last run
artifact = run.use_artifact('my-dataset:latest')
artifact_dir = artifact.download()
print(open(os.path.join(artifact_dir, 'my-dataset.txt')).read())

# Simulate tracking a model file with this simple txt
f = open('my-model.txt', 'w')
f.write('Imagine this is a model file.')
f.close()

# Save a model after training
model = wandb.Artifact('my-model', type='model')
model.add_file('my-model.txt')
run.log_artifact(model)

wandb.finish()

In [None]:
# 3. Log a new dataset version

# Initialize a new W&B run to track this job
run = wandb.init(project="artifacts-quickstart", job_type="dataset-creation")

# Update the dataset
f = open('my-dataset.txt', 'w')
f.write('Here is an edited dataset!')
f.close()

# Log to the same named artifact, but with updated data
artifact = wandb.Artifact('my-dataset', type='dataset')
artifact.add_file('my-dataset.txt')
run.log_artifact(artifact)

# Now you have a new artifact version logged!

wandb.finish()

In [None]:
# Thank you

Now in W&B you have a new version of that dataset logged. It looks something like [this page](https://wandb.ai/carey/artifacts-quickstart/artifacts/dataset/my-dataset/c51a86259b3821da2c2f/files/my-dataset.txt):

<img src="https://i.imgur.com/4LFApxA.png" width="500" alt="Weights & Biases" />
