## Introduction

Artifacts live in a Comet Workspace and are an easy way to keep track of data versioning. 

In this guide, we will demostrate how to use the Artifact class. Use an instance of the Artifact class to log and retreive both files and remote assets, and automatically keep track of the dataset version used to train your models. 

### Workflow Overview    

0. Input filepath, project name and workspace
1. Import packages and read data from local disk (if needed)
2. Initialize Comet and set your workspace and API key
3. Create an Experiment object to log the Artifact
4. Create an Artifact object and provide metadata
5. Add the dataset to the Artifact object
6. Upload the data to Comet using experiment.log_artifact
7. Reference the logged artifact from the Artifact Registry

#### 0. Input filepaths, project name and workspace

In [None]:
INPUT_FILE_PATH=    # example: "data/my_dataset_file.csv"
PROJECT_NAME=       
WORKSPACE=

#### 1. Import packages and read data from local disk

Skip this step if data is stored in a remote location.

In [None]:
import comet_ml
from comet_ml import Experiment, Artifact
import pandas as pd

import os

In [None]:
raw_data = pd.read_csv(INPUT_FILE_PATH, sep = '\t')

In [None]:
raw_data.shape

#### 2. Initialize Comet and set your workspace and API key

In [None]:
comet_ml.init(workspace=WORKSPACE, project_name=PROJECT_NAME)

#### 3. Create an Experiment object to log the Artifact

In [None]:
experiment = Experiment(
    project_name=PROJECT_NAME,
    workspace=WORKSPACE
)

experiment.add_tag('log-data')

#### 4. Create an Artifact object and provide metadata

Artifact aliases are specific to a particular version of an artifact. The exception to this is the alias "latest" which is automatically assigned to the most recent verion of the artifact. 

See documentation for more information: https://www.comet.ml/docs/python-sdk/Artifact/

In [None]:
artifact = Artifact(name="my-dataset",
                    artifact_type="tabluar dataset",
                    aliases=["raw-data"],
                    metadata={'filetype':'csv', 
                              'original_source':'Downloaded from local drive'}
)

#### 5. Add the dataset to the Artifact object

In [None]:
artifact.add(INPUT_FILE_PATH)

**Remote Assets**

If logging a remote asset, use the method "add_remote" in place of "add" with the Artifact() object. See docs for input arguments: https://www.comet.ml/docs/python-sdk/Artifact/#artifactadd_remote

#### 6. Upload the data to Comet using experiment.log_artifact

In [None]:
experiment.log_artifact(artifact)
experiment.end()

The artifact will now appear in the Artifacts tab in the workspace view. Using the same methodology steps listed above, the dataset can be updated and versioning will take place automatically. 

#### 7. Reference the logged artifact from the Artifact Registry

To access the Artifact, initialize an Experiment and use the method "get_artifact()". Download the artifact on your local machine if needed and specify the path (line 5).

Note, by using the method "get_artifact", this artifact and its version will be automatically associated with your current open Experiment and can be found in the "Assets & Artifacts" tab. In practice this means keeping track of the dataset version being used does not need to be done manually.

In [None]:
comet_ml.init()
experiment = Experiment(workspace=WORKSPACE_NAME, project_name=PROJECT_NAME)
experiment.set_name('fetching-data')
logged_artifact = experiment.get_artifact(DATASET_NAME)
logged_artifact.download(path = './')

# get dataset version
for asset in logged_artifact.assets:
    DATASET_VERSION_ID = asset.artifact_version_id
    
print("Dataset version: ", DATASET_VERSION_ID)

experiment.end()