# The CamVid Dataset

<!--- @wandbcode{sagemaker-studio-lab} -->

In this notebooks we will pull the Cambridge-driving Labeled Video Database or `CamVid` to train our model. It contains a collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes.

We will upload the full dataset to Weights and Biases as an `wandb.Artifact` first, and then compute some information of what classes are present on each image, and upload the processed dataset as a `wandb.Table`. Doing so enables the user to use the `wandb` UI to visualize and filter images.

In [1]:
import wandb
from fastai.vision.all import *

  from .autonotebook import tqdm as notebook_tqdm
  return torch._C._cuda_getDeviceCount() > 0


In [12]:
# log to wandb
wandb.login()

True

## Log the raw dataset
We will grab a copy of `CamVid` using `fastai`'s `untar_data` method, afterwards we can use the `Artifact.add_dir()` method, and upload the full folder to our wandb workspace.

In [2]:
path = untar_data(URLs.CAMVID)
codes = np.loadtxt(path/'codes.txt', dtype=str)
fnames = get_image_files(path/"images")
class_labels = {k: v for k, v in enumerate(codes)}

- we create a project under `user/project`
- If you are working on a team, you can pass the team name to `Entity`

In [3]:
PROJECT="sagemaker_camvid_demo"
ENTITY=None

In [4]:
with wandb.init(
    project=PROJECT,
    name="upload_camvid",
    entity=ENTITY,
    job_type="upload",
):
    artifact = wandb.Artifact(
        'camvid-dataset',
        type='dataset',
        metadata={
            "url": URLs.CAMVID,
            "class_labels": class_labels
        },
        description="The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes."
    )
    artifact.add_dir(path)
    wandb.log_artifact(artifact)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mcapecape[0m (use `wandb login --relogin` to force relogin)


[34m[1mwandb[0m: Adding directory to artifact (/home/studio-lab-user/.fastai/data/camvid)... Done. 4.1s





## Log a `wandb.Table`
Let's log a `wandb.Table` with the frequency distribution of each class

![](images/camvid_table.png)

In [5]:
def label_func(fn):
    return fn.parent.parent/"labels"/f"{fn.stem}_P{fn.suffix}"

In [6]:
def get_frequency_distribution(mask_data):
    (unique, counts) = np.unique(mask_data, return_counts=True)
    unique = list(unique)
    counts = list(counts)
    frequency_dict = {}
    for _class in class_labels.keys():
        if _class in unique:
            frequency_dict[class_labels[_class]] = counts[unique.index(_class)]
        else:
            frequency_dict[class_labels[_class]] = 0
    return frequency_dict

In [8]:
ARTIFACT_ID = 'capecape/sagemaker_camvid_demo/camvid-dataset:latest'

In [9]:
def log_dataset():
    with wandb.init(
        project=PROJECT,
        name="visualize_camvid",
        entity=ENTITY,
        job_type="data_viz"
    ):
        artifact = wandb.use_artifact(ARTIFACT_ID, type='dataset')
        artifact_dir = artifact.download()
        
        table_data = []
        image_files = get_image_files(Path(artifact_dir)/"images")
        labels = [str(class_labels[_lab]) for _lab in list(class_labels)]
        
        print("Creating Table...")
        for image_file in progress_bar(image_files):
            image = np.array(Image.open(image_file))
            mask_data = np.array(Image.open(label_func(image_file)))
            frequency_distribution = get_frequency_distribution(mask_data)
            table_data.append(
                [
                    str(image_file.name),
                    wandb.Image(image),
                    wandb.Image(image, masks={
                        "predictions": {
                            "mask_data": mask_data,
                            "class_labels": class_labels
                        }
                    })
                ] + [
                    frequency_distribution[_lab] for _lab in labels
                ]
            )
        wandb.log({
            "CamVid_Dataset": wandb.Table(
                data=table_data,
                columns=["File_Name", "Images", "Segmentation_Masks"] + labels
            )
        })

In [10]:
log_dataset()

[34m[1mwandb[0m: Downloading large artifact camvid-dataset:latest, 572.51MB. 1405 files... Done. 0:0:0


Creating Table...





## View the dataset in Weights and Biases workspace

We get a nice UI to view our images

![](images/camvid_mask.gif)