# Train UNet Model

In this Notebook you will learn how to train your UNet architecture with Dataloop and Pytorch

UNet is an Encoder - Decoder architecture for creating segmentation maps

In [1]:
import datetime
import matplotlib as mpl
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import pandas as pd
import os
import logging
import sys
import dtlpy as dl
from dtlpy.ml import train_utils
from dtlpy.ml.dataset_generators.torch_dataset_generator import DataGenerator

import tempfile

### Get the DataLoop entities

lets get the model and dataset entities from our dataloop platform

In [2]:
model = dl.models.get('unet')  # This is the global model
# Data entities
project = dl.projects.get('shefi-contests', '50f0fc03-4d70-455d-b485-c78cca53f2be')
dataset = dl.datasets.get('carvana', '61b9bbc1e8ad454a9aa7d285')

### Snapshot

Now we can create a new snapshot - we will add your name and data to the suffix to make the snapshot has a unique name

In [5]:
whoami = dl.client_api.info()['user_email']
now = datetime.datetime.now()

# Create a new snapshot - personally and with currect datetime
snapshot_name = f"carvana-train-example-{whoami.split('@')[0]}-{now.isoformat(timespec='minutes')}"
snapshot = model.snapshots.create(
    snapshot_name=snapshot_name,
    dataset_id=dataset.id,
    description='train unet example',
    bucket=project.buckets.create(bucket_type=dl.BucketType.ITEM, model_name=model.name, snapshot_name=snapshot_name),
    tags=['example', 'notebook'],
    configuration={'id_to_label_map': {'1': 'car'}, 'image_normalize_mu': 0, 'image_normalize_std': 1, 'input_shape': [640, 960]},
    project_id=project.id,
    labels=['car']
)


2022-01-03 09:40:59.019 [ERROR]-[MainThread]-[v1.47.1]dtlpy.repositories.snapshots: Snapshot does not support 'unlocked dataset'. Please change 'carvana' to readonly


### Lets View the Model and Snapshot entities

We use the to_df in order to convert to a DataFrame and view it

In [8]:
model.to_df()

Unnamed: 0,id,creator,name,description,version,tags,inputType,outputType,projectId,entryPoint,className,codebase,createdAt
0,619d001117bf2dab2b6aa4c3,yair@dataloop.ai,unet,Global Dataloop U-net implemented in pytorch,1.0.1,"[torch, unet, semantic]",image,binary,296bc0d5-46fc-447f-b3ee-25899e7268bc,unet_adapter.py,UNetAdapter,"{'type': 'git', 'gitUrl': 'https://github.com/...",2021-11-23T14:52:01.787Z


In [9]:
snapshot.to_df()

Unnamed: 0,id,creator,name,description,is_global,status,tags,configuration,modelId,projectId,datasetId,createdAt,bucket,ontologySpec
0,61d2a88a275d078c4b370427,yair@dataloop.ai,carvana-train-example-yair-2022-01-03T09:40,train unet example,False,created,"[example, notebook]","{'id_to_label_map': {'1': 'car'}, 'image_norma...",619d001117bf2dab2b6aa4c3,50f0fc03-4d70-455d-b485-c78cca53f2be,61b9bbc1e8ad454a9aa7d285,2022-01-03T07:40:58.895Z,"{'type': 'item', 'itemId': '61d2a88a80d978833d...","{'labels': ['car'], 'ontologyId': 'null'}"


### One last thing to make sure before we train

Our `adapter` train method expects the data to be organized as: train-validation-test  
this can be created manually on small datasets using `train_utils.create_dataset_partition()`

Our dataset is already prepared, we will just verify it

In [10]:
train_items = dataset.get_partitions(partitions=dl.SnapshotPartitionType.TRAIN)
val_items = dataset.get_partitions(partitions=dl.SnapshotPartitionType.VALIDATION)
test_items = dataset.get_partitions(partitions=dl.SnapshotPartitionType.TEST)

print(f"Dataset {dataset.name} Data partition, TRAIN: {train_items.items_count}, VALIDATION {val_items.items_count}, TEST {test_items.items_count} ")

Dataset carvana Data partition, TRAIN: 4070, VALIDATION 1018, TEST 0 


### Finally we can start to train

We initialize the adapter using the `build` method.

The `Adapter` is the base class to connect between dataloop platform and our specific model  
some method are inheritance from the base adapter and some are written specifically per model
each architecture has it's own adapter which you can view it's raw code


In [None]:
adapter = model.build()
adapter.load_from_snapshot(snapshot=snapshot)
# adapter._set_adapter_handler('DEBUG')

In [None]:
root_path, data_path, output_path = adapter.prepare_training()
adapter.train(data_path=data_path, output_path=output_path,)



### SAVING

The current adapter now holds the best model fit for our data.

In order to upload the weights and other configurations we need to save our snapshot.  
We will use a temp dir - so we save all content to that dir and upload it (other option is to upload all the *`output_path`* which has more runtime files)

In [None]:

temp_dir = tempfile.mkdtemp(prefix=snapshot.name, suffix=now.strftime('%F-%H%M%S'))
adapter.save_to_snapshot(local_path=temp_dir)


## USING THE MODEL - PREDICTION

We will use the DataGenerator to view the image (this utility already connects with our dataloop item and annotations)


In [None]:
datagen = DataGenerator(data_path=os.path.join(data_path, 'train'),
                        dataset_entity=snapshot.dataset,
                        annotation_type=dl.AnnotationType.SEGMENTATION,
)


In [None]:
# example - get 1 entry and visualize it
datagen.visualize(20)

In [None]:
data_item = dataset[20]
data_item