# Introduction to Brevetti AI Job API
brevettiai is a lightweight api for interfacing with the cloud ressources.

This notebook documents through examples the simple usage the job api's to access a model training job's artifacts, datasets and lots of other stuff  in a development context

This uses the Brevetti AI SDK Job access API

* Low level api for models, testreports and development against the platform

Job access is granted on instance basis and is tied to the concepts models and test reports on the platform. In Python the job context is managed with the **Job** object


# Brevetti AI package installation

In [None]:
pip install -U git+https://bitbucket.org/criterionai/core@CORE-22-add-augmentation-to-image-classi

In [None]:
# Imports and setup to avoid extensive verboisty
import logging
log = logging.getLogger(__name__)
logging.basicConfig()
log.root.setLevel(logging.DEBUG)
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("matplotlib").setLevel(logging.WARNING)

## BrevettiAI Job API description

The Job API is a lower level API. This is the basis of development, model training and test reports. the job api is a set of tools focused around access to your data, tracking of settings and output to the website.

This section explores its usage from the perspective of a developer training models on his/her own computer.
Except for the initialization of the environment, all of the code is transferrable to a docker containerized deployment of model training on the platform.




## Initialization of the development job context

A developer access the development context (**CriterionConfig**) of unstarted jobs (models) via the web api.

When you have the criterion config object you are ready to build your models, add visualizations to your model page, and persist artifacts.

Use `help(CriterionConfig)` to get an overview over available methods.

When you are finished with the job/model call upload_job_output() and complete_job() on the config object to finalize the model and enable the comparison tool.

```python
    config.upload_job_output()
    training_config.complete_job(path_to_model_artifact)
```

## Platform Job Interface
To access a job you need the model id from e.g. the model path https://platform.brevetti.ai/models/<model_id> and the api key, also accessible from the platform, which together grant you access to the data storage ressources.

If you, instead, used the web api to get access to, and start a model, they id and key can be found in the response

* model_id = model_def["id"]

* api_key = model_def["apiKey"]


In [None]:
# Job info: NB: replace with ID and api key from your job
import os
import getpass

model_id = os.getenv("job_id") or input("Training job model id (can be read from url https://platform.brevetti.ai/models/{model_id})")
api_key = os.getenv("api_key") or getpass.getpass("Training job Api Key:")

In [None]:
from brevettiai.platform import Job
from brevettiai.interfaces import vue_schema_utils
 
job = Job.init(job_id=model_id, api_key=api_key)

## Storage

In the job context you have two storage modes, temporary and persisted storage. Temporary storage is local on the machine, while the persisted storage is in the cloud in the form of artifacts.

In [None]:
temp_path = job.temp_path("something_i_want_to_save_temporarily.txt")
print(temp_path)

job.io.write_file(temp_path, "Valuable information")
print(str(job.io.read_file(temp_path), "utf-8"))

In [None]:
artifact_path = job.artifact_path("something_i_want_to_save.txt")
print(f"Available at on the website: {job.host_name}/models/{job.id}/artifacts")

# And in from the job
job.io.write_file(artifact_path, "Valuable information")
print(str(job.io.read_file(artifact_path), "utf-8"))


# API: Accessing datasets and downloading samples
Samples in a dataset can be accessed via the dataset objects in a platform job object. Access rights are managed seamlessly.

Sample integrity and purpose management can be done easily through the sample integrity module, which splits the samples for test and training, taking duplicates, stratification, etc. into account

In [None]:
from brevettiai.platform import get_image_samples
samples = get_image_samples(job.datasets)

In [None]:
from brevettiai.data.sample_integrity import SampleSplit
samples = SampleSplit().update_unassigned(samples, id_path=job.artifact_path("sample_identification.csv"))

In [None]:
samples.head(5)

## Loading datasets
File operations can be performed via the io_tools object. This object manages access of local and remote resources across windows and linux platforms. along with local cachin of files etc.

In [None]:
# io_tools is accessible from the job object or directly via import 'from brevettiai.io import io_tools'
# Note that access rights are configured on the IoTools object, and as such different instances of the object
# does not neccesarily have access to the same files. 
io_tools = job.io
buf = io_tools.read_file(samples.path[0])
buf[:10]

In [None]:
# Set caching of remote objects globally for all operations on the IoTools object
io_tools.set_cache_root(job.temp_path("cache", dir=True))
# Or as a key in the read_file method

## Loading image data with tensorflow datasets
Samples may be easily loaded into tensorflow datasets with the **DataGenerator** class. **DataGenerator** contains a lot of functionality out of the box. Among others to sample, shuffle and seed your data generation.

In [None]:
from brevettiai.data.data_generator import StratifiedSampler, DataGenerator, OneHotEncoder
from brevettiai.data.image import ImagePipeline, ImageAugmenter, SegmentationLoader

ds = StratifiedSampler().get(samples, shuffle=True, batch_size=8, output_structure=("path", "folder"))

The DataGenerator has four methods to iterate over data.

First returning tensorflow datasets:

* `get_samples()` returning the dataset sampled, but with no mapping
* `get_dataset()` returning the dataset sampled and mapped

Likewise `get_samples_numpy()` and `get_dataset_numpy()` returning numpy iterators

In [None]:
# Return Data Geneator as tensorflow dataset objects to loop over samples or "img" and "category"
ds.get_samples(), ds.get_dataset()

In [None]:
# Get iterator of numpy objects
ds.get_samples_numpy(), ds.get_dataset_numpy()

As tensorflow datasets, you can map the dataset with functions.
Among premade functions are ImagePipeline, ImageAugmenter, OneHotEncoder and AnnotationParser

In [None]:
ds = DataGenerator(samples, shuffle=True, batch_size=8, output_structure=("img", "onehot"))
ds = ds.map(ImagePipeline(target_size=(64,64), antialias=True, rescale="imagenet")) \
    .map(OneHotEncoder(samples.folder.unique(), output_key="onehot"))

# Use the structure change the default structure of the ouput
ds.get_dataset(structure=("path", "img", "onehot"))

In [None]:
from brevettiai.data.image.utils import tile2d
import matplotlib.pyplot as plt

# Use structure=None to access all the dataset elements
x = next(ds.get_dataset_numpy(structure=None))
plt.imshow(tile2d(x["img"], (2,4))[...,0])
plt.colorbar()

In [None]:
# Use structure="img" to get just the image
x = next(ds.get_dataset_numpy(structure="img"))
plt.imshow(tile2d(x, (2,4))[...,0])

Using `build_image_data_generator` makes a simple dataset, combining loading, augmentation and onehot encoding og categories, and returning an (image, onehot) tuple which may be used directly as input to keras.

In [None]:
from brevettiai.data.data_generator import build_image_data_generator
ds = build_image_data_generator(samples, batch_size=8, image=dict(target_size=(224, 224), antialias=True, rescale="imagenet"))

The standard modules of TfDataset are deterministic and randomness may be seeded. Thus multiple runs of the same dataset object will result in the same output sequence. By application of the `seed` parameter, this can be true across multiple similar TfDataset objects.

In [None]:
from brevettiai.data.data_generator import build_image_data_generator
ds = build_image_data_generator(samples, batch_size=8, image=dict(target_size=(224, 224), antialias=True, rescale="imagenet"))
x = next(ds.get_dataset_numpy())
plt.figure()
plt.title("Run 1")
plt.imshow(tile2d(x[0], (2,4))[...,0])
plt.figure()
plt.title("Run 2 of the same dataset results in the same sampling and augmentation performed on the dataset")
x = next(ds.get_dataset_numpy())
plt.imshow(tile2d(x[0], (2,4))[...,0])

# API: Interfaces / integrations
##Job output to platform website
A number of different outputs are available on the platform, here is a subset.

## Metrics
Metrics which may be compared across models can be added via the config object.

In [None]:
print(f"Uploading metrics and outputs to {job.host_name}/models/{model_id}/artifacts")
job.add_output_metric("My custom metric", 277)
job.upload_job_output()

## Progress monitoring (Models only)
Add progress metrics to monitor your models while it is running, by adding the RemoteMonitor callback to your keras training loop or call it yourself in your training code.

In [None]:
from brevettiai.interfaces.remote_monitor import RemoteMonitor
remote_monitor_callback = RemoteMonitor(root=job.host_name, path=job.api_endpoints["remote"])
# Simulate training epochs and produce callbacks
remote_monitor_callback.on_epoch_end(11, {"loss": 0.9, "accuracy": 0.5})
remote_monitor_callback.on_epoch_end(12, {"loss": 0.7, "accuracy": 0.8})

print(f"Training progress visible on {job.host_name}/models/{model_id}")

## Pivot tables

create pivot tables on the web platform to get an overview over your data

In [None]:
from brevettiai.interfaces.pivot import export_pivot_table, get_default_fields, pivot_fields
export_pivot_table(job.artifact_path("pivot", dir=True), samples,
                   datasets=job.datasets,
                   fields=None,
                   tags=job.get_root_tags(),
                   rows=["dataset_id"],
                   cols=["category"],
                   agg={"url": "first"})
print(f"Pivot table visible on {job.host_name}/models/{model_id}")

## Facets
Create facet dives to explore your data in depth by creating a dataset outputting thumbnails of size (64x64) per sample. 
![Facet example](https://gblobscdn.gitbook.com/assets%2F-LY12YhLSCDWlqNaQqWT%2F-MIdFH6dqJxgrYtQH83E%2F-MIdJ3qn1kPxLh6K0YI0%2Fimage.png?alt=media&token=d59993dc-9dd0-4f97-a548-4d6ceddf257d)

Put the files in the facets folder in your artifacts. To use the built-in tools you need to supply a DataGenerator which outputs a 64x64 thumbnail image, and category.

In [None]:
from brevettiai.interfaces.facets_atlas import build_facets
from brevettiai.data.data_generator import StratifiedSampler, DataGenerator
fds = DataGenerator(samples, shuffle=True, output_structure=("img", "category")) \
    .map(ImagePipeline(target_size=(64,64), antialias=True))

build_facets(fds, job.artifact_path("facets", dir=True), count=32)

print(f"Facets visible on {job.host_name}/models/{model_id}")

build_facets(fds, job.artifact_path("facets", dir=True), count=32)

## Vega-lite charts
Vega-Lite charts
Add Vega-Lite charts to your model page by calling upload_chart on the configuration object. Some different standard charts are available under brevettiai.interfaces.vegalite_charts

In [None]:
from brevettiai.interfaces import vegalite_charts

vegalite_json = vegalite_charts.dataset_summary(samples)
job.upload_chart(vegalite_json)

# Complete job to update the following on the platform
* The path to the model file (optional)
* That the training or testing process is finished, so the UI can be updated
* That access to write to the artifact path, and access to the datasets should no longer be granted

In [None]:
# job.complete_job()