# AutoDistil
This notebook covers the use of [Roboflow's autodistillation](https://blog.roboflow.com/autodistill/) feature.
## Overview
Model distillation is a technique for compressing a large model into a smaller model. The smaller model is trained to mimic the behavior of the larger model. This is useful for deploying models to devices with limited memory and processing power, such as mobile phones and embedded devices. What *Autodistill* aims to do is to automate the processes of training a computer vision model using initially unlabeled data. Instead autodistill uses a large model trained on a large dataset to label the unlabeled data. The labeled data is then used to train a smaller model which can be deployed to edge devices.
## Setup
### Install Dependencies
This notebook utilizes the pre-trained [SAM-Segment Anything Model](https://segment-anything.com), [[Github]](https://github.com/facebookresearch/segment-anything) [[Paper]](https://ai.facebook.com/research/publications/segment-anything/), and [Grounded Dino](https://huggingface.co/spaces/ShilongLiu/Grounding_DINO_demo), [[Github]](https://github.com/IDEA-Research/GroundingDINO) \& [[paper]](https://arxiv.org/pdf/2303.05499.pdf) models. The following are the dependencies needed to run this notebook:
- [AutoDistill](https://github.com/autodistill)
- [autodistill-grounded-sam](https://github.com/autodistill)
- [autodistill-yolov8](https://github.com/autodistill/autodistill-yolov8)
- [supervision](https://github.com/autodistill/supervision)
- **optionally** [autodistill-sam-clip](https://github.com/autodistill/autodistill-sam-clip)

There are many available base-models, i.e. grounded-sam or sam-clip, that can be used to label the unlabeled data. Make sure to use the one that works best for your use case. 

**Note** do not label all of your unlabeled data at once. Instead test if the model works well on a small subset of your data before labeling all of it. It might help to test a different base-model if the first one does not work well. See the [autodistill repo](https://github.com/autodistill) for the latest updates on available base-models.

To install the dependencies run the following commands:
```bash
pip install autodistill autodistill-grounded-sam autodistill-yolov8 supervision
```
It is recommended to use a virtual environment to install the dependencies.

With the environment setup, we start the auto-distillation process by importing the necessary libraries.

### Supervision - sv
Framework for training models to work with some of the SOTA models. It gives an interface between the various models a common Roboflow format.
### Autodistill
Library containing the main functions and classes for the auto-distillation process.
### Autodistill-grounded-sam
The grounded-sam model is used to label the unlabeled data. This can be replaced with any of the other available base-models supplied by the autodistill library.

In [1]:
import supervision as sv
from autodistill.detection import CaptionOntology
from autodistill_grounded_sam import GroundedSAM
from autodistill_sam_clip import SAMCLIP

  from .autonotebook import tqdm as notebook_tqdm


### Dataset setup
We now need to specify where the current images that should be labeled are stored and where the labeled images should be stored.

In [2]:
image_folder = 'test_folder'
output_folder = 'output_folder'

## Specify the 'Ontology' and the base-model
Crucial part of the process. This step is where you specify the classes that you want to label. This is done using a descriptive text per class. The large base-models uses the text embedding generated from the text to label the image, therefore the text should be consice and descriptive.
```python
descriptions = {
    "description of class 1": "class 1",
    "description of class 2": "class 2",
    "description of class 3": "class 3",
}
```


### Note that these models are very large and will take a long time to download, and will require a lot of memory to run.

In [3]:
ontology = CaptionOntology({
    'Number 0': '0',
    'Number 1': '1',
    'Number 2': '2',
    'Number 3': '3',
    'Number 4': '4',
    'Number 5': '5',
    'Number 6': '6',
    'Number 7': '7',
    'Number 8': '8',
    'Number 9': '9',
})
base_model = SAMCLIP(ontology)

## Label the data
Now using the easy to use interface implemented we can label the data. The following code will label the data and save the labels to the specified directory.

In [None]:
dataset = base_model.label(
    input_folder=image_folder,
    output_folder=output_folder,
)