# One-Shot Document Classification


BoonAI's One-Shot learning feature allows you to train a model with a single represenative example

This jupyter notebook demonstrates the workflow used in the documentation for **Boon AI's Python SDK**. You can find that walk-through [here](https://app.gitbook.com/@zorroa/s/boonsdk/solutions/single-shot).<br>

## Setup: 

In [None]:
import boonsdk

# You must copy your API key into parent directory for it to be loaded.
app = boonsdk.app_from_keyfile("../apikey.json")


## Dataset:
Now that we have set up the environment, we can define the dataset. We will be using a sample set comprised of two resumes and two cover letters to train our model. Remember, a dataset is a collection of assets to which a label has been assigned. In order for a model to be trained, it must be attached to a dataset. For more information on datasets, refer to datasets subheading, under custom models in our documentation. You can find it [here](https://app.gitbook.com/@zorroa/s/boonsdk/training-models/datasets).

In [None]:
# Create the dataset.  If you've already created this Dataset then you
# can skip this step.
app.datasets.create_dataset(
    "job-documents-dataset", 
    boonsdk.DatasetType.Classification
)

 There is an images.csv file in this directory. It consists of two columns and five rows. The first row is a header. The first column points to the uri for an image and the second applies a label to that image. If you have downloaded this file, be sure to provide the path to your local version of "images.csv" By providing labels that we have already associated with the dataset we're creating, we are associating these images with the dataset. After running this next block of code, you will recieve an object enumerating errors, number of assets you have created, and number of assets that exist. If you see a number in 'exists', it means that you have attempted to upload a duplicate of a file already in your project. 

In [None]:
# If you have downloaded this file, be sure to provide the path to your local version of "images.csv"
path_to_csv = "./images.csv"


# The file we're importing has values separated by commas, which is the default delimeter. 
# Each uri and label is enclosed in double quotes. 
csvfile = boonsdk.CsvFileImport(path_to_csv,
                                uri_column=0,
                                dataset=dataset,
                                label_column=1,
                                header=True)

result = app.assets.import_csv(csvfile)

# Wait on the import job to complete.
app.jobs.wait_on_job(result)

## Model:
Now that we have a dataset imported, we can create and train a model that will predict the type of any future documents we add to this project. We will use a KNN Classifier and attach the dataset we created earlier. 

In [None]:
# Grab the dataset we made earlier.
dataset = dataset=app.datasets.get_dataset("job-documents-dataset")

# Create model and link the dataset.
model = app.models.create_model(
    "document-classifier",
    boonsdk.ModelType.KNN_CLASSIFIER,
    dataset=dataset
)

## Training: 
The next step after creating a model with an attached dataset is to train the model. KNN Classifiers are relatively quick to train, so we shouldn't have to wait long for this step to be complete. You can use the job ID returned from the following line of code to check when the training is complete. 

In [None]:
result = app.models.train_model(model)

# Wait on the training job to be done, this should take 10-20 seconds.
app.jobs.wait_on_job(result)

## Testing: 
Now we'll test the model we've trained by providing the url for a single image we would like to classify as either a resume or a cover letter. The result we print is a JSON object with one prediction comprised of a label and a confidence score. If the predicted label returned is 'Unrecognized', that means the model believes what you have provided is not a resume or a cover letter. 
**Note: this does not upload a new asset, it only analyzed an image you have referred to by url.**

In [None]:
with open("test_resume1.jpg", "rb") as fp:
    result = app.assets.analyze_file(fp, ["document-classifier"])
print(result.get_analysis("document-classifier"))

## Improving Performance
There are many different styles of resumes and and you might eventually run into a case where your model fails to recognize the correct type of document.  When this happens you can improve your model's performance by labeling more examples and retraining your model.