# Bee Classification using Custom Vision - Training images - Augmented

## MPP AI Capstone

The data files for the Capstone are found here after registration and log in. 
https://www.datasciencecapstone.org/competitions/5/bumblebee-or-honeybee/page/16/

Extract the Capstone files in the same directory as this Jupyter Notebook and You are ready to go.

## Custom Vision init

https://customvision.ai

Documentation:

https://docs.microsoft.com/en-us/azure/opbuildpdf/cognitive-services/Custom-Vision-Service/toc.pdf?branch=live


To use the tutorial, you need to do the following: 
- Install either Python 2.7+ or Python 3.5+. 
- Install pip. 
- Install Git.

To build this example, you need to install the Preview Python SDK for the Custom Vision API from GitHub as follows:

pip install "git+https://github.com/Azure/azure-sdk-for-python#egg=azure-cognitiveservices-visioncustomvision&subdirectory=azure-cognitiveservices-vision-customvision"

If you encounter a Filename too long error, make sure you have longpath support in Git enabled:

git config --system core.longpaths true


### Initializing the Custom Vision pipe and creating a new project


In [5]:
from azure.cognitiveservices.vision.customvision.training import training_api 
from azure.cognitiveservices.vision.customvision.training.models import ImageUrlCreateEntry

# Replace with a valid key 
# Obtain your training and prediction key by signing in to Custom Vision Service and going to your account settings. 

training_key = "7a712806471e45b8b99ccb8ec0221fa1" 

trainer = training_api.TrainingApi(training_key)

# Create a new project

print ("Creating project...") 
project = trainer.create_project("MPP AI Capstone Aug5")
print ("Project created!")


Creating project...
Project created!


### Adding tags

In [6]:
# Make two tags in the new project

bumblebee_tag = trainer.create_tag(project.id, "bumblebee") 
honeybee_tag = trainer.create_tag(project.id, "honeybee")


### Sorting images

import os
import pandas as pd
import shutil as sh
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

# Creating subfolders

train_dir= "train"
bumblebee_dir = "train\\bumblebee"

if not os.path.exists(bumblebee_dir):
    os.makedirs(bumblebee_dir)

honeybee_dir = "train\\honeybee"

if not os.path.exists(honeybee_dir):
    os.makedirs(honeybee_dir)

# Reading labels
    
labels = pd.read_csv("train_labels.csv", header=0, dtype=str)
# print(labels)

# Sorting files by label

bb_labels = (labels[labels.bee_type == "bumble_bee"])
bb_files = (bb_labels.id + ".jpg")
# print(bb_files)

hb_labels = (labels[labels.bee_type == "honey_bee"])
hb_files = (hb_labels.id + ".jpg")
# print(hb_files)

# Copying files by label to subfolders

for files in bb_files:
    sh.copy2(train_dir + "\\" + files, bumblebee_dir)
    
for files in hb_files:
    sh.copy2(train_dir + "\\" + files, honeybee_dir)


### Uploading images

In [7]:
import os

bumblebee_dir = "train\\bumblebee"
honeybee_dir = "train\\honeybee"

# Then image uploading, this might take a while, upto 1 hour with 4000 images

for image in os.listdir(os.fsencode(bumblebee_dir)): 
    with open(bumblebee_dir + "\\" + os.fsdecode(image), mode="rb") as img_data:
        trainer.create_images_from_data(project.id, img_data.read(), [ bumblebee_tag.id ])

for image in os.listdir(os.fsencode(honeybee_dir)): 
    with open(honeybee_dir + "\\" + os.fsdecode(image), mode="rb") as img_data:
        trainer.create_images_from_data(project.id, img_data.read(), [ honeybee_tag.id ])


### Training

First iteration in the project, mark as default iteration

In [8]:
import time

print ("Training...") 
iteration = trainer.train_project(project.id) 
while (iteration.status == "Training"):
    iteration = trainer.get_iteration(project.id, iteration.id)
    print ("Training status: " + iteration.status)
    time.sleep(1)

# The iteration is now trained. Make it the default project endpoint

trainer.update_iteration(project.id, iteration.id, is_default=True)
print ("Done!")

Training...


HttpOperationError: Operation returned an invalid status code 'Bad Request'

### Testing 1st run (then use separate TestingImages notebook)

In [9]:
from azure.cognitiveservices.vision.customvision.prediction import prediction_endpoint 
from azure.cognitiveservices.vision.customvision.prediction.prediction_endpoint import models

prediction_key = "a4aea911d9fa4015b5a445164c70552a"
predictor = prediction_endpoint.PredictionEndpoint(prediction_key)


# Open the sample image and get back the prediction results. 
# This might take some time, upto 15 min with 1000 images.
# Write the results to file.

with open("predictions.csv","w") as f:
    f.write("id,pred1,pred2\n")

test_dir = "test"

for image in os.listdir(os.fsencode(test_dir)):
    with open(test_dir + "\\" + os.fsdecode(image), mode="rb") as test_data:
        results = predictor.predict_image(project.id, test_data.read())
        with open("predictions.csv", "a") as f:
            f.write(image.decode("utf-8"))
        for prediction in results.predictions:
            with open("predictions.csv","a") as f:
                f.write("," + prediction.tag + ": {0:.2f}\n".format(prediction.probability))
            
print("Done testing, results in Predictions.csv")

HttpOperationError: Operation returned an invalid status code 'Not Found'