# Data exploration for Capstone 2

Exploring the labels and images included in the [Lemons dataset](https://github.com/softwaremill/lemon-dataset).

The annotations are contained in a .json file. It has simple classification targets as well as semantic segmentation of additional quality (both image and fruit) targets.

### The Data
The data is contained within two folders - one with images and one with the annotations .json file. 

In [103]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pycocotools.coco import COCO
from skimage import io, color, filters
from skimage.transform import resize, rotate

In [61]:
coco = COCO('../data/raw/annotations/instances_default.json')

loading annotations into memory...
Done (t=0.75s)
creating index...
index created!


### Let's look at our categories

In [62]:
cats = coco.loadCats(coco.getCatIds())
cats

[{'id': 1, 'supercategory': '', 'name': 'image_quality'},
 {'id': 2, 'supercategory': '', 'name': 'illness'},
 {'id': 3, 'supercategory': '', 'name': 'gangrene'},
 {'id': 4, 'supercategory': '', 'name': 'mould'},
 {'id': 5, 'supercategory': '', 'name': 'blemish'},
 {'id': 6, 'supercategory': '', 'name': 'dark_style_remains'},
 {'id': 7, 'supercategory': '', 'name': 'artifact'},
 {'id': 8, 'supercategory': '', 'name': 'condition'},
 {'id': 9, 'supercategory': '', 'name': 'pedicel'}]

### Drop the unused 'supercategory' field

In [63]:
for idx, _ in enumerate(cats):
    del cats[idx]['supercategory']

In [64]:
cats

[{'id': 1, 'name': 'image_quality'},
 {'id': 2, 'name': 'illness'},
 {'id': 3, 'name': 'gangrene'},
 {'id': 4, 'name': 'mould'},
 {'id': 5, 'name': 'blemish'},
 {'id': 6, 'name': 'dark_style_remains'},
 {'id': 7, 'name': 'artifact'},
 {'id': 8, 'name': 'condition'},
 {'id': 9, 'name': 'pedicel'}]

### Make it a dataframe!

In [135]:
cats_df = pd.DataFrame.from_dict(cats).set_index('id')
cats_df

Unnamed: 0_level_0,name
id,Unnamed: 1_level_1
1,image_quality
2,illness
3,gangrene
4,mould
5,blemish
6,dark_style_remains
7,artifact
8,condition
9,pedicel


### Let's look at our annotations

A single annotation

In [145]:
anns = coco.loadAnns(coco.getAnnIds())
image_id_set = set()
for i, _ in enumerate(anns):
    image_id_set.add(anns[i]['image_id'])
anns[3:5]

[{'id': 4,
  'iscrowd': 0,
  'area': 30.0,
  'category_id': 5,
  'image_id': 100,
  'segmentation': [[311.98046875,
    494.6767578125,
    308.92625953626884,
    496.7767247414795,
    309.8807842578408,
    500.21301373913593,
    311.2171188680404,
    501.3584434050208,
    314.0806930327544,
    500.78572857207837,
    315.2261226986393,
    498.68577418462155,
    315.03521775432455,
    496.0131049642223]],
  'bbox': [308.92625953626884,
   494.6767578125,
   6.299863162370457,
   6.681685592520807]},
 {'id': 5,
  'iscrowd': 0,
  'area': 31.0,
  'category_id': 2,
  'image_id': 100,
  'segmentation': [[606.7744140625,
    489.2041015625,
    602.8160214904838,
    491.8431300845732,
    605.4550914406846,
    495.1419675223224,
    609.0838126222097,
    495.1419675223224,
    610.4033475973083,
    490.853478853247]],
  'bbox': [602.8160214904838,
   489.2041015625,
   7.587326106824548,
   5.937865959822375]}]

### len(anns)

## Load the images!

In [88]:
cat_ids = coco.getCatIds(catNms=['artifact'])
img_ids = coco.getImgIds(catIds=cat_ids)
len(img_ids)

451

In [79]:
img = coco.loadImgs(img_ids[np.random.randint(0, len(img_ids))])[0]
I = io.imread('data/raw/images/)

FileNotFoundError: No such file: '/home/chris/Dropbox/galvanize/capstones/capstone-2/notebooks/data/raw/images'