# WHU-RS19 Dataset

> This dataset is composed of 19 classes of different scenes, including
airport, beach, bridge, commercial area, desert, farmland, football field, forest,
industrial area, meadow, mountain, park, parking, pond, port, railway station,
residential area, river and viaduct. Each class has 50 images, with the size of 600Ã—600
pixels.

http://www.escience.cn/people/yangwen/WHU-RS19.html


The purpose of this script is to import and clean the WHU-RS19 dataset. This consits of:

    - Ensuring all images are in the pixel range of [0,255]
    - Images have 3 RGB color channels, i.e. black and bhite images are convered to 3 Ch black and white images, and transparent alpha channels are dropped
    - Images are chopped up into supimages of shape 200x200 px. i.e. each 600x600 image is converted to 9 images of 200x200
    
Not all 19 classes of the WHU-RS19 data set are imported. This is determined by the dictionary in `pipeline.raw_data.clean_whu_rs19`

Images are then pickled and stored in training and testing files, named Xtest, Xtrain, Ytest, Ytrain to refer to ther test and train images and test and train catigorical labels, respectivly.

In [1]:
# Move back to the root project directory
%cd ../..

/Users/bdhammel/Documents/insight/harvesting


In [2]:
%matplotlib inline 
import matplotlib.pyplot as plt
from pipeline.raw_data import clean_whu_rs19
from pipeline.raw_data import utils as clean_utils
from pipeline import utils as pipe_utils
import glob
import os
import csv

# These paths should not change
dataset_dir = os.path.join(os.getcwd(), 'datasets')
cleaned_image_dir = os.path.join(dataset_dir, 'obj_detection/dota/images')
cleaned_label_dir = os.path.join(dataset_dir, 'obj_detection/dota/')

labels_to_load = [label for label, local_label in clean_whu_rs19.MAP_TO_LOCAL_LABELS.items() if local_label is not None]

Define the paths to raw data sets here. The raw dataset can be downloaded from the above link, unzip this on your home computer and difine the path here:

In [3]:
whu_rs19_image_dir = '/Volumes/insight/data/WHU-RS19/RSDataset/'

Load the entire raw dataset. This imports all 19 categories and save them in a dictionary of the form:

```
raw_dataset = {
    'farmland': [farmland_image1, farmland_image2, ...],
    'parking': [parking_image1, parking_image2, ...],
    ...
}
```

In [4]:
raw_dataset = clean_utils.load_from_categorized_directory(whu_rs19_image_dir, labels_to_load)

Image Farmland-01.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,211.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image Farmland-02.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,187.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image Farmland-03.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,187.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image Farmland-04.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (11.00,147.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image Farmland-05.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,189.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image Farmland-06.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (2.00,116.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image Farmland-07.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (9.00,127.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image Farmland-08.jpg load

Image forest_18.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,201.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image forest_19.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,234.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image forest_20.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image forest_21.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image forest_22.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,239.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image forest_23.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,236.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image forest_24.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image forest_25.jpg loaded
	Shape:  (600,

Image meadow_26.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (11.00,235.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image meadow_27.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,189.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image meadow_28.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (13.00,161.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image meadow_29.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (1.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image meadow_30.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,238.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image meadow_31.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,217.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image meadow_32.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image meadow_33.jpg loaded
	Shape:  (60

Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image parking_31.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image parking_32.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image parking_33.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image parking_34.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image parking_35.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image parking_36.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image parking_37.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:


Image pond_47.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image pond_48.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image pond_49.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image pond_50.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,224.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image pond_51.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,254.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image pond_52.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image pond_53.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (2.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image pond_54.jpg loaded
	Shape:  (600, 600, 3)
	dtype:

Image residential_53.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8
Image residential_54.jpg loaded
	Shape:  (600, 600, 3)
	dtype:  uint8
Values: (0.00,255.00)
Cleaned To:
	Shape:  (600, 600, 3)
	dtype:  uint8


Because this dataset is merged with the UC Merced Aerial dataset, some of the class names are changed. In addition, some classes which are not of interest are dropped form the dataset. This behavior is devined in `pipeline.raw_data.clean_whu_rs19.py`

In [5]:
data = clean_utils.convert_classes(raw_dataset, clean_whu_rs19.MAP_TO_LOCAL_LABELS)
del raw_dataset # clear up some memory