## Land Classification Using Supervised Learning
### Pattern Recognition Mini Project (VII Semester CSE, A Section)
Submitted by:
B P Gayathri Ananya - ENG17CS0047  
Bharat Nilam - ENG17CS0050  
Chirag P D - ENG17CS0059  

### Introduction
A land use classification is a classification providing information on land cover, and the types
of human activity involved in land use. It may also facilitate the assessment of environmental
impacts on, and potential or alternative uses of, land. Classifying and mapping land cover is an
integral step in understanding the Earth's biophysical systems. Data on the area and distribution
of wildlife habitat, for example, are useful in managing and mitigating development impacts
on protected and endangered species.

### (Optional) Installing package dependencies if running on Google Colab

In [None]:
!pip install boto3

### 1. Import Libraries

In [None]:
%matplotlib inline
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import boto3
from botocore import UNSIGNED
from botocore.client import Config
import os
import cv2

### 2. Define functions
Define functions to read image from Amazon S3 and stream to memory

In [None]:
def stream2npy(img_stream):
    arr = np.asarray(bytearray(img_stream['Body'].read()), dtype=np.uint8)
    return arr

### 3. Get data from cloud
We will get data from an S3 bucket in the form of a geotif.

In [None]:
BUCKET_NAME = 'uw-geohack'
KEY_DG = 'la_digitalglobe_small.tif' 

s3 = boto3.resource('s3')
s3_client = boto3.client('s3',config=Config(signature_version=UNSIGNED))

img_stream_dg = s3_client.get_object(Bucket=BUCKET_NAME, Key=KEY_DG)

print('DG:', img_stream_dg)

In [None]:
# Read the data into memory and convert it to a numpy array in RGB space

img1=stream2npy(img_stream_dg)
img1_de = cv2.imdecode(img1, -1)
img1_rgb = cv2.cvtColor(img1_de, cv2.COLOR_BGR2RGB)

### 4. Visualize results

In [None]:
fig=plt.figure(figsize=(18, 16))
plt.subplot(1, 2, 2)
plt.imshow(img1_rgb)

### 5. Get training data

In [None]:
TRAIN_DG = 'training_data_dg.tif'

train_stream2 = s3_client.get_object(Bucket=BUCKET_NAME, Key=TRAIN_DG)
train2=stream2npy(train_stream2)
train2_de = cv2.imdecode(train2, -1)

print('Shape of the Training data:', train2_de.shape)
print('Shape of the Image data:', img1_rgb.shape)

### 6. How the training data was made..
The training data was created in QGIS as a geojson vector layer. This layer was rasterized using the command line rasterio tools.

### 7. Evaluate training data


In [None]:
classes = {'pool': 1, 'street': 2, 'grass': 3, 'roof': 4, 'tree': 5, 'shadow': 6}
n_classes = len(classes)
print('Unique values in training array: ',np.unique(train2_de))

# create a colour palette we will use to colour the predictions
palette = np.uint8([[0, 0, 0],[0, 255, 255], [128, 128, 128], [0, 255, 0],[255, 255, 255],[0, 102, 0],[51, 51, 51]])

In [None]:
def statsdata(arr):
    '''generate histogram of training data'''
    fig=plt.figure(figsize=(7, 7))
    bins = range(8)
    plt.hist(arr, bins=bins) 
    bins_labels(bins, fontsize=20)
    plt.title('Distribution of Training Data Classes')
    plt.xlabel('Classes')

def bins_labels(bins, **kwargs):
    '''center the histogram bin labels due to OCD'''
    bin_w = (max(bins) - min(bins)) / (len(bins) - 1)
    plt.xticks(np.arange(min(bins)+bin_w/2, max(bins), bin_w), bins, **kwargs)
    plt.xlim(bins[0], bins[-1])

statsdata(train2_de[train2_de>0].ravel())

### 8. Create training data mask
Mask out the parts of the RGB image we wont be using.

In [None]:
rows, cols, bands = img1_rgb.shape
full=img1_rgb[:,:,0:3]   
full=full.ravel()
full=full.reshape((-1, 1))  

red=img1_rgb[:,:,0]
green=img1_rgb[:,:,1] 
blue=img1_rgb[:,:,2]

# remove all the 'class 0' from the training data
red=np.where(train2_de>0, red,0)
green=np.where(train2_de>0, green,0)
blue=np.where(train2_de>0, blue,0)

# create a mask with the same dimensions
Xtrain=np.dstack((red,green,blue))
Ylabel=np.dstack((train2_de,train2_de,train2_de))

# flatten
data = Xtrain.ravel()     
label= Ylabel.ravel()  

# remove all the 'class 0' from the training data
l=label[label>0]
d=data[label>0]
d=d.reshape((-1, 1)) 


plt.imshow(Xtrain)

### 9. Training the SVM
What is a Support Vector Machine (SVM)? Given a set of labeled training data (supervised learning), the SVM outputs an optimal hyperplane which categorizes new data (not used in training). For a simple case of two classes in 2 dimensional space, the hyperplane is a line dividing a plane in two parts where each of the 2 classes lay on either side. SVMs are useful for classification problems where you would like to different among mulitple classes.

In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(d, l, test_size=0.25)

clf = SVC()
clf.fit(X_train, y_train)
y_t = clf.predict(full)
predicted=y_t.reshape(rows, cols,3)

fig=plt.figure(figsize=(18, 16))
plt.imshow(palette[predicted][:,:,0])

### 10. Model Performance
Create confusion matrix

In [None]:
from sklearn import datasets, svm, metrics
from sklearn.metrics import accuracy_score
expected = y_test
predicted = clf.predict(X_test)

print("Classification report for classifier %s:\n%s\n"
      % (clf, metrics.classification_report(expected, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))
print ('Accuracy Score :',accuracy_score(expected, predicted))

### 11. Train Random Forest Classifier
What is a random forest (RF) classifier? An RF classifier constructs decision trees during supervised training and outputs the class that is the mode of the classification of the individual trees.

In [None]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X_train, y_train)
y_rf = clf.predict(full)
predictedRF=y_rf.reshape(rows, cols,3)

fig=plt.figure(figsize=(18, 16))
plt.imshow(palette[predictedRF][:,:,0])

### 12. Model Performance - RF Classifier

In [None]:
from sklearn import datasets, svm, metrics
from sklearn.metrics import accuracy_score
expected = y_test
predicted = clf.predict(X_test)

print("Classification report for classifier %s:\n%s\n"
      % (clf, metrics.classification_report(expected, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))
print ('Accuracy Score :',accuracy_score(expected, predicted))

### 13. Metrics Used
recall = true positive / (true positive + false positive)

precision = true positive / (true positive + false negative)

accuracy = true positive + true negative /(true positive + false negative + false positive + true negative)

f1 = 2 * (recall * precision)/(recall + precision)

support = sum rows of matrix