# Computer Detection of Algal Blooms

![algal bloom](download.jpeg)

## The problem 

Phytoplankton are single-celled algae that live in freshwater, saltwater, and everywhere in between. Algal blooms, big explosions of phytoplankton growth, happen naturally every late spring and early fall all around the globe. Sometimes, blooms grow out of control for a variety of reasons: excess of nutrient from land based runoff, influx of freshwater, higher than normal temperatures. These are called Harmful Algal Blooms (HABs). HABs pose a threat to human health and ecosystem health. 

![hab facts](download1.jpeg)

Early detection of algal blooms from imagery can help scientists and policy makers sample blooms as soon as possible to determine if they are toxic, and make important decisions regarding public health: beach closures, fisherman warnings, and seafood warnings can all help humans avoid harmful effects of HABS. My goal is to create a model that classifies algal blooms using imagery obtained from Google. Algal blooms come in all differnet shapes, sizes and colors, and this highly varied dataset will hopefully lead to a model that is able to classsify algal blooms in all forms. 

In [1]:
import pandas as pd
import numpy as np
import os
import glob
import  PIL
from PIL import Image
import cv2
import time
import matplotlib.pyplot as plt
import scipy
from scipy import ndimage
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from data_cleaning import *
from keras import models
from keras import layers
%matplotlib inline

Using TensorFlow backend.


In [3]:
# get_img_stats is a function loaded in from data_cleaning.py
# it loops through each directory and gets the dimensions of each image and prints the minimum
test_algae = "split/test/algae/*"
test_not_algae = "split/test/not_algae/*"
get_img_stats(test_algae)
get_img_stats(test_not_algae)

train_algae = "split/train/algae/*"
train_not_algae = "split/train/not_algae/*"
get_img_stats(train_algae)
get_img_stats(train_not_algae)

val_algae = "split/validation/algae/*"
val_not_algae = "split/validation/not_algae/*"
get_img_stats(val_algae)
get_img_stats(val_not_algae)

images appended!
(140, 100)
images appended!
(160, 237)
images appended!
(140, 100)
images appended!
(153, 110)
images appended!
(140, 100)
images appended!
(200, 160)


In [14]:
# rescale and reshape the images

train_gen = ImageDataGenerator(rescale=1./255).flow_from_directory(
    'split/train/', target_size=(140,140), batch_size=871, class_mode='binary')

test_gen = ImageDataGenerator(rescale=1./255).flow_from_directory(
    'split/test/', target_size=(140,140), batch_size=291, class_mode='binary')

val_gen = ImageDataGenerator(rescale=1./255).flow_from_directory(
    'split/validation/', target_size=(140,140), batch_size=294, class_mode='binary')

Found 867 images belonging to 2 classes.
Found 288 images belonging to 2 classes.
Found 291 images belonging to 2 classes.


In [15]:
# split the images and labels
train_images, train_labels = next(train_gen)
test_images, test_labels = next(test_gen)
val_images, val_labels = next(val_generator)

In [16]:
# check the dataset specs
m_train = train_images.shape[0]
m_test = test_images.shape[0]
m_val = val_images.shape[0]
num_pics = train_images.shape[1]

print ("Number of training samples: " + str(m_train))
print ("Number of testing samples: " + str(m_test))
print ("Number of validation samples: " + str(m_val))
print ("train_images shape: " + str(train_images.shape))
print ("train_labels shape: " + str(train_labels.shape))
print ("test_images shape: " + str(test_images.shape))
print ("test_labels shape: " + str(test_labels.shape))
print ("val_images shape: " + str(val_images.shape))
print ("val_labels shape: " + str(val_labels.shape))

Number of training samples: 867
Number of testing samples: 288
Number of validation samples: 291
train_images shape: (867, 140, 140, 3)
train_labels shape: (867,)
test_images shape: (288, 140, 140, 3)
test_labels shape: (288,)
val_images shape: (291, 140, 140, 3)
val_labels shape: (291,)


In [17]:
train_labels[:5]

array([1., 0., 0., 1., 0.], dtype=float32)

In [18]:
# concatenate the labels and images
images = np.concatenate((train_images, test_images, val_images))
labels = np.concatenate((train_labels[:,0], test_labels[:,0], val_labels[:,0]))

# check the shape of some of the images to make sure everything went as planned
train_images.shape

IndexError: too many indices for array

In [None]:
# check size stas post-reshaping
# show image

In [None]:
# split data 

In [None]:
# design the base model (NO AUGMENTATION)

In [None]:
# evaluate base model

In [None]:
# save model 

In [None]:
# augmentation 

In [None]:
# build new model

In [None]:
# evaluate augmented model

In [None]:
# Final evaluation

In [None]:
# results, conclusions