<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 3 Assignment 2*
# Convolutional Neural Networks (CNNs)

# Assignment

Load a pretrained network from TensorFlow Hub, [ResNet50](https://tfhub.dev/google/imagenet/resnet_v1_50/classification/1) - a 50 layer deep network trained to recognize [1000 objects](https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt). Starting usage:

```python
module = hub.Module("https://tfhub.dev/google/imagenet/resnet_v1_50/classification/1")
height, width = hub.get_expected_image_size(module)
images = ...  # A batch of images with shape [batch_size, height, width, 3].
logits = module(images)  # Logits with shape [batch_size, num_classes].
```

Apply it to classify the images downloaded below (images from a search for animals in national parks):

In [0]:
!pip install tensorflow hub

In [0]:
import tensorflow as tf
import tensorflow_hub as hub

In [4]:
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

Using TensorFlow backend.


In [6]:
from google_images_download import google_images_download

response = google_images_download.googleimagesdownload()
arguments = {"keywords": "animal national park", "limit": 20,
             "print_urls": True}
absolute_image_paths = response.download(arguments)


Item no.: 1 --> Item name = animal national park
Evaluating...
Starting Download...
Image URL: https://i.ytimg.com/vi/P8NJa_YoRxk/maxresdefault.jpg
Completed Image ====> 1.maxresdefault.jpg
Image URL: https://k6u8v6y8.stackpathcdn.com/blog/wp-content/uploads/2017/06/Royal-Bengal-Tiger.jpg
Completed Image ====> 2.Royal-Bengal-Tiger.jpg
Image URL: https://www.nps.gov/arch/learn/nature/images/ARK_6.jpg?maxwidth=1200&maxheight=1200&autorotate=false
Completed Image ====> 3.ARK_6.jpg
Image URL: https://www.corbettnationalpark.in/blog/wp-content/uploads/2015/08/cropped-13625772024_1fd7467d29_k1.jpg
Completed Image ====> 4.cropped-13625772024_1fd7467d29_k1.jpg
Image URL: https://npca.s3.amazonaws.com/images/8135/2c7e0d75-c7ff-4336-99d7-259448d03a4d-banner.jpg?1445969501
Completed Image ====> 5.2c7e0d75-c7ff-4336-99d7-259448d03a4d-banner.jpg
Image URL: https://k6u8v6y8.stackpathcdn.com/blog/wp-content/uploads/2014/04/national-parks-and-wildlife-sanctuaries-in-india.png
Completed Image ====> 6.

In [0]:
image_list = absolute_image_paths[0]['animal national park']

In [0]:
def preprocess_image(img):
  img = tf.image.decode_jpeg(img, channels=3)
  img = tf.image.resize(img, [224, 224])
  img /= 255.0 # normalize to [0,1]
  return img

In [0]:
def load_and_preprocess(path):
  img = tf.read_file(path)
  return preprocess_image(img)

In [10]:
# test on single image
load_and_preprocess(image_list[0])

<tf.Tensor 'truediv:0' shape=(224, 224, 3) dtype=float32>

In [11]:
# Packs the list of tensors in `values` into a tensor with rank one higher than
# each tensor in `values`, by packing them along the `axis` dimension.
# Given a list of length `N` of tensors of shape `(A, B, C)`;

stacked_input = tf.stack([load_and_preprocess(i) for i in image_list])
stacked_input

<tf.Tensor 'stack:0' shape=(20, 224, 224, 3) dtype=float32>

## Module Instantiation

In [12]:
# must set trainable=True to modify the weights
module = hub.Module("https://tfhub.dev/google/imagenet/resnet_v1_50/classification/1")
height, weight = hub.get_expected_image_size(module)
height, weight

(224, 224)

In [17]:
logits = module(stacked_input, signature='image_classification')
logits

<tf.Tensor 'module_apply_image_classification_3/resnet_v1_50/SpatialSqueeze:0' shape=(20, 1001) dtype=float32>

In [18]:
logits2 = module(dict(images=stacked_input))
logits2

<tf.Tensor 'module_apply_default/resnet_v1_50/SpatialSqueeze:0' shape=(20, 1001) dtype=float32>

In [0]:
softmax = tf.nn.softmax(logits)
top_predictions = tf.nn.top_k(softmax, k=3, name='top_predictions')

<tf.Tensor 'top_predictions:0' shape=(20, 3) dtype=float32>

In [20]:
help(tf.nn.top_k)

Help on function top_k in module tensorflow.python.ops.nn_ops:

top_k(input, k=1, sorted=True, name=None)
    Finds values and indices of the `k` largest entries for the last dimension.
    
    If the input is a vector (rank=1), finds the `k` largest entries in the vector
    and outputs their values and indices as vectors.  Thus `values[j]` is the
    `j`-th largest entry in `input`, and its index is `indices[j]`.
    
    For matrices (resp. higher rank input), computes the top `k` entries in each
    row (resp. vector along the last dimension).  Thus,
    
        values.shape = indices.shape = input.shape[:-1] + [k]
    
    If two elements are equal, the lower-index element appears first.
    
    Args:
      input: 1-D or higher `Tensor` with last dimension at least `k`.
      k: 0-D `int32` `Tensor`.  Number of top elements to look for along the last
        dimension (along each row for matrices).
      sorted: If true the resulting `k` elements will be sorted by the values in

In [27]:
decode_predictions(logits, top=3)

ValueError: ignored

In [26]:
help(decode_predictions())

TypeError: ignored

In [0]:
# Will's solution
model = ResNet50(weights='imagenet')

i = 0
for x in images:
    i = i+1
    x = np.expand_dims(x, axis=0)
    features = model.predict(x)
    results = decode_predictions(features, top=3)[0]
    for result in results:
        print("Image " + str(i))
        print(result[1:])

In [0]:
# why won't you work!?
results = decode_predictions(logits, top=3)
print(results)

ValueError: ignored

## Expected Inputs

In [0]:
print(module.get_input_info_dict())

{'images': <hub.ParsedTensorInfo shape=(?, 224, 224, 3) dtype=float32 is_sparse=False>}


In [0]:
print(module.get_input_info_dict(signature='image_feature_vector'))

{'images': <hub.ParsedTensorInfo shape=(?, 224, 224, 3) dtype=float32 is_sparse=False>}


In [0]:
# shape will be (?, 1001) for default signature
print(module.get_output_info_dict(signature='image_classification'))

{'resnet_v1_50/block2/unit_3/bottleneck_v1': <hub.ParsedTensorInfo shape=(?, 28, 28, 512) dtype=float32 is_sparse=False>, 'default': <hub.ParsedTensorInfo shape=(?, 1001) dtype=float32 is_sparse=False>, 'resnet_v1_50/block4/unit_2/bottleneck_v1': <hub.ParsedTensorInfo shape=(?, 7, 7, 2048) dtype=float32 is_sparse=False>, 'resnet_v1_50/conv1': <hub.ParsedTensorInfo shape=(?, 112, 112, 64) dtype=float32 is_sparse=False>, 'resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut': <hub.ParsedTensorInfo shape=(?, 14, 14, 1024) dtype=float32 is_sparse=False>, 'resnet_v1_50/predictions': <hub.ParsedTensorInfo shape=(?, 1001) dtype=float32 is_sparse=False>, 'resnet_v1_50/block1/unit_3/bottleneck_v1': <hub.ParsedTensorInfo shape=(?, 28, 28, 256) dtype=float32 is_sparse=False>, 'resnet_v1_50/block3/unit_2/bottleneck_v1': <hub.ParsedTensorInfo shape=(?, 14, 14, 1024) dtype=float32 is_sparse=False>, 'resnet_v1_50/block3/unit_4/bottleneck_v1/conv1': <hub.ParsedTensorInfo shape=(?, 14, 14, 256) dtype=floa

## Collecting required layers of module

In [0]:
images = tf.placeholder(tf.float32, (None, 224, 224, 3))

In [0]:
help(images)

In [0]:
logits1 = module(dict(images=stacked_input))
print(logits1)
# Tensor("module_apply_default_44/resnet_v1_50/SpatialSqueeze:0", shape=(?, 1001), dtype=float32)

module_features = module(dict(images=stacked_input), signature='image_classification', as_dict=True)
# stores all layers in key-value pairs

logits2 = module_features['resnet_v1_50/logits']
print(logits2)
# Tensor("module_apply_image_classification_2/resnet_v1_50/logits/BiasAdd:0", shape=(?, 1, 1, 1001), dtype=float32)

global_pool = module_features['resnet_v1_50/global_pool']
print(global_pool)
# Tensor("module_apply_image_classification_3/resnet_v1_50/pool5:0", shape=(?, 1, 1, 2048), dtype=float32) 

Tensor("module_1_apply_default_1/resnet_v1_50/SpatialSqueeze:0", shape=(21, 1001), dtype=float32)
Tensor("module_1_apply_image_classification/resnet_v1_50/logits/BiasAdd:0", shape=(21, 1, 1, 1001), dtype=float32)
Tensor("module_1_apply_image_classification/resnet_v1_50/pool5:0", shape=(21, 1, 1, 2048), dtype=float32)


## Initialising TF Hub Operations

In [0]:
with tf.Session as ses:
  ses.run([tf.tables_initializer()])

AttributeError: ignored

Report both the most likely estimated class for any image, and also investigate (a) images where the classifier isn't that certain (the best estimate is low), and (b) images where the classifier fails.

Answer (in writing in the notebook) the following - "What sorts of images do CNN classifiers do well with? What sorts do they not do so well? And what are your hypotheses for why?"

In [0]:
### YOUR CODE HERE

# Resources and Stretch Goals

Stretch goals
- Enhance your code to use classes/functions and accept terms to search and classes to look for in recognizing the downloaded images (e.g. download images of parties, recognize all that contain balloons)
- Check out [other available pretrained networks](https://tfhub.dev), try some and compare
- Image recognition/classification is somewhat solved, but *relationships* between entities and describing an image is not - check out some of the extended resources (e.g. [Visual Genome](https://visualgenome.org/)) on the topic
- Transfer learning - using images you source yourself, [retrain a classifier](https://www.tensorflow.org/hub/tutorials/image_retraining) with a new category
- (Not CNN related) Use [piexif](https://pypi.org/project/piexif/) to check out the metadata of images passed in to your system - see if they're from a national park! (Note - many images lack GPS metadata, so this won't work in most cases, but still cool)

Resources
- [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) - influential paper (introduced ResNet)
- [YOLO: Real-Time Object Detection](https://pjreddie.com/darknet/yolo/) - an influential convolution based object detection system, focused on inference speed (for applications to e.g. self driving vehicles)
- [R-CNN, Fast R-CNN, Faster R-CNN, YOLO](https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e) - comparison of object detection systems
- [Common Objects in Context](http://cocodataset.org/) - a large-scale object detection, segmentation, and captioning dataset
- [Visual Genome](https://visualgenome.org/) - a dataset, a knowledge base, an ongoing effort to connect structured image concepts to language