<a href="https://colab.research.google.com/github/KAmbuske02/Pi-in-the-Sky/blob/main/Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Machine Learning for Remote Reconnaisance Project

In this research project, I combine the latest deeplearning models for computer vision together with a camera-augmented Raspberry Pi, to enable a modified RC plane to identify people, objects, and targets from the sky.

The fundamental research problem addressed in this study is: investigating the tradeoff between a computer vision model's target identification accuracy and its ability to run on hardware with substantial power and weight limitations. I investigate whether the latest pretrained deeplearning models from GoogleAI were appropriate for target identication.


There are many deep learning models available for general object identification. However, as we ultimately want this model to run on airborne computer hardware, it is crucial that the algorithm run without the enormous computational requirements typical of many models.

I ultimately found that Google's [*MobileNets*](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html) family of computer vision models for tensorflow offered a reasonable balace between algorithm performance and hardware requirements.  

More specifically I used Google's MobileNetV2 deep learning pretrained model. MobileNetV2 was designed to have a much smaller parameter count than typical computer vision models. A smaller model is more ideal to ultimately run with lower processing power, which will be deal for running on a UAV.





I used Google's tensorflow open source software library for deeplearning and applied it to the target identification problem.

In [None]:
# Here we install the tensorflow models into the colab environment  
!git clone https://github.com/tensorflow/models



Cloning into 'models'...
remote: Enumerating objects: 77698, done.[K
remote: Counting objects: 100% (77/77), done.[K
remote: Compressing objects: 100% (59/59), done.[K
remote: Total 77698 (delta 36), reused 37 (delta 18), pack-reused 77621[K
Receiving objects: 100% (77698/77698), 593.34 MiB | 40.33 MiB/s, done.
Resolving deltas: 100% (55195/55195), done.


There are many deep learning models available for general object identification. However, as we ultimately want this model to run on airborne computer hardware, it is crucial that the algorithm run without the enormous computational requirements typical of many models.

In this project I used Google's *MobileNets* family of computer vision models for tensorflow offered a reasonable balance between algorithm performance and hardware requirements.  

More specifically I used Google's MobileNetV2 deep learning pretrained model. MobileNetV2 has a much smaller parameter count. A smaller model is more ideal to ultimately run with lower processing power, which will be deal for running on a UAV.

In [None]:
# Here we download the mobilenet 2 model
from __future__ import print_function
from IPython import display 
base_name = checkpoint_name = 'mobilenet_v2_1.0_224' #@param
url = 'https://storage.googleapis.com/mobilenet_v2/checkpoints/' + checkpoint_name + '.tgz'
print('Downloading from ', url)
!wget {url}
print('Unpacking')
!tar -xvf {checkpoint_name}.tgz
checkpoint = checkpoint_name + '.ckpt'

display.clear_output()
print('Successfully downloaded checkpoint from ', url,
      '. It is available as', checkpoint)


Successfully downloaded checkpoint from  https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.0_224.tgz . It is available as mobilenet_v2_1.0_224.ckpt


In [None]:
!wget https://upload.wikimedia.org/wikipedia/commons/f/fe/Giant_Panda_in_Beijing_Zoo_1.JPG -O panda.jpg

--2022-09-30 20:52:03--  https://upload.wikimedia.org/wikipedia/commons/f/fe/Giant_Panda_in_Beijing_Zoo_1.JPG
Resolving upload.wikimedia.org (upload.wikimedia.org)... 208.80.154.240, 2620:0:861:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|208.80.154.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 116068 (113K) [image/jpeg]
Saving to: ‘panda.jpg’


2022-09-30 20:52:03 (23.2 MB/s) - ‘panda.jpg’ saved [116068/116068]



In [None]:
# Here we install tf_slim, a lightweight library for  
# evaluating models in TensorFlow
import sys
sys.path.append('/content/models/research/slim')
!pip install tf_slim

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# load the trained model into tensorflow
import tensorflow.compat.v1 as tf
import tf_slim as slim
from nets.mobilenet import mobilenet_v2

tf.compat.v1.disable_eager_execution()
tf.reset_default_graph()

# For simplicity we just decode jpeg inside tensorflow.
# But one can provide any input obviously.
file_input = tf.placeholder(tf.string, ())

image = tf.image.decode_jpeg(tf.read_file(file_input))

images = tf.expand_dims(image, 0)
images = tf.cast(images, tf.float32) / 128.  - 1
images.set_shape((None, None, None, 3))
images = tf.image.resize_images(images, (224, 224))

# Note: arg_scope is optional for inference.
with slim.arg_scope(mobilenet_v2.training_scope(is_training=False)):
  logits, endpoints = mobilenet_v2.mobilenet(images)
  
# Restore using exponential moving average since it produces (1.5-2%) higher 
# accuracy
ema = tf.train.ExponentialMovingAverage(0.999)
vars = ema.variables_to_restore()

saver = tf.train.Saver(vars)  

  outputs = layer.apply(inputs, training=is_training)


## Test Images
Here we demonstrate processing of images to generate text labels for the target candidates.

In [None]:
# The wget command downloads images from the internet.
!wget 'https://www.defensenews.com/resizer/U2oS79JL6Z7cX9bXU9zUMBI9bJk=/1024x0/filters:format(jpg):quality(70)/cloudfront-us-east-1.images.arcpublishing.com/archetype/OIFRAG2XKFDPPKI5KVUURA5474.jpg' -O sideTank.jpg
!wget https://thumbs.dreamstime.com/z/us-marines-marching-bucharest-romania-december-aerial-photo-united-states-soldiers-romania-s-national-day-military-63207644.jpg -O soldiers.jpg

--2022-10-03 17:11:42--  https://www.defensenews.com/resizer/U2oS79JL6Z7cX9bXU9zUMBI9bJk=/1024x0/filters:format(jpg):quality(70)/cloudfront-us-east-1.images.arcpublishing.com/archetype/OIFRAG2XKFDPPKI5KVUURA5474.jpg
Resolving www.defensenews.com (www.defensenews.com)... 23.215.176.48, 23.215.176.49, 2600:1409:5000::1723:6219, ...
Connecting to www.defensenews.com (www.defensenews.com)|23.215.176.48|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 77080 (75K) [image/jpeg]
Saving to: ‘sideTank.jpg’


2022-10-03 17:11:42 (8.84 MB/s) - ‘sideTank.jpg’ saved [77080/77080]

--2022-10-03 17:11:43--  https://thumbs.dreamstime.com/z/us-marines-marching-bucharest-romania-december-aerial-photo-united-states-soldiers-romania-s-national-day-military-63207644.jpg
Resolving thumbs.dreamstime.com (thumbs.dreamstime.com)... 192.229.163.122
Connecting to thumbs.dreamstime.com (thumbs.dreamstime.com)|192.229.163.122|:443... connected.
HTTP request sent, awaiting response... 200 OK

<img src='https://www.defensenews.com/resizer/U2oS79JL6Z7cX9bXU9zUMBI9bJk=/1024x0/filters:format(jpg):quality(70)/cloudfront-us-east-1.images.arcpublishing.com/archetype/OIFRAG2XKFDPPKI5KVUURA5474.jpg' width=300>

In [None]:
from IPython import display
import pylab
from datasets import imagenet
import PIL
#display.display(display.Image('sideTank.jpg'))

with tf.Session() as sess:
  saver.restore(sess,  checkpoint)
  x = endpoints['Predictions'].eval(feed_dict={file_input: 'sideTank.jpg'})
label_map = imagenet.create_readable_names_for_imagenet_labels()  
print("Top 1 prediction: ", x.argmax(),label_map[x.argmax()], x.max())

Top 1 prediction:  848 tank, army tank, armored combat vehicle, armoured combat vehicle 0.82173944


This demonstrates that the stripped down minimal mobelnetV2 model is still capable of identifying tanks.

In [None]:
# add image where it fails

<img src= https://thumbs.dreamstime.com/z/us-marines-marching-bucharest-romania-december-aerial-photo-united-states-soldiers-romania-s-national-day-military-63207644.jpg width=300>

In [None]:
from IPython import display
import pylab
from datasets import imagenet
import PIL
#display.display(display.Image('sideTank.jpg'))

with tf.Session() as sess:
  saver.restore(sess,  checkpoint)
  x = endpoints['Predictions'].eval(feed_dict={file_input: 'soldiers.jpg'})
label_map = imagenet.create_readable_names_for_imagenet_labels()  
print("Top 1 prediction: ", x.argmax(),label_map[x.argmax()], x.max())

Top 1 prediction:  647 maze, labyrinth 0.1894241


Here the limitations of the limited computer vision model become apparent as soldiers get classified as a "Maze" or "Labyrinth"!
