<a href="https://colab.research.google.com/github/LilySu/DS-Unit-4-Sprint-3-Deep-Learning/blob/master/LS_DS_Unit_4_Sprint_Challenge_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Major Neural Network Architectures Challenge
## *Data Science Unit 4 Sprint 3 Challenge*

In this sprint challenge, you'll explore some of the cutting edge of Data Science. This week we studied several famous neural network architectures: 
recurrent neural networks (RNNs), long short-term memory (LSTMs), convolutional neural networks (CNNs), and Generative Adverserial Networks (GANs). In this sprint challenge, you will revisit these models. Remember, we are testing your knowledge of these architectures not your ability to fit a model with high accuracy. 

__*Caution:*__  these approaches can be pretty heavy computationally. All problems were designed so that you should be able to achieve results within at most 5-10 minutes of runtime on Colab or a comparable environment. If something is running longer, doublecheck your approach!

## Challenge Objectives
*You should be able to:*
* <a href="#p1">Part 1</a>: Train a RNN classification model
* <a href="#p2">Part 2</a>: Utilize a pre-trained CNN for objective detection
* <a href="#p3">Part 3</a>: Describe the difference between a discriminator and generator in a GAN
* <a href="#p4">Part 4</a>: Describe yourself as a Data Science and elucidate your vision of AI

<a id="p1"></a>
## Part 1 - RNNs

Use an RNN to fit a multi-class classification model on reuters news articles to distinguish topics of articles. The data is already encoded properly for use in an RNN model. 

Your Tasks: 
- Use Keras to fit a predictive model, classifying news articles into topics. 
- Report your overall score and accuracy

For reference, the [Keras IMDB sentiment classification example](https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py) will be useful, as well the RNN code we used in class.

__*Note:*__  Focus on getting a running model, not on maxing accuracy with extreme data size or epoch numbers. Only revisit and push accuracy if you get everything else done!

In [0]:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import scipy.ndimage as nd
import tensorflow as tf
import tensorflow_hub as hub
import imageio
from google_images_download import google_images_download
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
from PIL import Image, ImageOps
from scipy.spatial import cKDTree
from skimage.feature import plot_matches
from skimage.filters import gaussian
from skimage.measure import ransac
from skimage.transform import AffineTransform
from skimage import color, io
from skimage.exposure import rescale_intensity

In [0]:
import numpy as np
np_load_old = np.load
np.load = lambda *a, **k: np_load_old(*a, allow_pickle=True, **k) 
from tensorflow.keras.datasets import reuters

(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None,
                                                         skip_top=0,
                                                         maxlen=None,
                                                         test_split=0.2,
                                                         seed=723812,
                                                         start_char=1,
                                                         oov_char=2,
                                                         index_from=3)

In [2]:

y_train

array([19, 41, 16, ..., 19,  3, 11])

In [3]:
# Demo of encoding

word_index = reuters.get_word_index(path="reuters_word_index.json")

print(f"Iran is encoded as {word_index['iran']} in the data")
print(f"London is encoded as {word_index['london']} in the data")
print("Words are encoded as numbers in our dataset.")

Iran is encoded as 779 in the data
London is encoded as 544 in the data
Words are encoded as numbers in our dataset.


In [5]:
from __future__ import print_function
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM

max_features = 20000
maxlen = 80
batch_size = 32

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)#Each review is shortened to 80-character chunks
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128))#take our inputs and extract a fixed-length vector of 128
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))#128 input nodes, dense output for classification
model.add(Dense(1, activation='sigmoid'))#one dense for classification

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
              optimizer='adam',#optimizer
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=2,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)


Using TensorFlow backend.
W0823 15:22:11.604213 139726867335040 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0823 15:22:11.640569 139726867335040 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0823 15:22:11.646985 139726867335040 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.



Pad sequences (samples x time)
x_train shape: (8982, 80)
x_test shape: (2246, 80)
Build model...


W0823 15:22:11.788582 139726867335040 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

W0823 15:22:11.800472 139726867335040 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W0823 15:22:12.096877 139726867335040 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0823 15:22:12.116796 139726867335040 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tenso

Train...
Train on 8982 samples, validate on 2246 samples
Epoch 1/2
Epoch 2/2
Test score: -124.4584993139931
Test accuracy: 0.0396260017809439


<a id="p2"></a>
## Part 2- CNNs

### Find the Frog

Time to play "find the frog!" Use Keras and ResNet50 (pre-trained) to detect which of the following images contain frogs:

<img align="left" src="https://d3i6fh83elv35t.cloudfront.net/newshour/app/uploads/2017/03/GettyImages-654745934-1024x687.jpg" width=400>


In [6]:
!pip install google_images_download



In [18]:
from google_images_download import google_images_download

response = google_images_download.googleimagesdownload()
arguments = {"keywords": "animal pond", "limit": 20, "print_urls": True}
absolute_image_paths = response.download(arguments)


Item no.: 1 --> Item name = animal pond
Evaluating...
Starting Download...
Image URL: https://www.enchantedlearning.com/pgifs/Pondanimals.GIF
Completed Image ====> 1.Pondanimals.GIF
Image URL: https://i.ytimg.com/vi/NCbu0TND9vE/hqdefault.jpg
Completed Image ====> 2.hqdefault.jpg
Image URL: https://pklifescience.com/staticfiles/articles/images/PKLS4116_inline.png
Completed Image ====> 3.PKLS4116_inline.png
Image URL: https://pklifescience.com/staticfiles/articles/images/PKLS4116.png
Completed Image ====> 4.PKLS4116.png
Image URL: https://get.pxhere.com/photo/water-animal-pond-wildlife-mammal-fish-eat-fauna-whiskers-vertebrate-otter-mink-marmot-sea-otter-mustelidae-1383482.jpg
Completed Image ====> 5.water-animal-pond-wildlife-mammal-fish-eat-fauna-whiskers-vertebrate-otter-mink-marmot-sea-otter-mustelidae-1383482.jpg
Image URL: https://cdn.pixabay.com/photo/2017/04/19/20/37/frog-2243543_960_720.jpg
Completed Image ====> 6.frog-2243543_960_720.jpg
Image URL: https://i.pinimg.com/origina

At time of writing at least a few do, but since the Internet changes - it is possible your 5 won't. You can easily verify yourself, and (once you have working code) increase the number of images you pull to be more sure of getting a frog. Your goal is to validly run ResNet50 on the input images - don't worry about tuning or improving the model.

*Hint* - ResNet 50 doesn't just return "frog". The three labels it has for frogs are: `bullfrog, tree frog, tailed frog`

*Stretch goal* - also check for fish.

In [19]:
image_list = absolute_image_paths[0]['animal pond']
image_list

['/content/downloads/animal pond/1.Pondanimals.GIF',
 '/content/downloads/animal pond/2.hqdefault.jpg',
 '/content/downloads/animal pond/3.PKLS4116_inline.png',
 '/content/downloads/animal pond/4.PKLS4116.png',
 '/content/downloads/animal pond/5.water-animal-pond-wildlife-mammal-fish-eat-fauna-whiskers-vertebrate-otter-mink-marmot-sea-otter-mustelidae-1383482.jpg',
 '/content/downloads/animal pond/6.frog-2243543_960_720.jpg',
 '/content/downloads/animal pond/7.12aee2aa186a7b69a66563f138bba822.jpg',
 '/content/downloads/animal pond/8.Alligator_animal_on_pond.jpg',
 '/content/downloads/animal pond/9.frog-3312038__340.jpg',
 '/content/downloads/animal pond/10.Gold-fish.jpg',
 '/content/downloads/animal pond/11.birds-in-a-pond-5986310798966784.jpg',
 '/content/downloads/animal pond/12.goose-2650209_960_720.jpg',
 '/content/downloads/animal pond/13.06af3a_f89e7596d5254e6e8896f054e8c4ea7b~mv2_d_1650_1275_s_2.jpg',
 '/content/downloads/animal pond/14.87827228_XS.jpg',
 '/content/downloads/ani

In [0]:
def resize_image(filename, new_width=256, new_height=256):#the delf model was trained on images 256x256 the input of our model also should be 256x256
  pil_image = Image.open(filename)
  pil_image = ImageOps.fit(pil_image, (new_width, new_height), Image.ANTIALIAS)
  pil_image_rgb = pil_image.convert('RGB')
  pil_image_rgb.save(filename, format='JPEG', quality=90)

for i in image_list:
     resize_image(i)

In [0]:
import numpy as np

from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions#take probabilities

def process_img_path(img_path):
  return image.load_img(img_path, target_size=(224, 224))

def img_contains_frog(img):
  x = image.img_to_array(img)#takes in the image
  x = np.expand_dims(x, axis=0)
  x = preprocess_input(x)
  model = ResNet50(weights='imagenet')
  features = model.predict(x)#predict what's in the model
  results = decode_predictions(features, top=3)[0]#get the top 3 probabilities and also label associated
  print(results)
  for entry in results:
    if entry[1] == 'banana':#if the first entry returns banana, do nothing, if not, return second entry
      return entry[2]#outputs a list of probabilities
  return 0.0

In [22]:
result = img_contains_frog(process_img_path(image_list[0]))

[('n06359193', 'web_site', 0.89605397), ('n04404412', 'television', 0.07584102), ('n03598930', 'jigsaw_puzzle', 0.009310877)]


In [24]:
labels = []
for i in image_list:
    results = img_contains_frog(process_img_path(i))
    labels.append(results)

[('n06359193', 'web_site', 0.89605397), ('n04404412', 'television', 0.07584102), ('n03598930', 'jigsaw_puzzle', 0.009310877)]
[('n01443537', 'goldfish', 0.55898356), ('n02536864', 'coho', 0.32859007), ('n01630670', 'common_newt', 0.02376141)]
[('n04243546', 'slot', 0.8985829), ('n04476259', 'tray', 0.025335675), ('n03908618', 'pencil_box', 0.019952878)]
[('n03485794', 'handkerchief', 0.71689796), ('n04209239', 'shower_curtain', 0.11719423), ('n02834397', 'bib', 0.023235068)]
[('n02444819', 'otter', 0.90565825), ('n02441942', 'weasel', 0.05342591), ('n02442845', 'mink', 0.034301735)]
[('n01641577', 'bullfrog', 0.99305624), ('n01644900', 'tailed_frog', 0.006250196), ('n01644373', 'tree_frog', 0.00034542466)]
[('n02116738', 'African_hunting_dog', 0.8971374), ('n02117135', 'hyena', 0.034927733), ('n02105162', 'malinois', 0.026862286)]
[('n01698640', 'American_alligator', 0.8215397), ('n01737021', 'water_snake', 0.057155196), ('n01689811', 'alligator_lizard', 0.048066698)]
[('n01641577', 'b

In [0]:
labelmatrix = [[('n06359193', 'web_site', 0.89605397), ('n04404412', 'television', 0.07584102), ('n03598930', 'jigsaw_puzzle', 0.009310877)],
[('n01443537', 'goldfish', 0.55898356), ('n02536864', 'coho', 0.32859007), ('n01630670', 'common_newt', 0.02376141)],
[('n04243546', 'slot', 0.8985829), ('n04476259', 'tray', 0.025335675), ('n03908618', 'pencil_box', 0.019952878)],
[('n03485794', 'handkerchief', 0.71689796), ('n04209239', 'shower_curtain', 0.11719423), ('n02834397', 'bib', 0.023235068)],
[('n02444819', 'otter', 0.90565825), ('n02441942', 'weasel', 0.05342591), ('n02442845', 'mink', 0.034301735)],
[('n01641577', 'bullfrog', 0.99305624), ('n01644900', 'tailed_frog', 0.006250196), ('n01644373', 'tree_frog', 0.00034542466)],
[('n02116738', 'African_hunting_dog', 0.8971374), ('n02117135', 'hyena', 0.034927733), ('n02105162', 'malinois', 0.026862286)],
[('n01698640', 'American_alligator', 0.8215397), ('n01737021', 'water_snake', 0.057155196), ('n01689811', 'alligator_lizard', 0.048066698)],
[('n01641577', 'bullfrog', 0.97481173), ('n01737021', 'water_snake', 0.011244673), ('n01644900', 'tailed_frog', 0.010225667)],
[('n01443537', 'goldfish', 0.99945635), ('n02536864', 'coho', 0.0004327336), ('n01985128', 'crayfish', 2.9877303e-05)],
[('n02009912', 'American_egret', 0.7405644), ('n02012849', 'crane', 0.15032986), ('n02009229', 'little_blue_heron', 0.026411064)],
[('n01860187', 'black_swan', 0.7273625), ('n01855672', 'goose', 0.12300291), ('n02457408', 'three-toed_sloth', 0.06162966)],
[('n06359193', 'web_site', 0.9964599), ('n03291819', 'envelope', 0.0012711672), ('n06596364', 'comic_book', 0.00019414631)],
[('n01443537', 'goldfish', 0.9999902), ('n02536864', 'coho', 3.0037447e-06), ('n09256479', 'coral_reef', 1.7858109e-06)],
[('n03291819', 'envelope', 0.1650426), ('n04476259', 'tray', 0.10622762), ('n03876231', 'paintbrush', 0.083313994)],
[('n04476259', 'tray', 0.41268253), ('n03938244', 'pillow', 0.08933346), ('n02909870', 'bucket', 0.05083196)],
[('n01847000', 'drake', 0.999673), ('n01855032', 'red-breasted_merganser', 0.0002866547), ('n02018207', 'American_coot', 1.4786872e-05)],
[('n06359193', 'web_site', 0.36451733), ('n01667778', 'terrapin', 0.27395737), ('n04243546', 'slot', 0.12207158)],
[('n02363005', 'beaver', 0.5943077), ('n02361337', 'marmot', 0.24198961), ('n02444819', 'otter', 0.15658511)],
[('n01440764', 'tench', 0.73850137), ('n02514041', 'barracouta', 0.056688532), ('n02641379', 'gar', 0.056272756)]]

In [26]:
import pandas as pd
df = pd.DataFrame(labelmatrix)
df

Unnamed: 0,0,1,2
0,"(n06359193, web_site, 0.89605397)","(n04404412, television, 0.07584102)","(n03598930, jigsaw_puzzle, 0.009310877)"
1,"(n01443537, goldfish, 0.55898356)","(n02536864, coho, 0.32859007)","(n01630670, common_newt, 0.02376141)"
2,"(n04243546, slot, 0.8985829)","(n04476259, tray, 0.025335675)","(n03908618, pencil_box, 0.019952878)"
3,"(n03485794, handkerchief, 0.71689796)","(n04209239, shower_curtain, 0.11719423)","(n02834397, bib, 0.023235068)"
4,"(n02444819, otter, 0.90565825)","(n02441942, weasel, 0.05342591)","(n02442845, mink, 0.034301735)"
5,"(n01641577, bullfrog, 0.99305624)","(n01644900, tailed_frog, 0.006250196)","(n01644373, tree_frog, 0.00034542466)"
6,"(n02116738, African_hunting_dog, 0.8971374)","(n02117135, hyena, 0.034927733)","(n02105162, malinois, 0.026862286)"
7,"(n01698640, American_alligator, 0.8215397)","(n01737021, water_snake, 0.057155196)","(n01689811, alligator_lizard, 0.048066698)"
8,"(n01641577, bullfrog, 0.97481173)","(n01737021, water_snake, 0.011244673)","(n01644900, tailed_frog, 0.010225667)"
9,"(n01443537, goldfish, 0.99945635)","(n02536864, coho, 0.0004327336)","(n01985128, crayfish, 2.9877303e-05)"


In [48]:
import re

original = '(n06359193, web_site, 0.89605397)'	
bad_characters = ['(', '1','2','3','4','5','6','7','8','9','0',')','.',',',' ']


def replace(original):
  newlabel = ''
  for i in original:
    if i not in bad_characters:
      newlabel += i
  newlabel = re.sub('n', '', newlabel, count=1)
  return newlabel
    
replace(original)

'web_site'

In [0]:
# bad_characters = ['(', '1','2','3','4','5','6','7','8','9','0',')','.',',',' ']


# def replace(original):
#   newlabel = ''
#   for i in original:
#     if i not in bad_characters:
#       newlabel += i
#   newlabel = re.sub('n', '', newlabel, count=1)
#   return newlabel

# df['Primary_Label'] = df.apply(replace)
# df

In [64]:

[x[1] for x in df[0].values]

['web_site',
 'goldfish',
 'slot',
 'handkerchief',
 'otter',
 'bullfrog',
 'African_hunting_dog',
 'American_alligator',
 'bullfrog',
 'goldfish',
 'American_egret',
 'black_swan',
 'web_site',
 'goldfish',
 'envelope',
 'tray',
 'drake',
 'web_site',
 'beaver',
 'tench']

In [59]:
# x = df[0].values
# np.array2string(x, separator=',')

"[('n06359193', 'web_site', 0.89605397),\n ('n01443537', 'goldfish', 0.55898356),('n04243546', 'slot', 0.8985829),\n ('n03485794', 'handkerchief', 0.71689796),\n ('n02444819', 'otter', 0.90565825),('n01641577', 'bullfrog', 0.99305624),\n ('n02116738', 'African_hunting_dog', 0.8971374),\n ('n01698640', 'American_alligator', 0.8215397),\n ('n01641577', 'bullfrog', 0.97481173),\n ('n01443537', 'goldfish', 0.99945635),\n ('n02009912', 'American_egret', 0.7405644),\n ('n01860187', 'black_swan', 0.7273625),\n ('n06359193', 'web_site', 0.9964599),('n01443537', 'goldfish', 0.9999902),\n ('n03291819', 'envelope', 0.1650426),('n04476259', 'tray', 0.41268253),\n ('n01847000', 'drake', 0.999673),('n06359193', 'web_site', 0.36451733),\n ('n02363005', 'beaver', 0.5943077),('n01440764', 'tench', 0.73850137)]"

<a id="p3"></a>
## Part 3 - Autoencoders

Describe a use case for an autoencoder given that an autoencoder tries to predict its own input. 

Autoencoders are great for content-based recommendations because when a topic is inputed, the autoencoder can learn non-linear, high-dimensional relationships. 

<a id="p4"></a>
## Part 4 - More...

Answer the following questions, with a target audience of a fellow Data Scientist:

- What do you consider your strongest area, as a Data Scientist?
  Creative problem solving especially as it comes to feature engineering
  and business strategy due to my creative and entrepreneurial background.
- What area of Data Science would you most like to learn more about, and why?
  I would immediately be learning about more word frequency in relation to ratings for business via webscraping review websites because there are direct and immediate demands in this field and projects focuses for small business is short term.
- Where do you think Data Science will be in 5 years?
In 5 years, there will be more sophisticated libraries in 3D computer generated imagery use cases for video conversion into augmented reality experiences. There may be use of AI for harmful political agendas and technological warfar in general. It could be that a recession will exasterbate increases in the United States in job changes.
- What are the threats posed by AI to our society?
The threats posed by AI in society are the proliferation of fake news, of addition, of misinformation in all senses of human knowledge and fraud
- How do you think we can counteract those threats? 
There may be more subject matter experts in the field of technological regulation to counter threats. The U.S. may take a more socialist turn in providing assistance to those experiencing job loss. 
- Do you think achieving General Artifical Intelligence is ever possible?
Yes I do believe achieving General Artificial Intelligence is possible especially when there are fields studying how the human brain can be interfaced with computational systems.

A few sentences per answer is fine - only elaborate if time allows.

## Congratulations! 

Thank you for your hard work, and congratulations! You've learned a lot, and you should proudly call yourself a Data Scientist.


In [0]:
from IPython.display import HTML

HTML("""<iframe src="https://giphy.com/embed/26xivLqkv86uJzqWk" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/mumm-champagne-saber-26xivLqkv86uJzqWk">via GIPHY</a></p>""")