<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Major Neural Network Architectures Challenge
## *Data Science Unit 4 Sprint 3 Challenge*

In this sprint challenge, you'll explore some of the cutting edge of Data Science. This week we studied several famous neural network architectures: 
recurrent neural networks (RNNs), long short-term memory (LSTMs), convolutional neural networks (CNNs), and Generative Adverserial Networks (GANs). In this sprint challenge, you will revisit these models. Remember, we are testing your knowledge of these architectures not your ability to fit a model with high accuracy. 

__*Caution:*__  these approaches can be pretty heavy computationally. All problems were designed so that you should be able to achieve results within at most 5-10 minutes of runtime on Colab or a comparable environment. If something is running longer, doublecheck your approach!

## Challenge Objectives
*You should be able to:*
* <a href="#p1">Part 1</a>: Train a RNN classification model
* <a href="#p2">Part 2</a>: Utilize a pre-trained CNN for objective detection
* <a href="#p3">Part 3</a>: Describe the difference between a discriminator and generator in a GAN
* <a href="#p4">Part 4</a>: Describe yourself as a Data Science and elucidate your vision of AI

<a id="p1"></a>
## Part 1 - RNNs

Use an RNN to fit a multi-class classification model on reuters news articles to distinguish topics of articles. The data is already encoded properly for use in an RNN model. 

Your Tasks: 
- Use Keras to fit a predictive model, classifying news articles into topics. 
- Report your overall score and accuracy

For reference, the [Keras IMDB sentiment classification example](https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py) will be useful, as well the RNN code we used in class.

__*Note:*__  Focus on getting a running model, not on maxing accuracy with extreme data size or epoch numbers. Only revisit and push accuracy if you get everything else done!

In [45]:
from tensorflow.keras.datasets import reuters

(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None,
                                                         skip_top=0,
                                                         maxlen=None,
                                                         test_split=0.2,
                                                         seed=723812,
                                                         start_char=1,
                                                         oov_char=2,
                                                         index_from=3)

In [46]:
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((8982,), (2246,), (8982,), (2246,))

In [47]:
#since multiclass we need to know how many outpude nodes we'd have, 45 different topics

import pandas as pd 
y_explore = pd.DataFrame(y_train)
y_explore.describe()

Unnamed: 0,0
count,8982.0
mean,8.911712
std,9.169627
min,0.0
25%,3.0
50%,4.0
75%,15.0
max,45.0


In [48]:
# Demo of encoding

word_index = reuters.get_word_index(path="reuters_word_index.json")

print(f"Iran is encoded as {word_index['iran']} in the data")
print(f"London is encoded as {word_index['london']} in the data")
print("Words are encoded as numbers in our dataset.")

Iran is encoded as 779 in the data
London is encoded as 544 in the data
Words are encoded as numbers in our dataset.


In [49]:
from __future__ import print_function

from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding
from tensorflow.keras.layers import LSTM
from tensorflow.keras.datasets import imdb

max_features = 50000
# cut text after this number of words
maxlen = 10
batch_size = 10

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 80))
model.add(LSTM(46, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(46, activation='softmax'))

# try using different optimizers and different optimizer configs
model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=1,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Pad sequences (samples x time)
x_train shape: (8982, 10)
x_test shape: (2246, 10)
Build model...
Train...
Train on 8982 samples, validate on 2246 samples
Test score: 1.9143075510952372
Test accuracy: 0.48174533


Conclusion - RNN runs, and gives pretty decent improvement over a naive model. To *really* improve the model, more playing with parameters would help. Also - RNN may well not be the best approach here, but it is at least a valid one.

<a id="p2"></a>
## Part 2- CNNs

### Find the Frog

Time to play "find the frog!" Use Keras and ResNet50 (pre-trained) to detect which of the following images contain frogs:

<img align="left" src="https://d3i6fh83elv35t.cloudfront.net/newshour/app/uploads/2017/03/GettyImages-654745934-1024x687.jpg" width=400>


In [14]:
!pip install google_images_download

Collecting google_images_download
  Downloading https://files.pythonhosted.org/packages/18/ed/0319d30c48f3653802da8e6dcfefcea6370157d10d566ef6807cceb5ec4d/google_images_download-2.8.0.tar.gz
Collecting selenium (from google_images_download)
[?25l  Downloading https://files.pythonhosted.org/packages/80/d6/4294f0b4bce4de0abf13e17190289f9d0613b0a44e5dd6a7f5ca98459853/selenium-3.141.0-py2.py3-none-any.whl (904kB)
[K    100% |████████████████████████████████| 911kB 1.9MB/s ta 0:00:01
Building wheels for collected packages: google-images-download
  Building wheel for google-images-download (setup.py) ... [?25ldone
[?25h  Stored in directory: /Users/TomasFox/Library/Caches/pip/wheels/1f/28/ad/f56e7061e1d2a9a1affe2f9c649c2570cb9198dd24ede0bbab
Successfully built google-images-download
Installing collected packages: selenium, google-images-download
Successfully installed google-images-download-2.8.0 selenium-3.141.0


In [31]:
from google_images_download import google_images_download

response = google_images_download.googleimagesdownload()
arguments = {"keywords": "animal pond", "limit": 15, "print_urls": True}
absolute_image_paths = response.download(arguments)


Item no.: 1 --> Item name = animal pond
Evaluating...
Starting Download...
Image URL: https://www.enchantedlearning.com/pgifs/Pondanimals.GIF
Completed Image ====> 1.Pondanimals.GIF
Image URL: https://i.ytimg.com/vi/NCbu0TND9vE/hqdefault.jpg
Completed Image ====> 2.hqdefault.jpg
Image URL: https://get.pxhere.com/photo/water-animal-pond-wildlife-mammal-fish-eat-fauna-whiskers-vertebrate-otter-mink-marmot-sea-otter-mustelidae-1383482.jpg
Completed Image ====> 3.water-animal-pond-wildlife-mammal-fish-eat-fauna-whiskers-vertebrate-otter-mink-marmot-sea-otter-mustelidae-1383482.jpg
Image URL: https://pklifescience.com/staticfiles/articles/images/PKLS4116_inline.png
Completed Image ====> 4.PKLS4116_inline.png
Image URL: https://i.pinimg.com/originals/12/ae/e2/12aee2aa186a7b69a66563f138bba822.jpg
Completed Image ====> 5.12aee2aa186a7b69a66563f138bba822.jpg
Image URL: https://cdn.pixabay.com/photo/2018/04/11/23/05/frog-3312038__340.jpg
Completed Image ====> 6.frog-3312038__340.jpg
Image URL: 

At time of writing at least a few do, but since the Internet changes - it is possible your 5 won't. You can easily verify yourself, and (once you have working code) increase the number of images you pull to be more sure of getting a frog. Your goal is to validly run ResNet50 on the input images - don't worry about tuning or improving the model.

*Hint* - ResNet 50 doesn't just return "frog". The three labels it has for frogs are: `bullfrog, tree frog, tailed frog`

*Stretch goal* - also check for fish.

In [32]:
import numpy as np

from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions

def process_img_path(img_path):
  return image.load_img(img_path, target_size=(224, 224))

def class_image(img):
  x = image.img_to_array(img)
  x = np.expand_dims(x, axis=0)
  x = preprocess_input(x)
  model = ResNet50(weights='imagenet')
  features = model.predict(x)
  results = decode_predictions(features, top=3)[0]
  print(results)
  for entry in results:
    if entry[1] == 'frog':
      return entry[2]
  return 0.0

In [33]:
absolute_image_paths

({'animal pond': ['/Users/TomasFox/Downloads/downloads/animal pond/1.Pondanimals.GIF',
   '/Users/TomasFox/Downloads/downloads/animal pond/2.hqdefault.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/3.water-animal-pond-wildlife-mammal-fish-eat-fauna-whiskers-vertebrate-otter-mink-marmot-sea-otter-mustelidae-1383482.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/4.PKLS4116_inline.png',
   '/Users/TomasFox/Downloads/downloads/animal pond/5.12aee2aa186a7b69a66563f138bba822.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/6.frog-3312038__340.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/7.birds-in-a-pond-5986310798966784.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/8.alligator-animal-on-pond.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/9.frog-2243543_960_720.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/10.PKLS4116.png',
   '/Users/TomasFox/Downloads/downloads/animal pond/11.goose-2650209_960_720.jpg',
   '/Use

In [34]:
animal_list = ['/Users/TomasFox/Downloads/downloads/animal pond/1.Pondanimals.GIF',
   '/Users/TomasFox/Downloads/downloads/animal pond/2.hqdefault.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/3.water-animal-pond-wildlife-mammal-fish-eat-fauna-whiskers-vertebrate-otter-mink-marmot-sea-otter-mustelidae-1383482.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/4.PKLS4116_inline.png',
   '/Users/TomasFox/Downloads/downloads/animal pond/5.12aee2aa186a7b69a66563f138bba822.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/6.frog-3312038__340.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/7.birds-in-a-pond-5986310798966784.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/8.alligator-animal-on-pond.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/9.frog-2243543_960_720.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/10.PKLS4116.png',
   '/Users/TomasFox/Downloads/downloads/animal pond/11.goose-2650209_960_720.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/12.screen480x480.jpeg',
   '/Users/TomasFox/Downloads/downloads/animal pond/13.87827228_XS.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/14.416e0eff5efce95e87fae13b90d0b37a.jpg',
   '/Users/TomasFox/Downloads/downloads/animal pond/15.1801wildpond001b.jpg']

for i in animal_list:
    print(class_image(process_img_path(i)))
    

[('n03598930', 'jigsaw_puzzle', 0.86803204), ('n06359193', 'web_site', 0.06409986), ('n02834397', 'bib', 0.021264242)]
0.0
[('n01443537', 'goldfish', 0.8495882), ('n01631663', 'eft', 0.06760146), ('n02536864', 'coho', 0.03516317)]
0.0
[('n02442845', 'mink', 0.3097655), ('n02363005', 'beaver', 0.23398967), ('n02361337', 'marmot', 0.2079685)]
0.0
[('n04243546', 'slot', 0.8712437), ('n04476259', 'tray', 0.049936477), ('n03908618', 'pencil_box', 0.02307264)]
0.0
[('n02116738', 'African_hunting_dog', 0.59568644), ('n02117135', 'hyena', 0.1690415), ('n02105162', 'malinois', 0.090247944)]
0.0
[('n01737021', 'water_snake', 0.30730626), ('n01641577', 'bullfrog', 0.26061302), ('n04275548', 'spider_web', 0.1134344)]
0.0
[('n02009912', 'American_egret', 0.78224105), ('n02012849', 'crane', 0.14339268), ('n02009229', 'little_blue_heron', 0.021143341)]
0.0
[('n01698640', 'American_alligator', 0.96394145), ('n01697457', 'African_crocodile', 0.026759788), ('n01737021', 'water_snake', 0.0059646494)]
0.0

<a id="p3"></a>
## Part 3 - Generative Adverserial Networks (GANS)

Describe the difference between a discriminator and generator in a GAN in your own words.

__*Your Answer:*__ 

In a generative adverserial network, as the word 'adverserial' implies, both the discriminator and the generator algorithms are optimizing opposing loss functions.

The discriminator and the generator work within a double feedback loop model wherein the generator creates 'noise'; in other words, produces fake data, and the discriminator probabilistically decides, after receiving both 'real' output labels (the ground truth) and fake output labels from the generator, which outputs are true and which are fake. 

To summarize, The discriminator is in feedback loop with the ground truth and the generator's noise, as well as with the generator. Therefore, the discriminator 'learns' to better discriminate against fake and real data as the generative algorithm becomes 'smarter' as it receives more accurate feedback from the discriminator's probabiltistic classification outputs.

<a id="p4"></a>
## Part 4 - More...

Answer the following questions, with a target audience of a fellow Data Scientist:

- What do you consider your strongest area, as a Data Scientist?
- What area of Data Science would you most like to learn more about, and why?
- Where do you think Data Science will be in 5 years?
- What are the treats posed by AI to our society?
- How do you think we can counteract those threats? 
- Do you think achieving General Artifical Intelligence is ever possible?

A few sentences per answer is fine - only elaborate if time allows.

__*Your Answer:*__ 

1.I consider my strongest area as a data scientist to be applied statistics.

2.I would like to learn more about the math fundamentals underlying ML models. Also, learning more about the 'data retrieval' process: how best to find data and leverage our resources to filter through bad data. 

3.I think the data science industry will continue to grow in the next 5 years, especially with the rise of non-tradional education such as coding bootcamps, which are helping reduce the entry barrier into this multi-disciplinary industry. Moreover, I believe the industry as a whole will become more segmented, as the more 'vanilla' DS becomes more widely available and the more complex (neural nets) DS areas become even more esoteric in their applications and thus even more elusive to DS practioners. 

4.One threat AI poses stems from the gap that exists between AI advances and the publics' lack of awareness of the current pace of growth in the industry. It is impossible to democratically decide what measures should be allowed for the healthy growth of AI if the majority of the world is oblvious to its current state. 

5.I believe AI homogenous ethical standards should be put in place, in order to foster cooperation between nations.

6.I don't know if AGI is possible; neverthless, if we use historical science fiction novel predictions as a benchmark into what could be humanly achievable, GAI could win. Just as no one could have predicted the major shift from agrarian society to industrialism, led by the industrial revolution in the late 18th century, or the internet boom in the 2000s, it is equally difficult to pinpoint how fast AI will advance in the long-term. 

## Congratulations! 

Thank you for your hard work, and congratulations! You've learned a lot, and you should proudly call yourself a Data Scientist.


In [36]:
# from IPython.display import HTML

# HTML("""<iframe src="https://giphy.com/embed/26xivLqkv86uJzqWk" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/mumm-champagne-saber-26xivLqkv86uJzqWk">via GIPHY</a></p>""")