<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Major Neural Network Architectures Challenge
## *Data Science Unit 4 Sprint 3 Challenge*

In this sprint challenge, you'll explore some of the cutting edge of Data Science. This week we studied several famous neural network architectures: 
recurrent neural networks (RNNs), long short-term memory (LSTMs), convolutional neural networks (CNNs), and Generative Adverserial Networks (GANs). In this sprint challenge, you will revisit these models. Remember, we are testing your knowledge of these architectures not your ability to fit a model with high accuracy. 

__*Caution:*__  these approaches can be pretty heavy computationally. All problems were designed so that you should be able to achieve results within at most 5-10 minutes of runtime on Colab or a comparable environment. If something is running longer, doublecheck your approach!

## Challenge Objectives
*You should be able to:*
* <a href="#p1">Part 1</a>: Train a RNN classification model
* <a href="#p2">Part 2</a>: Utilize a pre-trained CNN for objective detection
* <a href="#p3">Part 3</a>: Describe the difference between a discriminator and generator in a GAN
* <a href="#p4">Part 4</a>: Describe yourself as a Data Science and elucidate your vision of AI

<a id="p1"></a>
## Part 1 - RNNs

Use an RNN to fit a multi-class classification model on reuters news articles to distinguish topics of articles. The data is already encoded properly for use in an RNN model. 

Your Tasks: 
- Use Keras to fit a predictive model, classifying news articles into topics. 
- Report your overall score and accuracy

For reference, the [Keras IMDB sentiment classification example](https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py) will be useful, as well the RNN code we used in class.

__*Note:*__  Focus on getting a running model, not on maxing accuracy with extreme data size or epoch numbers. Only revisit and push accuracy if you get everything else done!

In [None]:
!pip install numpy
import numpy as np

In [1]:
from tensorflow.keras.datasets import reuters

(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None,
                                                         skip_top=0,
                                                         maxlen=None,
                                                         test_split=0.2,
                                                         seed=723812,
                                                         start_char=1,
                                                         oov_char=2,
                                                         index_from=3)

In [2]:
# Demo of encoding

word_index = reuters.get_word_index(path="reuters_word_index.json")

print(f"Iran is encoded as {word_index['iran']} in the data")
print(f"London is encoded as {word_index['london']} in the data")
print("Words are encoded as numbers in our dataset.")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters_word_index.json
Iran is encoded as 779 in the data
London is encoded as 544 in the data
Words are encoded as numbers in our dataset.


In [40]:
len(y_train)

8982

In [42]:
import pandas as pd
df = pd.DataFrame({'y_train':y_train})
one_hot = pd.get_dummies(df['y_train'])
y_train_dummies = one_hot.as_matrix()

  after removing the cwd from sys.path.


In [43]:
df = pd.DataFrame({'y_test':y_test})
one_hot = pd.get_dummies(df['y_test'])
y_test_dummies = one_hot.as_matrix()

  This is separate from the ipykernel package so we can avoid doing imports until


In [46]:
y_train = y_train_dummies
y_test = y_test_dummies

In [None]:
from __future__ import print_function
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb


max_features = 200000
# cut texts after this number of words (among top max_features most common words)
maxlen = 80
batch_size = 32

print('Loading data...')
# (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 80))
model.add(LSTM(80, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(46, activation='softmax'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=1,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Conclusion - RNN runs, and gives pretty decent improvement over a naive model. To *really* improve the model, more playing with parameters would help. Also - RNN may well not be the best approach here, but it is at least a valid one.

<a id="p2"></a>
## Part 2- CNNs

### Find the Frog

Time to play "find the frog!" Use Keras and ResNet50 (pre-trained) to detect which of the following images contain frogs:

<img align="left" src="https://d3i6fh83elv35t.cloudfront.net/newshour/app/uploads/2017/03/GettyImages-654745934-1024x687.jpg" width=400>


In [None]:
!pip install google_images_download

In [60]:
from google_images_download import google_images_download

response = google_images_download.googleimagesdownload()
arguments = {"keywords": "animal pond", "limit":20, "print_urls": True}
absolute_image_paths = response.download(arguments)


Item no.: 1 --> Item name = animal pond
Evaluating...
Starting Download...
Image URL: https://www.enchantedlearning.com/pgifs/Pondanimals.GIF
Completed Image ====> 1.Pondanimals.GIF
Image URL: https://i.ytimg.com/vi/NCbu0TND9vE/hqdefault.jpg
Completed Image ====> 2.hqdefault.jpg
Image URL: https://pklifescience.com/staticfiles/articles/images/PKLS4116_inline.png
Completed Image ====> 3.PKLS4116_inline.png
Image URL: https://get.pxhere.com/photo/water-animal-pond-wildlife-mammal-fish-eat-fauna-whiskers-vertebrate-otter-mink-marmot-sea-otter-mustelidae-1383482.jpg
Completed Image ====> 4.water-animal-pond-wildlife-mammal-fish-eat-fauna-whiskers-vertebrate-otter-mink-marmot-sea-otter-mustelidae-1383482.jpg
Image URL: https://pklifescience.com/staticfiles/articles/images/PKLS4116.png
Completed Image ====> 5.PKLS4116.png
Image URL: https://i.pinimg.com/originals/57/5c/5b/575c5b5c441e27ff04eb50571ee30127.jpg
Completed Image ====> 6.575c5b5c441e27ff04eb50571ee30127.jpg
Image URL: https://pix

At time of writing at least a few do, but since the Internet changes - it is possible your 5 won't. You can easily verify yourself, and (once you have working code) increase the number of images you pull to be more sure of getting a frog. Your goal is to validly run ResNet50 on the input images - don't worry about tuning or improving the model.

*Hint* - ResNet 50 doesn't just return "frog". The three labels it has for frogs are: `bullfrog, tree frog, tailed frog`

*Stretch goal* - also check for fish.

In [61]:
absolute_image_paths

({'animal pond': ['C:\\Users\\Patrick\\Desktop\\Repos\\DS-Unit-4-Sprint-3-Deep-Learning\\downloads\\animal pond\\1.Pondanimals.GIF',
   'C:\\Users\\Patrick\\Desktop\\Repos\\DS-Unit-4-Sprint-3-Deep-Learning\\downloads\\animal pond\\2.hqdefault.jpg',
   'C:\\Users\\Patrick\\Desktop\\Repos\\DS-Unit-4-Sprint-3-Deep-Learning\\downloads\\animal pond\\3.PKLS4116_inline.png',
   'C:\\Users\\Patrick\\Desktop\\Repos\\DS-Unit-4-Sprint-3-Deep-Learning\\downloads\\animal pond\\4.water-animal-pond-wildlife-mammal-fish-eat-fauna-whiskers-vertebrate-otter-mink-marmot-sea-otter-mustelidae-1383482.jpg',
   'C:\\Users\\Patrick\\Desktop\\Repos\\DS-Unit-4-Sprint-3-Deep-Learning\\downloads\\animal pond\\5.PKLS4116.png',
   'C:\\Users\\Patrick\\Desktop\\Repos\\DS-Unit-4-Sprint-3-Deep-Learning\\downloads\\animal pond\\6.575c5b5c441e27ff04eb50571ee30127.jpg',
   'C:\\Users\\Patrick\\Desktop\\Repos\\DS-Unit-4-Sprint-3-Deep-Learning\\downloads\\animal pond\\7.alligator-animal-on-pond.jpg',
   'C:\\Users\\Patrick

In [62]:
images = absolute_image_paths[0].get('animal pond')

In [57]:
import numpy as np

In [55]:
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions

def process_img_path(img_path):
  return image.load_img(img_path, target_size=(224, 224))

def img_contains_frog(img):
  x = image.img_to_array(img)
  x = np.expand_dims(x, axis=0)
  x = preprocess_input(x)
  model = ResNet50(weights='imagenet')
  features = model.predict(x)
  results = decode_predictions(features, top=3)[0]
  print(results)
  for entry in results:
    if entry[1] == ('bullfrog', 'tree frog', 'tailed frog'):
      return entry[2]
  return 0.0

In [58]:
for i in range(len(images)):
    img_contains_frog(process_img_path(images[i]))

W0726 09:48:07.558556 12916 deprecation_wrapper.py:119] From C:\Users\Patrick\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

W0726 09:48:07.730624 12916 deprecation_wrapper.py:119] From C:\Users\Patrick\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:3976: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.



[('n03598930', 'jigsaw_puzzle', 0.8680317), ('n06359193', 'web_site', 0.06410015), ('n02834397', 'bib', 0.021264354)]
[('n01443537', 'goldfish', 0.84959215), ('n01631663', 'eft', 0.06760177), ('n02536864', 'coho', 0.03516347)]
[('n04243546', 'slot', 0.87124425), ('n04476259', 'tray', 0.049936365), ('n03908618', 'pencil_box', 0.023072524)]
[('n02442845', 'mink', 0.30976573), ('n02363005', 'beaver', 0.2339894), ('n02361337', 'marmot', 0.20796867)]
[('n03485794', 'handkerchief', 0.88227314), ('n02834397', 'bib', 0.022680819), ('n03291819', 'envelope', 0.020095201)]


<a id="p3"></a>
## Part 3 - Generative Adverserial Networks (GANS)

Describe the difference between a discriminator and generator in a GAN in your own words.

A discriminator is the checker, it's job is to not let anything that doesn't look like the target get through. The generator creates data and it wants to be able t fool the discriminator.
At the end of training the discriminator doesn't let subpar data through and the generator is making really good looking data to the discriminator.

<a id="p4"></a>
## Part 4 - More...

Answer the following questions, with a target audience of a fellow Data Scientist:

- What do you consider your strongest area, as a Data Scientist?
 - Feature engineering, and data analysis, I'd like to improve my NN knowledge for the future. 
 
 
- What area of Data Science would you most like to learn more about, and why?
 - NN, it is a complex topic and it's super interesting. NN's are very powerful and in the future our problems are only going to get more complicated.
   
   
- Where do you think Data Science will be in 5 years?
 - I'm not sure, I expect AI to get better and all the things that we don't exactly have solid rules for will be fleshed out. GPU's will get better so training will be faster, i'm looking forward to that.
 
 
- What are the treats posed by AI to our society?
 - AI has the potential to control what people see online, well it's already happening but it will be worse, and the biggest threat at the moment is that AI will take over jobs 
   and those people who's jobs have been automated are going to have to do something else. 
   
   
- How do you think we can counteract those threats? 
 - Though I'm not sure how to prevent AI from filtering what we see online, making programs for those people how will be out of jobs so that they can transition into a different 
   sector of work. The best thing would be free education and if needed providing neccesities. 
   
   
- Do you think achieving General Artifical Intelligence is ever possible?
 - Maybe but I think it's more likly that we will for a symbiotic relationship with AI and we all become cyborgs, at that point AGI won't be needed. Everyones processing power will increase
   which will be good as a whole for the world. We need to evolve with technology. 

A few sentences per answer is fine - only elaborate if time allows.

## Congratulations! 

Thank you for your hard work, and congratulations! You've learned a lot, and you should proudly call yourself a Data Scientist.


In [59]:
from IPython.display import HTML

HTML("""<iframe src="https://giphy.com/embed/26xivLqkv86uJzqWk" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/mumm-champagne-saber-26xivLqkv86uJzqWk">via GIPHY</a></p>""")