<a href="https://colab.research.google.com/github/TobyChen320/DS-Unit-4-Sprint-3-Deep-Learning/blob/main/sprint%20challenge/Toby's_LS_DS_Unit_4_Sprint_Challenge_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Major Neural Network Architectures Challenge
## *Data Science Unit 4 Sprint 3 Challenge*

In this sprint challenge, you'll explore some of the cutting edge of Data Science. This week we studied several famous neural network architectures: 
recurrent neural networks (RNNs), long short-term memory (LSTMs), convolutional neural networks (CNNs), and Autoencoders. In this sprint challenge, you will revisit these models. Remember, we are testing your knowledge of these architectures not your ability to fit a model with high accuracy. 

__*Caution:*__  these approaches can be pretty heavy computationally. All problems were designed so that you should be able to achieve results within at most 5-10 minutes of runtime locally, on AWS SageMaker, on Colab or on a comparable environment. If something is running longer, double check your approach!

## Challenge Objectives
*You should be able to:*
* <a href="#p1">Part 1</a>: Train a LSTM classification model
* <a href="#p2">Part 2</a>: Utilize a pre-trained CNN for object detection
* <a href="#p3">Part 3</a>: Describe a use case for an autoencoder
* <a href="#p4">Part 4</a>: Describe yourself as a Data Science and elucidate your vision of AI

<a id="p1"></a>
## Part 1 - LSTMSs

Use a LSTM to fit a multi-class classification model on Reuters news articles to distinguish topics of articles. The data is already encoded properly for use in a LSTM model. 

Your Tasks: 
- Use Keras to fit a predictive model, classifying news articles into topics. 
- Report your overall score and accuracy

For reference, the [Keras IMDB sentiment classification example](https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py) will be useful, as well as the LSTM code we used in class.

__*Note:*__  Focus on getting a running model, not on maxing accuracy with extreme data size or epoch numbers. Only revisit and push accuracy if you get everything else done!

In [1]:
from tensorflow.keras.datasets import reuters

(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words=None,
                                                         skip_top=0,
                                                         maxlen=None,
                                                         test_split=0.2,
                                                         seed=723812,
                                                         start_char=1,
                                                         oov_char=2,
                                                         index_from=3)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters.npz


In [2]:
# Demo of encoding

word_index = reuters.get_word_index(path="reuters_word_index.json")

print(f"Iran is encoded as {word_index['iran']} in the data")
print(f"London is encoded as {word_index['london']} in the data")
print("Words are encoded as numbers in our dataset.")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters_word_index.json
Iran is encoded as 779 in the data
London is encoded as 544 in the data
Words are encoded as numbers in our dataset.


In [3]:
from keras.preprocessing import sequence
# Do not change this line. You need the +1 for some reason. 
max_features = len(word_index.values()) + 1

# TODO - your code!
maxlen = 80
batch_size = 32
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('x_train shape:', X_train.shape)
print('x_test shape:', X_test.shape)
print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)

x_train shape: (8982, 80)
x_test shape: (2246, 80)
y_train shape: (8982,)
y_test shape: (2246,)


In [4]:
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM

model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128))
model.add(Dense(1, activation='softmax'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(X_train, y_train,
          batch_size=batch_size,
          epochs=5,
          validation_data=(X_test, y_test))

score, acc = model.evaluate(X_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test score: -119.0472640991211
Test accuracy: 0.03962600231170654


## Sequence Data Question
#### *Describe the `pad_sequences` method used on the training dataset. What does it do? Why do you need it?*

It 'pads' your sequences to make sure they are the same length as the longest one. It essentially makes sure your sequences all the same length.

## RNNs versus LSTMs
#### *What are the primary motivations behind using Long-ShortTerm Memory Cell unit over traditional Recurrent Neural Networks?*

It puts more weight on the newer data while not losing your old information.

## RNN / LSTM Use Cases
#### *Name and Describe 3 Use Cases of LSTMs or RNNs and why they are suited to that use case*

It works the best with text data, speech data, and classification prediction problems.

<a id="p2"></a>
## Part 2- CNNs

### Find the Frog

Time to play "find the frog!" Use Keras and [ResNet50v2](https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet_v2) (pre-trained) to detect which of the images with the `frog_images` subdirectory has a frog in it. Note: You will need to upload the images to Colab. 

<img align="left" src="https://d3i6fh83elv35t.cloudfront.net/newshour/app/uploads/2017/03/GettyImages-654745934-1024x687.jpg" width=400>

The skimage function below will help you read in all the frog images into memory at once. You should use the preprocessing functions that come with ResnetV2, and you should also resize the images using scikit-image.

In [16]:
# You are going to need to upload the file to Colab first; unless you decide to just mount it.
!unzip /content/frog_images.zip

Archive:  /content/frog_images.zip
  inflating: frog_images/cristiane-teston-bcnfJvEYm1Y-unsplash.jpg  
  inflating: frog_images/drew-brown-VBvoy5gofWg-unsplash.jpg  
  inflating: frog_images/ed-van-duijn-S1zA6AR50X8-unsplash.jpg  
  inflating: frog_images/elizabeth-explores-JZybccsrB-0-unsplash.jpg  
  inflating: frog_images/jacky-watt-92W5jPbOj48-unsplash.jpg  
  inflating: frog_images/jared-evans-VgRnolD7OIw-unsplash.jpg  
  inflating: frog_images/joel-henry-Rcvf6-n1gc8-unsplash.jpg  
  inflating: frog_images/marcus-neto-fH_DOdTt-pA-unsplash.jpg  
  inflating: frog_images/matthew-kosloski-sYkr-M78H6w-unsplash.jpg  
  inflating: frog_images/mche-lee-j-P8z4EOgyQ-unsplash.jpg  
  inflating: frog_images/priscilla-du-preez-oWJcgqjFb6I-unsplash.jpg  
  inflating: frog_images/saturday_sun-_q37Ca0Ll4o-unsplash.jpg  
  inflating: frog_images/serenity-mitchell-tUDSHkd6rYQ-unsplash.jpg  
  inflating: frog_images/yanna-zissiadou-SV-aMgliWNs-unsplash.jpg  
  inflating: frog_images/zdenek-machace

In [17]:
from skimage.io import imread_collection

images = imread_collection('./frog_images/*.jpg')

In [18]:
print(type(images))
print(type(images[0]), end="\n\n")

<class 'skimage.io.collection.ImageCollection'>
<class 'numpy.ndarray'>



Your goal is to validly run ResNet50v2 on the input images - don't worry about tuning or improving the model. Print out the predictions in any way you see fit. 

*Hint* - ResNet 50v2 doesn't just return "frog". The three labels it has for frogs are: `bullfrog, tree frog, tailed frog`

*Stretch goals:* 
- Check for other things such as fish.
- Print out the image with its predicted label
- Wrap everything nicely in well documented fucntions

In [None]:
# I know this is inefficient. I'm not sure how to correctly path the file to read all the jpgs at once. So I just ran it one at a time.

In [58]:
from tensorflow.keras.applications.resnet_v2 import ResNet50V2, decode_predictions, preprocess_input
from tensorflow.keras.preprocessing import image
import numpy as np
# TODO - your code!
def process_img_path(img_path):
  return image.load_img(img_path, target_size=(224, 224))

def img_contains_frog(img):
  x = image.img_to_array(img)
  x = np.expand_dims(x, axis=0)
  x = preprocess_input(x)
  model = ResNet50V2(weights='imagenet')
  features = model.predict(x)
  results = decode_predictions(features, top=1)[0]
  print(results)
  for entry in results:
    if entry[1] == 'frog':
      return entry[2]
  return 0.0

In [72]:
img_contains_frog(process_img_path('cristiane-teston-bcnfJvEYm1Y-unsplash.jpg'))

[('n03991062', 'pot', 0.43382066)]


0.0

In [73]:
img_contains_frog(process_img_path('drew-brown-VBvoy5gofWg-unsplash.jpg'))

[('n01641577', 'bullfrog', 0.9959907)]


0.0

In [74]:
img_contains_frog(process_img_path('ed-van-duijn-S1zA6AR50X8-unsplash.jpg'))

[('n02190166', 'fly', 0.9926561)]


0.0

In [75]:
img_contains_frog(process_img_path('elizabeth-explores-JZybccsrB-0-unsplash.jpg'))

[('n04258138', 'solar_dish', 0.26390132)]


0.0

In [78]:
img_contains_frog(process_img_path('jacky-watt-92W5jPbOj48-unsplash.jpg'))

[('n04476259', 'tray', 0.34925267)]


0.0

In [79]:
img_contains_frog(process_img_path('jared-evans-VgRnolD7OIw-unsplash.jpg'))

[('n01644900', 'tailed_frog', 0.6716922)]


0.0

In [80]:
img_contains_frog(process_img_path('joel-henry-Rcvf6-n1gc8-unsplash.jpg'))

[('n01644373', 'tree_frog', 0.95609444)]


0.0

In [81]:
img_contains_frog(process_img_path('marcus-neto-fH_DOdTt-pA-unsplash.jpg'))

[('n03991062', 'pot', 0.4787551)]


0.0

In [82]:
img_contains_frog(process_img_path('matthew-kosloski-sYkr-M78H6w-unsplash.jpg'))

[('n01641577', 'bullfrog', 0.72788036)]


0.0

In [83]:
img_contains_frog(process_img_path('mche-lee-j-P8z4EOgyQ-unsplash.jpg'))

[('n04033995', 'quilt', 0.27399585)]


0.0

In [84]:
img_contains_frog(process_img_path('priscilla-du-preez-oWJcgqjFb6I-unsplash.jpg'))

[('n12620546', 'hip', 0.38287023)]


0.0

In [85]:
img_contains_frog(process_img_path('saturday_sun-_q37Ca0Ll4o-unsplash.jpg'))

[('n01737021', 'water_snake', 0.30644095)]


0.0

In [86]:
img_contains_frog(process_img_path('serenity-mitchell-tUDSHkd6rYQ-unsplash.jpg'))

[('n11939491', 'daisy', 0.81456643)]


0.0

In [87]:
img_contains_frog(process_img_path('yanna-zissiadou-SV-aMgliWNs-unsplash.jpg'))

[('n01944390', 'snail', 0.8160971)]


0.0

In [88]:
img_contains_frog(process_img_path('zdenek-machacek-HYTwWSE5ztw-unsplash (1).jpg'))

[('n01644373', 'tree_frog', 0.9961675)]


0.0

<a id="p3"></a>
## Part 3 - Autoencoders

Describe a use case for an autoencoder given that an autoencoder tries to predict its own input. 

__*Your Answer:*__

It basically reduces dimensionality of your input through its output. It learns through its own encoding by trying to replicate the original input as closely as possible.


<a id="p4"></a>
## Part 4 - More...

Answer the following questions, with a target audience of a fellow Data Scientist:

- What do you consider your strongest area, as a Data Scientist?
I would say my strongest area would be debugging.
- What area of Data Science would you most like to learn more about, and why?
I would like to learn more about machine learning. I am very interested in optimzing and creating the best possible model that can be used to predict anything I program it to.
- Where do you think Data Science will be in 5 years?
I will probably be working in a tech company somewhere.
- What are the threats posed by AI to our society?
The possibility of abuse using AIs are very real. Using AIs makes it a lot easier for people to control others.
- How do you think we can counteract those threats?
I would assume we would create policies in regards to AI usage.
- Do you think achieving General Artifical Intelligence is ever possible?
Honeslty; yes I do think its possible. I do think that it will be way down the line before that happens though.
A few sentences per answer is fine - only elaborate if time allows.

## Congratulations! 

Thank you for your hard work, and congratulations! You've learned a lot, and you should proudly call yourself a Data Scientist.


In [89]:
from IPython.display import HTML

HTML("""<iframe src="https://giphy.com/embed/26xivLqkv86uJzqWk" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/mumm-champagne-saber-26xivLqkv86uJzqWk">via GIPHY</a></p>""")