<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Major Neural Network Architectures Challenge
## *Data Science Unit 4 Sprint 3 Challenge*

In this sprint challenge, you'll explore some of the cutting edge of Data Science. This week we studied several famous neural network architectures: 
recurrent neural networks (RNNs), long short-term memory (LSTMs), convolutional neural networks (CNNs), and Autoencoders. In this sprint challenge, you will revisit these models. Remember, we are testing your knowledge of these architectures not your ability to fit a model with high accuracy. 

__*Caution:*__  these approaches can be pretty heavy computationally. All problems were designed so that you should be able to achieve results within at most 5-10 minutes of runtime locally, on AWS SageMaker, on Colab or on a comparable environment. If something is running longer, double check your approach!

## Challenge Objectives
*You should be able to:*
* <a href="#p1">Part 1</a>: Train a LSTM classification model
* <a href="#p2">Part 2</a>: Utilize a pre-trained CNN for object detection
* <a href="#p3">Part 3</a>: Describe a use case for an autoencoder
* <a href="#p4">Part 4</a>: Describe yourself as a Data Science and elucidate your vision of AI

<a id="p1"></a>
## Part 1 - LSTMSs

Use a LSTM to fit a multi-class classification model on Reuters news articles to distinguish topics of articles. The data is already encoded properly for use in a LSTM model. 

Your Tasks: 
- Use Keras to fit a predictive model, classifying news articles into topics. 
- Report your overall score and accuracy

For reference, the [Keras IMDB sentiment classification example](https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py) will be useful, as well as the LSTM code we used in class.

__*Note:*__  Focus on getting a running model, not on maxing accuracy with extreme data size or epoch numbers. Only revisit and push accuracy if you get everything else done!

In [18]:
import numpy as np
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, Dropout, LSTM, Activation

In [2]:
from tensorflow.keras.datasets import reuters
import pandas as pd
(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words=None,
                                                         skip_top=0,
                                                         maxlen=None,
                                                         test_split=0.2,
                                                         seed=723812,
                                                         start_char=1,
                                                         oov_char=2,
                                                         index_from=3)

In [3]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

((8982,), (8982,), (2246,), (2246,))

In [4]:
np.unique(y_train)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45])

In [5]:
data = np.concatenate((X_train, X_test), axis=0)
targets = np.concatenate((y_train, y_test), axis=0)

In [6]:
print("Categories:", np.unique(targets))
print("Number of unique words:", len(np.unique(np.hstack(data))))

length = [len(i) for i in data]
print("Average Article length:", np.mean(length))
print("Standard Deviation:", round(np.std(length)))


Categories: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45]
Number of unique words: 30980
Average Article length: 145.96419665122906
Standard Deviation: 146.0


In [7]:
# Demo of encoding

word_index = reuters.get_word_index(path="reuters_word_index.json")

print(f"Iran is encoded as {word_index['iran']} in the data")
print(f"London is encoded as {word_index['london']} in the data")
print("Words are encoded as numbers in our dataset.")

Iran is encoded as 779 in the data
London is encoded as 544 in the data
Words are encoded as numbers in our dataset.


In [8]:
word_index

{'mdbl': 10996,
 'fawc': 16260,
 'degussa': 12089,
 'woods': 8803,
 'hanging': 13796,
 'localized': 20672,
 'sation': 20673,
 'chanthaburi': 20675,
 'refunding': 10997,
 'hermann': 8804,
 'passsengers': 20676,
 'stipulate': 20677,
 'heublein': 8352,
 'screaming': 20713,
 'tcby': 16261,
 'four': 185,
 'grains': 1642,
 'broiler': 20680,
 'wooden': 12090,
 'wednesday': 1220,
 'highveld': 13797,
 'duffour': 7593,
 '0053': 20681,
 'elections': 3914,
 '270': 2563,
 '271': 3551,
 '272': 5113,
 '273': 3552,
 '274': 3400,
 'rudman': 7975,
 '276': 3401,
 '277': 3478,
 '278': 3632,
 '279': 4309,
 'dormancy': 9381,
 'errors': 7247,
 'deferred': 3086,
 'sptnd': 20683,
 'cooking': 8805,
 'stratabit': 20684,
 'designing': 16262,
 'metalurgicos': 20685,
 'databank': 13798,
 '300er': 20686,
 'shocks': 20687,
 'nawg': 7972,
 'tnta': 20688,
 'perforations': 20689,
 'affiliates': 2891,
 '27p': 20690,
 'ching': 16263,
 'china': 595,
 'wagyu': 16264,
 'affiliated': 3189,
 'chino': 16265,
 'chinh': 16266,
 '

In [9]:
reverse_index = dict([(value, key) for (key, value) in word_index.items()]) 

In [10]:
reverse_index

{10996: 'mdbl',
 16260: 'fawc',
 12089: 'degussa',
 8803: 'woods',
 13796: 'hanging',
 20672: 'localized',
 20673: 'sation',
 20675: 'chanthaburi',
 10997: 'refunding',
 8804: 'hermann',
 20676: 'passsengers',
 20677: 'stipulate',
 8352: 'heublein',
 20713: 'screaming',
 16261: 'tcby',
 185: 'four',
 1642: 'grains',
 20680: 'broiler',
 12090: 'wooden',
 1220: 'wednesday',
 13797: 'highveld',
 7593: 'duffour',
 20681: '0053',
 3914: 'elections',
 2563: '270',
 3551: '271',
 5113: '272',
 3552: '273',
 3400: '274',
 7975: 'rudman',
 3401: '276',
 3478: '277',
 3632: '278',
 4309: '279',
 9381: 'dormancy',
 7247: 'errors',
 3086: 'deferred',
 20683: 'sptnd',
 8805: 'cooking',
 20684: 'stratabit',
 16262: 'designing',
 20685: 'metalurgicos',
 13798: 'databank',
 20686: '300er',
 20687: 'shocks',
 7972: 'nawg',
 20688: 'tnta',
 20689: 'perforations',
 2891: 'affiliates',
 20690: '27p',
 16263: 'ching',
 595: 'china',
 16264: 'wagyu',
 3189: 'affiliated',
 16265: 'chino',
 16266: 'chinh',
 2

In [11]:
def vectorize(sequences, dimension = 31000):
  results = np.zeros((len(sequences), dimension))
  for i, sequence in enumerate(sequences):
    results[i, sequence] = 1
  return results
  
data = vectorize(data)
targets = np.array(targets).astype("float32")

In [14]:
# Do not change this line. You need the +1 for some reason. 
max_features = len(word_index.values()) + 1

model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss="binary_crossentropy",
              optimizer="adam",
              metrics=["accuracy"])
print("Train...")
model.fit(data, targets, batch_size=32, epochs=3,
          validation_split = 0.20)


Train...
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7f8ff3c60780>

Overall Score is about 4% and Acuracy is about 5%

In [None]:
# for x in X_train:
#   x=np.asarray(x).astype(np.float32)
# y_train=np.asarray(y_train).astype(np.float32)
# for x in X_test:
#   x=np.asarray(x).astype(np.float32)
# y_test=np.asarray(y_test).astype(np.float32)

In [None]:
# for each in y_train:
#   each= float(each) 
# for each in y_test:
#   each= float(each) 

In [15]:
type(y_test)

numpy.ndarray

In [16]:
y_test

array([18,  3,  3, ...,  3,  4,  3])

In [17]:
lstm_history = model.fit(X_train, y_train,
                        batch_size=32, 
                        epochs=5, 
                        validation_data=(X_test,y_test))

ValueError: ignored

The reference link is broken, the stack overflow and github solutions to this issue do not work, and the claim that "The data is already encoded properly for use in a LSTM model." is seemingly false. I have spent more than half of the allotted time on this error and even after converting all values to floats to no avail, the documentation for tensorflow.keras.datasets.reuters claims y_train and y_test should be 0,1 values. They are not.

## Sequence Data Question
#### *Describe the `pad_sequences` method used on the training dataset. What does it do? Why do you need it?*
pad_sequences is used to ensure all sequence data is the same length. This is accomplished by adding 0's to the beginning of a list of numbers.


## RNNs versus LSTMs
#### *What are the primary motivations behind using Long-ShortTerm Memory Cell unit over traditional Recurrent Neural Networks?*

Traditional RNN's have issues with vanishing or exploding gradients. LSTM networks solve this issue by retaining the information from previous nodes while stepping through the network

## RNN / LSTM Use Cases
#### *Name and Describe 3 Use Cases of LSTMs or RNNs and why they are suited to that use case*

LSTM RNN's are best used in sequence data such as text prediction, finacial market information and time series data such as weather prediction. They are best suited for these type of data because what has happened previously, and especially recently in the sequence can have a stong influence on what may happen next. 

<a id="p2"></a>
## Part 2- CNNs

### Find the Frog

Time to play "find the frog!" Use Keras and [ResNet50v2](https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet_v2) (pre-trained) to detect which of the images with the `frog_images` subdirectory has a frog in it. Note: You will need to upload the images to Colab. 

<img align="left" src="https://d3i6fh83elv35t.cloudfront.net/newshour/app/uploads/2017/03/GettyImages-654745934-1024x687.jpg" width=400>

The skimage function below will help you read in all the frog images into memory at once. You should use the preprocessing functions that come with ResnetV2, and you should also resize the images using scikit-image.

Your goal is to validly run ResNet50v2 on the input images - don't worry about tuning or improving the model. Print out the predictions in any way you see fit. 

*Hint* - ResNet 50v2 doesn't just return "frog". The three labels it has for frogs are: `bullfrog, tree frog, tailed frog`

*Stretch goals:* 
- Check for other things such as fish.
- Print out the image with its predicted label
- Wrap everything nicely in well documented fucntions

In [2]:
from tensorflow.keras.applications.resnet_v2 import ResNet50V2, decode_predictions, preprocess_input


In [3]:
from skimage.io import imread_collection

images = imread_collection('./frog_images/*.jpg')

In [4]:
print(type(images))
print(type(images[0]), end="\n\n")

<class 'skimage.io.collection.ImageCollection'>
<class 'numpy.ndarray'>



In [50]:
len(images),images[7].shape[0]

(15, 2642)

In [26]:
images.files[0]

'./frog_images/cristiane-teston-bcnfJvEYm1Y-unsplash.jpg'

In [46]:
import tensorflow as tf
def check(img, shape):
  model = tf.keras.applications.ResNet50V2(
      include_top=False, weights='imagenet', input_tensor=None,
      input_shape=shape, pooling=None, classes=1000,
      classifier_activation='softmax'
      )
  model.compile(loss="binary_crossentropy",
              optimizer="adam",
              metrics=["accuracy"])
  return model.evaluate(img)

In [None]:
model.summary()

In [53]:
check(images[0], (len(images),images[0].shape[0],images[0].shape[1], images[0].shape[2]))

ValueError: ignored

Must be a Tuple of 3 integers... okay :

In [54]:
check(images[0], images[0].shape)



ValueError: ignored

*Throws hands up*

In [34]:
for i in range(len(images)):
  print(get_results(images[i]))



ValueError: ignored

In [55]:
for i in range(len(images)):
  print(get_results(images.files[i]))

TypeError: ignored

In [48]:
pro_img = []
for i in range(len(images)):
  pro_img.append(preprocess_input(images[i]))

In [49]:
pro_img[0].shape

(2137, 1710, 3)

In [25]:
pro_img.files

AttributeError: ignored

In [24]:
results = []
for i in range(len(pro_img)):
  print(model.predict(pro_img[i]))



ValueError: ignored

In [60]:
images[0].reshape(15,224,224,3)

ValueError: ignored

I give. I cannot get this to run on a single image. I will try to reshape the data to the standard, but so far that's the only hope I have.


<a id="p3"></a>
## Part 3 - Autoencoders

Describe a use case for an autoencoder given that an autoencoder tries to predict its own input. 

__*Your Answer:*__ Autoencoders can be used for denoising and image or dimentionality reduction this could be useful if trying to compress images for storage or transfer and decoding later.


<a id="p4"></a>
## Part 4 - More...

Answer the following questions, with a target audience of a fellow Data Scientist:

- What do you consider your strongest area, as a Data Scientist?

Explanation in layman's terms. I came into the field with only a couple of years of programming experience and a couple of decades of time away. I have learned to communicate well with many walks of life and can make data science somewhat relateable to people who have little to no understanding of the technical side.

- What area of Data Science would you most like to learn more about, and why?

Machine learning and AI. I believe we are headed into a world where AI is becoming more and more integral to everyday life and AGI may actually emerge. It will become more and more important for people with diverse backgrounds to be involved in the field as these changes begin to impact how our society functions.


- Where do you think Data Science will be in 5 years?

It's quite impossible to say for sure. I believe automation in driving, flying, railways, and even space flight will become more commonplace, but also that most standard new appliances will also become 'smart' giving us more time to allocate our time to things that we find interesting. I also believe complex machine learning models will help us to understand and correct the consumption of resources in order to truly help make Earth's future better for the coming generations.


- What are the threats posed by AI to our society?

AI, while exciting and impressive, can also serve to disconect people from the ways we are used to communicatiing and interacting. I don't neccesarily see that as a negative, I am sure people were weirded out when the radio and television came around also. I see all movement as progress. There really is only a danger if people stop payiong attention to the capabilities of AI, and I don't sdee that happening. In fact, more people are interested in advancements in technology now than ever. One threat I could envision is the separation from nature and majesty of non-human creations.


- How do you think we can counteract those threats? 

I think it should remain just as important to preserve the natural world as it is to advance the man-made world. More recreational and arts related activities in schools could be a good start.



- Do you think achieving General Artifical Intelligence is ever possible?

Personally, no. I think to decode the entirety of what we consider an inteligent mind is still so far out of reach that it is far more likely that humans would find a reason to self destruct first. I hope that isn't the case, but as impressed as I am with AI, I believe AGI is anotherr whole complex set of problems to figure out. We don't even know what happens to our own consciousness when we sleep. To attempt to replicate human or any other consiousness into a thinking, learning, expanding form of artificial consciousness, seems inplausible even with the major rate of advancement and the exponetial growth rate of change and miniturization... I am going to have stick around until 2045 to see if Kurzweil was right though, but then I'm off to Mars.

A few sentences per answer is fine - only elaborate if time allows.

## Congratulations! 

Thank you for your hard work, and congratulations! You've learned a lot, and you should proudly call yourself a Data Scientist.


In [None]:
from IPython.display import HTML

HTML("""<iframe src="https://giphy.com/embed/26xivLqkv86uJzqWk" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/mumm-champagne-saber-26xivLqkv86uJzqWk">via GIPHY</a></p>""")