# Deep learning using neural networks.

#### author: bhavesh patel
#### we will use images and its related tags to predict what the image is about.
#### we will use neural network for deep learning.  Deep features is very interesting and see how it improves prediction.


In [1]:
import graphlab

In [2]:
# Limit number of worker processes. This preserves system memory, which prevents hosted notebooks from crashing.
graphlab.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 4)

This non-commercial license of GraphLab Create for academic use is assigned to bhaveshhk8@gmail.com and will expire on October 17, 2017.


[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1481406886.log


In [5]:
# now let's load the images from CIFAR-10 dataset, but its reduced to four categories: cat, bird, automobile, dog.
# it is already split into training dataset and test dataset.

image_train = graphlab.SFrame('image_train_data/')
image_test = graphlab.SFrame('image_test_data/')

In [6]:
# set output local to here.
graphlab.canvas.set_target('ipynb')

In [8]:
# let's view the data.
image_train.show()

In [9]:
# before we build model to predict, let's see what first three images are.
image_test[0:3]['image'].show()

In [10]:
# its cat, car, cat. noted.
# to confirm, here are the labels.
image_test[0:3]['label']

dtype: str
Rows: 3
['cat', 'automobile', 'cat']

In [12]:
# now let's build ML model to train classifier.

image_classifier_model = graphlab.logistic_classifier.create(image_train, target='label',
                                                            features=['image_array'])

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



In [13]:
# now use the model to predict.

image_classifier_model.predict(image_test[0:3])

dtype: str
Rows: 3
['bird', 'cat', 'bird']

In [14]:
# well, that's horrible accuracy.  :(
# let's evluate to find out.

image_classifier_model.evaluate(image_test)

{'accuracy': 0.48075, 'auc': 0.7235272916666664, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 16
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |     bird     |       dog       |  198  |
 |     dog      |       cat       |  239  |
 |     bird     |    automobile   |  112  |
 |  automobile  |    automobile   |  607  |
 |     cat      |       dog       |  303  |
 |     dog      |       dog       |  431  |
 |     dog      |    automobile   |   88  |
 |     bird     |       bird      |  529  |
 |  automobile  |       bird      |  118  |
 |     bird     |       cat       |  161  |
 +--------------+-----------------+-------+
 [16 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'f1_score': 0.4807160516374978, 'log_loss': 1.2065411828057908, 'precision': 0

In [15]:
# ok so only 48% accuracy!  Not good.

In [16]:
# now let's use deep features.  Borrow it!
# first load the model.
deep_learning_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')


In [17]:
# now let's extract the features for our data based on this model.
image_train['bp_deep_features'] = deep_learning_model.extract_features(image_train)

In [18]:
image_train.show()

In [20]:
# It took long time to process the new model.  But finally it did!
# Let's use this deep featuers, which are borrowed from other model.

deep_feature_model=graphlab.logistic_classifier.create(image_train,
                                                      features=['bp_deep_features'],
                                                      target='label')

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



In [21]:
deep_feature_model.predict(image_test[0:3])

dtype: str
Rows: 3
['cat', 'cat', 'cat']

In [22]:
# ok good improvement.  Let's compare all three.
# real values are: cat, car, cat
# our training data predicted it as: bird, cat, bird
# our deep feature model predicted, cat, cat, cat -> not bad, but I was hoping better!


In [24]:
# now let's find accuracy of this model.

deep_feature_model.evaluate(image_test)

{'accuracy': 0.25, 'auc': 0.5, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |     dog      |       cat       |  1000 |
 |     bird     |       cat       |  1000 |
 |  automobile  |       cat       |  1000 |
 |     cat      |       cat       |  1000 |
 +--------------+-----------------+-------+
 [4 rows x 3 columns], 'f1_score': 0.1, 'log_loss': 1.466788500595805, 'precision': 0.25, 'recall': 0.25, 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 	class	int
 
 Rows: 400004
 
 Data:
 +-----------+-----+-----+------+------+-------+
 | threshold | fpr | tpr |  p   |  n   | class |
 +-----------+-----+-----+------+------+-------+
 |    0.0    | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   1e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   2e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   

In [25]:
# only 25%.  That's not good.  That's because my computer didn't continue to iterate.

In [None]:
# let's use the deep features, which were part of the model and see what we get.

In [31]:
deep_feature_precalculated_model = graphlab.logistic_classifier.create(image_train,
                                                             features =['deep_features'],
                                                             target='label')

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.



In [32]:
deep_feature_precalculated_model.predict(image_test[0:3])

dtype: str
Rows: 3
['cat', 'automobile', 'cat']

In [None]:
# wow-> finally ML got it.  

In [33]:
# let's see accuracy of this model.

In [35]:
deep_feature_precalculated_model.evaluate(image_test)

{'accuracy': 0.784, 'auc': 0.9384483749999979, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 16
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |  automobile  |       cat       |   14  |
 |     bird     |       dog       |   58  |
 |     cat      |       bird      |   69  |
 |  automobile  |       dog       |   7   |
 |     cat      |    automobile   |   33  |
 |     dog      |       bird      |   44  |
 |     bird     |       cat       |  130  |
 |     dog      |    automobile   |   20  |
 |     dog      |       dog       |  716  |
 |     cat      |       dog       |  222  |
 +--------------+-----------------+-------+
 [16 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'f1_score': 0.7841899124404468, 'log_loss': 0.6112328671477912, 'precision': 0.7

In [36]:
# wow 78% accuracy.  That's very good.  

In [None]:
# To summarize:

#Model 1:  Using limited set of data without using deep featuers from other model.  Accuracy: 48%
#Model 2:  My model with deep features, but not going through all iteration.  Accuracy: 25% -> feeling bad.
#Model 3:  Pre calculated deep featuer model.  Accuracy: 78% -> aka we need to have bigger computer for more iteration.