# Tensorflow with GPU

This notebook provides an introduction to computing on a [GPU](https://cloud.google.com/gpu) in Colab. In this notebook you will connect to a GPU, and then run some basic TensorFlow operations on both the CPU and a GPU, observing the speedup provided by using the GPU.


## Enabling and testing the GPU

First, you'll need to enable GPUs for the notebook:

- Navigate to Edit→Notebook Settings
- select GPU from the Hardware Accelerator drop-down

Next, we'll confirm that we can connect to the GPU with tensorflow:

In [1]:
%tensorflow_version 2.x
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [2]:
import pandas as pd

data = pd.read_csv('abcnews-date-text.csv', error_bad_lines=False)
data_text = data[['headline_text']]
data_text['index'] = data_text.index
documents = data_text

In [3]:
print(len(documents))
print(documents[:5])

404720
                                       headline_text  index
0  aba decides against community broadcasting lic...      0
1     act fire witnesses must be aware of defamation      1
2     a g calls for infrastructure protection summit      2
3           air nz staff in aust strike for pay rise      3
4      air nz strike to affect australian travellers      4


In [25]:
import gensim
from gensim.utils import simple_preprocess
from gensim.parsing.preprocessing import STOPWORDS
from nltk.stem import WordNetLemmatizer, PorterStemmer
from nltk.stem.porter import *
import numpy as np
np.random.seed(2018)
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [26]:
stemmer = PorterStemmer()

def lemmatize_stemming(text):
    x = WordNetLemmatizer().lemmatize(text, pos='v')
    return stemmer.stem(x)
def preprocess(text):
    result = []
    for token in gensim.utils.simple_preprocess(text):
        if token not in gensim.parsing.preprocessing.STOPWORDS and len(token) > 3:
            result.append(lemmatize_stemming(token))
    return result

In [27]:
doc_sample = documents[documents['index'] == 4310].values[0][0]
print('original document: ')
words = []
for word in doc_sample.split(' '):
    words.append(word)
print(words)
print('\n\n tokenized and lemmatized document: ')
print(preprocess(doc_sample))

original document: 
['ratepayers', 'group', 'wants', 'compulsory', 'local', 'govt', 'voting']


 tokenized and lemmatized document: 
['ratepay', 'group', 'want', 'compulsori', 'local', 'govt', 'vote']


In [28]:
processed_docs = documents['headline_text'].map(preprocess)
processed_docs[:10]

0               [decid, commun, broadcast, licenc]
1                               [wit, awar, defam]
2           [call, infrastructur, protect, summit]
3                      [staff, aust, strike, rise]
4             [strike, affect, australian, travel]
5               [ambiti, olsson, win, tripl, jump]
6           [antic, delight, record, break, barca]
7    [aussi, qualifi, stosur, wast, memphi, match]
8            [aust, address, secur, council, iraq]
9                         [australia, lock, timet]
Name: headline_text, dtype: object

In [29]:
dictionary = gensim.corpora.Dictionary(processed_docs)
count = 0
for k, v in dictionary.iteritems():
    print(k, v)
    count += 1
    if count > 10:
        break

0 broadcast
1 commun
2 decid
3 licenc
4 awar
5 defam
6 wit
7 call
8 infrastructur
9 protect
10 summit


In [30]:
dictionary.filter_extremes(no_below=15, no_above=0.5, keep_n=100000)

In [31]:
bow_corpus = [dictionary.doc2bow(doc) for doc in processed_docs]
bow_corpus[4310]

[(159, 1), (233, 1), (282, 1), (570, 1), (815, 1), (3301, 1), (3302, 1)]

In [32]:
bow_doc_4310 = bow_corpus[4310]
for i in range(len(bow_doc_4310)):
    print("Word {} (\"{}\") appears {} time.".format(bow_doc_4310[i][0], 
                                               dictionary[bow_doc_4310[i][0]], 
bow_doc_4310[i][1]))

Word 159 ("govt") appears 1 time.
Word 233 ("group") appears 1 time.
Word 282 ("vote") appears 1 time.
Word 570 ("local") appears 1 time.
Word 815 ("want") appears 1 time.
Word 3301 ("compulsori") appears 1 time.
Word 3302 ("ratepay") appears 1 time.


In [33]:
from gensim import corpora, models
tfidf = models.TfidfModel(bow_corpus)
corpus_tfidf = tfidf[bow_corpus]
from pprint import pprint
for doc in corpus_tfidf:
    pprint(doc)
    break

[(0, 0.5988915696116912),
 (1, 0.38632666708232094),
 (2, 0.4843588629673915),
 (3, 0.5074219999310693)]


In [34]:
lda_model = gensim.models.LdaMulticore(bow_corpus, num_topics=10, id2word=dictionary, passes=2, workers=2)

In [35]:
for idx, topic in lda_model.print_topics(-1):
    print('Topic: {} \nWords: {}'.format(idx, topic))

Topic: 0 
Words: 0.023*"power" + 0.022*"court" + 0.019*"tell" + 0.018*"driver" + 0.016*"trial" + 0.015*"accus" + 0.015*"appeal" + 0.014*"lose" + 0.012*"case" + 0.012*"studi"
Topic: 1 
Words: 0.042*"govt" + 0.028*"plan" + 0.027*"council" + 0.022*"water" + 0.018*"fund" + 0.016*"urg" + 0.015*"health" + 0.013*"group" + 0.011*"hospit" + 0.011*"indigen"
Topic: 2 
Words: 0.028*"hous" + 0.019*"elect" + 0.015*"time" + 0.015*"offer" + 0.011*"opposit" + 0.010*"promis" + 0.010*"drive" + 0.010*"poll" + 0.010*"say" + 0.009*"pledg"
Topic: 3 
Words: 0.070*"polic" + 0.034*"charg" + 0.023*"crash" + 0.023*"death" + 0.018*"miss" + 0.018*"murder" + 0.017*"jail" + 0.017*"sydney" + 0.017*"rudd" + 0.015*"investig"
Topic: 4 
Words: 0.029*"face" + 0.021*"open" + 0.018*"lead" + 0.017*"market" + 0.016*"olymp" + 0.013*"world" + 0.013*"final" + 0.012*"aussi" + 0.011*"win" + 0.011*"race"
Topic: 5 
Words: 0.021*"year" + 0.015*"get" + 0.015*"busi" + 0.013*"begin" + 0.011*"work" + 0.011*"green" + 0.010*"industri" + 0.0

In [36]:
lda_model_tfidf = gensim.models.LdaMulticore(corpus_tfidf, num_topics=10, id2word=dictionary, passes=2, workers=4)
for idx, topic in lda_model_tfidf.print_topics(-1):
    print('Topic: {} Word: {}'.format(idx, topic))

Topic: 0 Word: 0.019*"crash" + 0.010*"die" + 0.010*"fatal" + 0.008*"polic" + 0.007*"road" + 0.007*"truck" + 0.007*"driver" + 0.006*"accid" + 0.006*"nelson" + 0.005*"iemma"
Topic: 1 Word: 0.013*"price" + 0.010*"rise" + 0.006*"toll" + 0.006*"wind" + 0.006*"farm" + 0.006*"petrol" + 0.006*"rat" + 0.006*"takeov" + 0.005*"nurs" + 0.005*"bendigo"
Topic: 2 Word: 0.009*"climat" + 0.007*"water" + 0.006*"timor" + 0.006*"talk" + 0.005*"chang" + 0.005*"east" + 0.005*"alcohol" + 0.005*"farmer" + 0.005*"govt" + 0.005*"korea"
Topic: 3 Word: 0.030*"closer" + 0.017*"interview" + 0.005*"john" + 0.004*"bird" + 0.004*"pulp" + 0.004*"christma" + 0.004*"kangaroo" + 0.004*"human" + 0.004*"global" + 0.004*"titl"
Topic: 4 Word: 0.010*"beij" + 0.008*"cancer" + 0.008*"olymp" + 0.006*"bash" + 0.006*"arm" + 0.005*"medic" + 0.005*"hick" + 0.005*"final" + 0.004*"polic" + 0.004*"treatment"
Topic: 5 Word: 0.005*"aussi" + 0.005*"australia" + 0.005*"liber" + 0.005*"england" + 0.005*"world" + 0.005*"weather" + 0.005*"list

In [37]:
processed_docs[4310]

['ratepay', 'group', 'want', 'compulsori', 'local', 'govt', 'vote']

In [38]:
for index, score in sorted(lda_model[bow_corpus[4310]], key=lambda tup: -1*tup[1]):
    print("\nScore: {}\t \nTopic: {}".format(score, lda_model.print_topic(index, 10)))


Score: 0.8874948620796204	 
Topic: 0.042*"govt" + 0.028*"plan" + 0.027*"council" + 0.022*"water" + 0.018*"fund" + 0.016*"urg" + 0.015*"health" + 0.013*"group" + 0.011*"hospit" + 0.011*"indigen"

Score: 0.012502336874604225	 
Topic: 0.026*"talk" + 0.023*"claim" + 0.017*"coast" + 0.016*"gold" + 0.014*"head" + 0.012*"north" + 0.012*"mayor" + 0.012*"meet" + 0.012*"labor" + 0.010*"south"

Score: 0.012500844895839691	 
Topic: 0.034*"kill" + 0.018*"protest" + 0.017*"attack" + 0.017*"nation" + 0.016*"iraq" + 0.014*"china" + 0.013*"aust" + 0.013*"closer" + 0.013*"australia" + 0.012*"bomb"

Score: 0.012500814162194729	 
Topic: 0.016*"farmer" + 0.013*"climat" + 0.013*"bodi" + 0.012*"drought" + 0.012*"chang" + 0.011*"rain" + 0.011*"trade" + 0.010*"compani" + 0.010*"free" + 0.009*"cancer"

Score: 0.01250048354268074	 
Topic: 0.028*"hous" + 0.019*"elect" + 0.015*"time" + 0.015*"offer" + 0.011*"opposit" + 0.010*"promis" + 0.010*"drive" + 0.010*"poll" + 0.010*"say" + 0.009*"pledg"

Score: 0.012500403

In [39]:
for index, score in sorted(lda_model_tfidf[bow_corpus[4310]], key=lambda tup: -1*tup[1]):
    print("\nScore: {}\t \nTopic: {}".format(score, lda_model_tfidf.print_topic(index, 10)))


Score: 0.6190384030342102	 
Topic: 0.010*"govt" + 0.009*"health" + 0.009*"fund" + 0.008*"council" + 0.008*"plan" + 0.006*"servic" + 0.006*"boost" + 0.006*"hospit" + 0.006*"urg" + 0.005*"indigen"

Score: 0.28094929456710815	 
Topic: 0.009*"climat" + 0.007*"water" + 0.006*"timor" + 0.006*"talk" + 0.005*"chang" + 0.005*"east" + 0.005*"alcohol" + 0.005*"farmer" + 0.005*"govt" + 0.005*"korea"

Score: 0.012504156678915024	 
Topic: 0.014*"rudd" + 0.009*"labor" + 0.009*"govt" + 0.008*"elect" + 0.007*"plan" + 0.006*"poll" + 0.005*"howard" + 0.005*"say" + 0.005*"chang" + 0.005*"govern"

Score: 0.012501808814704418	 
Topic: 0.013*"price" + 0.010*"rise" + 0.006*"toll" + 0.006*"wind" + 0.006*"farm" + 0.006*"petrol" + 0.006*"rat" + 0.006*"takeov" + 0.005*"nurs" + 0.005*"bendigo"

Score: 0.01250174455344677	 
Topic: 0.022*"charg" + 0.019*"court" + 0.014*"murder" + 0.012*"jail" + 0.011*"face" + 0.011*"assault" + 0.010*"polic" + 0.010*"accus" + 0.009*"guilti" + 0.009*"drug"

Score: 0.01250126305967569

In [43]:
unseen_document = 'China is alleged to spread the deadly corona virus'
bow_vector = dictionary.doc2bow(preprocess(unseen_document))
for index, score in sorted(lda_model[bow_vector], key=lambda tup: -1*tup[1]):
    print("Score: {}\t Topic: {}".format(score, lda_model.print_topic(index, 5)))

Score: 0.3500000238418579	 Topic: 0.016*"farmer" + 0.013*"climat" + 0.013*"bodi" + 0.012*"drought" + 0.012*"chang"
Score: 0.34948840737342834	 Topic: 0.034*"kill" + 0.018*"protest" + 0.017*"attack" + 0.017*"nation" + 0.016*"iraq"
Score: 0.18383482098579407	 Topic: 0.070*"polic" + 0.034*"charg" + 0.023*"crash" + 0.023*"death" + 0.018*"miss"
Score: 0.016675231978297234	 Topic: 0.033*"warn" + 0.022*"price" + 0.021*"drug" + 0.020*"fear" + 0.020*"worker"
Score: 0.016667375341057777	 Topic: 0.026*"talk" + 0.023*"claim" + 0.017*"coast" + 0.016*"gold" + 0.014*"head"
Score: 0.016667207702994347	 Topic: 0.028*"hous" + 0.019*"elect" + 0.015*"time" + 0.015*"offer" + 0.011*"opposit"
Score: 0.016666913405060768	 Topic: 0.029*"face" + 0.021*"open" + 0.018*"lead" + 0.017*"market" + 0.016*"olymp"
Score: 0.016666674986481667	 Topic: 0.023*"power" + 0.022*"court" + 0.019*"tell" + 0.018*"driver" + 0.016*"trial"
Score: 0.01666666753590107	 Topic: 0.042*"govt" + 0.028*"plan" + 0.027*"council" + 0.022*"water