# In-Class Exercise: Exploring Gutenberg Using Python
 This exercise includes
 both collaborative and independent components. You will be working primarily in your own Jupyter notebook, but will be collaborating on investigating a question of your own choosing.


 First, you will need to install some dependencies:

 
 - Install BSD-DB according to the instructions here:
 https://github.com/c-w/Gutenberg

 - Next, we'll install a library for downloading texts from gutenberg via pip. After selecting the appropriate shell for Anaconda, type the following into the terminal:
 
 ```bash
 pip install gutenberg
 ```
 
 - Finally, install TextBlob and necessary corpora:
 ```na;j
 pip install -U textblob
 python -m textblob.download_corpora
 ```

In [15]:
# Let's begin by downloading and using the version of Moby Dick published on Project Gutenberg.
from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers
from textblob import TextBlob

text = strip_headers(load_etext(7947)).strip()
blob = TextBlob(text)
# print(text)  # prints 'MOBY DICK; OR THE WHALE\n\nBy Herman Melville ...'
# This will save the text to a local .txt file in this directory.
source = open('uboat.txt','w',encoding="utf-16",newline='\n')
source.write(text)
source.close()



In [16]:
type(text)


str

In [17]:
blob.noun_phrases   # WordList(['titular threat', 'blob',
                    #            'ultimate movie monster',
                    #            'amoeba-like mass', ...])



In [18]:
for sentence in blob.sentences:
    print(sentence.sentiment.polarity)


2
-0.1125
0.13592592592592592
0.1
0.45
-0.6
0.125
0.0
0.35
0.0
0.0
0.0
0.16666666666666666
-0.15
0.0
0.0
0.0
0.0
0.6266666666666667
-0.25
0.2
-0.07999999999999999
0.32916666666666666
0.2
-0.049999999999999996
0.0
-0.25
0.0
0.0
0.0
0.0
0.0
1.0
0.08333333333333333
0.08333333333333333
-0.14583333333333334
0.0
0.1
0.1
0.0
0.09999999999999998
0.625
-0.08333333333333333
0.10714285714285714
0.056249999999999994
0.0
0.10166666666666666
0.16666666666666666
0.0
-0.0625
0.0
0.0
-0.1
0.48214285714285715
0.35
0.25
-0.046875
0.09458333333333334
-0.1125
0.0
0.0
0.0666666666666667
0.07083333333333333
-0.2
-0.2916666666666667
0.2035714285714286
0.0
-0.125
-0.001111111111111109
-0.25
0.11666666666666665
0.25
-0.20833333333333331
-0.033333333333333326
0.3666666666666667
1.0
0.2285714285714286
-0.2916666666666667
-0.07777777777777779
-0.08333333333333331
0.0
0.0
0.0
0.0
0.0
-0.4
0.0
0.0
-0.2
-0.07638888888888888
0.0
0.0
-0.20833333333333334
0.2857142857142857
0.25
-0.16666666666666666
-0.17500000000000002

In [41]:
from operator import itemgetter  

d = blob.word_counts
for key, value in sorted(d.items(), key = itemgetter(1), reverse = False):
    print(key, value)


rds 2
nordreich 2
thence 2
orkneys 2
chlorine 2
disposal 2
crash-dived 2
explosions 2
burning 2
implacable 2
powers 2
recording 2
ending 2
vessel 2
centimetres 2
outlines 2
random 2
signalman 2
trunnion 2
pedestal 2
juncture 2
preserved 2
reader 2
verge 2
insomnia 2
stormy 2
vowed 2
friends 2
lovers 2
reasonable 2
repaired 2
lifetime 2
possibly 2
englishmen 2
selfish 2
church 2
bullets 2
peril 2
agent 2
yourself 2
unworthy 2
priest 2
nursed 2
alexandrovitch 2
dreamed 2
season 2
warsaw 2
dreaming 2
brilliantly 2
christian 2
autumn 2
proclamations 2
promises 2
believed 2
westward 2
austrian 2
thunderstorm 2
wolf 2
staggering 2
dragging 2
flowed 2
minds 2
cellars 2
paralysed 2
terror 2
mob 2
pistol 2
dairy 2
yard 2
hay-loft 2
looting 2
shout 2
rifles 2
loft 2
courtyard 2
stayed 2
gate 2
helplessly 2
oblation 2
strengthened 2
telling 2
roumania 2
offensive 2
roused 2
grind 2
countries 2
shadow 2
agents 2
realization 2
vision 2
fatal 2
clung 2
prayers 2
tortures 2
forgive 2
paradise 2
lande

In [32]:
max = 0
index = 0
# Find the longest sentence in the work
for key, sentence in enumerate(blob.sentences):
    if(len(sentence.words) > max):
        max = len(sentence.words)
        index = key


6


In [29]:
# Find the longest word in the work
max = 0
for key, word in enumerate(blob.words):
    if(len(word) > max):
        max = len(word)
        index = key
print(max)
print(blob.words[index])


21
lieutenant-commanders


# Parts of Speech

Another method Montfort described is to use the tags to count certain parts of speech. Below is an example that uses a single sentence, but the same could be applied to a full manuscript.

In [33]:

pride = TextBlob('''It is a truth universally acknowledged, 
that a single man in possession of a good fortune, must be in 
want of a wife.''')


In [34]:
def adjs(pride):
    count = 0
    for (word, tag) in pride.tags:
        if tag == 'JJ':
            count = count + 1
    return count


In [40]:
adjs(pride)

2

# Creating Figures
There are many ways to create figures. Below is one example of a table. You can save the figure to a file. 

You will need to install orca, however, using conda in order to create a static image:
```
conda install -c plotly plotly-orca
```

In [38]:
import plotly.graph_objects as go

fig = go.Figure(data=[go.Table(header=dict(values=['A Scores', 'B Scores']),
                 cells=dict(values=[[100, 90, 80, 90], [95, 85, 75, 95]]))
                     ])
fig.show()
fig.write_image("class/fig1.png")

ModuleNotFoundError: No module named 'plotly'

In [39]:
import plotly.graph_objects as go

fig = go.Figure(data=[go.Table(header=dict(values=['A Scores', 'B Scores']),
                 cells=dict(values=[[100, 90, 80, 90], [95, 85, 75, 95]]))
                     ])
fig.show()
fig.write_image("class/fig1.png")

ModuleNotFoundError: No module named 'plotly'

We will work with other types of figures, graphs, and tables in Lab 2.

To turn in the assignment, follow the instructions in class_notes.ipynb