# In-Class Exercise: Exploring Gutenberg Using Python
 This exercise includes
 both collaborative and independent components. You will be working primarily in your own Jupyter notebook, but will be collaborating on investigating a question of your own choosing.


 First, you will need to install some dependencies:

 
 - Install BSD-DB according to the instructions here:
 https://github.com/c-w/Gutenberg

 - Next, we'll install a library for downloading texts from gutenberg via pip. After selecting the appropriate shell for Anaconda, type the following into the terminal:
 
 ```bash
 pip install gutenberg
 ```
 
 - Finally, install TextBlob and necessary corpora:
 ```na;j
 pip install -U textblob
 python -m textblob.download_corpora
 ```

In [4]:
# Let's begin by downloading and using the version of Moby Dick published on Project Gutenberg.
from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers
from textblob import TextBlob

text = strip_headers(load_etext(60482)).strip()
blob = TextBlob(text)
# print(text)  # prints 'MOBY DICK; OR THE WHALE\n\nBy Herman Melville ...'
# This will save the text to a local .txt file in this directory.
source = open('Steve Browns Bunyip and other Stories.txt','w',encoding="utf-16",newline='\n')
source.write(text)
source.close()

In [5]:
type(text)


str

In [6]:
blob.noun_phrases   # WordList(['titular threat', 'blob',
                    #            'ultimate movie monster',
                    #            'amoeba-like mass', ...])

uid pitch', '’ ard', 'groping', 'tom', '’ s pipe', 'swallow mouthfuls', 'boat ’ s stern', 'unnatural fashion', 'commending', 'god', 'boat ’ s bottom', 'shoot downward', 'whilst thick spray', 'clamour adown', 'narrow chasm', 'headlong course', 'splinters fly', 'boat ’ s timbers', 'whilst masses', 'dank weeds', '‘ tell-tale ’ compass', 'audience— ‘', 'subterranean cataract', 'dull gurgle', 'cleft water', 'faint ray', 'black hole', 'deceptive echo', 'hope', 'underground river', 'utter despair', 'tumultuous waters', 'courage flown', 'was', 'earth ’ s bowels', 'vague remembrance', 'comic songs', 'shouting defiance', 'whether', 'strange journey', 'sober senses', 'her. ’', 'captain ’ s story', 'chief officer', 'superior ’ s eye', 'o ’ clock', '’ m', 'evening. ’', 'general verdict', 'miss hillier', 'horrid place', 'sequel. ’ ‘', 'quizzical smile', 'heavy weather', 'mr santley', '’ s eye', 'all. ’', 'ropes thrown', 'evening.= close-reefed', 'high sea', '_corona ’ s_ passengers', 'captain ’ s st

In [7]:
for sentence in blob.sentences:
    print(sentence.sentiment.polarity)


.5
0.0
0.0
0.03571428571428573
0.0
0.95
0.05
0.0
0.1
0.0
0.0
0.15863095238095237
0.25
0.25
0.0
0.125
0.0
-0.075
-0.6
-0.26
0.0
0.0861111111111111
0.0
-0.1625
-0.2
0.13636363636363635
0.13636363636363635
0.0
0.0
0.03125
0.0
0.5
0.0
-0.25
-0.0625
0.0
0.0
-0.12000000000000001
0.10485714285714287
-0.2
0.0
-0.0625
0.25
-0.125
-0.08
0.02500000000000001
0.0
0.06507936507936506
0.0
0.0
0.625
0.0
0.0
-0.06666666666666665
0.0
-0.4
0.0
0.21428571428571427
0.0
0.25
-0.05
0.0
0.0
1.0
0.09999999999999999
-0.4
0.0
-0.15000000000000002
0.0
0.25
-0.09999999999999996
-0.1
-0.04791666666666668
0.0
0.15
-0.3
-0.15
-0.25
-0.1
0.0
-0.125
0.14479166666666665
-0.12121212121212116
0.0
0.2113095238095238
0.0
0.056825396825396814
0.04305555555555555
0.025000000000000005
0.3111111111111111
-0.1
0.325
0.325
-0.05
0.0
0.0
0.25
-0.625
0.0
-0.05714285714285714
0.0
0.7
0.0
0.075
-0.3125
0.0
-0.125
0.005208333333333315
-0.05
0.1433862433862434
0.125
0.32
0.11649305555555556
0.04888888888888887
-0.05
0.25
-0.0625
-0.075

In [23]:
from operator import itemgetter  

d = blob.word_counts
for key, value in sorted(d.items(), key = itemgetter(1), reverse = True):
    print(key, value)


ercise 1
perverted 1
exists 1
precincts 1
reference 1
remote 1
herein 1
sere 1
leaf 1
allusion 1
episode 1
earlier 1
horse-boy 1
boundary-rider 1
aggrieved 1
monthly 1
dull—unutterably 1
dull—and 1
instalment 1
duller 1
heavier 1
mule 1
grindstone 1
diseases 1
animals 1
fraud 1
huts 1
cottages 1
194 1
men—dead 1
portraits 1
notices 1
parkes 1
humpalong 1
mayor 1
paragraphs 1
woodcuts 1
nelson 1
pompey 1
scipio 1
africanus 1
such-like 1
characters 1
delivery 1
confronted 1
signatures 1
specification 1
handwriting 1
palpable 1
forgery 1
copies 1
verandahs 1
stockyard 1
bail 1
pay—costs 1
forsworn 1
literature 1
shabby 1
indelible 1
pencils 1
agreements 1
signature 1
goatees 1
manager—a 1
ousted 1
cæsar— 1
damages 1
stockwhips 1
justifiable 1
homicide 1
betther 1
whups 1
doggin 1
thrash 1
book-fiends 1
collecting 1
station-yard 1
entitled 1
forty-seven 1
engravings 1
howl 1
execration 1
chancing 1
distanced 1
summoned 1
municipality 1
strewed 1
fellow-fiends 1
cute-looking 1
modestly 1
go

In [10]:
max = 0
index = 0
# Find the longest sentence in the work
for key, sentence in enumerate(blob.sentences):
    if(len(sentence.words) > max):
        max = len(sentence.words)
        index = key


In [11]:
# Find the longest word in the work
max = 0
for key, word in enumerate(blob.words):
    if(len(word) > max):
        max = len(word)
        index = key
print(max)
print(blob.words[index])


52
specimen-sheet-monthly-delivery-collection-per-agent


# Parts of Speech

Another method Montfort described is to use the tags to count certain parts of speech. Below is an example that uses a single sentence, but the same could be applied to a full manuscript.

In [18]:

pride = TextBlob('''It is a truth universally acknowledged, 
that a single man in possession of a good fortune, must be in 
want of a wife.''')


In [19]:
def adjs(pride):
    count = 0
    for (word, tag) in pride.tags:
        if tag == 'JJ':
            count = count + 1
    return count


In [20]:
adjs(pride)

2

# Creating Figures
There are many ways to create figures. Below is one example of a table. You can save the figure to a file. 

You will need to install orca, however, using conda in order to create a static image:
```
conda install -c plotly plotly-orca
```

In [21]:
import plotly.graph_objects as go

fig = go.Figure(data=[go.Table(header=dict(values=['A Scores', 'B Scores']),
                 cells=dict(values=[[100, 90, 80, 90], [95, 85, 75, 95]]))
                     ])
fig.show()
fig.write_image("in-class/week-6/fig1.png")

ModuleNotFoundError: No module named 'plotly'

In [None]:
import plotly.graph_objects as go

fig = go.Figure(data=[go.Table(header=dict(values=['A Scores', 'B Scores']),
                 cells=dict(values=[[100, 90, 80, 90], [95, 85, 75, 95]]))
                     ])
fig.show()
fig.write_image("in-class/week-6/fig1.png")

We will work with other types of figures, graphs, and tables in Lab 2.

To turn in the assignment, follow the instructions in class_notes.ipynb