# Linguistic Analysis

Let's look at the different levels of linguistic analysis. First, we need some text. Let's take *Moby Dick* from Project Gutenberg, as a list of strings:

In [1]:
# import wget
# url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/moby_dick.txt'
# wget.download(url, 'moby_dick.txt')

'moby_dick (1).txt'

In [2]:
documents = [line.strip() for line in open('moby_dick.txt', encoding='utf8').readlines()]
print(documents[1])

Call me Ishmael .


In [3]:
max([len(line) for line in documents])

2844

We will use the `spacy` library for a lot of the analyses. Here, we load it:

In [4]:
import spacy

nlp = spacy.load("en_core_web_sm")

### Usage:

We can now call `nlp()` as a function on any text. By default, it will perform a number of analyses:
- tokenization
- sentence splitting
- lemmatization
- part of speech tagging
- dependency parsing
- named entity recognition

To speed up analysis, we can disable some of these analyses if we do not need it:
```
nlp = spacy.load('en', disable=['tokenizer', 'tagger', 'parser', 'ner'])
```


The result is an iterator over the sentences (if called on a text), or tokens (if called on a sentence). Each token has a range of properties see [here](https://spacy.io/api/token#attributes). We will use a few of them in the following:

- `text`: the actual word
- `lemma_`: the dictionary entry of a word
- `pos_`: the part of speech
- `dep`: dependency relation
- `is_punct`: check whether word is punctuation
- `is_stop`: check whether word is a stop word

## Tokenization
Before we do anything, we need to insert spaces into the data.

In [5]:
tokens = [[token.text for token in nlp(sentence)] for sentence in documents[:100]]
tokens


[['Loomings', '.'],
 ['Call', 'me', 'Ishmael', '.'],
 ['Some',
  'years',
  'ago',
  '--',
  'never',
  'mind',
  'how',
  'long',
  'precisely',
  '--',
  'having',
  'little',
  'or',
  'no',
  'money',
  'in',
  'my',
  'purse',
  ',',
  'and',
  'nothing',
  'particular',
  'to',
  'interest',
  'me',
  'on',
  'shore',
  ',',
  'I',
  'thought',
  'I',
  'would',
  'sail',
  'about',
  'a',
  'little',
  'and',
  'see',
  'the',
  'watery',
  'part',
  'of',
  'the',
  'world',
  '.'],
 ['It',
  'is',
  'a',
  'way',
  'I',
  'have',
  'of',
  'driving',
  'off',
  'the',
  'spleen',
  'and',
  'regulating',
  'the',
  'circulation',
  '.'],
 ['Whenever',
  'I',
  'find',
  'myself',
  'growing',
  'grim',
  'about',
  'the',
  'mouth',
  ';',
  'whenever',
  'it',
  'is',
  'a',
  'damp',
  ',',
  'drizzly',
  'November',
  'in',
  'my',
  'soul',
  ';',
  'whenever',
  'I',
  'find',
  'myself',
  'involuntarily',
  'pausing',
  'before',
  'coffin',
  'warehouses',
  ',',
  'an

### Exercise

What's the longest and shortest sentence?

In [13]:
# your code here
import numpy as np

lengths = [len(token) for token in tokens]
tokens[np.argmax(lengths)], tokens[np.argmin(lengths)]

(['Though',
  'I',
  'can',
  'not',
  'tell',
  'why',
  'it',
  'was',
  'exactly',
  'that',
  'those',
  'stage',
  'managers',
  ',',
  'the',
  'Fates',
  ',',
  'put',
  'me',
  'down',
  'for',
  'this',
  'shabby',
  'part',
  'of',
  'a',
  'whaling',
  'voyage',
  ',',
  'when',
  'others',
  'were',
  'set',
  'down',
  'for',
  'magnificent',
  'parts',
  'in',
  'high',
  'tragedies',
  ',',
  'and',
  'short',
  'and',
  'easy',
  'parts',
  'in',
  'genteel',
  'comedies',
  ',',
  'and',
  'jolly',
  'parts',
  'in',
  'farces',
  '--',
  'though',
  'I',
  'can',
  'not',
  'tell',
  'why',
  'this',
  'was',
  'exactly',
  ';',
  'yet',
  ',',
  'now',
  'that',
  'I',
  'recall',
  'all',
  'the',
  'circumstances',
  ',',
  'I',
  'think',
  'I',
  'can',
  'see',
  'a',
  'little',
  'into',
  'the',
  'springs',
  'and',
  'motives',
  'which',
  'being',
  'cunningly',
  'presented',
  'to',
  'me',
  'under',
  'various',
  'disguises',
  ',',
  'induced',
  'm

### Exercise

Collect counts over the tokens. What is the most frequent token?

In [56]:
from collections import Counter
# your code here
c = Counter()
_ = [c.update(token) for token in tokens]

In [57]:
c.most_common(10)

[(',', 157),
 ('the', 111),
 ('.', 78),
 ('of', 75),
 ('a', 67),
 ('and', 61),
 ('to', 49),
 ('in', 43),
 ('I', 41),
 ('is', 32)]

## Lemmatization
We want to get the dictionary form of each word, to reduce variation.

In [65]:
print(documents[7])

There is nothing surprising in this .


How do you expect its lemmatize version to be?

In [66]:
# your code here
# word.lemma_ to lemmatize a word

[token.lemma_ for token in nlp(documents[7])]

['there', 'be', 'nothing', 'surprising', 'in', 'this', '.']

Now we run it for all the sentences

In [70]:
lemmas = [token.lemma_ for sentence in documents[:100] for token in nlp(sentence)]

['looming', '.', 'call', 'I', 'Ishmael']

### Exercise
Right now, the lemmas of all pronouns are collapsed into `-PRON-`. Change the code to preserve the original word (as lower case).

In [71]:
# your code here
[token.lemma_ if token.lemma_ != "I" else token.lower_ for sentence in documents[:100] for token in nlp(sentence)]

['looming',
 '.',
 'call',
 'me',
 'Ishmael',
 '.',
 'some',
 'year',
 'ago',
 '--',
 'never',
 'mind',
 'how',
 'long',
 'precisely',
 '--',
 'have',
 'little',
 'or',
 'no',
 'money',
 'in',
 'my',
 'purse',
 ',',
 'and',
 'nothing',
 'particular',
 'to',
 'interest',
 'me',
 'on',
 'shore',
 ',',
 'i',
 'think',
 'i',
 'would',
 'sail',
 'about',
 'a',
 'little',
 'and',
 'see',
 'the',
 'watery',
 'part',
 'of',
 'the',
 'world',
 '.',
 'it',
 'be',
 'a',
 'way',
 'i',
 'have',
 'of',
 'drive',
 'off',
 'the',
 'spleen',
 'and',
 'regulate',
 'the',
 'circulation',
 '.',
 'whenever',
 'i',
 'find',
 'myself',
 'grow',
 'grim',
 'about',
 'the',
 'mouth',
 ';',
 'whenever',
 'it',
 'be',
 'a',
 'damp',
 ',',
 'drizzly',
 'November',
 'in',
 'my',
 'soul',
 ';',
 'whenever',
 'i',
 'find',
 'myself',
 'involuntarily',
 'pause',
 'before',
 'coffin',
 'warehouse',
 ',',
 'and',
 'bring',
 'up',
 'the',
 'rear',
 'of',
 'every',
 'funeral',
 'i',
 'meet',
 ';',
 'and',
 'especially',
 

## Stemming
A more aggressive way of removing variation is *stemming*. Let's have a look again to our example.

In [72]:
print(documents[7])

There is nothing surprising in this .


How do you expect the lemmatized version to be?

In [88]:
from nltk import SnowballStemmer

stemmer = SnowballStemmer('english')
# your code here
print([stemmer.stem(word.text) for word in nlp("surprise surprising surprised")]) # to stem a sentence
[word.lemma_ for word in nlp("surprise surprising surprised")] # Lemma is an entry in the vocabulary

['surpris', 'surpris', 'surpris']


['surprise', 'surprising', 'surprised']

Now let's stem all our sentences

In [89]:
stems = [[stemmer.stem(token) for token in sentence] for sentence in tokens]

### Exercise

Keep track of the most frequent word for each stem in `tokens`. Hint: use a nested `defaultdict`.

- How many word forms does the stem `hand` have?
- What is the most common word form for the stems `respect` and `whale`? What happened there?

In [126]:
from collections import defaultdict

words_per_stem = defaultdict(Counter)

for sentence in tokens:
    for word in sentence:
        stemmed_word = stemmer.stem(word)
        words_per_stem[stemmed_word].update([word.lower()])

words_per_stem["whale"]

Counter({'whaling': 4, 'whale': 2})

## Parts of speech
We can extract the part of speech for every word with the `pos_` atttribute.

List of POS tags:
https://universaldependencies.org/u/pos/


In [127]:
print(documents[7])

There is nothing surprising in this .


Which are the POS tags of these words?

In [135]:
# your code here
# use token.pos_

_ = [print(f"{token.text} : {token.pos_}") for token in nlp(documents[7])]

There : PRON
is : VERB
nothing : PRON
surprising : ADJ
in : ADP
this : PRON
. : PUNCT


Let's apply this to all our documents

In [137]:
pos = [[token.pos_ for token in nlp(sentence)] for sentence in documents[:100]]

### Exercise
Print out the words in the first 10 sentences, but remove all words that are not nouns, verbs, adjectives, adverbs, or proper names.

In [169]:
# your code here

for sentence in documents[:10]:
    token = nlp(sentence)
    for word in token:
        if word.pos_ in ["NOUN", "VERB", "ADJ", "ADV", "PROPN"]:
            print(word, word.pos_)

Loomings NOUN
Call VERB
Ishmael PROPN
years NOUN
ago ADV
never ADV
mind VERB
long ADV
precisely ADV
having VERB
little ADJ
money NOUN
purse NOUN
particular ADJ
interest VERB
shore NOUN
thought VERB
sail VERB
about ADV
little ADJ
see VERB
watery ADJ
part NOUN
world NOUN
way NOUN
have VERB
driving VERB
spleen NOUN
regulating VERB
circulation NOUN
find VERB
growing VERB
grim ADJ
mouth NOUN
damp NOUN
drizzly NOUN
November PROPN
soul NOUN
find VERB
involuntarily ADV
pausing VERB
coffin NOUN
warehouses NOUN
bringing VERB
rear NOUN
funeral NOUN
meet VERB
especially ADV
hypos NOUN
get VERB
upper ADJ
hand NOUN
requires VERB
strong ADJ
moral ADJ
principle NOUN
prevent VERB
deliberately ADV
stepping VERB
street NOUN
methodically ADV
knocking VERB
people NOUN
hats NOUN
then ADV
account VERB
high ADJ
time NOUN
get VERB
sea NOUN
as ADV
soon ADV
substitute NOUN
pistol NOUN
ball NOUN
philosophical ADJ
flourish NOUN
Cato PROPN
throws VERB
sword NOUN
quietly ADV
take VERB
ship NOUN
is VERB
surprising AD

## Named Entities
For each noun phrase, we can infer the semantic type of it.

In [140]:
print(documents[2])

Some years ago -- never mind how long precisely -- having little or no money in my purse , and nothing particular to interest me on shore , I thought I would sail about a little and see the watery part of the world .


Which are the NEs in this sentence?

In [156]:
# your code here
for token in nlp(documents[9]).ents:
    print(token.text, token.label_)

Manhattoes ORG
Indian NORP


In [157]:
from spacy import displacy
displacy.render(nlp(documents[9]), style="ent", jupyter=True)

Let's apply this to all our documents

In [173]:
entities = [[(entity.text, entity.label_) 
             for entity in nlp(sentence).ents]
            for sentence in documents[:50]]
entities
# nlp('John gave a book to Mary and Celia in Cardiff').ents

[[],
 [],
 [('Some years ago', 'DATE')],
 [],
 [('November', 'DATE')],
 [],
 [('Cato', 'ORG')],
 [],
 [],
 [('Manhattoes', 'ORG'), ('Indian', 'NORP')],
 [],
 [('a few hours', 'TIME')],
 [],
 [('afternoon', 'TIME')],
 [('Corlears Hook', 'PRODUCT')],
 [('thousands upon thousands', 'CARDINAL')],
 [('China', 'GPE')],
 [('week days', 'DATE')],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [('ten', 'CARDINAL')],
 [],
 [],
 [('American', 'NORP')],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [('June', 'DATE')],
 [('Niagara', 'PERSON'), ('your thousand miles', 'QUANTITY')],
 [('Tennessee', 'GPE'), ('two', 'CARDINAL'), ('Rockaway Beach', 'GPE')],
 [],
 [('first', 'ORDINAL'), ('first', 'ORDINAL')],
 [('Persians', 'NORP')]]

### Exercise
Who are the 5 most frequently named people in the first 500 sentence?

In [199]:
# your code here
counter = Counter()
_ = [[counter.update([entity.text]) for entity in nlp(sentence).ents if entity.label_ == 'PERSON'] for sentence in documents[:500]]

counter.most_common(5)

[('Landlord', 7),
 ('Euroclydon', 5),
 ('Bulkington', 3),
 ('Lazarus', 2),
 ('Sal', 2)]

### Exercise

Use the text below to extract all entities. 
- Create tuples of `(lemma, NER type)`
- Collect counts over the tuples
- Look at the 10 most frequent tuples: how many of them are wrong? Why? Discuss with a neighbor.


In [209]:
text = """
Seville.
Summers in the flamboyant Andalucían capital often nudge 40C, but spring is a delight, with the parks in bloom and the scent of orange blossom and jasmine in the air. And in Semana Santa (Holy Week, 14-20 April) the streets come alive with floats and processions. There is also the raucous annual Feria de Abril – a week-long fiesta of parades, flamenco and partying long into the night (4-11 May; expect higher hotel prices if you visit then).
Seville is a romantic and energetic place, with sights aplenty, from the Unesco-listed cathedral – the largest Gothic cathedral in the world – to the beautiful Alcázar royal palace. But days here are best spent simply wandering the medieval streets of Santa Cruz and along the river to La Real Maestranza, Spain’s most spectacular bullring.
Seville is the birthplace of tapas and perfect for a foodie break – join a tapas tour (try devoursevillefoodtours.com), or stop at the countless bars for a glass of sherry with local jamón ibérico (check out Bar Las Teresas in Santa Cruz or historic Casa Morales in Constitución). Great food markets include the Feria, the oldest, and the wooden, futuristic-looking Metropol Parasol.
Nightlife is, unsurprisingly, late and lively. For flamenco, try one of the peñas, or flamenco social clubs – Torres Macarena on C/Torrijano, perhaps – with bars open across town until the early hours.
Book it: In an atmospheric 18th-century house, the Hospes Casa del Rey de Baeza is a lovely place to stay in lively Santa Cruz. Doubles from £133 room only, hospes.com
Trieste.
By April, temperatures are on the rise in Trieste and in the late 20s by May. It is far less touristy than the likes of Florence or Rome, and spring sees the city’s lovely restaurants and bars populated almost exclusively by locals.
A city with a proud coffee-drinking culture – Illy has its headquarters here – Trieste has many venerable cafes, including the dazzling mirror-walled Caffè degli Specchi on the Piazza Unità d’Italia – said to be Europe’s biggest seaside piazza – and the elegant Caffè San Marco, which has a good bookshop. James Joyce was a regular when he lived here between 1904-1915. You can learn all about him at the excellent museum, which also has a free, downloadable themed walk on its website (museojoycetrieste.it).
Above Trieste is a vast limestone plateau known as the carso (or karst). Travel up to Villa Opicina on the edge of the region by bus.
There are several trattorie, but for a real treat catch a cab to one of the 30 or so osmize – farm restaurants that sell their wines, cured meats, cheese, honey, fruit and veg; traditionally, they were open eight, 16 or 24 days per year (“osmi” means “eighth” in Slovene) but this now varies – check the app at osmize.com for details.
Book it: Stay at the palatial, seafront Savoia Excelsior Palace, Jan Morris’s pad in her book Trieste and the Meaning of Nowhere. Doubles from £127 room only, starhotelscollezione.com
Belgrade.
As Belgrade shrugs off the snow, cafe tables start colonising the pavements again. Not that Serbia’s capital hibernates during the winter, but spring brings a freshness worth savouring before summer’s 40C heat kicks in.
You feel it especially in Kalemegdan, the huge park and fortress hugging the confluence of the Danube and Sava rivers. Down below are wide riverside paths that offer superb cycling all the way to the attractive suburb of Zemun. Follow the Sava southwards to reach the river island of Ada Ciganlija – open all year round for walks and bike rides and usually warm enough in May for a swim.
Although barely a month goes by without a festival, the pace picks up during spring. The Belgrade Dance Festival attracts dance companies from around the world. Classical guitar gets its own spotlight during the Guitar Art Festivalfrom 12-17 March, and from 26-28 April you can join the Orthodox Easter festivities. The landmark Mikser House in Savamala may have closed, but its Mikser festival still celebrates the best in Balkan design (24-26 May).
With the long-awaited reopening of the National Museum of Serbia and the Museum of Contemporary Art, Belgrade has some cultural heavyweights to add to its dizzyingly varied restaurant scene. Head to the Dorćol district for cheap cocktails in Blaznavac’s psychedelic garden before a Balkan-Mediterranean dinner in cosy Tezga (Strahinjića Bana 82, on Facebook).
Book it: Set in a handsome 1929 villa in Dorćol, Smokvica has six stylish rooms as well as a restaurant with a courtyard garden. Doubles from €70 room only, smokvica.rs
Montpellier.
Montpellier combines easy elegance and a vibrant cultural scene with a youthful buzz – its university, founded in the 13th century, counts radical satirist Rabelais among its alumni and some 60,000 students live here.
The medieval centre is a maze made for wandering, with 16 leafy squares – in spring, all green and abuzz with alfresco cafe life. The vast, pedestrianised Place de la Comédie connects the old town with the striking new Antigone district, replete with modern, neoclassical-style buildings.
For a fine-art foray, head to Musée Fabre – one of the biggest in France – or nearby photography museum Pavillon Populaire. Montpellier boasts the oldest botanical garden in France, too, dating from 1593 and particularly beautiful in April and May. Independent boutiques, opera houses, markets laden with Languedoc produce (look for oysters and asparagus) and great dining options add to the appeal. Le Petit Jardin is a bistro and lovely garden restaurant near the old cathedral.
Beyond the city, discover vineyards like Mas de Daumas Gassac or hike up Pic St-Loup in the Cévennes foothills for views over the coast. The beach and charming seaside town of Palavas-les-Flots is just 10km away – a tram ride or easy cycle (rent bicycles from Ville et Vélo).
Book it: Hotel Le Guilhem is a 16th-century building in the historic centre with cathedral views from its terrace (where breakfast is served, weather permitting). Doubles from £83 room only, leguilhem.com
Berlin.
After a long, cold winter, Berliners waste no time in celebrating the return of the sun, with beer gardens and flea markets reopening all over the city. Try picturesque Cafe Am Neuen Seefor lakeside pizza and beer in the Tiergarten, and hang out with the hipsters along the canal at Berlin’s coolest flohmarkt, surrounded by cherry blossom trees, a charming place to browse and have a couple of beers.
For the best blossom, though, head to one of the trails along the line of the wall, the Mauerweg, near S-bahn Bornholmer Strasse, or Lichterfelde Süd, where the trees were gifted by the Japanese to celebrate German reunification in 1989.
Berlin is set for two big anniversaries this year: it’s 100 years since the Bauhaus was founded and 30 years since the wall came down, so check out events including exhibitions and dances themed around the former (see visitberlin.de for more information) or get a feel for the creative chaos of post-wall Berlin at the multimedia Nineties Berlin exhibition.
Boat tours along the river and canals are superb in spring but, generally, cycling is the best way to get around (many hotels and hostels have their own). And if urban life gets a bit much, do as the Berliners do and head out to one of the many city lakes, such as Schlachtensee or Wannsee, directly accessible by U and S-bahn.
Book it: Boutique Hotel Oderberger in a trendy, central location has doubles from £113, hotel-oderberger.berlin, or try indoor caravanning at Hüttenpalast, doubles from £61 room only, huettenpalast.de
"""

# your code here
counter = Counter([(ent.lemma_, ent.label_) for ent in nlp(text).ents])
counter.most_common(10)

[(('spring', 'DATE'), 4),
 (('one', 'CARDINAL'), 4),
 (('Santa Cruz', 'GPE'), 3),
 (('Seville', 'PERSON'), 2),
 (('April', 'DATE'), 2),
 (('Belgrade', 'GPE'), 2),
 (('Montpellier', 'PERSON'), 2),
 (('France', 'GPE'), 2),
 (('Berlin', 'GPE'), 2),
 (('Andalucían', 'PERSON'), 1)]

## Parsing
For each word, we can extract the word it is grammatically related to, plus the type of the relation.

In [210]:
print(documents[7])

There is nothing surprising in this .


Which are the NEs in this sentence?

In [214]:
# your code here
[(word.text, word.head, word.dep_) for word in nlp(documents[7])]

[('There', is, 'expl'),
 ('is', is, 'ROOT'),
 ('nothing', is, 'attr'),
 ('surprising', nothing, 'amod'),
 ('in', nothing, 'prep'),
 ('this', in, 'pobj'),
 ('.', is, 'punct')]

Let's apply this to all our documents

In [215]:
[[(c.text, c.head.text, c.dep_) for c in nlp(sentence)] 
 for sentence in documents[:100]]

[[('Loomings', 'Loomings', 'ROOT'), ('.', 'Loomings', 'punct')],
 [('Call', 'Call', 'ROOT'),
  ('me', 'Call', 'dobj'),
  ('Ishmael', 'Call', 'oprd'),
  ('.', 'Call', 'punct')],
 [('Some', 'years', 'det'),
  ('years', 'ago', 'npadvmod'),
  ('ago', 'mind', 'advmod'),
  ('--', 'mind', 'punct'),
  ('never', 'mind', 'neg'),
  ('mind', 'thought', 'ccomp'),
  ('how', 'long', 'advmod'),
  ('long', 'precisely', 'advmod'),
  ('precisely', 'mind', 'advmod'),
  ('--', 'mind', 'punct'),
  ('having', 'mind', 'advcl'),
  ('little', 'money', 'amod'),
  ('or', 'little', 'cc'),
  ('no', 'little', 'conj'),
  ('money', 'having', 'dobj'),
  ('in', 'money', 'prep'),
  ('my', 'purse', 'poss'),
  ('purse', 'in', 'pobj'),
  (',', 'having', 'punct'),
  ('and', 'having', 'cc'),
  ('nothing', 'having', 'conj'),
  ('particular', 'nothing', 'amod'),
  ('to', 'interest', 'aux'),
  ('interest', 'particular', 'xcomp'),
  ('me', 'interest', 'dobj'),
  ('on', 'interest', 'prep'),
  ('shore', 'on', 'pobj'),
  (',', 'thou

Instead of doing this at a word-by-word basis, we can do it by larger chunks, the noun phrases.

In [216]:
[[(c.text, c.root.head.text, c.root.dep_) for c in nlp(sentence).noun_chunks] 
 for sentence in documents[:100]]

[[('Loomings', 'Loomings', 'ROOT')],
 [('me', 'Call', 'dobj'), ('Ishmael', 'Call', 'oprd')],
 [('little or no money', 'having', 'dobj'),
  ('my purse', 'in', 'pobj'),
  ('me', 'interest', 'dobj'),
  ('shore', 'on', 'pobj'),
  ('I', 'thought', 'nsubj'),
  ('I', 'sail', 'nsubj'),
  ('the watery part', 'see', 'dobj'),
  ('the world', 'of', 'pobj')],
 [('It', 'is', 'nsubj'),
  ('a way', 'is', 'attr'),
  ('I', 'have', 'nsubj'),
  ('the spleen', 'driving', 'dobj'),
  ('the circulation', 'regulating', 'dobj')],
 [('I', 'find', 'nsubj'),
  ('myself', 'growing', 'nsubj'),
  ('the mouth', 'about', 'pobj'),
  ('it', 'is', 'nsubj'),
  ('a damp', 'is', 'attr'),
  ('my soul', 'in', 'pobj'),
  ('I', 'find', 'nsubj'),
  ('myself', 'pausing', 'nsubj'),
  ('coffin warehouses', 'before', 'pobj'),
  ('the rear', 'bringing', 'dobj'),
  ('every funeral', 'of', 'pobj'),
  ('I', 'meet', 'nsubj'),
  ('my hypos', 'get', 'nsubj'),
  ('such an upper hand', 'get', 'dobj'),
  ('me', 'of', 'pobj'),
  ('it', 'require

## Exercise
How does Melville describe nouns? Extract all the pairs related by `amod`.

In [220]:
# your code here
[(word.text, word.head, word.dep_) for word in nlp(text) if word.dep_ == "amod"]

[('flamboyant', capital, 'amod'),
 ('raucous', annual, 'amod'),
 ('long', fiesta, 'amod'),
 ('higher', prices, 'amod'),
 ('romantic', place, 'amod'),
 ('aplenty', sights, 'amod'),
 ('listed', cathedral, 'amod'),
 ('largest', cathedral, 'amod'),
 ('Gothic', cathedral, 'amod'),
 ('beautiful', palace, 'amod'),
 ('royal', palace, 'amod'),
 ('medieval', streets, 'amod'),
 ('spectacular', bullring, 'amod'),
 ('countless', bars, 'amod'),
 ('local', ibérico, 'amod'),
 ('historic', Morales, 'amod'),
 ('Great', markets, 'amod'),
 ('wooden', Parasol, 'amod'),
 ('futuristic', looking, 'amod'),
 ('looking', Parasol, 'amod'),
 ('social', clubs, 'amod'),
 ('open', bars, 'amod'),
 ('early', hours, 'amod'),
 ('atmospheric', house, 'amod'),
 ('18th', century, 'amod'),
 ('lovely', place, 'amod'),
 ('lively', Cruz, 'amod'),
 ('late', 20s, 'amod'),
 ('less', touristy, 'amod'),
 ('lovely', restaurants, 'amod'),
 ('proud', culture, 'amod'),
 ('many', cafes, 'amod'),
 ('venerable', cafes, 'amod'),
 ('dazzling

How does he describe men?

In [224]:
# your code here
[(word.text, word.head, word.dep_) for word in nlp(text) if word.head.text == "men"]

[]

# Try on the language that you prefer!

https://spacy.io/models
