# Basic Linguistic Features

This notebook introduces basic linguistic features used in NLP. It contains exercises on:
* Part-of-Speech tagging
* Named Entity Recognition

We will use SpaCy for this task.

## Part-of-Speech Tagging (PoS)

#### Documentation
* https://spacy.io/api/annotation#pos-en

In [4]:
# Install spacy
#!pip install spacy

In [7]:
#!python -m spacy download en_core_web_md

In [1]:
from spacy import displacy
import en_core_web_md
import spacy

nlp = en_core_web_md.load()

In [5]:
doc = nlp("This is a new sentence with a special meaning.")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
          token.shape_, token.is_alpha, token.is_stop)

This this DET DT nsubj Xxxx True True
is be AUX VBZ ROOT xx True True
a a DET DT det x True True
new new ADJ JJ amod xxx True False
sentence sentence NOUN NN attr xxxx True False
with with ADP IN prep xxxx True True
a a DET DT det x True True
special special ADJ JJ amod xxxx True False
meaning meaning NOUN NN pobj xxxx True False
. . PUNCT . punct . False False


In [6]:
displacy.render(doc, style="dep")

In [4]:
spacy.explain("attr")

'attribute'

### Exercise

Repeat the above exercise with German language.
* download a german model
* create *doc* of any sentence in German
* render and display the scheme

#### Tools
Native Python + SpaCy

#### Model
*de_core_news_sm*

#### Documentation
* https://spacy.io/api/token#attributes
* https://spacy.io/models/de#de_core_news_sm

In [13]:
# download the German model mentioned above
#!python -m spacy download de_core_news_sm

In [14]:
# import the German model
import de_core_news_sm

nlp = de_core_news_sm.load()

In [15]:
doc = nlp("Ich würde gerne essen")  # please change the input document if you want to

# obtain the different components of 'doc'
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
          token.shape_, token.is_alpha, token.is_stop)

Ich Ich PRON PPER sb Xxx True True
würde werden AUX VAFIN ROOT xxxx True True
gerne gerne ADV ADV mo xxxx True False
essen essen VERB VVINF oc xxxx True False


In [16]:
# render the dependencies
displacy.render(doc, style="dep")

## Named Entity Recognition (NER)

#### Documentation
* https://spacy.io/api/annotation#named-entities

In [17]:
from spacy import displacy
import en_core_web_md

nlp = en_core_web_md.load()

In [18]:
doc = nlp("When Sebastian Thrun started working on self-driving cars at Google in 2007,\
          few people outside of the company took him seriously.")
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Sebastian Thrun 5 20 PERSON
Google 61 67 ORG
2007 71 75 DATE


In [35]:
displacy.render(doc, style="ent")

### Exercise

Repeat the above exercise with a German language model. If you have already finished the PoS exercise then you can directly import the german model you downloaded.

#### Tools
Native Python + SpaCy

#### Model
*de_core_news_sm*

#### Documentation
* https://spacy.io/api/annotation#named-entities

In [20]:
# import the German model
nlp = de_core_news_sm.load()

In [21]:
doc = nlp("Nordkorea sprengt ein gemeinsames Verbindungsbüro und droht mit der\
           Besetzung der demilitarisierten Zone an der Grenze zu Südkorea.")  # change the document

# obtain different NER components for the document
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Nordkorea 0 9 PER
Südkorea 132 140 LOC


In [22]:
# render the dependencies
displacy.render(doc, style="ent")