# Basic Linguistic Features

This notebook introduces basic linguistic features used in NLP. It contains exercises on:
* Part-of-Speech tagging
* Named Entity Recognition

We will use SpaCy for this task.

## Part-of-Speech Tagging (PoS)

#### Documentation
* https://spacy.io/api/annotation#pos-en

In [None]:
# Install spacy
%pip install spacy

In [2]:
# Download the English model
# find more models at https://spacy.io/models
!python -m spacy download en_core_web_md

Collecting en-core-web-md==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl (42.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 MB[0m [31m588.7 kB/s[0m eta [36m0:00:00[0m00:01[0m00:02[0m
Installing collected packages: en-core-web-md
Successfully installed en-core-web-md-3.7.1
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')


In [3]:
import en_core_web_md
import spacy
from spacy import displacy

# Load the spaCy model into the variable 'nlp'
nlp = en_core_web_md.load()

In [7]:
doc = nlp("This is a new sentence with a special meaning. Now I'm on a new Laptop, P43s!")
for token in doc:
    print(
        token.text, # string of the token
        token.lemma_, # base form of the token
        token.pos_, # part of speech
        token.tag_, # detailed part of speech
        token.dep_, # syntactic dependency
        token.shape_, # word shape
        token.is_alpha, # is the token an alpha character
        token.is_stop # is the token a stop word
    )

This this PRON DT nsubj Xxxx True True
is be AUX VBZ ROOT xx True True
a a DET DT det x True True
new new ADJ JJ amod xxx True False
sentence sentence NOUN NN attr xxxx True False
with with ADP IN prep xxxx True True
a a DET DT det x True True
special special ADJ JJ amod xxxx True False
meaning meaning NOUN NN pobj xxxx True False
. . PUNCT . punct . False False
Now now ADV RB advmod Xxx True True
I I PRON PRP nsubj X True True
'm be AUX VBP ROOT 'x False True
on on ADP IN prep xx True True
a a DET DT det x True True
new new ADJ JJ amod xxx True False
Laptop Laptop PROPN NNP pobj Xxxxx True False
, , PUNCT , punct , False False
P43s p43s NOUN NN appos Xddx False False
! ! PUNCT . punct ! False False


In [8]:
spacy.explain("attr")

'attribute'

In [9]:
displacy.render(doc, style="dep")

## Named Entity Recognition (NER)

#### Documentation
* https://spacy.io/api/annotation#named-entities

In [10]:
from spacy import displacy
import en_core_web_md

nlp = en_core_web_md.load()

In [11]:
doc = nlp("When Sebastian Thrun started working on self-driving cars at Google in 2007,\
          few people outside of the company took him seriously.")
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Sebastian Thrun 5 20 PERSON
Google 61 67 ORG
2007 71 75 DATE


In [12]:
displacy.render(doc, style="ent")