# Basic Linguistic Features

This notebook introduces basic linguistic features used in NLP. It contains exercises on:
* Part-of-Speech tagging
* Named Entity Recognition

We will use SpaCy for this task.

## Part-of-Speech Tagging (PoS)

#### Documentation
* https://spacy.io/api/annotation#pos-en

In [None]:
# Install spacy
%pip install spacy

In [None]:
# Download the English model
# find more models at https://spacy.io/models
!python -m spacy download en_core_web_md

In [None]:
import en_core_web_md
import spacy
from spacy import displacy

# Load the spaCy model into the variable 'nlp'
nlp = en_core_web_md.load()

In [None]:
doc = nlp("This is a new sentence with a special meaning. Now I'm on a new Laptop, P43s!")
for token in doc:
    print(
        token.text, # string of the token
        token.lemma_, # base form of the token
        token.pos_, # part of speech
        token.tag_, # detailed part of speech
        token.dep_, # syntactic dependency
        token.shape_, # word shape
        token.is_alpha, # is the token an alpha character
        token.is_stop # is the token a stop word
    )

In [None]:
spacy.explain("attr")

In [None]:
displacy.render(doc, style="dep")

## Named Entity Recognition (NER)

#### Documentation
* https://spacy.io/api/annotation#named-entities

In [None]:
from spacy import displacy
import en_core_web_md

nlp = en_core_web_md.load()

In [None]:
doc = nlp("When Sebastian Thrun started working on self-driving cars at Google in 2007,\
          few people outside of the company took him seriously.")
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

In [None]:
displacy.render(doc, style="ent")