# Introduction to spaCy

[spaCy](https://spacy.io) is a free, open-source library for advanced Natural Language Processing (NLP) in python.

spaCy is designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning.

This library features

* Tokenization  - Segmenting text into words, punctuations, etc
* Part-of-speech Tagging - Assigning word types to tokens (verbs, noun, etc)
* Dependency Parsing  - Assigning dependency labels (describing relationships between tokens)
* Lemmatization - Assigning base forms of words
* Named Entity Recognition (NER) - Labelling named "real-world" objects like Persons, Companies, locations
* Similarity - Comparing words, text spans and documents and how similar are they to each other
* Text Classification - Assigning categories or labels to whole document or parts of a document
* Rule-based Matching - Finding sequences of tokens based on their texts and annotations
* Training - Updating and improving a statistical model's predictions
* Support for 55+ languages
* Easy deep learning integration
* Built-in visualizers for syntax and NER

They even have an excellent [course](https://course.spacy.io/) to try out their library.


In [26]:
import spacy
from spacy import displacy
import pandas as pd

In [2]:
nlp = spacy.load('en_core_web_lg')

In [3]:
doc = nlp('Apple is looking to buy UK startup for $1 billion')

In [24]:
for ent in doc.ents:
    print("{:<20}{:<20}".format(ent.text, ent.label_))

Apple               ORG                 
UK                  GPE                 
$1 billion          MONEY               


In [25]:
for token in doc:
    print("{:<12}{:<10}{:<10}".format(token.text, token.pos_, token.dep_))

Apple       PROPN     nsubj     
is          AUX       aux       
looking     VERB      ROOT      
to          PART      aux       
buy         VERB      xcomp     
UK          PROPN     compound  
startup     NOUN      dobj      
for         ADP       prep      
$           SYM       quantmod  
1           NUM       compound  
billion     NUM       pobj      


In [22]:
displacy.render(doc, style="dep", jupyter=True)

In [21]:
displacy.render(doc, style="ent", jupyter=True)

In [30]:
tags = [(word, word.tag_, word.pos_, word.lemma_) for word in doc]
pd.DataFrame(tags).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,Apple,is,looking,to,buy,UK,startup,for,$,1,billion
1,NNP,VBZ,VBG,TO,VB,NNP,NN,IN,$,CD,CD
2,PROPN,AUX,VERB,PART,VERB,PROPN,NOUN,ADP,SYM,NUM,NUM
3,Apple,be,look,to,buy,UK,startup,for,$,1,billion
