# SpaCy

Spacy is an open source platform available in Python for Natural Language Processing. Its speed, comprehensiveness, and thorough documentation make it a strong choice for both industry and academia. 

### install

```
pip3 install spacy
```

### import

In [1]:
import spacy

### data

In [2]:
x = spacy.load('en')

### languages

In [3]:
parser = spacy.en.English()

In [4]:
spacy.de.German()

<spacy.de.German at 0x11a5b8240>

In [5]:
spacy.fr.French()

<spacy.fr.French at 0x1825e81d0>

In [6]:
spacy.es.Spanish()

<spacy.es.Spanish at 0x10c5fcb38>

In [7]:
spacy.it.Italian()

<spacy.it.Italian at 0x10947f898>

In [8]:
spacy.pt.Portuguese()

<spacy.pt.Portuguese at 0x11bf87940>

In [9]:
spacy.nl.Dutch()

<spacy.nl.Dutch at 0x10947f978>

In [10]:
spacy.sv.Swedish()

<spacy.sv.Swedish at 0x11bfd4128>

In [11]:
spacy.fi.Finnish()

<spacy.fi.Finnish at 0x11bfd4ba8>

In [12]:
spacy.hu.Hungarian()

<spacy.hu.Hungarian at 0x11bfd4550>

In [13]:
spacy.bn.Bengali()

<spacy.bn.Bengali at 0x1829416d8>

In [14]:
spacy.he.Hebrew()

<spacy.he.Hebrew at 0x184656a58>

In [15]:
spacy.zh.Chinese()

<spacy.zh.Chinese at 0x184656a90>

### specific data

In [16]:
parser.vocab['NASA']
parser.vocab['apple']
parser.vocab['UNK']

<spacy.lexeme.Lexeme at 0x10d46ac18>

### loading parent doc

In [17]:
x = x("Hello, I like to program. My favorite language is Python.")

### parent doc type

In [18]:
x[0].lang_

'en'

### sentences

In [19]:
for i in x.sents:
    print(i)

Hello, I like to program.
My favorite language is Python.
<class 'generator'>


### lower

In [189]:
x[0].orth_

'Hello'

In [190]:
x[0].lower_

'hello'

### prefix

In [191]:
x[0].prefix_

'H'

### suffix

In [192]:
x[0].suffix_

'llo'

### shape

In [193]:
x[0].shape_

'Xxxxx'

### log probability

In [194]:
x[0].prob

-11.369197845458984

### sentiment

In [231]:
x.sentiment

0.0

### brown cluster ID

In [195]:
x[0].cluster

1726

### vectors

In [196]:
king = x.vocab['king'].vector

### lemmatizing

In [20]:
for i in x:
    print(i,":",i.lemma_)
print(x[0].lemma_)

Hello : hello
, : ,
I : -PRON-
like : like
to : to
program : program
. : .
My : -PRON-
favorite : favorite
language : language
is : be
Python : python
. : .
hello


### parts of speech

In [198]:
for i in x:
    print(i,":",i.pos_)

Hello : INTJ
, : PUNCT
I : PRON
like : VERB
to : PART
program : VERB
. : PUNCT
My : ADJ
favorite : ADJ
language : NOUN
is : VERB
Python : PROPN
. : PUNCT


### entity types

In [207]:
x[0].ent_type

0

In [205]:
x[0].ent_iob_

'O'

0 = no tag is assigned. <br>
1 = `I` = inside an entity. <br>
2 = `O` = no tag is assigned. <br>
3 = `B` = begins an entity.

### PoS string

In [199]:
x[0].dep_

'intj'

### entities

In [201]:
for i in x.ents:
    print(i,i.label_)

Python PERSON


### nounphrases

In [21]:
for i in x.noun_chunks:
    print(i)

I
My favorite language
Python


### similarity

In [22]:
print(x[5],x[9])
x[5].similarity(x[9])

program language


0.3588502505070657

### dependency trees

In [23]:
for i in x.sents:
    print(i.root)
    print(list(i.root.children))

like
[Hello, ,, I, program, .]
is
[language, Python, .]


### performance review

![spacy vs nltk image.png](attachment:image.png)

### matchers

In [24]:
x = spacy.load('en')
x = spacy.load('en')
spacy.matcher.Matcher(x.vocab)

### entities

In [None]:
matcher.add_entity(
    "GoogleNow",
)

### patterns

In [None]:
from spacy.attrs import ORTH

In [None]:
matcher.has_entity(LOWER)

In [None]:
matcher.add_pattern(
    "GoogleNow", 
    [{ORTH: "Google"},
    {ORTH: "Now"}],
    label=None
)

In [27]:
spacy.attrs.ORTH
spacy.attrs.LOWER
spacy.attrs.POS
spacy.attrs.IS_ALPHA

1

### third party modules

sense2vec <br>
displaCy <br>
textacy <br>
spacyr <br>