## Installation and Setup

Installation is a two-step process. First, install spaCy using either conda or pip. Next, download the specific model you want, based on language.

### 1. From the command line or terminal:
> `conda install -c conda-forge spacy`
> <br>*or*<br>
> `pip install -U spacy`

> ### Alternatively you can create a virtual environment:
> `conda create -n spacyenv python=3.6`

### 2. Next, also from the command line (you must run this as admin or use sudo):

> `python -m spacy download en`

> ### If successful, you should see a message like:

> **`Linking successful`**<br>
> `    C:\Anaconda3\envs\spacyenv\lib\site-packages\en_core_web_sm -->`<br>
> `    C:\Anaconda3\envs\spacyenv\lib\site-packages\spacy\data\en`<br>
> ` `<br>
> `    You can now load the model via spacy.load('en')`


In [1]:
# Import spaCy and load the language library
import spacy

In [2]:
spacy.__version__

'3.2.4'

In [3]:
nlp = spacy.load('en_core_web_sm')

In [4]:
# Create a Doc object
doc = nlp('Tesla is looking at buying U.S. startup for $6 million')
doc2 = nlp("आपको बता दें कि देव नाम का अर्थ भगवान, राजा, प्रकाश, स्वर्गीय, बादल होता है। देव नाम का खास महत्व है क्योंकि इसका मतलब भगवान, राजा, प्रकाश, स्वर्गीय")

In [5]:
dir(doc)

['_',
 '__bytes__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__pyx_vtable__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__unicode__',
 '_bulk_merge',
 '_context',
 '_get_array_attrs',
 '_realloc',
 '_vector',
 '_vector_norm',
 'cats',
 'char_span',
 'copy',
 'count_by',
 'doc',
 'ents',
 'extend_tensor',
 'from_array',
 'from_bytes',
 'from_dict',
 'from_disk',
 'from_docs',
 'get_extension',
 'get_lca_matrix',
 'has_annotation',
 'has_extension',
 'has_unknown_spaces',
 'has_vector',
 'is_nered',
 'is_parsed',
 'is_sentenced',
 'is_tagged',
 'lang',
 'lang_',
 'mem',
 'noun_chunks',
 'noun_chunks_iterator',
 'remove_extension',
 'retokenize',
 'sentiment',
 'sents',
 'set_ents',
 'set_

In [6]:
doc.text

'Tesla is looking at buying U.S. startup for $6 million'

In [7]:
for i in doc:
    print(i.text, i.pos, i.pos_, i.lang, i.lang_)

Tesla 96 PROPN 14626626061804382878 en
is 87 AUX 14626626061804382878 en
looking 100 VERB 14626626061804382878 en
at 85 ADP 14626626061804382878 en
buying 100 VERB 14626626061804382878 en
U.S. 96 PROPN 14626626061804382878 en
startup 100 VERB 14626626061804382878 en
for 85 ADP 14626626061804382878 en
$ 99 SYM 14626626061804382878 en
6 93 NUM 14626626061804382878 en
million 93 NUM 14626626061804382878 en


In [8]:
list(doc.sents)

[Tesla is looking at buying U.S. startup for $6 million]

In [9]:
# Print each token separately
for token in doc:
    print(token.text, token.pos, token.pos_, token.dep_)
    

# .pos: part of speach number
# .pos_: part of speach name
# .dep_: syntactic dependencies 

Tesla 96 PROPN nsubj
is 87 AUX aux
looking 100 VERB ROOT
at 85 ADP prep
buying 100 VERB pcomp
U.S. 96 PROPN dobj
startup 100 VERB dep
for 85 ADP prep
$ 99 SYM quantmod
6 93 NUM compound
million 93 NUM pobj


In [23]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x1b9a643c9a8>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x1b9a643cc48>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x1b9a6184898>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x1b9a6428b48>),
 ('lemmatizer', <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x1b9a6428988>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x1b9a61849e8>)]

In [None]:
# stemming
happiness -> happy
loved -> love
working -> work

# lammetization

worked -> work
mice -> mouse
enjoing -> enjoy

In [None]:
nlp.pipe_names


## Tokenization

In [25]:
"hi my name     is dev".split()

['hi', 'my', 'name', 'is', 'dev']

In [31]:
doc2 = nlp(u"Tesla isn't         looking into startups anymore.")

for token in doc2:
    print(token.text, token.pos_, token.dep_)

Tesla PROPN nsubj
is AUX ROOT
n't PART neg
         SPACE attr
looking VERB advcl
into ADP prep
startups NOUN pobj
anymore ADV advmod
. PUNCT punct


In [27]:
a = "Tesla isn't   looking into startups anymore."

In [28]:
a.split()

['Tesla', "isn't", 'looking', 'into', 'startups', 'anymore.']

In [None]:
doc2

In [29]:
doc2[0]

Tesla

In [30]:
type(doc2)

spacy.tokens.doc.Doc

___
## Part-of-Speech Tagging (POS)
For a full list of POS Tags visit https://spacy.io/api/annotation#pos-tagging

In [None]:
doc2[0].pos_


## Dependencies

For a full list of Syntactic Dependencies visit https://spacy.io/api/annotation#dependency-parsing


In [None]:
doc2[0].dep_

To see the full name of a tag use `spacy.explain(tag)`

In [None]:
spacy.explain('PROPN')

In [None]:
spacy.explain('nsubj')

___
## Additional Token Attributes


|Tag|Description|doc2[0].tag|
|:------|:------:|:------|
|`.text`|The original word text<!-- .element: style="text-align:left;" -->|`Tesla`|
|`.lemma_`|The base form of the word|`tesla`|
|`.pos_`|The simple part-of-speech tag|`PROPN`/`proper noun`|
|`.tag_`|The detailed part-of-speech tag|`NNP`/`noun, proper singular`|
|`.shape_`|The word shape – capitalization, punctuation, digits|`Xxxxx`|
|`.is_alpha`|Is the token an alpha character?|`True`|
|`.is_stop`|Is the token part of a stop list, i.e. the most common words of the language?|`False`|

In [None]:
# Lemmas (the base form of the word):
print(doc2[4].text)
print(doc2[4].lemma_)

In [None]:
# Simple Parts-of-Speech & Detailed Tags:
print(doc2[4].pos_)
print(doc2[4].tag_ + ' / ' + spacy.explain(doc2[4].tag_))

In [None]:
# Word Shapes:
print(doc2[0].text+': '+doc2[0].shape_)
print(doc[5].text+' : '+doc[5].shape_)

In [None]:
# Boolean Values:
print(doc2[0].is_alpha)
print(doc2[0].is_stop)


## Spans


In [None]:
doc3 = nlp(u'Although commmonly attributed to John Lennon from his song "Beautiful Boy", \
the phrase "Life is what happens to us while we are making other plans" was written by \
cartoonist Allen Saunders and published in Reader\'s Digest in 1957, when Lennon was 17.')

In [None]:
life_quote = doc3[16:30]
print(life_quote)

In [None]:
type(life_quote)

In [None]:
type(doc3)


## Sentences


In [None]:
doc4 = nlp(u'This is the first sentence. This is another sentence. This is the last sentence.')

In [None]:
doc4[0]

In [None]:
for sent in doc4.sents:
    print(sent)

In [None]:
doc4[6]

In [None]:
doc4[6].is_sent_start

In [None]:
doc4[7]

In [None]:
doc4[7].is_sent_start