## <p style = 'text-align: center'>Intoduction to SpaCy</p>
---

In [38]:
import spacy

Spacy has several language models available, including advanced German and Chinese implementations.

- English : *en_core_web_sm*
- Spanish : *es_core_news_sm*
- German : *de_core_news_sm*
- French : *fr_core_news_sm*
- Dutch : *nl_core_news_sm*

In [None]:
nlp = spacy.load('en_core_web_sm')

We load a new document by passsing a string into the NLP variable

In [25]:
text = "Apple Inc. is based in Cupertino, California, and it was founded by Steve Jobs."
doc = nlp(text)

In spaCy, the Doc object, which represents a processed text, provides access to various attributes and properties that allow you to access linguistic annotations and information about the text. Here are some of the commonly used attributes and properties of the Doc object:

**text**: The original text of the document.

**ents**: A list of named entities found in the text.

**sents**: A list of sentence objects in the document.

**tokens**: A list of token objects, where each token represents a word or punctuation mark in the text.

**noun_chunks**: A list of noun chunks or phrases in the text.

**vector**: The document's vector representation, if a word vectors model is available.

**vector_norm**: The L2 norm of the document's vector.

**is_parsed**: A Boolean value indicating whether the text has been syntactically parsed.

**is_tagged**: A Boolean value indicating whether part-of-speech tagging has been performed.

**is_nered**: A Boolean value indicating whether named entity recognition (NER) has been performed.

**has_annotation**: A Boolean value indicating whether the document has any linguistic annotations.

**user_data**: A dictionary where custom data can be stored.

**vocab**: The vocabulary of the language model used for tokenization and linguistic analysis.

**lang**: The language of the document.

**cats**: The document's category labels if text classification is performed.

**similarity()**: A method for computing the similarity between two documents.

In [43]:
for te in doc.noun_chunks:
    print(f"{te.text} : {te.label_}")

Apple Inc. : NP
Cupertino : NP
California : NP
it : NP
Steve Jobs : NP


In [45]:
doc.ents

(Apple Inc., Cupertino, California, Steve Jobs)

In [36]:
for entity in doc.ents:
    print(f'{entity.text} : {entity.label_}')

Apple Inc. : ORG
Cupertino : GPE
California : GPE
Steve Jobs : PERSON
