<a href="https://colab.research.google.com/github/LonelyFriday/spacyBasic/blob/main/spacyBasic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Overview




*   Load an English language model.
*   Process a given text to analyze its structure and content.
*   Perform tokenization to break the text into tokens (words and punctuation).
*   Apply part-of-speech (POS) tagging to each token.
* Execute dependency parsing to understand the grammatical structure.
* Identify named entities (like companies, locations, amounts) within the text.






In [None]:
import spacy

# Load English tokenizer, tagger, parser, NER, and word vectors
nlp = spacy.load("en_core_web_sm")

# Process a text
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)

# Tokenization, POS tagging, and Dependency Parsing
print("Token\t\tLemma\t\tPOS\t\tTag\t\tDep\t\tShape\t\tis_alpha\tis_stop")
for token in doc:
    print(f"{token.text}\t\t{token.lemma_}\t\t{token.pos_}\t\t{token.tag_}\t\t{token.dep_}\t\t{token.shape_}\t\t{token.is_alpha}\t\t{token.is_stop}")

# Named Entity Recognition
print("\nNamed Entities:")
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)


Token		Lemma		POS		Tag		Dep		Shape		is_alpha	is_stop
Apple		Apple		PROPN		NNP		nsubj		Xxxxx		True		False
is		be		AUX		VBZ		aux		xx		True		True
looking		look		VERB		VBG		ROOT		xxxx		True		False
at		at		ADP		IN		prep		xx		True		True
buying		buy		VERB		VBG		pcomp		xxxx		True		False
U.K.		U.K.		PROPN		NNP		dobj		X.X.		False		False
startup		startup		NOUN		NN		dep		xxxx		True		False
for		for		ADP		IN		prep		xxx		True		True
$		$		SYM		$		quantmod		$		False		False
1		1		NUM		CD		compound		d		False		False
billion		billion		NUM		CD		pobj		xxxx		True		False

Named Entities:
Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY


#1. Loading a Language Model:

> The en_core_web_sm model is a small English model. You can load it as follows:



In [None]:
import spacy
nlp = spacy.load("en_core_web_sm")

#2. Processing Text:
> With the model loaded, you can process text documents. This step allows you to perform various NLP tasks on the text.

In [None]:
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

#3. Tokenization:
>This breaks up the text into individual words or tokens.

In [None]:
for token in doc:
    print(token.text)

Apple
is
looking
at
buying
U.K.
startup
for
$
1
billion


#4. Part-of-Speech Tagging:
>spaCy can also annotate tokens with their part of speech.

In [None]:
for token in doc:
    print(f'{token.text:<{10}}, {token.pos_:>{10}}')

Apple     ,      PROPN
is        ,        AUX
looking   ,       VERB
at        ,        ADP
buying    ,       VERB
U.K.      ,      PROPN
startup   ,       NOUN
for       ,        ADP
$         ,        SYM
1         ,        NUM
billion   ,        NUM


#5. Named Entity Recognition (NER):
>spaCy can recognize various entities in the text, such as companies or locations.

In [None]:
for ent in doc.ents:
    print(f'{ent.text:<{10}}, {ent.label_:>{10}}')

Apple     ,        ORG
U.K.      ,        GPE
$1 billion,      MONEY


#6. Dependency Parsing:
>You can analyze the grammatical structure of sentences.

In [None]:
for token in doc:
    print(f'{token.text:<{10}}, {token.dep_:<{10}}, {token.head.text:>{10}}')

Apple     , nsubj     ,    looking
is        , aux       ,    looking
looking   , ROOT      ,    looking
at        , prep      ,    looking
buying    , pcomp     ,         at
U.K.      , dobj      ,     buying
startup   , dep       ,    looking
for       , prep      ,    startup
$         , quantmod  ,    billion
1         , compound  ,    billion
billion   , pobj      ,        for
