**SpaCy Package:** spaCy is a free, open-source library for NLP in Python written in Cython. spaCy is designed to make it easy to build systems

In [None]:
#install spacy package
!pip install spacy



In [None]:
 import spacy
 !python -m spacy download en_core_web_sm
 nlp = spacy.load("en_core_web_sm")

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m84.0 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


**("en_core_web_sm") :** Loads the small version of Spacy's English model. This model is capable of tokenization, part-of-speech tagging, named entity recognition, dependency parsing, and more.

In [None]:
doc = nlp("It is a shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult.")

In [None]:
# Tokenization
for token in doc:
  print(token.text)

It
is
a
shocking
to
find
how
many
people
do
not
believe
they
can
learn
,
and
how
many
more
believe
learning
to
be
difficult
.


In [None]:
# find the word which is on 10th position
token = doc[10]
token

not

In [None]:
# find length of words
len(doc)

26

In [None]:
# display the words from 18th- 25th position
token = doc[18:26]
token

many more believe learning to be difficult.

In [None]:
#find indexes of each words
for token in doc:
  print(token.i, token.text)

0 It
1 is
2 a
3 shocking
4 to
5 find
6 how
7 many
8 people
9 do
10 not
11 believe
12 they
13 can
14 learn
15 ,
16 and
17 how
18 many
19 more
20 believe
21 learning
22 to
23 be
24 difficult
25 .


In [None]:
#pos
for token in doc:
  print( token.i,token.text,token.pos_)

0 It PRON
1 is AUX
2 a DET
3 shocking NOUN
4 to PART
5 find VERB
6 how SCONJ
7 many ADJ
8 people NOUN
9 do AUX
10 not PART
11 believe VERB
12 they PRON
13 can AUX
14 learn VERB
15 , PUNCT
16 and CCONJ
17 how SCONJ
18 many ADJ
19 more ADJ
20 believe VERB
21 learning VERB
22 to PART
23 be AUX
24 difficult ADJ
25 . PUNCT


**NER**

In [None]:
# Sample text
text =  "Apple is looking at buying U.K. startup for $1 billion."

# Process the text
doc = nlp(text)

# Apply NER and print entities
for ent in doc.ents:
    print(ent.text, ent.label_)


Apple ORG
U.K. GPE
$1 billion MONEY


**Matcher :** it lets you find words and phrases using rules describing their token attributes.

In [None]:
!pip install matchers
from spacy.matcher import Matcher


Collecting matchers
  Downloading matchers-0.22.tar.gz (7.0 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pyHamcrest>=1.7.1 (from matchers)
  Downloading pyhamcrest-2.1.0-py3-none-any.whl.metadata (15 kB)
Downloading pyhamcrest-2.1.0-py3-none-any.whl (54 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.6/54.6 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: matchers
  Building wheel for matchers (setup.py) ... [?25l[?25hdone
  Created wheel for matchers: filename=matchers-0.22-py3-none-any.whl size=5750 sha256=046f41c377dea8e8cab45f88276e9703b17a9e114669eb1072b19fc72f4a4b2b
  Stored in directory: /root/.cache/pip/wheels/41/2a/a9/03f6aeb3fba24681307de77d6257d0630599b98305244b2cda
Successfully built matchers
Installing collected packages: pyHamcrest, matchers
Successfully installed matchers-0.22 pyHamcrest-2.1.0


In [None]:
import spacy
from spacy.matcher import Matcher

# Load the small English model
nlp = spacy.load("en_core_web_sm")

# Sample text
text = "Apple is looking at buying U.K. startup for $1 billion."
doc = nlp(text)

# Initialize the matcher with the shared vocab
matcher = Matcher(nlp.vocab)

# Define a pattern to match 'Apple'
pattern = [{"TEXT": "startup"}]

# Add the pattern to the matcher
matcher.add("buy_PATTERN", [pattern])

# Apply the matcher to the doc
matches = matcher(doc)

# Print the match results
for match_id, start, end in matches:
    matched_span = doc[start:end]
    print(matched_span.text)


startup


In [None]:
#exercise for pattern matching:
doc = nlp("2018 FiFa World Cup: France won!")

In [None]:

pattern =[{'IS_DIGIT':True},{'LOWER':'fifa'},{'LOWER':'world'}]
matcher2 = Matcher(nlp.vocab)
matcher2.add('FiFa_pattern',[pattern])
matches = matcher2(doc)


In [None]:
for match_id,start,end in matches:
  matched_span = doc[start:end]
  print(matched_span.text)



2018 FiFa World
