<a href="https://colab.research.google.com/github/aimlresearcher/NLP/blob/main/Ex01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import nltk

In [2]:
text = "Ben relocated to Paris last year to pursue his passion. His currently enrolled in a comprehensive course on Natural Language Processing"
text

'Ben relocated to Paris last year to pursue his passion. His currently enrolled in a comprehensive course on Natural Language Processing'

### punkt
The punkt tokenizer is a powerful tool in the nltk library for splitting text into sentences and words. Its unsupervised learning approach allows it to handle a variety of text types and languages effectively. By training it on custom corpora, users can further enhance its accuracy for specific applications.

In [3]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [4]:
from nltk.tokenize import word_tokenize
tokens = word_tokenize(text)
print(tokens)

['Ben', 'relocated', 'to', 'Paris', 'last', 'year', 'to', 'pursue', 'his', 'passion', '.', 'His', 'currently', 'enrolled', 'in', 'a', 'comprehensive', 'course', 'on', 'Natural', 'Language', 'Processing']


In [5]:
# Stemming
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(token) for token in tokens]
print(stemmed_words)

['ben', 'reloc', 'to', 'pari', 'last', 'year', 'to', 'pursu', 'hi', 'passion', '.', 'hi', 'current', 'enrol', 'in', 'a', 'comprehens', 'cours', 'on', 'natur', 'languag', 'process']


In [6]:
nltk.download('averaged_perceptron_tagger')

# Part-of-speech tagging
from nltk import pos_tag
text_pos_tag = pos_tag(tokens)
print(text_pos_tag)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


[('Ben', 'NNP'), ('relocated', 'VBD'), ('to', 'TO'), ('Paris', 'NNP'), ('last', 'JJ'), ('year', 'NN'), ('to', 'TO'), ('pursue', 'VB'), ('his', 'PRP$'), ('passion', 'NN'), ('.', '.'), ('His', 'PRP$'), ('currently', 'RB'), ('enrolled', 'VBN'), ('in', 'IN'), ('a', 'DT'), ('comprehensive', 'JJ'), ('course', 'NN'), ('on', 'IN'), ('Natural', 'NNP'), ('Language', 'NNP'), ('Processing', 'NNP')]


In [7]:
nltk.download('tagsets')

nltk.help.upenn_tagset("NNP")

NNP: noun, proper, singular
    Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos
    Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA
    Shannon A.K.C. Meltex Liverpool ...


[nltk_data] Downloading package tagsets to /root/nltk_data...
[nltk_data]   Package tagsets is already up-to-date!


In [8]:
# Named Entity Recognition
nltk.download('maxent_ne_chunker')
nltk.download('words')
entities = nltk.ne_chunk(text_pos_tag)
print(entities)

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!


(S
  (PERSON Ben/NNP)
  relocated/VBD
  to/TO
  (GPE Paris/NNP)
  last/JJ
  year/NN
  to/TO
  pursue/VB
  his/PRP$
  passion/NN
  ./.
  His/PRP$
  currently/RB
  enrolled/VBN
  in/IN
  a/DT
  comprehensive/JJ
  course/NN
  on/IN
  (ORGANIZATION Natural/NNP Language/NNP)
  Processing/NNP)


In [9]:
import spacy
!python -m spacy download en_core_web_lg
# Ensure you have this model downloaded
# !python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl (587.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.7/587.7 MB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [10]:
# Load the English model
nlp = spacy.load('en_core_web_lg')

# Process words with the model
word1 = nlp("king")
word2 = nlp("queen")
word3 = nlp("apple")

# Calculate the similarities
similarity1 = word1.similarity(word2)
similarity2 = word1.similarity(word3)

# Display the similarities
print(f"Similarity between 'king' and 'queen': {similarity1:.2f}")
print(f"Similarity between 'king' and 'apple': {similarity2:.2f}")

Similarity between 'king' and 'queen': 0.61
Similarity between 'king' and 'apple': 0.20
