<a href="https://colab.research.google.com/github/ranamaddy/NLP/blob/main/2_NLP_BASIC_(_Language_Understanding_)with_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Language Understanding**

NLTK can be used for Language Understanding, which is a subfield of Natural Language Processing (NLP) that focuses on enabling computers to understand human language.

NLTK provides a range of tools and resources for language understanding tasks such as part-of-speech tagging, named entity recognition, dependency parsing, and sentiment analysis.

For example, NLTK's pos_tag() function can be used to tag the parts of speech in a sentence, which can be useful for understanding the grammatical structure of the text. NLTK's ne_chunk() function can be used for named entity recognition, which can help identify and extract entities such as people, organizations, and locations from the text. NLTK also provides pre-trained models for sentiment analysis, which can be used to classify the sentiment of a given text as positive, negative, or neutral.

While NLTK is a powerful tool for language understanding, it may not always be the best option for more complex tasks or large-scale applications. However, it can be a good starting point for learning and experimenting with language understanding techniques in NLP.

**NLTK can be used** for Language Understanding, which is a subfield of Natural Language Processing (NLP) that focuses on enabling computers to understand human language. NLTK provides a range of tools and resources for language understanding tasks such as part-of-speech tagging, named entity recognition, dependency parsing, and sentiment analysis.

Here are some examples of how to use NLTK for Language Understanding:

**Part-of-speech (POS)** tagging: The pos_tag() function in NLTK can be used to tag the parts of speech in a sentence. For example:

In this example, we first downloaded the required resource for POS tagging using **nltk.download()**. Then, we defined a simple text string, tokenized it using **word_tokenize()**, and tagged the parts of speech using pos_tag(). The output is a list of tuples, where each tuple contains a token and its corresponding POS tag.

In [3]:
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')

text = "John saw the blue car."
tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)
print(pos_tags)


[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


[('John', 'NNP'), ('saw', 'VBD'), ('the', 'DT'), ('blue', 'JJ'), ('car', 'NN'), ('.', '.')]


**Named Entity Recognition (NER):** The ne_chunk() function in NLTK can be used for named entity recognition, which can help identify and extract entities such as people, organizations, and locations from the text. For example:

In this example, we first downloaded the required resources for named entity recognition using **nltk.download()**. Then, we defined a simple text string, tokenized it and tagged the parts of speech using**pos_tag()**. Finally, we applied **ne_chunk()** to the POS tags to extract named entities. The output is a tree structure, where each leaf node represents a token and its entity label.

In [4]:
import nltk
nltk.download('maxent_ne_chunker')
nltk.download('words')

text = "Barack Obama was born in Hawaii."
tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)
ne_tags = nltk.ne_chunk(pos_tags)
print(ne_tags)


[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker.zip.


(S
  (PERSON Barack/NNP)
  (PERSON Obama/NNP)
  was/VBD
  born/VBN
  in/IN
  (GPE Hawaii/NNP)
  ./.)


[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


**Sentiment analysis:** NLTK also provides pre-trained models for sentiment analysis, which can be used to classify the sentiment of a given text as positive, negative, or neutral. For example:

In this example, we downloaded the required resource for sentiment analysis using **nltk.download().** Then, we defined a simple text string and used the **SentimentIntensityAnalyzer()**class to analyze its sentiment. The output is a dictionary of scores, where pos and neg indicate the proportions of positive and negative sentiment in the text, neu indicates the proportion of neutral sentiment, and compound provides an overall sentiment

 score between -1 (most negative) and 1 (most positive).

In [5]:
import nltk
nltk.download('vader_lexicon')

from nltk.sentiment import SentimentIntensityAnalyzer

text = "I love this movie! It's so funny and entertaining."
analyzer = SentimentIntensityAnalyzer()
scores = analyzer.polarity_scores(text)
print(scores)


{'neg': 0.0, 'neu': 0.305, 'pos': 0.695, 'compound': 0.9081}


[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


**===============================================================================================**

# **Language Understanding BY spaCy**

Python libraries such as spaCy provide advanced Language Understanding capabilities for Natural Language Processing (NLP) tasks. spaCy is an open-source library that provides a range of features for efficient and accurate language processing, including tokenization, part-of-speech tagging, named entity recognition, dependency parsing, and semantic similarity.

Here are some examples of how to use spaCy for Language Understandin


**Tokenization: spaCy's** tokenizer can be used to split text into individual tokens, which are the basic units of language processing. For example:

In this example, we first loaded spaCy's pre-trained English language model using **spacy.load().** Then, we defined a simple text string and created a Doc object using the nlp() function. Finally, we looped through each token in the Doc object and printed its text using token.text.

In [6]:
import spacy

nlp = spacy.load("en_core_web_sm")

text = "The quick brown fox jumped over the lazy dog."
doc = nlp(text)
for token in doc:
    print(token.text)




The
quick
brown
fox
jumped
over
the
lazy
dog
.


**Named Entity Recognition (NER): spaCy's** EntityRecognizer can be used for named entity recognition, which can help identify and extract entities such as people, organizations, and locations from the text. For example:

In this example, we first loaded spaCy's pre-trained English language model using **spacy.load().** Then, we defined a simple text string and created a Doc object using the nlp() function. Finally, we looped through each named entity in the Doc object and printed its text and entity label using ent.text and ent.label_

In [7]:
import spacy

nlp = spacy.load("en_core_web_sm")

text = "Barack Obama was born in Hawaii."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.label_)


Barack Obama PERSON
Hawaii GPE


**Semantic similarity: spaCy** provides a similarity() method for comparing the similarity between two documents or spans based on their semantic meaning. For example

In this example, we first loaded spaCy's pre-trained English language model using **spacy.load()**. Then, we defined two simple text strings and created Doc objects for each using the **nlp()** function. Finally, we used the **similarity()** method to compare the semantic similarity between the two documents, with the output being a similarity score between 0 (completely dissimilar) and 1 (identical)

In [8]:
import spacy

nlp = spacy.load("en_core_web_sm")

doc1 = nlp("The cat sat on the mat.")
doc2 = nlp("The dog slept on the rug.")
similarity = doc1.similarity(doc2)
print(similarity)


0.8641852424995727


  similarity = doc1.similarity(doc2)


**Dependency Parsing**: Dependency parsing is the process of analyzing the grammatical structure of a sentence by identifying the relationships between words. spaCy provides a DependencyParser that can be used to perform dependency parsing on a text. For example:

In this example, we first loaded spaCy's pre-trained English language model using **spacy.load()**. Then, we defined a simple text string and created a Doc object using the **nlp()** function. Finally, we looped through each token in the Doc object and printed its text, dependency label, head word, head word POS tag, and children using the **token.text, token.dep_, token.head.text, token.head.pos_, and token.children attributes**.

This example shows how to extract the grammatical structure of a sentence using dependency parsing. The output includes the text of each token, its dependency label (e.g., subject, object, etc.), its head word (i.e., the token it depends on), its head word POS tag, and its children (i.e., the tokens that depend on it).

In [9]:
import spacy

nlp = spacy.load("en_core_web_sm")

text = "I ate pizza with a fork."
doc = nlp(text)

for token in doc:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
          [child for child in token.children])


I nsubj ate VERB []
ate ROOT ate VERB [I, pizza, with, .]
pizza dobj ate VERB []
with prep ate VERB [fork]
a det fork NOUN []
fork pobj with ADP [a]
. punct ate VERB []
