# Part 1:  Advanced Natural Language Processing (NLP)

Introduction to NLP:
- Natural Language Processing (NLP) is a field at the intersection of computer science, artificial intelligence, and linguistics. It involves understanding, interpreting, and manipulating human language by machines. NLP enables computers to read text, hear speech, interpret it, measure sentiment, and determine which parts are important.

Core Concepts and Techniques:
- Tokenization: The process of breaking text into individual terms or symbols.
- Part-of-Speech Tagging: Identifies each word's part of speech (e.g., noun, verb, adjective) based on its definition and context.
- Named Entity Recognition (NER): Locates and classifies named entities mentioned in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
- Dependency Parsing: Analyzes the grammatical structure of a sentence to establish relationships between "head" words and words which modify those heads.
- Lemmatization and Stemming: Techniques to reduce words to their root form. Lemmatization considers the context and converts the word to its meaningful base form, whereas stemming chops off the ends of words in the hope of achieving this goal correctly.
- Word Embeddings: Representations of text in an n-dimensional space where words that have the same meaning have a similar representation. Popular methods include Word2Vec and GloVe.

Advanced Techniques:
- Sentiment Analysis: Determines the attitude or emotion of the writer, i.e., whether it's positive, negative, or neutral.
- Topic Modeling: Identifies the topics present in a text corpus, useful for document classification, summarization, and understanding the thematic structure of the text.

Python Libraries for NLP:
- `NLTK (Natural Language Toolkit)`: A leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources.
- `spaCy`: An industrial-strength NLP library that offers robust tools for performing complex NLP tasks like parsing, tagging, NER, and more.

Applications of NLP:
- From chatbots and virtual assistants to sentiment analysis in social media monitoring, NLP technologies are a cornerstone of artificial intelligence, profoundly impacting industries by providing insights from vast amounts of unstructured data.

Challenges and Considerations:
- Ambiguity and Diversity: Natural language is inherently ambiguous and diverse. Contextual nuances, sarcasm, idioms, and cultural differences present significant challenges for NLP systems.
- Resource Intensiveness: Processing and understanding large volumes of text can be computationally intensive, requiring advanced algorithms and substantial hardware resources.



# Part 2: Follow Me

Install necessary packages using pip
pip install nltk
pip install spacy

In [1]:
#Install Packages
#!pip install nltk
#!pip install spacy

In [2]:
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
import spacy
from spacy import displacy

In [3]:
# Downloading necessary NLTK resources and spaCy model
nltk.download('vader_lexicon')
spacy.cli.download("en_core_web_sm")

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [4]:
# Load spaCy English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")

In [5]:
# Sentiment Analysis with NLTK
# Initialize the VADER sentiment intensity analyzer
sia = SentimentIntensityAnalyzer()

In [6]:
# Example text for sentiment analysis
text = "NLTK is a leading platform for building Python programs to work with human language data."

In [7]:
# Obtain polarity scores for the text
polarity_scores = sia.polarity_scores(text)
print("Sentiment Analysis Results:")
for score in polarity_scores:
    print(f"{score}: {polarity_scores[score]}")

Sentiment Analysis Results:
neg: 0.0
neu: 1.0
pos: 0.0
compound: 0.0


In [8]:
# Topic Modeling with spaCy
# Example text for topic modeling
doc = nlp("spaCy is an industrial-strength natural language processing library.")

In [9]:
# Print named entities found in the document
print("\nNamed Entities, Phrases, and Concepts:")
for ent in doc.ents:
    print(f"{ent.text} ({ent.label_})")


Named Entities, Phrases, and Concepts:


In [10]:
# Visualization of Dependency Parsing
# Render the dependency parse in console-friendly format
print("\nDependency Parsing Visualization:")
displacy.render(doc, style='dep', options={'compact': True, 'bg': 'ghostwhite', 'color': '#000000'})


Dependency Parsing Visualization:


In [11]:
# Additional spaCy capabilities demonstration
# Extracting noun phrases for more detailed analysis
print("\nNoun Phrases:")
for np in doc.noun_chunks:
    print(np.text)


Noun Phrases:
an industrial-strength natural language processing library


In [12]:
# Advanced usage of spaCy for linguistic features
print("\nToken-level analysis (Lemma, POS, Tag, Dep, Shape):")
for token in doc:
    print(f"{token.text} ({token.lemma_}, {token.pos_}, {token.tag_}, {token.dep_}, {token.shape_})")


Token-level analysis (Lemma, POS, Tag, Dep, Shape):
spaCy (spacy, INTJ, UH, nsubj, xxxXx)
is (be, AUX, VBZ, ROOT, xx)
an (an, DET, DT, det, xx)
industrial (industrial, ADJ, JJ, amod, xxxx)
- (-, PUNCT, HYPH, punct, -)
strength (strength, NOUN, NN, nmod, xxxx)
natural (natural, ADJ, JJ, amod, xxxx)
language (language, NOUN, NN, compound, xxxx)
processing (processing, NOUN, NN, compound, xxxx)
library (library, NOUN, NN, attr, xxxx)
. (., PUNCT, ., punct, .)


# Part 3: Apply Your Skills - Advanced NLP Tasks

In this part of the course, you will apply the skills you've learned from the previous section to perform advanced NLP tasks using NLTK and spaCy. This assignment encourages you to explore text analysis further by engaging in sentiment analysis and topic modeling with your own selected examples.

## Setup:
- Ensure you have NLTK and spaCy installed and configured with the necessary resources, as covered in Part 2.

## Sentiment Analysis with NLTK:
- Select a text of your choice, perhaps from a recent article, book, or a social media post.
- Utilize the VADER sentiment intensity analyzer from NLTK to determine the sentiment expressed in your selected text.
- Document the polarity scores and interpret what they imply about the text’s sentiment.

## Topic Modeling with spaCy:
- Choose a different text that offers potential for rich entity recognition and topic analysis.
- Use spaCy to identify named entities and analyze their roles within the text.
- Attempt to identify themes or topics prevalent in the text based on the entities and noun phrases extracted.

## Advanced Analysis:
- Explore further linguistic features using spaCy, such as dependency parsing and token analysis, to gain deeper insights into the grammatical structure and usage of words in your chosen text.
- Visualize the dependency parse of sentences in your text to better understand their syntactic structure.

## Instructions:
1. Choose two different texts for analysis: one for sentiment analysis with NLTK and another for topic modeling with spaCy.
2. Perform sentiment analysis on the first text, noting the overall sentiment and specific polarity scores.
3. Conduct topic modeling on the second text, identifying key entities and themes.
4. Use spaCy’s visualization tools to graphically represent the dependency parsing of your topic modeling text.
5. Compile your steps and insights into the Jupyter notebook and submit it as your completed assignment.