In [None]:
# This notebook demonstrates fundamental Natural Language Processing (NLP) tasks using popular Python libraries.
# Each exercise focuses on a specific NLP technique:
# 1.  **Tokenization** with NLTK
# 2.  **Named Entity Recognition (NER)** with SpaCy
# 3.  **Sentiment Analysis** with TextBlob
# 4.  **Text Summarization** with Sumy


### Explanation of NLP Concepts:

1.  **Tokenization** with NLTK: The process of breaking down a text into smaller units called tokens, which can be words, subwords, or characters. NLTK provides tools like `word_tokenize` for this task.

2.  **Named Entity Recognition (NER)** with SpaCy: A technique to identify and classify named entities in text into predefined categories such as person names, organizations, locations, expressions of times, quantities, monetary values, etc. SpaCy is a library widely used for this purpose.

3.  **Sentiment Analysis** with TextBlob: The computational study of opinions, sentiments, and emotions expressed in text. It determines the emotional tone behind a piece of text, often classifying it as positive, negative, or neutral. TextBlob offers a simple API for sentiment analysis.

4.  **Text Summarization** with Sumy: The process of condensing a longer text into a shorter, coherent, and fluent version while retaining the most important information and overall meaning of the original text. Sumy is a Python library that provides various summarization algorithms, like LSA.

## Import Libraries and Download Resources

In [14]:
import nltk
# nltk.download('punkt') # Required for NLTK's word tokenizer
# nltk.download('punkt_tab') # Required for some NLTK tokenizer functionalities (e.g., used by Sumy)
from nltk.tokenize import word_tokenize

import spacy

from textblob import TextBlob

## Install `sumy` library

In [85]:
# Install sumy if not already installed
!pip install sumy

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
nltk.download('stopwords') # Sumy often uses NLTK stopwords for summarization


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1000)>


False

## Exercise 1: Tokenization with NLTK

**Task**: Break down a given text into individual words or tokens.
**Library**: NLTK (Natural Language Toolkit) is a powerful library for working with human language data.

In [36]:
print("\n--- Exercise 1: Tokenization with NLTK ---")
text1 = "Natural Language Processing enables computers to understand human language.Hi!!!!"
tokens = word_tokenize(text1) # Uses NLTK's word_tokenize function to split the text
print(f"Original text: '{text1}'")
print(f"Tokens: {tokens}")


--- Exercise 1: Tokenization with NLTK ---
Original text: 'Natural Language Processing enables computers to understand human language.Hi!!!!'
Tokens: ['Natural', 'Language', 'Processing', 'enables', 'computers', 'to', 'understand', 'human', 'language.Hi', '!', '!', '!', '!']


### Challenge 1: Tokenization

**Task**: Experiment with different NLTK tokenizers. For example, try `wordpunct_tokenize` or `TreebankWordTokenizer` (after importing if necessary) and observe the differences in tokenization for the given text or a new sentence of your choice. Pay attention to how punctuation is handled.

**Hint**: You might need to import `nltk.tokenize.wordpunct_tokenize` or `nltk.tokenize.TreebankWordTokenizer`.

In [38]:
from nltk.tokenize import wordpunct_tokenize
tokens = wordpunct_tokenize(text1)
print(f"Original text: '{text1}'")
print(f"Tokens:{tokens}")


Original text: 'Natural Language Processing enables computers to understand human language.Hi!!!!'
Tokens:['Natural', 'Language', 'Processing', 'enables', 'computers', 'to', 'understand', 'human', 'language', '.', 'Hi', '!!!!']


In [55]:
from nltk.tokenize import TreebankWordTokenizer

text1 = "Natural Language Processing enables computers to understand human language."

# 1. Tokenize into words (TreebankWordTokenizer requirement)
tokenizer = TreebankWordTokenizer()
wordTokens = tokenizer.tokenize(text1)

# 2. Remove all spaces between words (join words together)
noSpaceText = "".join(wordTokens)

# 3. Add spaces between every character
detokenizedText = " ".join(list(noSpaceText))

print("Word tokens:", wordTokens)
print("No-space text:", noSpaceText)
print("Character-level detokenized:", detokenizedText)


Word tokens: ['Natural', 'Language', 'Processing', 'enables', 'computers', 'to', 'understand', 'human', 'language', '.']
No-space text: NaturalLanguageProcessingenablescomputerstounderstandhumanlanguage.
Character-level detokenized: N a t u r a l L a n g u a g e P r o c e s s i n g e n a b l e s c o m p u t e r s t o u n d e r s t a n d h u m a n l a n g u a g e .


## Exercise 2: Named Entity Recognition with SpaCy

**Task**: Identify and classify named entities (like persons, organizations, locations) in text.
**Library**: SpaCy is an industrial-strength natural language processing library in Python.

In [60]:
print("\n--- Exercise 2: Named Entity Recognition with SpaCy ---")
# Load SpaCy model - ensure 'en_core_web_sm' is downloaded (you might need !python -m spacy download en_core_web_sm)
try:
    nlp = spacy.load("en_core_web_sm") # Loads a small English model for processing
except OSError:
    print("Downloading en_core_web_sm model for SpaCy...")
    from spacy.cli import download
    download("en_core_web_sm") # If model is not found, download it automatically
    nlp = spacy.load("en_core_web_sm")

text2 = "Google was founded by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University."
doc = nlp(text2) # Process the text with the loaded SpaCy model
print(f"Original text: '{text2}'")
print("Named Entities:")
for ent in doc.ents: # Iterate through the detected entities
    print(f"  {ent.text:<20} {ent.label_}") # Print the entity text and its label (e.g., PERSON, ORG)


--- Exercise 2: Named Entity Recognition with SpaCy ---
Original text: 'Google was founded by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University.'
Named Entities:
  Google               ORG
  Larry Page           PERSON
  Sergey Brin          PERSON
  Ph.D.                WORK_OF_ART
  Stanford University  ORG


### Challenge 2: Named Entity Recognition

**Task**: Apply NER to a new sentence that contains different types of entities (e.g., dates, monetary values, products). Analyze the output and see if SpaCy correctly identifies and labels them. What happens if you use a sentence with less common entities?

**Example Sentence**: "Apple released the iPhone 15 in September 2023 for $799. The event took place in Cupertino, California."

In [65]:

# Load SpaCy model
try:
    nlp = spacy.load("en_core_web_sm")
except OSError:
    print("Downloading en_core_web_sm model for SpaCy...")
    from spacy.cli import download
    download("en_core_web_sm")
    nlp = spacy.load("en_core_web_sm")

# Challenge sentence
text_challenge = "Apple released the iPhone 15 in September 2023 for $799. The event took place in Cupertino, California."
doc = nlp(text_challenge)

print(f"Original text: '{text_challenge}'")
print("\nNamed Entities:")
for ent in doc.ents:
    print(f"{ent.text:<20} {ent.label_}")



--- Challenge 2: Named Entity Recognition with SpaCy ---
Original text: 'Apple released the iPhone 15 in September 2023 for $799. The event took place in Cupertino, California.'

Named Entities:
Apple                ORG
September 2023       DATE
799                  MONEY
Cupertino            GPE
California           GPE


## Exercise 3: Sentiment Analysis with TextBlob

**Task**: Determine the emotional tone behind a piece of text, usually categorizing it as positive, negative, or neutral.
**Library**: TextBlob is a simple Python library for processing textual data. It provides a simple API for common NLP tasks.

In [64]:
print("\n--- Exercise 3: Sentiment Analysis with TextBlob ---")
text3 = "I am extremely happy with the service provided."
blob = TextBlob(text3) # Create a TextBlob object from the text
sentiment = blob.sentiment # Access the sentiment property, which returns polarity and subjectivity
print(f"Original text: '{text3}'")
print(f"Sentiment: {sentiment}") # Polarity ranges from -1 (negative) to 1 (positive), Subjectivity from 0 (objective) to 1 (subjective)


--- Exercise 3: Sentiment Analysis with TextBlob ---
Original text: 'I am extremely happy with the service provided.'
Sentiment: Sentiment(polarity=0.8, subjectivity=1.0)


### Challenge 3: Sentiment Analysis

**Task**: Choose a short paragraph from a product review or a news article. Perform sentiment analysis using TextBlob. Analyze the `polarity` and `subjectivity` scores. How well does it align with your own understanding of the text's sentiment? Try sentences with sarcasm or nuanced language and observe the results.

In [69]:
from textblob import TextBlob
text4 = "The laptop has a beautiful display and runs very fast, but the battery life is disappointing and the fan is extremely loud. Overall, it's good but not great."
blob = TextBlob(text4)
sentiment = blob.sentiment
print(f"Original text: '{text4}'")
print(f"Sentiment: {sentiment}")


Original text: 'The laptop has a beautiful display and runs very fast, but the battery life is disappointing and the fan is extremely loud. Overall, it's good but not great.'
Sentiment: Sentiment(polarity=0.12999999999999998, subjectivity=0.6614285714285716)


## Exercise 4: Text Summarization with Sumy

**Task**: Condense a longer text into a shorter version while retaining the most important information.
**Library**: Sumy is a Python library for automatic text summarization of text documents.

In [83]:
print("\n--- Exercise 4: Text Summarization with Sumy ---")
text4 = "Natural Language Processing (NLP) is a fascinating field at the intersection of computer science, artificial intelligence, and linguistics. It enables machines to understand, interpret, and generate human language, opening up a world of possibilities for applications ranging from chatbots and translation services to sentiment analysis and beyond. This field involves various techniques, including machine learning, deep learning, and rule-based methods, to process and analyze large amounts of text data. The goal of NLP is to bridge the communication gap between humans and computers, allowing for more natural and intuitive interactions. Its applications are constantly expanding, making it a critical area of research and development in today's technologically driven world."
parser = PlaintextParser.from_string(text4, Tokenizer("english")) # Parse the text using Sumy's PlaintextParser and English tokenizer
summarizer = LsaSummarizer() # Initialize an LSA (Latent Semantic Analysis) summarizer
summary = summarizer(parser.document, 4)  # Summarize the document into 2 sentences
print(f"Original text (first 100 chars): '{text4[:100]}...' ")
print("Summary (2 sentences):")
for sentence in summary:
    print(f"  - {sentence}") # Print each sentence of the generated summary



--- Exercise 4: Text Summarization with Sumy ---
Original text (first 100 chars): 'Natural Language Processing (NLP) is a fascinating field at the intersection of computer science, ar...' 
Summary (2 sentences):
  - Natural Language Processing (NLP) is a fascinating field at the intersection of computer science, artificial intelligence, and linguistics.
  - This field involves various techniques, including machine learning, deep learning, and rule-based methods, to process and analyze large amounts of text data.
  - The goal of NLP is to bridge the communication gap between humans and computers, allowing for more natural and intuitive interactions.
  - Its applications are constantly expanding, making it a critical area of research and development in today's technologically driven world.


### Challenge 4: Text Summarization

**Task**: Take a longer article or a portion of text (e.g., from Wikipedia) and apply Sumy's LSA summarizer. Experiment with different numbers of sentences for the summary (e.g., 3, 5). Compare the generated summaries to see which one best captures the essence of the original text while remaining concise. You can also explore other summarization algorithms available in Sumy if you're feeling adventurous (e.g., `LexRankSummarizer`).

In [86]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
from sumy.summarizers.lex_rank import LexRankSummarizer

text5 = """
Game of Thrones (GoT), the popular HBO series that crept its way into the books of George R. R. Martin’s A Song of Ice and Fire, has managed to touch millions through its complex plot lines, strategic characterizations, and sweeping landscapes; on air from 2011 to 2019, eight seasons of the story had run through television sets and went beyond, being debated, creating fan theories, even academics discussing the themes, narratives, and legacies.
GoT is more than just a fantasy epic. It is the narrative of political intrigue, power struggle, moral ambiguities, and the complexities of human nature, all set in a sort of fictitious world- Westeros. The show did so with such success that it created such rich, immersive universe to live in, where magic and dragons and battles for the right to sit on thrones coexisted with very personal, intimate stories about love, betrayal, and redemption.
This report provides a comprehensive analysis on the television show Game of Thrones: origin, world-building, core themes, character development, and its cultural influence, and how it eventually ended in a highly disputed final season.
Origins: Page to Screen.
George R. R. Martin started writing his A Song of Ice and Fire series in 1996 with A Game of Thrones, the first installment in what he then imagined would be a seven-book saga. Drawing inspiration from the history of the Middle Ages, notably the Wars of the Roses, the Hundred Years’ War, and ad libitum political coups d’état throughout history, Martin envisioned an epic that was so grand in scope that he cheerfully avoided the more conventional fantasy trappings in order to infuse gritty realism, sophisticated political maneuvering, and scrupulously flawed, morally ambivalent characters into his work.
Martin’s reluctance to do the conventional fantasy things ignited both books and eventually the television series. While, within Tolkien’s The Lord of the Rings, an obvious line is clearly drawn between good and evil, Game of Thrones does not: heroes fall, and villains redeem.
Primetime broadcaster HBO’s betting the house on a dramatically stretched and expanded series with thousands of characters and convoluted plotlines by developing such an amazing, sprawling series as Game of Thrones was going to be a risk for those in control: David Benioff and D.B. Weiss. But the show took off after its launch in 2011 and remained faithful to the books for the first few seasons before outgrowing them, since Martin could not publish works fast enough.
The World-Building of Game of Thrones
One of the intrinsic features of GoT is world-building. Westeros, Essos, and every other land within the universe are highly imagined, with detailed histories, cultures, religions, and political systems. Every one of the Seven Kingdoms of Westeros, at that:
The North, led by the honorable Stark family, is cold and raw with these things in mind: duty and loyalty.
For instance, the Reach can be described as being fertile and thriving, while Dorne is a land of arid soil where the people there love to passionate and fiercely independent.
The Iron Islands can be said to be defined by the cruel and harsh culture of its people, which reflects the law of strength over weakness.
Yet another continent beyond the sea, Essos adds a new layer of depth to GoT. From cities known as free cities to the likes of Braavos, to lands and places from which very little is known, like Asshai and Qarth, Martin created depth beyond Westeros’s edge in a way that added substance to the lore in general.
At the core of this world are the Targaryens, a line with a history in Valyria, the land of the dragonlords. Their dragons are a slice of power and sorcery, long thought to be lost to time. Yet it is through Daenerys Targaryen that they come back into the world. These dragons throughout the series bring magical and mythological elements that are grounded in brutal political realities in the world.
Politics of Power and Intrigue
It’s all about power. GoT drove it through with nearly every character’s arc tied in their ambition: personal glory, survival, and even to hold the Iron Throne. The title of the series itself speaks of the struggle: the game of thrones is brutal, a saying familiarized in the meanwhile by Cersei Lannister’s famous line, “When you play the game of thrones, you win or you die.”.
Inspiring GoT are political machinations, mostly based on real-world history, with a specification that it mainly refers to feudal systems where power is concentrated in the hands of just a few noble houses. Important political players within the series include:
House Stark: Honour, loyalty, and integrity are best personified by the Starks of Winterfell, first and foremost through Eddard Stark, who ultimately loses his life at the end of Season 1 because of such strong morals in such a merciless society. Each of the Stark children starts out as victims of power, loss, and suffering but will eventually become the epitome of resilience.
The House Lannister, rich and sharp-witted and actually a bit cruel, is probably the most interesting House, from the mildest pieces of wisdom by the patriarch, Tywin Lannister-savagely pragmatic-for some classes of society-to his children, Cersei, Jaime, and Tyrion, who were fully involved in pursuing the attention of Westeros’ power circles. And Cersei is probably one of the most interesting lines, running from ascension to queen to tyranny and to eventual madness.
House Targaryen: The most important journey of the exiled princess Daenerys Targaryen represents one of the most classic journeys-one from exiled princess to conqueror and queen of Westeros. Daenery’s arc at the beginning so idealistic and hopeful ends in bleak shadow, culminating in her almost controversial ending in the final season.
While GoT increases its political intrigue from even amateur ambition, it probes deeper into an effort and loss of power, especially about matters related to betrayal, loyalty, manipulation, and war. Littlefinger and Varys are masters of manipulation-the double mainspring of the whole operation that orchestrates behind-the-scenes moves deciding on kingdoms. It also questions whose legitimacy to govern and what governance actually is.
Themes of Morality and Human Nature
The most beautiful thing about Game of Thrones has been its take on morality. This, obviously, is the brightest part, since, unlike in most typical fantasies, where good is characterized so and evil so odious, GoT illustrates a world where characters are highly flawed, and lines of morality get fuzzy constantly.
"""
parser = PlaintextParser.from_string(text5, Tokenizer("english"))
summarizer_lsa = LsaSummarizer()
summary_3 = summarizer_lsa(parser.document, 3)  # 3-sentence summary
summary_5 = summarizer_lsa(parser.document, 5)  # 5-sentence summary
print("\nLSA Summary (3 sentences):")
for sentence in summary_3:
    print(f"- {sentence}")
print("\nLSA Summary (5 sentences):")
for sentence in summary_5:
    print(f"- {sentence}")
summarizer_lex = LexRankSummarizer()
summary_lex = summarizer_lex(parser.document, 4)  # 4-sentence LexRank summary
print("\nLexRank Summary (4 sentences):")
for sentence in summary_lex:
    print(f"- {sentence}")



LSA Summary (3 sentences):
- Yet another continent beyond the sea, Essos adds a new layer of depth to GoT.
- GoT drove it through with nearly every character’s arc tied in their ambition: personal glory, survival, and even to hold the Iron Throne.
- This, obviously, is the brightest part, since, unlike in most typical fantasies, where good is characterized so and evil so odious, GoT illustrates a world where characters are highly flawed, and lines of morality get fuzzy constantly.

LSA Summary (5 sentences):
- This report provides a comprehensive analysis on the television show Game of Thrones: origin, world-building, core themes, character development, and its cultural influence, and how it eventually ended in a highly disputed final season.
- Yet another continent beyond the sea, Essos adds a new layer of depth to GoT.
- GoT drove it through with nearly every character’s arc tied in their ambition: personal glory, survival, and even to hold the Iron Throne.
- Inspiring GoT are polit