# Basic NLP Hands-On Exercise (Using NLTK)

This Jupyter Notebook provides a simple, hands-on exercise to practice basic Natural Language Processing (NLP) tasks using the NLTK library. The exercises focus on fundamental NLP techniques like tokenization, stemming, lemmatization, POS tagging, and feature extraction. Follow the instructions carefully and complete the tasks.

**Note**: Ensure you have the required libraries installed (`nltk`, `sklearn`). Run the setup cell below to install dependencies and download necessary resources.

In [1]:
# Setup: Install and import required libraries
#!pip install nltk sklearn
import nltk
import re
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import stopwords, wordnet as wn
from nltk import pos_tag, ne_chunk
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Download NLTK resources
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\ksab2\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\ksab2\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\ksab2\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\ksab2\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\ksab2\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\ksab2\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-

True

## Exercise 1: Tokenization and Normalization

**Task**: Split text into words and clean it.

**Instructions**:
1. Tokenize the text into words.
2. Convert to lowercase and remove punctuation.
3. Remove stop words (common words like 'the', 'is').
4. Print the results for each step.

In [7]:
text = "I love running and jumping! It's so much FUN."

# Step 1: Tokenization
nltk.download('punkt')
from nltk.tokenize import word_tokenize
print("Tokenized words:", word_tokenize(text))


# Step 2: Normalization
norm_text = text.lower()  # Lowercase
print("Lowercase:", norm_text)



# Step 3: Stop-word Removal
nltk.download('stopwords')
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
filtered_text = [word for word in word_tokenize(norm_text) if word not in stop_words]
print("Filtered text:", filtered_text)



Tokenized words: ['I', 'love', 'running', 'and', 'jumping', '!', 'It', "'s", 'so', 'much', 'FUN', '.']
Lowercase: i love running and jumping! it's so much fun.
Filtered text: ['love', 'running', 'jumping', '!', "'s", 'much', 'fun', '.']


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\ksab2\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\ksab2\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Exercise 2: Stemming and Lemmatization

**Task**: Reduce words to their base forms.

**Instructions**:
1. Use the filtered tokens from Exercise 1.
2. Apply stemming (chop off word endings).
3. Apply lemmatization (find dictionary form).
4. Print the stemmed and lemmatized tokens.

In [13]:
# Use filtered tokens from Exercise 1
tokens = filtered_text


# Step 1: Stemming
stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(token) for token in tokens]
print("Stemmed tokens:", stemmed_tokens)

# Step 2: Lemmatization
lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(token, pos="v") for token in tokens]
print("Lemmatized tokens:", lemmatized_tokens)



Stemmed tokens: ['love', 'run', 'jump', '!', "'s", 'much', 'fun', '.']
Lemmatized tokens: ['love', 'run', 'jump', '!', "'s", 'much', 'fun', '.']


## Exercise 3: Part-of-Speech (POS) Tagging

**Task**: Label words with their grammatical roles.

**Instructions**:
1. Tokenize the sentence.
2. Apply POS tagging to identify nouns, verbs, etc.
3. Print the POS tags.

In [15]:
sentence = "The quick fox runs fast."

# Step 1: Tokenization
sentences = sent_tokenize(sentence)
print("Tokenized sentences:", sentences)

# Step 2: POS Tagging
pos_tags = pos_tag(word_tokenize(sentence))
print("POS tags:", pos_tags)


Tokenized sentences: ['The quick fox runs fast.']
POS tags: [('The', 'DT'), ('quick', 'JJ'), ('fox', 'NN'), ('runs', 'VBZ'), ('fast', 'RB'), ('.', '.')]


## Exercise 4: Named Entity Recognition

**Task**: Identify names of people, places, or organizations.

**Instructions**:
1. Tokenize the sentence and apply POS tagging.
2. Use NLTK's named entity chunker to find entities.
3. Print the named entities.

In [21]:
text = "Elon Musk founded Tesla in California."

# Step 1: Tokenization and POS Tagging
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
print("POS tags:", pos_tags)


# Step 2: Named Entity Recognition
nltk.download('maxent_ne_chunker_tab')  
named_entities = ne_chunk(pos_tags)
print("Named entities:", named_entities)




POS tags: [('Elon', 'NNP'), ('Musk', 'NNP'), ('founded', 'VBD'), ('Tesla', 'NNP'), ('in', 'IN'), ('California', 'NNP'), ('.', '.')]


[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     C:\Users\ksab2\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!


Named entities: (S
  (PERSON Elon/NNP)
  (PERSON Musk/NNP)
  founded/VBD
  (PERSON Tesla/NNP)
  in/IN
  (GPE California/NNP)
  ./.)


## Exercise 5: WordNet Exploration

**Task**: Find synonyms for a word using WordNet.

**Instructions**:
1. Use WordNet to find synonyms for the word "happy".
2. Print the definition and synonyms.

In [19]:
# Explore synonyms for 'happy'


## Submission Instructions

1. Ensure all code cells run without errors.
2. Save the notebook as `nlp_basic_hands_on_nltk.ipynb`.
3. Submit the notebook file for evaluation.

**Evaluation Criteria**:
- Correctness of code implementation.
- Clarity of output for each task.
- Proper use of NLTK library.