<a href="https://colab.research.google.com/github/Simonmatharesh/nlp-labs/blob/main/lab3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 Problem Statement
Implement NLP techniques in the domain of EdTech/Learning Tools to:

Work on two different non-English sentences related to the EdTech domain.

Construct five different variations of the same English sentence and generate parse trees for each (total of 10 parse trees).

Compare a sentence with and without converting to lowercase, to demonstrate the importance of case normalization in NLP.

Take user input and generate a parse tree for the entered sentence.

This task demonstrates how Natural Language Processing can be applied in educational platforms to enhance content understanding, personalization, and adaptability.



Objective
The objective of this program is to:

Apply core Natural Language Processing (NLP) concepts such as tokenization, case normalization, and parsing in the context of EdTech.

Show how NLP can help build intelligent learning tools that support adaptive content delivery, multilingual processing, and personalized feedback.

Demonstrate conceptual clarity, logical structure, and interactivity through a functional Python implementation suitable for Google Colab.

Align the code with evaluation rubrics focusing on domain relevance, program logic, complexity, and clarity of the implemented topic.

In [None]:
# Install necessary packages
!pip install nltk --quiet
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

from nltk import word_tokenize, pos_tag, ne_chunk, Tree
from nltk.parse.corenlp import CoreNLPParser
from nltk.tree import ParentedTree


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!


In [None]:
# Non-English sentences in French and German
frenchSentence = "LearnWise est une plateforme d'apprentissage adaptatif."
germanSentence = "LearnWise ist ein adaptives Lernwerkzeug für Studenten."

print("French Sentence:", frenchSentence)
print("German Sentence:", germanSentence)


French Sentence: LearnWise est une plateforme d'apprentissage adaptatif.
German Sentence: LearnWise ist ein adaptives Lernwerkzeug für Studenten.


In [None]:
# Sentences conveying the same idea
sentences = [
    "LearnWise adapts content for each learner.",
    "Each learner gets personalized content from LearnWise.",
    "LearnWise provides personalized learning experiences.",
    "The platform LearnWise tailors content to the learner.",
    "Learning content is personalized by LearnWise."
]


In [None]:
def simple_parse_tree(sentence):
    tokens = word_tokenize(sentence)
    tags = pos_tag(tokens)
    return Tree('S', [Tree(tag, [word]) for word, tag in tags])

# Generate parse trees
trees = []
for sentence in sentences:
    tree = simple_parse_tree(sentence)
    trees.append(tree)
    tree.pretty_print()


                          S                  
     _____________________|________________   
   NNP     VBZ      NN    IN  DT     NN    . 
    |       |       |     |   |      |     |  
LearnWise adapts content for each learner  . 

                       S                                 
  _____________________|_______________________________   
 DT     NN   VBZ      VBN         NN    IN     NNP     . 
 |      |     |        |          |     |       |      |  
Each learner gets personalized content from LearnWise  . 

                        S                               
     ___________________|_____________________________   
   NNP      VBZ         JJ         NN        NNS      . 
    |        |          |          |          |       |  
LearnWise provides personalized learning experiences  . 

                                  S                       
  ________________________________|_____________________   
 DT    NN       NNP      NNS      NN    TO  DT    NN    . 
 |     |

In [None]:
text1 = "LearnWise helps Students Learn Better."
text2 = "learnwise helps students learn better."

tokens1 = word_tokenize(text1)
tokens2 = word_tokenize(text2)

set1 = set(tokens1)
set2 = set(tokens2)

print("Tokens without lowercasing:", tokens1)
print("Tokens with lowercasing:", [token.lower() for token in tokens1])

print("\nToken Set Comparison:")
print("Without Lowercase:", set1)
print("With Lowercase:", set2)

print("\nCommon Tokens:", set1.intersection(set2))
print("Difference caused by casing:", set1.symmetric_difference(set2))


Tokens without lowercasing: ['LearnWise', 'helps', 'Students', 'Learn', 'Better', '.']
Tokens with lowercasing: ['learnwise', 'helps', 'students', 'learn', 'better', '.']

Token Set Comparison:
Without Lowercase: {'LearnWise', 'helps', 'Learn', '.', 'Better', 'Students'}
With Lowercase: {'helps', 'better', 'learnwise', '.', 'students', 'learn'}

Common Tokens: {'helps', '.'}
Difference caused by casing: {'better', 'LearnWise', 'learnwise', 'Students', 'students', 'Better', 'learn', 'Learn'}


In [None]:
userInput = input("Enter an EdTech-related sentence: ")
parsedTree = simple_parse_tree(userInput)

print("\nParsed Tree for your sentence:")
parsedTree.pretty_print()


Enter an EdTech-related sentence: in 1945 there was world war and lot of jews died

Parsed Tree for your sentence:
                          S                       
  ________________________|____________________    
 IN  CD    EX  VBD   NN   NN  CC  NN  IN NNS  VBD 
 |   |     |    |    |    |   |   |   |   |    |   
 in 1945 there was world war and lot  of jews died

