###                     **Stemming and Lemmatization: Solutions**

- **Run this cell to import all necessary packages**

In [11]:
#let import necessary libraries and create the object نستورد المكاتب
#for nltk
import nltk
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

#downloading all neccessary packages related to nltk
#nltk.download('all')

#for spacy
import spacy
nlp = spacy.load("en_core_web_sm")

**Exercise1:** 
- Convert these list of words into base form using Stemming and Lemmatization and observe the transformations
- Write a short note on the words that have different base words using stemming and Lemmatization

In [12]:
#using stemming in nltk
lst_words = ['running', 'painting', 'walking', 'dressing', 'likely', 'children', 'whom', 'good', 'ate', 'fishing','rafting', 'ability', 'meeting']

for word in lst_words:
    print(f"{word} | {stemmer.stem(word)}")


running | run
painting | paint
walking | walk
dressing | dress
likely | like
children | children
whom | whom
good | good
ate | ate
fishing | fish
rafting | raft
ability | abil
meeting | meet


In [13]:
#using lemmatization in spacy

doc = nlp("running painting walking dressing likely children whom good ate fishing")
for token in doc:
    print(token, " | ", token.lemma_)

running  |  run
painting  |  painting
walking  |  walking
dressing  |  dress
likely  |  likely
children  |  child
whom  |  whom
good  |  good
ate  |  eat
fishing  |  fish


**Observations**

- Words that are different in stemming and lemmatization are:
    - painting
    - likely
    - children
    - ate
    - fishing

- As Stemming achieves the base word by removing the **suffixes** [ing, ly etc], so it successfully transform the words like 'painting', 'likely', 'fishing' and lemmatization fails for some words ending with suffixes here.

- As Lemmatization uses the **dictionary** meanings while converting to the base form, so words like 'children' and 'ate' are successfully transformed and stemming fails here.

**Exercise2:**

- convert the given text into it's base form using both stemming and lemmatization

In [26]:
text = """Latha is very multi talented girl.She is good at many skills like dancing, running, singing, playing.She also likes eating Pav Bhagi.she has a 
habit of fishing and swimming too.Besides all this, she is a wonderful at cooking too.
"""

In [15]:
#using stemming in nltk

#step1: Word tokenizing تجزئت الكلمات 
all_word_tokens = nltk.word_tokenize(text)


#step2: getting the base form for each token using stemmer  نخزن النتائج
all_base_words = []

for token in all_word_tokens:
  base_form = stemmer.stem(token)
  all_base_words.append(base_form)


#step3: joining all words in a list into string using 'join()' ضم جميع الكلمات في القائمه الى سلسله نصيه 
final_base_text = ' '.join(all_base_words)
print(final_base_text)

latha is veri multi talent girl.sh is good at mani skill like danc , run , sing , playing.sh also like eat pav bhagi . she ha a habit of fish and swim too.besid all thi , she is a wonder at cook too .


In [22]:
#using lemmatisation in spacy


#step1: Creating the object for the given text
dco = nlp(text)
all_base_words=[]

#step2: getting the base form for each token using spacy 'lemma_'
for token in dco:
    base_word = token.lemma_
    all_base_words.append(base_word)
#all_base_words
#step3: joining all words in a list into string using 'join()'
final_base_text = ' '.join(all_base_words)
print(final_base_text)

Latha be very multi talented girl . she be good at many skill like dancing , running , singing , play . she also like eat Pav Bhagi . she have a 
 habit of fishing and swim too . besides all this , she be a wonderful at cook too . 



In [18]:
# نستبدل الخطوات السابقه بسطر واحد 
final_base_text = " ".join([token.lemma_ for token in nlp(text)])
print(final_base_text)

Latha be very multi talented girl . she be good at many skill like dancing , running , singing , play . she also like eat Pav Bhagi . she have a 
 habit of fishing and swim too . besides all this , she be a wonderful at cook too . 



In [23]:
# النسخة الاحترافية الكاملة (تنظيف + حذف كلمات التوقف + تأصيل)
final_text = " ".join([t.lemma_ for t in nlp(text) if not t.is_stop and not t.is_punct])
print(final_text)

Latha multi talented girl good skill like dancing running singing play like eat Pav Bhagi 
 habit fishing swim wonderful cook 



In [25]:
# تصفية النص للحصول على أصل الأفعال فقط، واستبعاد كلمات التوقف وعلامات الترقيم
final_verbs = " ".join([t.lemma_ for t in nlp(text) if t.pos_ == "VERB" and not t.is_stop and not t.is_punct])
print(final_verbs)

play like eat swim cook


In [9]:
# قائمة كلمات للتجربة
words = ['eating', 'eats', 'ate', 'adjustable', 'rafting', 'ability', 'meeting']

# تجربة NLTK (Stemming)
print("NLTK Stemming:")
for word in words:
    print(f"{word} -> {stemmer.stem(word)}")

print("\n" + "-"*30 + "\n")

# تجربة spaCy (Lemmatization)
print("spaCy Lemmatization:")
doc = nlp(" ".join(words))
for token in doc:
    print(f"{token.text} -> {token.lemma_}")

NLTK Stemming:
eating -> eat
eats -> eat
ate -> ate
adjustable -> adjust
rafting -> raft
ability -> abil
meeting -> meet

------------------------------

spaCy Lemmatization:
eating -> eat
eats -> eat
ate -> eat
adjustable -> adjustable
rafting -> raft
ability -> ability
meeting -> meeting
