# Basic NLP Course

## Stemming and Lemmatization

Stemming and lemmatization are fundamental concepts in Natural Language Processing (NLP) used to reduce words to their base or root form.

- **Stemming**: This technique uses fixed rules to strip affixes from words, resulting in a base form that may not always be a valid word. For example, "running" becomes "run" and "flies" becomes "fli".
- **Lemmatization**: Unlike stemming, lemmatization leverages knowledge of a language, such as vocabulary and morphological analysis, to derive the base or dictionary form of a word. For instance, "running" becomes "run" and "flies" becomes "fly".

In [4]:
import nltk
import spacy
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
spacy_nlp = spacy.load("en_core_web_sm")

In [2]:
safety_reports = [
    "Wear appropriate personal protective equipment",
    "Organize periodic safety drills",
    "Examine machinery for potential hazards",
    "Deliver safety training to employees",
    "Keep emergency exits unobstructed",
    "Log and report workplace incidents",
    "Adhere to chemical handling guidelines",
    "Assess air quality in confined spaces",
    "Mark hazardous materials clearly",
    "Perform regular risk evaluations"
]

In [3]:
for report in safety_reports:
    for word in report.split():
        print(f"{word} --> {stemmer.stem(word)}")

Wear --> wear
appropriate --> appropri
personal --> person
protective --> protect
equipment --> equip
Organize --> organ
periodic --> period
safety --> safeti
drills --> drill
Examine --> examin
machinery --> machineri
for --> for
potential --> potenti
hazards --> hazard
Deliver --> deliv
safety --> safeti
training --> train
to --> to
employees --> employe
Keep --> keep
emergency --> emerg
exits --> exit
unobstructed --> unobstruct
Log --> log
and --> and
report --> report
workplace --> workplac
incidents --> incid
Adhere --> adher
to --> to
chemical --> chemic
handling --> handl
guidelines --> guidelin
Assess --> assess
air --> air
quality --> qualiti
in --> in
confined --> confin
spaces --> space
Mark --> mark
hazardous --> hazard
materials --> materi
clearly --> clearli
Perform --> perform
regular --> regular
risk --> risk
evaluations --> evalu


In [5]:
# use lemmatization in spacy
nlp = spacy.load("en_core_web_sm")
for report in safety_reports:
    doc = nlp(report)
    for token in doc:
        print(f"{token.text} --> {token.lemma_}")

Wear --> wear
appropriate --> appropriate
personal --> personal
protective --> protective
equipment --> equipment
Organize --> organize
periodic --> periodic
safety --> safety
drills --> drill
Examine --> examine
machinery --> machinery
for --> for
potential --> potential
hazards --> hazard
Deliver --> deliver
safety --> safety
training --> training
to --> to
employees --> employee
Keep --> keep
emergency --> emergency
exits --> exit
unobstructed --> unobstructed
Log --> log
and --> and
report --> report
workplace --> workplace
incidents --> incident
Adhere --> adhere
to --> to
chemical --> chemical
handling --> handling
guidelines --> guideline
Assess --> Assess
air --> air
quality --> quality
in --> in
confined --> confine
spaces --> space
Mark --> Mark
hazardous --> hazardous
materials --> material
clearly --> clearly
Perform --> perform
regular --> regular
risk --> risk
evaluations --> evaluation
