# POS Tagging

<p>One of the core tasks in Natural Language Processing (NLP) is Parts of Speech (PoS) tagging, which is giving each word in a text a grammatical category, such as nouns, verbs, adjectives, and adverbs. Through improved comprehension of phrase structure and semantics, this technique makes it possible for machines to study and comprehend human language more accurately.

## POS Tagging with spaCy


In the below, you can find a simple Python example built with **spaCy** with the given sentences.

In [45]:
import spacy
nlp = spacy.load('en_core_web_sm')    # Loading model

In [46]:
sentences= [
    "The prevalence of discrimination across racial groups in contemporary America:",
    "Indeed, the implementation of certain policies is rooted in the assumption that discrimination and biases are, at least to some appreciable amount, present in modern society.",
    "What do you think was the main reason for these experiences?",
    "Some countries have recently adopted a similar course, which we welcome.",
    "Sometimes this information is available, but usually not.",
    "Make sure the shelter box has no nails or safety hazards.",
    "A barn owl is the best.",
    "A family of barn owls can eat many mice in a night!",
    "Okay next to my bed, I've got the radio alarm clock, and this has actually changed my life.",
    "It makes waking up in the morning and getting out of bed at 6:00 a.m. when it's pitch black outside so much easier when you're waking up really early."
]


for sentence in sentences:
    doc = nlp(sentence)
    for token in doc:
      print(token.text, " => ", token.pos_, " = ", spacy.explain(token.pos_), " | ", token.tag_, " => ", spacy.explain(token.tag_))


The  =>  DET  =  determiner  |  DT  =>  determiner
prevalence  =>  NOUN  =  noun  |  NN  =>  noun, singular or mass
of  =>  ADP  =  adposition  |  IN  =>  conjunction, subordinating or preposition
discrimination  =>  NOUN  =  noun  |  NN  =>  noun, singular or mass
across  =>  ADP  =  adposition  |  IN  =>  conjunction, subordinating or preposition
racial  =>  ADJ  =  adjective  |  JJ  =>  adjective (English), other noun-modifier (Chinese)
groups  =>  NOUN  =  noun  |  NNS  =>  noun, plural
in  =>  ADP  =  adposition  |  IN  =>  conjunction, subordinating or preposition
contemporary  =>  PROPN  =  proper noun  |  NNP  =>  noun, proper singular
America  =>  PROPN  =  proper noun  |  NNP  =>  noun, proper singular
:  =>  PUNCT  =  punctuation  |  :  =>  punctuation mark, colon or ellipsis
Indeed  =>  ADV  =  adverb  |  RB  =>  adverb
,  =>  PUNCT  =  punctuation  |  ,  =>  punctuation mark, comma
the  =>  DET  =  determiner  |  DT  =>  determiner
implementation  =>  NOUN  =  noun  |  NN 

## POS Tagging with ZEMBEREK

In the below, you can find a simple Python example built with **ZEMBEREK** with the given sentences.

In [50]:
!pip install jpype1
!pip install zemberek-python

Collecting zemberek-python
  Downloading zemberek_python-0.2.3-py3-none-any.whl.metadata (2.7 kB)
Collecting antlr4-python3-runtime==4.8 (from zemberek-python)
  Downloading antlr4-python3-runtime-4.8.tar.gz (112 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m112.4/112.4 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading zemberek_python-0.2.3-py3-none-any.whl (95.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.1/95.1 MB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: antlr4-python3-runtime
  Building wheel for antlr4-python3-runtime (setup.py) ... [?25l[?25hdone
  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.8-py3-none-any.whl size=141214 sha256=078fa1920013784e988d7055c420bbfe7fabf26b4e6360717413942a3b99a319
  Stored in directory: /root/.cache/pip/wheels/a7/20/bd/e1477d664f22d99989fd28ee1a43d6633dddb5cb9

In [51]:
import zemberek
from zemberek import TurkishMorphology, TurkishSentenceExtractor, TurkishTokenizer

In [56]:
tokenizer = TurkishTokenizer.DEFAULT
morphology = TurkishMorphology.create_with_defaults()

INFO:zemberek.morphology.turkish_morphology:TurkishMorphology instance initialized in 19.08867597579956


2024-12-14 08:43:54,852 - zemberek.morphology.turkish_morphology - INFO
Msg: TurkishMorphology instance initialized in 19.08867597579956



In [57]:
result = morphology.analyze("kahve")
print(result)

WordAnalysis{input='kahve', normalizedInput='kahve', analysisResults=[kahve:Noun] kahve:Noun+A3sg}


In [82]:
sentences = [
    "Koyun kurt ile gezerdi, fikir başka başka olmasa.",
    "Sıraya koyunca en önemlisini öne almak lazım geldi."
    "Pakize bu son fikri fazla beğenmişti ve itiraf edeyim ki Pakize'nin zevki benim için bir çeşit miyar olmuştu.",
    "Doğduğuma pişman olacak kadar sıkıntı çektim.",
    "Kim ona yan bakarsa kemiklerini kırar, anasını ağlatırım.",
    "Onlara göre yaşlı yazarların anısal birikimi daha fazlaydı.",
    "Yaşamı her yönden yalnızlığa yaslanmış olan bu kadına tek çocuğun bile anlayış gösterdiğini sanmam.",
    "Polis olay yerine 4 dakika içinde ulaştı.",
    "Yüksekte tutulan bir taştaki gizli güç, taş bırakılınca mekanik bir güç durumunda ortaya çıkar.",
    "Bu hastanede doğmuşum."
    "xxxxxxxxxxxxxxxxxxxxxx"
]

for sentence in sentences:
  for token in tokenizer.tokenize(sentence):
    #print(token)
    #tokens = tokenizer.tokenize(sentence)
    #print(tokens)
    analysis = morphology.analyze(token.normalized)
    if analysis.analysis_results:
      print(token.content , " => ", analysis.analysis_results[0])
    else:
      print(token.content , " => ", "No analysis")


Koyun  =>  [koymak:Verb] koy:Verb+Imp+un:A2pl
kurt  =>  [kurt:Adj] kurt:Adj
ile  =>  [ile:Postp, PCNom] ile:Postp
gezerdi  =>  [gezmek:Verb] gez:Verb+er:Aor+di:Past+A3sg
,  =>  [,:Punc] ,:Punc
fikir  =>  [fikir:Noun] fikir:Noun+A3sg
başka  =>  [başka:Adj] başka:Adj
başka  =>  [başka:Adj] başka:Adj
olmasa  =>  [olmak:Verb] ol:Verb+ma:Neg+sa:Desr+A3sg
.  =>  [.:Punc] .:Punc
Sıraya  =>  [sıra:Noun] sıra:Noun+A3sg+ya:Dat
koyunca  =>  [koymak:Verb] koy:Verb|unca:When→Adv
en  =>  [en:Adv] en:Adv
önemlisini  =>  [önem:Noun] önem:Noun+A3sg|li:With→Adj|Zero→Noun+A3sg+si:P3sg+ni:Acc
öne  =>  [önemek:Verb] öne:Verb+Imp+A2sg
almak  =>  [almak:Verb] al:Verb|mak:Inf1→Noun+A3sg
lazım  =>  [lazım:Adj] lazım:Adj
geldi  =>  [gelmek:Verb] gel:Verb+di:Past+A3sg
.  =>  [.:Punc] .:Punc
Pakize  =>  [Pakize:Noun, Prop] pakize:Noun+A3sg
bu  =>  [bu:Det] bu:Det
son  =>  [son:Adj] son:Adj
fikri  =>  [fikrî:Adj] fikri:Adj
fazla  =>  [fazla:Adv] fazla:Adv
beğenmişti  =>  [beğenmek:Verb] beğen:Verb+miş:Narr+ti:Past