<a href="https://colab.research.google.com/github/alex-smith-uwec/NLP_Spring2025/blob/main/UniversalDependencyParsing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dependency parsing

Good reading for part of dependency parsing: [Jurafsky/Martin Chapter 19](https://web.stanford.edu/~jurafsky/slp3/19.pdf)

[Universal Dependencies](https://universaldependencies.org/)

Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 600 contributors producing over 200 treebanks in over 150 languages

[SpaCy on hugginface](https://huggingface.co/spaces/spacy/pipeline-visualizer)

SpaCy is *very good* with dependency parse trees.

In [1]:
!pip install spacy -q

In [29]:
import spacy
from spacy import displacy


In [30]:

!python -m spacy download en_core_web_sm -q

#dowload language models for different languages (English done above)
!python -m spacy download fr_core_news_sm -q
!python -m spacy download es_core_news_sm -q
!python -m spacy download zh_core_web_sm -q
!python -m spacy download ja_core_news_sm -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m62.8 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.3/16.3 MB[0m [31m69.8 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('fr_core_news_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
[2K     [90m━━━━━━━

In [12]:
# Load spaCy models for different languages
nlp_en = spacy.load("en_core_web_sm")
nlp_fr = spacy.load("fr_core_news_sm")
nlp_es = spacy.load("es_core_news_sm")
nlp_zh=spacy.load("zh_core_web_sm")
nlp_ja=spacy.load("ja_core_news_sm")

In [31]:

doc_en=nlp_en("Students will learn about natural logarithms in differential calculus.")
doc_fr=nlp_fr("Les élèves apprendront les logarithmes naturels en calcul différentiel.")
doc_es=nlp_es("Los estudiantes aprenderán sobre los logaritmos naturales en el cálculo diferencial.")
doc_zh=nlp_zh("學生將學習微積分中的自然對數。")
doc_ja=nlp_ja("学生は、微分幾何学で自然対数を学ぶことができる。")

In [34]:
options = {
    "compact": True,      # Makes the visualization more compact
    "color": "black",     # Optional: sets text color
    "distance": 100,       # Reduces the distance between words (default is 100)
    "font": "Arial",      # Adjusts font if needed
}

displacy.render(doc_en, style="dep",options=options)
displacy.render(doc_fr, style="dep",options=options)
displacy.render(doc_es, style="dep",options=options)
displacy.render(doc_zh, style="dep",options=options)
displacy.render(doc_ja, style="dep",options=options)

[List of ISO 639 language codes](https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes)

For example, en (English), zh (Chinese), fr (French)

Translations can be obtained for [google translate web interface](https://translate.google.com)

For fun, at Python library to interface with google translate is shown several cells down.

In [36]:
!pip install googletrans==4.0.0-rc1 -q

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.1/55.1 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.4/133.4 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.6/42.6 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.8/58.8 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.0/65.0 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m45.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.6/53.6 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for googletrans (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not c

In [37]:
from googletrans import Translator

translator = Translator()

In [38]:
sentence="Students will learn about natural logarithms in differential calculus."
result = translator.translate(sentence, dest='zh-CN')
print(result.text)

学生将了解差分微积分中的自然对数。
