# Visualizing Dependency Trees for EDUs

This notebook demonstrates how to visualize dependency trees for Elementary Discourse Units (EDUs) based on the project structure and parsed results. It uses the spaCy and conllu libraries to load, process, and visualize the dependency parses of EDUs. The visualizations can be saved as SVG files for inclusion in your thesis.

---

## 1. Setup: Install Required Libraries


In [1]:
!pip install spacy conllu
!python -m spacy download de_core_news_sm
!python -m spacy download ru_core_news_sm

Defaulting to user installation because normal site-packages is not writeable
Collecting conllu
  Downloading conllu-6.0.0-py3-none-any.whl.metadata (21 kB)
Downloading conllu-6.0.0-py3-none-any.whl (16 kB)
Installing collected packages: conllu
Successfully installed conllu-6.0.0
/Users/arturbegichev/miniconda3/bin/python: No module named spacy
/Users/arturbegichev/miniconda3/bin/python: No module named spacy


---

## 2. Import Libraries

In [2]:
from pathlib import Path
from conllu import parse
import spacy
from spacy import displacy



---

## 3. Load spaCy Models

Load the German and Russian language models.

In [3]:
nlp_de = spacy.load("de_core_news_sm")  # German model
nlp_ru = spacy.load("ru_core_news_sm")  # Russian model

OSError: [E050] Can't find model 'de_core_news_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

---

## 4. Define Utility Functions

These functions help to read CoNLL-U files and visualize or save dependency trees.

In [None]:
def load_edus_from_conllu(conllu_path):
    """
    Load all EDUs (as TokenList objects) from a CoNLL-U file.
    """
    data = Path(conllu_path).read_text(encoding="utf-8")
    return parse(data)

def visualize_edu(edu_text, nlp_model, jupyter=True):
    """
    Visualize the dependency tree of an EDU using a spaCy model and displaCy.
    Set jupyter=False to return SVG string.
    """
    doc = nlp_model(edu_text)
    if jupyter:
        displacy.render(doc, style="dep", jupyter=True)
    else:
        svg = displacy.render(doc, style="dep", jupyter=False)
        return svg

def save_edu_tree_svg(edu_text, nlp_model, output_path):
    """
    Generate and save the dependency tree visualization of an EDU as an SVG file.
    """
    svg = visualize_edu(edu_text, nlp_model, jupyter=False)
    Path(output_path).write_text(svg, encoding="utf-8")
    print(f"SVG saved: {output_path}")

---

## 5. Example: Visualize and Save Dependency Trees for Selected EDUs

Modify the `edu_files` list and select which EDU to visualize from each file.

In [None]:
# List your CoNLL-U files here (relative or absolute paths)
edu_files = [
    "parsed_results/maz-00001_parsed.conllu",
    "parsed_results/maz-00002_parsed.conllu",
    # Add more file paths as needed
]

# Loop through files and visualize/save the first EDU from each
for conllu_path in edu_files:
    edus = load_edus_from_conllu(conllu_path)
    if not edus:
        print(f"No EDUs found in {conllu_path}")
        continue
    edu = edus[0]  # You can choose a different index if needed
    edu_text = edu.metadata.get("text", "")
    if not edu_text:
        print(f"No text found for EDU in {conllu_path}")
        continue
    # Determine language based on filename (adjust if you use a different pattern)
    if "maz" in conllu_path or "de" in conllu_path:
        nlp_model = nlp_de
    elif "ru" in conllu_path:
        nlp_model = nlp_ru
    else:
        nlp_model = nlp_de  # Default to German if unsure
    print(f"Visualizing EDU from {conllu_path}:")
    visualize_edu(edu_text, nlp_model, jupyter=True)
    # Save the SVG file
    svg_path = Path(conllu_path).stem + "_dep.svg"
    save_edu_tree_svg(edu_text, nlp_model, svg_path)