# Dependency Parsing for English RST Example: EDUs and Whole Sentences

This notebook demonstrates how to:
- Extract Elementary Discourse Units (EDUs) from an .rs3 file (RST format)
- Parse each EDU using dependency parsing (spaCy)
- Group EDUs into whole sentences and parse them as full sentences
- Visualize the dependency trees for both EDUs and full sentences

All comments and code are in English. Run each code cell step by step.

## 1. Install and Import Required Libraries

In [None]:
!pip install spacy
!python -m spacy download en_core_web_sm
import spacy
from spacy import displacy
import xml.etree.ElementTree as ET
from pathlib import Path

## 2. Load the English spaCy Model

In [None]:
nlp_en = spacy.load("en_core_web_sm")

## 3. Extract EDUs from the .rs3 File

In [None]:
# Path to your .rs3 file (update as needed)
rs3_path = "#"
# Parse the XML structure and extract EDU segments
root = ET.parse(rs3_path).getroot()
edus = []
for segment in root.findall(".//segment"):
    edu_text = segment.text.strip() if segment.text else ""
    if edu_text:
        edus.append(edu_text)
print(f"Total EDUs extracted: {len(edus)}")
for i, e in enumerate(edus[:5]):
    print(f"EDU {i+1}: {e}")

## 4. Dependency Parsing for Each EDU

In [None]:
# Visualize the dependency tree for each EDU (in Jupyter only)
for idx, edu in enumerate(edus):
    print(f"EDU {idx+1}: {edu}")
    doc = nlp_en(edu)
    displacy.render(doc, style="dep", jupyter=True)

## 5. Group EDUs into Full Sentences

In [None]:
def group_edus_to_sentences(edus):
    sentences = []
    buffer = []
    for edu in edus:
        buffer.append(edu)
        # Check for sentence-final punctuation
        if edu.strip().endswith(('.', '!', '?')):
            sentences.append(" ".join(buffer))
            buffer = []
    if buffer:
        sentences.append(" ".join(buffer))  # Add any remaining as a final sentence
    return sentences

sentences = group_edus_to_sentences(edus)
print(f"Total full sentences: {len(sentences)}")
for i, s in enumerate(sentences[:3]):
    print(f"Sentence {i+1}: {s}\n")

## 6. Dependency Parsing for Whole Sentences

In [None]:
for idx, sent in enumerate(sentences):
    print(f"Full Sentence {idx+1}: {sent}")
    doc = nlp_en(sent)
    displacy.render(doc, style="dep", jupyter=True)

## 7. (Optional) Save Visualizations as SVG

In [None]:
output_dir = Path("english_parse_svgs")
output_dir.mkdir(exist_ok=True)
# Save for EDUs
for idx, edu in enumerate(edus):
    doc = nlp_en(edu)
    svg = displacy.render(doc, style="dep", jupyter=False)
    (output_dir / f"edu_{idx+1}_dep.svg").write_text(svg, encoding="utf-8")
# Save for full sentences
for idx, sent in enumerate(sentences):
    doc = nlp_en(sent)
    svg = displacy.render(doc, style="dep", jupyter=False)
    (output_dir / f"sentence_{idx+1}_dep.svg").write_text(svg, encoding="utf-8")

---
*This notebook shows the complete workflow for English RST parsing: from EDU extraction to visualization of dependency trees for both discourse units and whole sentences.*