# Dependency Parsing for English RST Example: EDUs and Whole Sentences

This notebook demonstrates how to:
- Extract Elementary Discourse Units (EDUs) from an .rs3 file (RST format)
- Parse each EDU using dependency parsing (spaCy)
- Group EDUs into whole sentences and parse them as full sentences
- Visualize the dependency trees for both EDUs and full sentences


## Requirements and Setup

**System Requirements:**
- Python 3.7+ 
- Internet connection (for downloading spaCy model on first run)

**Auto-Installation:**
This notebook includes automatic installation of all required packages. Simply run the installation cell in Section 1, and all dependencies will be installed automatically.

**Manual Installation (if needed):**
```bash
pip install spacy
python -m spacy download en_core_web_sm
```

---

## 1. Install and Import Required Libraries

**First, run the installation cell below to automatically install all required packages:**

In [6]:
# Install required packages and download spaCy model
# Run this cell once to set up the environment
import subprocess
import sys

def install_package(package):
    """Install a package using pip"""
    subprocess.check_call([sys.executable, "-m", "pip", "install", package])

# Install main packages
try:
    import spacy
    print("✓ spaCy is already installed")
except ImportError:
    print("Installing spaCy...")
    install_package("spacy")
    print("✓ spaCy installed successfully")

# Download and install English spaCy model
try:
    nlp_test = spacy.load("en_core_web_sm")
    print("✓ English spaCy model (en_core_web_sm) is already available")
except OSError:
    print("Downloading English spaCy model...")
    # Method 1: Try using spacy download command
    try:
        subprocess.check_call([sys.executable, "-m", "spacy", "download", "en_core_web_sm"])
        print("✓ English spaCy model installed successfully via spacy download")
    except:
        # Method 2: Install from direct link if method 1 fails
        print("Trying alternative installation method...")
        install_package("https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0.tar.gz")
        print("✓ English spaCy model installed successfully via direct download")

print("\nAll packages are ready to use!")

✓ spaCy is already installed
✓ English spaCy model (en_core_web_sm) is already available

All packages are ready to use!


**Now import the libraries:**

In [7]:
import spacy
from spacy import displacy
import xml.etree.ElementTree as ET
from pathlib import Path

In [8]:
import sys
print(sys.executable)


/Users/arturbegichev/University/bachelor_thesis/.venv/bin/python


## 2. Load the English spaCy Model

In [9]:
nlp_en = spacy.load("en_core_web_sm") 


## 3. Extract EDUs from the .rs3 File

In [10]:
# Path to your .rs3 file (update as needed)
rs3_path = "../en_example.rs3"
# Parse the XML structure and extract EDU segments
root = ET.parse(rs3_path).getroot()
edus = []
for segment in root.findall(".//segment"):
    edu_text = segment.text.strip() if segment.text else ""
    if edu_text:
        edus.append(edu_text)
print(f"Total EDUs extracted: {len(edus)}")
for i, e in enumerate(edus[:5]):
    print(f"EDU {i+1}: {e}")

Total EDUs extracted: 37
EDU 1: I, too, would like to thank Assistant Secretary-General for Political Affairs Oscar Fernandez-Taranco for his briefing on the developments in Ukraine and on the involvement of the United Nations.
EDU 2: I also welcome the presence of the Permanent Representative of Ukraine at this meeting.
EDU 3: Argentina continues to follow the situation with concern, in particular in the east of Ukraine.
EDU 4: Undoubtedly, the situation has deteriorated, with further tensions and violence,
EDU 5: and we cannot remain indifferent to that.


## 4. Dependency Parsing for Each EDU

In [11]:
# Visualize the dependency tree for the first 5 EDUs (in Jupyter only)
for idx, edu in enumerate(edus[:5]):
    print(f"EDU {idx+1}: {edu}")
    doc = nlp_en(edu)
    displacy.render(doc, style="dep", jupyter=True)

#  ALL EDUs:
"""
for idx, edu in enumerate(edus):
    print(f'EDU {idx+1}: {edu}')
    doc = nlp_en(edu)
    displacy.render(doc, style='dep', jupyter=True)
"""


EDU 1: I, too, would like to thank Assistant Secretary-General for Political Affairs Oscar Fernandez-Taranco for his briefing on the developments in Ukraine and on the involvement of the United Nations.


EDU 2: I also welcome the presence of the Permanent Representative of Ukraine at this meeting.


EDU 3: Argentina continues to follow the situation with concern, in particular in the east of Ukraine.


EDU 4: Undoubtedly, the situation has deteriorated, with further tensions and violence,


EDU 5: and we cannot remain indifferent to that.


"\nfor idx, edu in enumerate(edus):\n    print(f'EDU {idx+1}: {edu}')\n    doc = nlp_en(edu)\n    displacy.render(doc, style='dep', jupyter=True)\n"

## 5. Group EDUs into Full Sentences

In [12]:
def group_edus_to_sentences(edus):
    sentences = []
    buffer = []
    for edu in edus:
        buffer.append(edu)
        # Check for sentence-final punctuation
        if edu.strip().endswith(('.', '!', '?')):
            sentences.append(" ".join(buffer))
            buffer = []
    if buffer:
        sentences.append(" ".join(buffer))  # Add any remaining as a final sentence
    return sentences

sentences = group_edus_to_sentences(edus)
print(f"Total full sentences: {len(sentences)}")
for i, s in enumerate(sentences[:3]):
    print(f"Sentence {i+1}: {s}\n")

Total full sentences: 19
Sentence 1: I, too, would like to thank Assistant Secretary-General for Political Affairs Oscar Fernandez-Taranco for his briefing on the developments in Ukraine and on the involvement of the United Nations.

Sentence 2: I also welcome the presence of the Permanent Representative of Ukraine at this meeting.

Sentence 3: Argentina continues to follow the situation with concern, in particular in the east of Ukraine.



## 6. Dependency Parsing for Whole Sentences

In [13]:
for idx, sent in enumerate(sentences):
    print(f"Full Sentence {idx+1}: {sent}")
    doc = nlp_en(sent)
    displacy.render(doc, style="dep", jupyter=True)

Full Sentence 1: I, too, would like to thank Assistant Secretary-General for Political Affairs Oscar Fernandez-Taranco for his briefing on the developments in Ukraine and on the involvement of the United Nations.


Full Sentence 2: I also welcome the presence of the Permanent Representative of Ukraine at this meeting.


Full Sentence 3: Argentina continues to follow the situation with concern, in particular in the east of Ukraine.


Full Sentence 4: Undoubtedly, the situation has deteriorated, with further tensions and violence, and we cannot remain indifferent to that.


Full Sentence 5: We are deeply concerned that the tension and violence may continue and indeed worsen.


Full Sentence 6: We are also concerned about the consequences for Ukraine and the region.


Full Sentence 7: The delegation of Argentina reiterates that it is essential that we adhere to the principles that we subscribe to as Members of the United Nations.


Full Sentence 8: In particular, we must remember our commitment to non-interference in the internal affairs of States, whether militarily, politically or economically.


Full Sentence 9: In that respect, we believe that the action of any State or international organization must duly respect Ukraine's handling of its own affairs.


Full Sentence 10: As we have expressed on several occasions, the situation will not be resolved through any kind of unilateral action.


Full Sentence 11: It is crucial that the use of force be avoided.


Full Sentence 12: We join the Secretary-General today in calling on all parties to endeavour to bring calm to the situation.


Full Sentence 13: Maximal self-restraint must be shown, and constructive dialogue sought and established on an urgent basis in order to de-escalate the situation and address the parties' differences.


Full Sentence 14: In conclusion, Argentina will continue to promote dialogue and a peaceful settlement of the crisis.


Full Sentence 15: It is essential to step up efforts to create the conditions necessary for urgent dialogue to begin so that solutions can be found to the parties' differences and the interests of all minorities are taken into account.


Full Sentence 16: The multilateral meeting among the United States, the Russian Federation and the European Union that is planned for Thursday, 17 April will present an opportunity to do so.


Full Sentence 17: We hope that those main actors will be able to achieve the agreement necessary.


Full Sentence 18: We reiterate that the international community should concentrate its efforts on establishing dialogue.


Full Sentence 19: We stress that Ukrainians must have a democratic and peaceful outcome to the situation.


## 7. (Optional) Save Visualizations as SVG

In [14]:
output_dir = Path("english_parse_svgs")
output_dir.mkdir(exist_ok=True)
# Save for EDUs
for idx, edu in enumerate(edus):
    doc = nlp_en(edu)
    svg = displacy.render(doc, style="dep", jupyter=False)
    (output_dir / f"edu_{idx+1}_dep.svg").write_text(svg, encoding="utf-8")
# Save for full sentences
for idx, sent in enumerate(sentences):
    doc = nlp_en(sent)
    svg = displacy.render(doc, style="dep", jupyter=False)
    (output_dir / f"sentence_{idx+1}_dep.svg").write_text(svg, encoding="utf-8")

---
*This notebook shows the complete workflow for English RST parsing: from EDU extraction to visualization of dependency trees for both discourse units and whole sentences.*