In [2]:
# Testing the AddEdge Class in Stanford CoreNLP's Ssurgeon Library
# Stanford CoreNLP is a suite of natural language processing tools that provide functionalities such as tokenization, part-of-speech tagging, parsing, and more.
# Ssurgeon (Semantic Surgeon) is a component within Stanford CoreNLP that allows users to perform complex manipulations on the dependency graphs generated by the parser.
# AddEdge is a class within Ssurgeon designed to add a new grammatical relation (edge) between two nodes (words) in a dependency graph.

In [1]:
# Install Java
!apt-get update
!apt-get install -y openjdk-11-jdk-headless
!java -version

# Download Stanford CoreNLP
# Download Stanford CoreNLP 4.5.5
!wget https://nlp.stanford.edu/software/stanford-corenlp-4.5.5.zip
!unzip stanford-corenlp-4.5.5.zip

Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease
Get:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease [128 kB]        
Get:3 http://security.ubuntu.com/ubuntu focal-security InRelease [128 kB]      
Get:4 http://archive.ubuntu.com/ubuntu focal-backports InRelease [128 kB]      
Get:5 https://packages.cloud.google.com/apt gcsfuse-focal InRelease [1227 B]
Get:6 https://packages.cloud.google.com/apt cloud-sdk InRelease [1618 B]
Get:7 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [4069 kB]
Get:8 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [1274 kB]
Get:9 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [4090 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1566 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [4532 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [4242 kB]
Get:13 https://packages.clou

In [3]:
!wget http://nlp.stanford.edu/software/stanford-corenlp-4.5.5-models-english.jar

--2024-10-17 15:07:39--  http://nlp.stanford.edu/software/stanford-corenlp-4.5.5-models-english.jar
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/software/stanford-corenlp-4.5.5-models-english.jar [following]
--2024-10-17 15:07:39--  https://nlp.stanford.edu/software/stanford-corenlp-4.5.5-models-english.jar
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://downloads.cs.stanford.edu/nlp/software/stanford-corenlp-4.5.5-models-english.jar [following]
--2024-10-17 15:07:40--  https://downloads.cs.stanford.edu/nlp/software/stanford-corenlp-4.5.5-models-english.jar
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|

In [6]:
!export CLASSPATH=$CLASSPATH:/kaggle/working/stanford-corenlp-4.5.5/*:/kaggle/working/stanford-corenlp-4.5.5-models-english.jar


In [None]:
# Add a New Grammatical Relation (obj): Introduce an object relation between the verb "sleeps" and the noun "mat" in the sentence "The cat sleeps on the mat."
# Observe the Impact on the Dependency Graph: Examine how adding this new relation affects the existing grammatical relations, particularly the obl:on (oblique modifier) relation.


In [None]:
# Semgrex pattern:
# {word:sleeps}=verb: Matches the word "sleeps" and labels it as verb.
# >nsubj {word:cat}=subject: Indicates that "sleeps" has a nominal subject (nsubj) relation to "cat," which is labeled as subject.
# >/(nmod|obl):.*/: Matches either a nmod (nominal modifier) or obl (oblique modifier) relation to any label.
# ({word:mat}=object): Captures the word "mat" and labels it as object.


In [7]:
java_code = """

// Import dependencies
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.semgraph.semgrex.*;
import edu.stanford.nlp.semgraph.semgrex.ssurgeon.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.*;
import java.util.*;

public class SsurgeonAddEdgeTest {
    public static void main(String[] args) {
    // Setting Up the NLP Pipeline
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize,ssplit,pos,lemma,parse");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        
        // Adding an edge 
        runTest(pipeline, "SsurgeonAddEdgeTest - New Node",
                "The cat sleeps on the mat.",
                // Semgrex Pattern
                "{word:sleeps}=verb >nsubj {word:cat}=subject >/(nmod|obl):.*/ ({word:mat}=object)",
                (pattern) -> {
                    AddEdge addObjEdge = AddEdge.createEngAddEdge("verb", "object", "obj");
                    pattern.addEdit(addObjEdge);
                    return pattern;
                });               
    }
                
    private static void runTest(StanfordCoreNLP pipeline, String testName, String text, String semgrexPattern, java.util.function.Function<SsurgeonPattern, SsurgeonPattern> patternModifier) {
        System.out.println("Testing " + testName + ":");
        SemanticGraph graph = annotateAndGetGraph(pipeline, text);
        System.out.println("Original graph:");
        System.out.println(graph.toFormattedString());

        try {
            SemgrexPattern semgrexPat = SemgrexPattern.compile(semgrexPattern);
            System.out.println("Semgrex pattern: " + semgrexPattern);

            SemgrexMatcher matcher = semgrexPat.matcher(graph);
            if (matcher.find()) {
                System.out.println("Pattern matched.");
                System.out.println("Matched nodes:");
                for (String nodeName : matcher.getNodeNames()) {
                    System.out.println("  " + nodeName + ": " + matcher.getNode(nodeName));
                }
                SsurgeonPattern surgeonPattern = new SsurgeonPattern(semgrexPat);
                surgeonPattern = patternModifier.apply(surgeonPattern);

                Collection<SemanticGraph> result = Ssurgeon.inst().exhaustFromPatterns(Collections.singletonList(surgeonPattern), graph);
                if (result != null && !result.isEmpty()) {
                    System.out.println("Graph after operations:");
                    System.out.println(result.iterator().next().toFormattedString());
                } else {
                    System.out.println("No changes were made to the graph.");
                }
            } else {
                System.out.println("No matches for the Semgrex pattern.");
                System.out.println("Graph nodes:");
                for (IndexedWord node : graph.vertexSet()) {
                    System.out.println("  " + node.word() + "/" + node.tag());
                }
            }
        } catch (Exception e) {
            System.out.println("Error during " + testName + " operations: " + e.getMessage());
            e.printStackTrace();
        }
        System.out.println();
    }

    private static SemanticGraph annotateAndGetGraph(StanfordCoreNLP pipeline, String text) {
        Annotation document = new Annotation(text);
        pipeline.annotate(document);
        List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
        return sentences.get(0).get(SemanticGraphCoreAnnotations.EnhancedPlusPlusDependenciesAnnotation.class);
    }
}
   
"""

with open('SsurgeonAddEdgeTest.java', 'w') as f:
    f.write(java_code)

# Compile the Java code
compile_command = ["javac", "-encoding", "UTF-8", "-cp", ".:/kaggle/working/stanford-corenlp-4.5.5/*", "SsurgeonAddEdgeTest.java"]
compile_result = subprocess.run(compile_command, capture_output=True, text=True)

if compile_result.returncode == 0:
    print("Compilation successful")
    
    # Run the Java program
    run_command = ["java", "-cp", ".:/kaggle/working/stanford-corenlp-4.5.5/*", "SsurgeonAddEdgeTest"]
    run_result = subprocess.run(run_command, capture_output=True, text=True)
    
    print("Program output:")
    print(run_result.stdout)
    
    if run_result.stderr:
        print("Errors or warnings:")
        print(run_result.stderr)
else:
    print("Compilation failed:")
    print(compile_result.stderr)

Compilation successful
Program output:
Testing SsurgeonAddEdgeTest - New Node:
Original graph:
[sleeps/VBZ
  nsubj>[cat/NN det>The/DT]
  obl:on>[mat/NN case>on/IN det>the/DT]
  punct>./.]
Semgrex pattern: {word:sleeps}=verb >nsubj {word:cat}=subject >/(nmod|obl):.*/ ({word:mat}=object)
Pattern matched.
Matched nodes:
  verb: sleeps/VBZ
  subject: cat/NN
  object: mat/NN
Graph after operations:
[sleeps/VBZ
  nsubj>[cat/NN det>The/DT]
  obj>[mat/NN case>on/IN det>the/DT]
  obl:on>
  punct>./.]


[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [1.2 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO e

In [None]:
# Output

# Original Dependency Graph:
# sleeps/VBZ: The verb "sleeps" is the root of the sentence.
# nsubj>[cat/NN det>The/DT]: "cat" is the nominal subject of "sleeps," with "The" as its determiner.
# obl:on>[mat/NN case>on/IN det>the/DT]: "mat" is connected to "sleeps" via an oblique modifier introduced by the preposition "on," with "the" as its determiner.
# punct>./.: The period marks the end of the sentence.

# Final Dependency Graph After AddEdge
# Added obj Relation: "mat" is now directly connected to "sleeps" as its object.
# Empty obl:on> Relation: The original obl:on> relation remains but no longer points to any node, effectively becoming empty.

In [None]:
# Key Observations:
# Pattern Matching: The Semgrex pattern successfully matches the intended parts of the sentence.
# Intended Function: The AddEdge class is supposed to add a new grammatical relation without altering existing ones unless necessary.
# Observed Behavior: While AddEdge successfully adds the obj relation between "sleeps" and "mat," it inadvertently leaves the existing obl:on> relation empty, effectively removing the connection to "mat.
# Edge Addition: An obj relation is added between "sleeps" and "mat."
# Empty obl:on> Relation: The original obl:on> relation becomes empty, meaning it no longer points to "mat."
# Repeated Passes: The Ssurgeon processes the pattern multiple times (depths 1, 2, 3) but no further changes are made after the initial edit.

In [None]:
# Inferences from the Output
# Edge Overwriting: Adding a new relation (obj) might be conflicting with the existing relation (obl:on>), leading to the latter being nullified or emptied.
# Multiplicity of Relations: The dependency graph allows multiple relations between the same governor and dependent. However, the AddEdge operation seems to mishandle this by not preserving existing relations when a new one is added.
# Graph Consistency Maintenance: Ssurgeon may be attempting to maintain graph consistency by limiting the number of relations between nodes, inadvertently removing existing relations when new ones are added.
# Redundancy: Having both obj and obl:on> relations between "sleeps" and "mat" could be redundant or semantically conflicting, especially if one is intended to replace the other.
# Incomplete Edits: The empty obl:on> indicates that the edit operation didn't fully account for all existing relations, leading to an incomplete or inconsistent dependency graph.

