<a href="https://colab.research.google.com/github/Supun1234/Thesis/blob/main/End2End.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
# ==============================================================================
# I. SETUP: INSTALL NECESSARY LIBRARIES
# ==============================================================================
!pip install transformers torch spacy pandas -q
!pip install spacy-transformers -q
!python -m spacy download en_core_web_trf -q

# ==============================================================================
# II. IMPORT LIBRARIES AND LOAD MODELS
# ==============================================================================
import spacy
import torch
import pandas as pd
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification

# --- Hugging Face Model (dslim/bert-base-NER) ---
hf_tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
hf_model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")
hf_ner_pipeline = pipeline("ner", model=hf_model, tokenizer=hf_tokenizer, aggregation_strategy="simple")

# --- spaCy Model (with Transformer Pipeline) ---
spacy_nlp = spacy.load("en_core_web_trf")

# ==============================================================================
# III. CORE FUNCTIONS OF THE REQUIREMENT EXTRACTION PIPELINE
# ==============================================================================

def preprocess_text(text):
    """Simple text preprocessing function."""
    return text.strip()

def extract_agr_from_huggingface(sentence):
    """Extracts AGR from a sentence using Hugging Face NER."""
    ner_results = hf_ner_pipeline(sentence)
    actor = {"text": None, "confidence": 0.0}
    goal = {"text": None, "confidence": 0.0}
    rationale = {"text": None, "confidence": 0.0}

    # Heuristic: First PER or ORG is the Actor
    for entity in ner_results:
        if entity['entity_group'] in ['PER', 'ORG']:
            actor["text"] = entity['word']
            actor["confidence"] = float(entity['score'])
            break

    # Fallback for generic actors if NER fails
    if not actor["text"]:
        generic_actors = ['user', 'users', 'admin', 'customer', 'system', 'application']
        for act in generic_actors:
            if act in sentence.lower():
                actor["text"] = act
                actor["confidence"] = 0.80
                break

    # Rationale & Goal Extraction with expanded keywords
    rationale_keywords = ['so that', 'in order to', 'to', 'without']
    text_to_split = sentence

    found_rationale = False
    for keyword in rationale_keywords:
        if f" {keyword} " in text_to_split:
            parts = text_to_split.split(f" {keyword} ", 1)
            goal["text"] = parts[0].strip()
            rationale["text"] = (keyword + " " + parts[1]).strip()
            goal["confidence"] = 0.90
            rationale["confidence"] = 0.90
            found_rationale = True
            break

    if not found_rationale:
        goal["text"] = text_to_split
        goal["confidence"] = 0.85

    return {"Actor": actor, "Goal": goal, "Rationale": rationale}

def extract_agr_from_spacy(sentence):
    """Extracts AGR using spaCy with robust dependency parsing."""
    doc = spacy_nlp(sentence)
    actor = {"text": None, "confidence": 0.0}
    goal = {"text": None, "confidence": 0.0}
    rationale = {"text": None, "confidence": 0.0}

    # 1. Actor Extraction (NER with grammatical subject fallback)
    for ent in doc.ents:
        if ent.label_ in ["PERSON", "ORG"]:
            actor["text"] = ent.text
            actor["confidence"] = 0.95
            break
    if not actor["text"]:
        for token in doc:
            if "nsubj" in token.dep_:
                subject_phrase = ' '.join([t.text for t in token.subtree])
                actor["text"] = subject_phrase
                actor["confidence"] = 0.90
                break

    # 2. Goal & Rationale Extraction
    rationale_keywords = ['so that', 'in order to', 'to', 'without']
    rationale_start_index = -1
    for keyword in rationale_keywords:
        if f" {keyword} " in sentence:
            rationale_start_index = sentence.find(f" {keyword} ")
            break

    if rationale_start_index != -1:
        goal["text"] = sentence[:rationale_start_index].strip()
        rationale["text"] = sentence[rationale_start_index:].strip()
        goal["confidence"] = 0.95
        rationale["confidence"] = 0.95
    else:
        goal["text"] = sentence
        goal["confidence"] = 0.90

    # 3. Refine Goal text by removing the actor
    if actor["text"] and goal["text"] and actor["text"] in goal["text"]:
        actor_end_index = goal["text"].find(actor["text"]) + len(actor["text"])
        refined_goal_text = goal["text"][actor_end_index:].strip()
        filler_words = ["shall", "should", "must", "will", "can"]
        first_word = refined_goal_text.split(' ')[0] if refined_goal_text else ""
        if first_word in filler_words:
            refined_goal_text = refined_goal_text.replace(first_word, "", 1).strip()
        goal["text"] = refined_goal_text

    return {"Actor": actor, "Goal": goal, "Rationale": rationale}

def merge_agr_triplets(hf_agr, spacy_agr):
    """Merges AGR triplets, favoring the more robust spaCy output."""
    merged_agr = {}
    def choose_best(slot_name):
        hf_slot, spacy_slot = hf_agr[slot_name], spacy_agr[slot_name]
        if spacy_slot["text"]: return spacy_slot
        if hf_slot["text"]: return hf_slot
        return spacy_slot
    merged_agr["Actor"] = choose_best("Actor")
    merged_agr["Goal"] = choose_best("Goal")
    merged_agr["Rationale"] = choose_best("Rationale")
    return merged_agr

def evaluate_completeness_and_confidence(merged_agr):
    """Calculates completeness and confidence scores."""
    filled_slots = sum(1 for slot in merged_agr.values() if slot["text"])
    completeness = filled_slots / 3.0
    c_actor = merged_agr["Actor"]["confidence"] if merged_agr["Actor"]["text"] else 0
    c_goal = merged_agr["Goal"]["confidence"] if merged_agr["Goal"]["text"] else 0
    c_rationale = merged_agr["Rationale"]["confidence"] if merged_agr["Rationale"]["text"] else 0
    weighted_confidence = (c_actor + c_goal + c_rationale) / 3.0
    return completeness, weighted_confidence

# ==============================================================================
# IV. OUTPUT AND VISUALIZATION FUNCTIONS
# ==============================================================================

def display_results(final_agr, completeness, confidence):
    """Displays the final structured requirement in a table."""
    display_data = {
        'Actor': [final_agr["Actor"]["text"]],
        'Goal': [final_agr["Goal"]["text"]],
        'Rationale': [final_agr["Rationale"]["text"]],
        'Completeness': [f"{completeness:.2%}"],
        'Confidence': [f"{confidence:.2%}"]
    }
    df = pd.DataFrame(display_data)
    print("\n--- Structured Requirement ---")
    print(df.to_string(index=False))

def print_comma_separated_agr(final_agr):
    """
    (NEW) Prints the extracted AGR components as a single comma-separated string.
    """
    actor_text = final_agr['Actor']['text'] or "None"
    goal_text = final_agr['Goal']['text'] or "None"
    rationale_text = final_agr['Rationale']['text'] or "None"

    # Create the comma-separated string
    agr_string = f"Actor: {actor_text}, Goal: {goal_text}, Rationale: {rationale_text}"

    print("\n--- Comma-Separated AGR Result ---")
    print(agr_string)

def print_semantic_graph(final_agr):
    """Prints a Cypher-like text representation of the semantic graph."""
    actor_node = f"({final_agr['Actor']['text'] or 'UnspecifiedActor'})"
    goal_node = f"({final_agr['Goal']['text'] or 'UnspecifiedGoal'})"
    cypher_string = f"{actor_node} -[:PERFORMS_GOAL]-> {goal_node}"
    if final_agr['Rationale']['text']:
        rationale_node = f"({final_agr['Rationale']['text'] or 'UnspecifiedRationale'})"
        cypher_string += f" -[:WITH_CONSTRAINT_OR_PURPOSE]-> {rationale_node}"
    print("\n--- Semantic Graph (Cypher-like text) ---")
    print(cypher_string)

# ==============================================================================
# V. MAIN EXECUTION PIPELINE
# ==============================================================================

def main():
    """Main function to run the requirement extraction pipeline."""
    requirement_sentence = input("What is your requirement? ")
    processed_sentence = preprocess_text(requirement_sentence)
    print("\nRunning NER Model 1 (Hugging Face with expanded keywords)...")
    hf_agr = extract_agr_from_huggingface(processed_sentence)
    print("Running NER Model 2 (spaCy with dependency parsing)...")
    spacy_agr = extract_agr_from_spacy(processed_sentence)
    print("Merging results...")
    final_agr = merge_agr_triplets(hf_agr, spacy_agr)
    completeness, confidence = evaluate_completeness_and_confidence(final_agr)

    # --- ALL OUTPUTS ---
    display_results(final_agr, completeness, confidence) # 1. Table
    print_comma_separated_agr(final_agr)                 # 2. Comma-separated string (NEW)
    print_semantic_graph(final_agr)                      # 3. Graph

# --- Run the main program ---
if __name__ == "__main__":
    main()

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m457.4/457.4 MB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_trf')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


What is your requirement? The system shall prevent unauthorized access to sensitive configuration files

Running NER Model 1 (Hugging Face with expanded keywords)...
Running NER Model 2 (spaCy with dependency parsing)...
Merging results...

--- Structured Requirement ---
     Actor                        Goal                        Rationale Completeness Confidence
The system prevent unauthorized access to sensitive configuration files      100.00%     93.33%

--- Comma-Separated AGR Result ---
Actor: The system, Goal: prevent unauthorized access, Rationale: to sensitive configuration files

--- Semantic Graph (Cypher-like text) ---
(The system) -[:PERFORMS_GOAL]-> (prevent unauthorized access) -[:WITH_CONSTRAINT_OR_PURPOSE]-> (to sensitive configuration files)


In [8]:
if __name__ == "__main__":
    main()

What is your requirement? As an admin, I need to access the dashboard to monitor system performance in real time.

Running NER Model 1 (Hugging Face with expanded keywords)...
Running NER Model 2 (spaCy with dependency parsing)...
Merging results...

--- Structured Requirement ---
Actor Goal                                                           Rationale Completeness Confidence
    I need to access the dashboard to monitor system performance in real time.      100.00%     93.33%

--- Comma-Separated AGR Result ---
Actor: I, Goal: need, Rationale: to access the dashboard to monitor system performance in real time.

--- Semantic Graph (Cypher-like text) ---
(I) -[:PERFORMS_GOAL]-> (need) -[:WITH_CONSTRAINT_OR_PURPOSE]-> (to access the dashboard to monitor system performance in real time.)
