# Study Summaries Comparison

## Clustering

We are going to organize both summaries into predefined themes.  

To analyze both summaries we define 4 themes for both summaries in such a way that the theme content is discussing roughly the same portion of the original study.

The generic themes which are easily identifiable at both summaries are `Introduction`, `Methodology`, `Findings` and `Conclusion`.

We are going to define a dictionary for both summaries a dictionary with the aforementioned keys and the values will be the actually title in the summary text.


In [86]:
# MySummary.txt themes
my_themes = {
    'Introduction': 'Introduction and Motivation',
    'Methodology' : 'Research Methodology',
    'Findings'    : 'Findings and Analysis',
    'Conclusion'  : 'Limitations and Future Work'}

# LLM_Summary.txt themes
llm_themes = {
    'Introduction': 'Introduction',
    'Methodology' : 'Methodology',
    'Findings'    : 'Key Findings',
    'Conclusion'  : 'Conclusion and Future Work'}

We are defining a function that is splitting the summaries text into sections based on the text headings.  
The function is going to return a sections dictionary with the generic headings `Introduction`, `Methodology`, `Findings` and `Conclusion` as keys and the correponding text as values.


In [104]:
import re

def extract_sections(text, themes):
    """
    Splits text into sections based on the provided theme headings.
    Returns a dictionary with theme as key and corresponding text as value.
    """
    sections = {}
    # get a list of the actual text headings
    text_headings = themes.values()
    # create a reverse mapping of actual text_headings -> generic headings
    generic_headings = {v:k for k,v in themes.items()}
    # Create a regex pattern that matches any of the text headings.
    # (Assuming that text headings appear at the beginning of a line)
    pattern = r'(?m)^(' + '|'.join(re.escape(heading) for heading in text_headings) + r')'
    
    # Find all matches and split text accordingly.
    splits = re.split(pattern, text)
    # re.split returns a list where headings are also part of the result.
    # The first element is any text before the first heading (if any).
    current_heading = None
    for segment in splits:
        segment = segment.strip()
        if segment in text_headings:
            current_heading = segment
            sections[generic_headings[current_heading]] = ""
        elif current_heading:
            sections[generic_headings[current_heading]] += segment + "\n"
    return sections


Reading the summary files:

In [109]:
# Read the files
with open("MySummary.txt", "r", encoding="utf-8") as file:
    my_summary_text = file.read()

with open("LLM_Summary.txt", "r", encoding="utf-8") as file:
    llm_summary_text = file.read()


Extracting the sections for each summary.

In [110]:
# Extract sections for each summary
my_sections = extract_sections(my_summary_text, my_themes)
llm_sections = extract_sections(llm_summary_text, llm_themes)

Checking and validation the section split result:

In [111]:
# check the number of sections
print("my_sections length:", len(my_sections))
print("llm_sections length:", len(llm_sections))

my_sections length: 4
llm_sections length: 4


Visually inspecting the last section for both cases:

In [112]:
print(my_sections.keys())

dict_keys(['Introduction', 'Methodology', 'Findings', 'Conclusion'])


In [114]:
print(my_sections['Conclusion'])

While domain models can clearly highlight missing requirements, this study did not evaluate whether analysts effectively identify and correct those omissions in practice. Future research should include user studies to explore the practical effectiveness of domain models in supporting requirements validation.

Conclusion
This empirical study provides concrete evidence supporting domain models' value as effective tools for completeness checking in natural-language requirements specifications. By systematically highlighting omissions, particularly entirely missing requirements, domain models can significantly improve requirements quality, making them valuable components of requirements engineering practice.



In [115]:
print(llm_sections.keys())

dict_keys(['Introduction', 'Methodology', 'Findings', 'Conclusion'])


In [116]:
print(llm_sections['Conclusion'])

The study provides empirical evidence that domain models can help identify missing and under-specified requirements, though their effectiveness depends on how frequently concepts are referenced in the requirements. The results suggest that domain models should be complemented by other techniques for completeness checking. Future work should focus on user studies to evaluate whether analysts can effectively leverage domain models in practice.



## Diffing with Python's `difflib`

Using the built-in `difflib` module to compute and print the similarity ratio between two sections.  
We are defining the function `similarity_ratios` which is going to compute the similarity between the sections of the two summary.

In [187]:
from difflib import SequenceMatcher

def compute_similarity_ratios(sequences1, sequences2):

    ratios = {}
    for theme in sequences1:
        ratio = SequenceMatcher(
            None,
            sequences1[theme],
            sequences2[theme]
        ).ratio()
        ratios[theme] = ratio
    return ratios


Computing and displaying the similarity ratios between `my_sections` and `llm_sections`:

In [188]:
my_llm_similarity_ratios = compute_similarity_ratios(my_sections, llm_sections)
for theme in my_llm_similarity_ratios:
    print(f'{theme:12}: {my_llm_similarity_ratios[theme]:.2f}')

Introduction: 0.05
Methodology : 0.13
Findings    : 0.10
Conclusion  : 0.05


We can observe that the `Introduction` and `Conclusion` sections got only `5%` similarity ratio, while the `Methodology` got `13%` and the `Findings` section got `10%`.

## TF-IDF and Cosine Similarity

Converting each section into a vector representation using TF-IDF (via scikit-learn), 
then calculating the cosine similarity.  
This method highlights the overall textual differences:

In [170]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


def compute_cosine_similarities(sections1, sections2):
    cos_similarities = {}
    for theme in sections1:
        vectorizer = TfidfVectorizer()
        texts = [sections1[theme], sections2[theme]]
        tfidf_matrix = vectorizer.fit_transform(texts)
        # Use the imported cosine_similarity function from scikit-learn
        sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
        cos_similarities[theme] = sim[0][0]
    return cos_similarities


In [176]:
my_llm_cosine_similarities = compute_cosine_similarities(my_sections, llm_sections)
for theme in my_llm_cosine_similarities:
    print(f'{theme:12}: {my_llm_cosine_similarities[theme]:.2f}')

Introduction: 0.65
Methodology : 0.59
Findings    : 0.65
Conclusion  : 0.47


## Embedding-Based Comparison

In [183]:
from sentence_transformers import SentenceTransformer, util

def compute_semantic_similarities(sections1, sections2):
    sem_similarities = {}
    for theme in sections1:
        model = SentenceTransformer('all-MiniLM-L6-v2')
        embeddings = model.encode([my_sections[theme], llm_sections[theme]], convert_to_tensor=True)
        sem_similarities[theme] = util.pytorch_cos_sim(embeddings[0], embeddings[1])
    return sem_similarities

Semantic similarity: 0.8764227628707886


In [184]:
my_llm_semantic_similarities = compute_semantic_similarities(my_sections, llm_sections)
for theme in my_llm_semantic_similarities:
    print(f'{theme:12}: {my_llm_semantic_similarities[theme][0][0]:.2f}')

Introduction: 0.86
Methodology : 0.88
Findings    : 0.92
Conclusion  : 0.88


## Keyword Extraction Using RAKE

In [197]:
from rake_nltk import Rake
# import nltk
# nltk.download('stopwords')
# nltk.download('punkt_tab')

def extract_keywords(text):
    # Initialize RAKE with NLTK's stopwords
    rake = Rake()
    rake.extract_keywords_from_text(text)
    # Get ranked phrases, higher rank means more important
    return set(rake.get_ranked_phrases())

# Example sections from each summary
my_intro = my_sections['Introduction']
llm_intro = llm_sections['Introduction']

# Extract keywords
my_keywords = extract_keywords(my_intro)
llm_keywords = extract_keywords(llm_intro)

# Compare the keyword sets
common_keywords = my_keywords.intersection(llm_keywords)
unique_to_my = my_keywords - llm_keywords
unique_to_llm = llm_keywords - my_keywords

print("Common Keywords:", common_keywords, '\n\n')
print("Unique to MySummary:", unique_to_my, '\n\n')
print("Unique to LLM_Summary:", unique_to_llm)


Common Keywords: {'completeness', 'shall', 'specified requirements', 'statements', 'checking', 'models', 'specification', 'domain models', 'requirements', 'uml class diagrams'} 


Unique to MySummary: {'external domain knowledge', 'domain model • abstract', 'using domain models', 'every important domain concept', '• models typically include domain concepts', 'feature', 'style', 'specification includes', 'system shall ...").', 'g .,', 'refers', 'external sources', 'explicitly capture domain concepts', 'ensuring', 'software requirements', 'included within', 'objects ).', 'absence', 'key concepts completeness • internal completeness', '— external completeness involves cross', 'checks', 'real', 'structured representations', 'appear', 'single function', 'implementation', 'thus', 'e', 'system design', 'constraints', 'structured', '• functional requirements', 'conditions', 'avoiding costly oversights', 'underlying hypothesis', 'specification contains', 'paper empirically investigates', 'pract