# Study Summaries Comparison

## Clustering

We are going to organize both summaries into predefined themes.  

To analyze both summaries we define 4 themes for both summaries in such a way that the theme content is discussing roughly the same portion of the original study.

The generic themes which are easily identifiable at both summaries are `Introduction`, `Methodology`, `Findings` and `Conclusion`.

We are going to define a dictionary for both summaries a dictionary with the aforementioned keys and the values will be the actually title in the summary text.


In [86]:
# MySummary.txt themes
my_themes = {
    'Introduction': 'Introduction and Motivation',
    'Methodology' : 'Research Methodology',
    'Findings'    : 'Findings and Analysis',
    'Conclusion'  : 'Limitations and Future Work'}

# LLM_Summary.txt themes
llm_themes = {
    'Introduction': 'Introduction',
    'Methodology' : 'Methodology',
    'Findings'    : 'Key Findings',
    'Conclusion'  : 'Conclusion and Future Work'}

We are defining a function that is splitting the summaries text into sections based on the text headings.  
The function is going to return a sections dictionary with the generic headings `Introduction`, `Methodology`, `Findings` and `Conclusion` as keys and the correponding text as values.


In [104]:
import re

def extract_sections(text, themes):
    """
    Splits text into sections based on the provided theme headings.
    Returns a dictionary with theme as key and corresponding text as value.
    """
    sections = {}
    # get a list of the actual text headings
    text_headings = themes.values()
    # create a reverse mapping of actual text_headings -> generic headings
    generic_headings = {v:k for k,v in themes.items()}
    # Create a regex pattern that matches any of the text headings.
    # (Assuming that headings appear at the beginning of a line)
    pattern = r'(?m)^(' + '|'.join(re.escape(heading) for heading in text_headings) + r')'
    
    # Find all matches and split text accordingly.
    splits = re.split(pattern, text)
    # re.split returns a list where headings are also part of the result.
    # The first element is any text before the first heading (if any).
    current_heading = None
    for segment in splits:
        segment = segment.strip()
        if segment in text_headings:
            current_heading = segment
            sections[generic_headings[current_heading]] = ""
        elif current_heading:
            sections[generic_headings[current_heading]] += segment + "\n"
    return sections


Reading the summary files:

In [109]:
# Read the files
with open("MySummary.txt", "r", encoding="utf-8") as file:
    my_summary_text = file.read()

with open("LLM_Summary.txt", "r", encoding="utf-8") as file:
    llm_summary_text = file.read()


Extracting the sections for each summary.

In [110]:
# Extract sections for each summary
my_sections = extract_sections(my_summary_text, my_themes)
llm_sections = extract_sections(llm_summary_text, llm_themes)

Checking and validation the section split result:

In [111]:
# check the number of sections
print("my_sections length:", len(my_sections))
print("llm_sections length:", len(llm_sections))

my_sections length: 4
llm_sections length: 4


Visually inspecting the last section for both cases:

In [112]:
print(my_sections.keys())

dict_keys(['Introduction', 'Methodology', 'Findings', 'Conclusion'])


In [114]:
print(my_sections['Conclusion'])

While domain models can clearly highlight missing requirements, this study did not evaluate whether analysts effectively identify and correct those omissions in practice. Future research should include user studies to explore the practical effectiveness of domain models in supporting requirements validation.

Conclusion
This empirical study provides concrete evidence supporting domain models' value as effective tools for completeness checking in natural-language requirements specifications. By systematically highlighting omissions, particularly entirely missing requirements, domain models can significantly improve requirements quality, making them valuable components of requirements engineering practice.



In [115]:
print(llm_sections.keys())

dict_keys(['Introduction', 'Methodology', 'Findings', 'Conclusion'])


In [116]:
print(llm_sections['Conclusion'])

The study provides empirical evidence that domain models can help identify missing and under-specified requirements, though their effectiveness depends on how frequently concepts are referenced in the requirements. The results suggest that domain models should be complemented by other techniques for completeness checking. Future work should focus on user studies to evaluate whether analysts can effectively leverage domain models in practice.

