### QA Generation

In this notebook, the Question-Answering (QA) Dataset for the annotation data [merged_ecsa_icsa.json](../../../../data/external/merged_ecsa_icsa.json) is created.

The creation was guided by a matrix consisting of two dimensions:
1. **Use Case**
    - **1**: The first use case reflects the current state-of-practice in scientific literature search. The researcher seeks additional details about the metadata of one or more papers. To find this information, the researcher provides the QA system with specific metadata information related to the papers he is interested in. In response, the QA system returns information on other metadata attributes of the papers rather than content information.
    - **2**: In the second use case, the researcher seeks information about the content of one or more papers. In this use case, the researcher provides the QA system with metadata information about the papers and asks a question about their contents. The QA system is then expected to extract content information related to specific papers that conform to the metadata constraints provided.
    - **3**: In the third use case, the researcher seeks information about metadata of one or more papers. In this use case, the researcher provides the QA system with content constraints about the papers and asks a question about the metadata of the paper. The QA system is then expected to extract metadata information related to the specific papers mentioned in the question.
    - **4**: In the fourth use case, the researcher seeks information about the content of one or more papers. In this use case, the researcher provides the QA system with content information about the paper, and asks a question about the content of the paper. The QA system is then expected to extract content information related to the specific papers mentioned in the question.
    - **5**: In the fifth use case, a researcher seeks information about the content of one or more papers. In this use case, the researcher provides the retriever with both metadata and content information about the papers and asks a question about the content of the paper. The retriever is then expected to extract content information related to the specific papers mentioned in the question.
    - **6**: In the sixth use case, the researcher seeks information about metadata of one or more papers. In this specific use case, the researcher provides the retriever with both metadata and content information about the papers, such as the name of an evaluation method and the year of publication, and asks a question about the metadata of the papers. The retriever is then expected to extract metadata information related to the specific papers mentioned in the question.

2. **Retrieval Operation Classification**
    - **Basic**: Classifies those questions where the retriever is required to just find one ore more facts in the Knowledge Graph and use those to provide the answer without further processing.
    - **Aggregation**: Classifies those questions where the retriever is required to quantitatively or qualitatively aggregate the information in the Knowledge Graph to answer the question.
    - **Comparative**: Classifies those questions where the retriever is required to compare two or more pieces of information in the Knowledge Graph to answer the question.
    - **Ranking**: Classifies those questions where the retriever is required to rank the information in the Knowledge Graph to answer the question.
    - **Counting**: Classifies those questions where the retriever is required to count the number of occurrences of a certain information in the Knowledge Graph to answer the question.
    - **Superlative**: Classifies those questions where the retriever is required to identify the most or least of a certain information in the Knowledge Graph to answer the question.
    - **Relationship**: Classifies questions where the retriever must identify any type of interconnection or reliance between pieces of information in the Knowledge Graph. Essentially, it captures all scenarios where one piece of data is influenced by, contingent upon, or systematically linked to another.
    - **Negation**: Classifies those questions where the retriever is required to negate the information in the Knowledge Graph to answer the question.

### How to Read this File

For each prepared question template (see [here](../templates.md)) we prepare the parameters for the clustering or subgraph construction strategies below. From each of the generated questions, we then selected those questions that we considered to already have a high quality and include no hallucination while conforming to the indended template, use case, and retrieval operation.

In [None]:
%load_ext autoreload
%autoreload 2

import pandas as pd
from sqa_system.knowledge_base.knowledge_graph.storage import KnowledgeGraphManager
from sqa_system.core.language_model.llm_provider import LLMProvider
from sqa_system.core.data.models import QAPair
from sqa_system.app.cli.cli_progress_handler import ProgressHandler
from sqa_system.core.config.models import LLMConfig, KnowledgeGraphConfig, EmbeddingConfig

# Initialize the Generators
from sqa_system.qa_generator.strategies import (
    FromTopicEntityGenerator, FromTopicEntityGeneratorOptions, GenerationOptions)
from sqa_system.qa_generator.strategies import (
    PaperComparisonGenerator, PaperComparisonGeneratorOptions)
from sqa_system.qa_generator.strategies.clustering_strategy.cluster_based_question_generator import (
    ClusterBasedQuestionGenerator, 
    ClusterGeneratorOptions, 
    AdditionalInformationRestriction,
    ClusterStrategyOptions
)


# Prepare the Progress Handler, which we are going to disable because of compatibility issues
# with Jupyter Notebooks
progress_handler = ProgressHandler()
progress_handler.disable()

# Prepare Knowledge Graph
kg_config = KnowledgeGraphConfig.from_dict({
    "additional_params": {
        "contribution_building_blocks": {
            "Classifications_2": [
                "paper_class",
                "research_level",
                "all_research_objects",
                "validity",
                "evidence"
            ]
        },
        "force_cache_update": True,
        "force_publication_update": False,
        "subgraph_root_entity_id": "R659055",
        "orkg_base_url": "https://sandbox.orkg.org"
    },
    "graph_type": "orkg",
    "dataset_config": {
        "name": "merged_ecsa.json_jsonpublicationloader_limit-1",
        "additional_params": {},
        "file_name": "merged_ecsa_icsa.json",
        "loader": "JsonPublicationLoader",
        "loader_limit": -1
    },
    "extraction_llm": {
        "name": "openai_gpt-4o-mini_tmp0.0_maxt-1",
        "additional_params": {},
        "endpoint": "OpenAI",
        "name_model": "gpt-4o-mini",
        "temperature": 0.0,
        "max_tokens": -1
    },
    "extraction_context_size": 4000,
    "chunk_repetitions": 2
})
graph = KnowledgeGraphManager().get_item(kg_config)

# Prepare the Research Field Topic Entity
research_field = graph.get_entity_by_id("R659055")

# Prepare Language Model
gpt_4o_mini_config = LLMConfig.from_dict({
    "endpoint": "OpenAI",
    "name_model": "gpt-4o-mini",
    "temperature": 0.0,
    "max_tokens": -1
})
gpt_4o_mini = LLMProvider().get_llm_adapter(gpt_4o_mini_config)

gpt_4o_config = LLMConfig.from_dict({
    "endpoint": "OpenAI",
    "name_model": "gpt-4o",
    "temperature": 0.0,
    "max_tokens": -1
})
gpt_4o = LLMProvider().get_llm_adapter(gpt_4o_config)

gpt_o3_mini_config = LLMConfig.from_dict({
    "endpoint": "OpenAI",
    "name_model": "o3-mini",
    "temperature": None,
    "max_tokens": -1,
    "reasoning_effort": "low"
})
gpt_o3_mini = LLMProvider().get_llm_adapter(gpt_o3_mini_config)

embedding_config = EmbeddingConfig.from_dict({
    "name": "openai_text-embedding-3-small",
    "additional_params": {},
    "endpoint": "OpenAI",
    "name_model": "text-embedding-3-small"
})

def print_qa_pairs(qa_pairs: list[QAPair]):
    if not qa_pairs:
        print("No QA pairs generated")
    for qa_pair in qa_pairs:
        print(f"Question: {qa_pair.question}")
        print(f"Answer: {qa_pair.golden_answer}")
        print(f"Golden Triples: {qa_pair.golden_triples}")
        print(f"Hops: {qa_pair.hops}")
        print(f"Topic Entity: {qa_pair.topic_entity_value}")
        df = pd.DataFrame([qa_pair.model_dump()])
        print(f"CSV: \n {df.to_csv(index=False)}")
        print("------------------") 

Rotating log file
[32m2025-04-09 11:38:20,509[0m - New session started
[32m2025-04-09 11:38:27,699[0m - Connected to the ORKG API.
[32m2025-04-09 11:40:34,787[0m - Caching ORKG subgraph
[32m2025-04-09 11:44:06,331[0m - Finished caching ORKG subgraph
[32m2025-04-09 11:44:06,592[0m - Saved cached subgraph to /home/marco/master_thesis_implementation/sqa-system/data/knowledge_base/knowledge_graphs/orkg/18cb3cd3bfeb394bb28319e053e03582.json


## Use Case 1

### Basic


In [None]:
qa_pairs = []
qa_strategy = FromTopicEntityGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    from_topic_entity_options=FromTopicEntityGeneratorOptions(
        topic_entity=research_field,
    ),     
    options=GenerationOptions(
        template_text="In which venue has the paper '[paper_title]' been published?",
        additional_requirements=[
            "The generated question should include the title of the paper.",
            "The context should only include the triple that contains the venue of the paper",
        ],
        validate_contexts=False,
        convert_path_to_text=False,
        classify_questions=False,
    )     
)
qa_pairs.extend(qa_strategy.generate())

qa_strategy = FromTopicEntityGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    from_topic_entity_options=FromTopicEntityGeneratorOptions(
        topic_entity=graph.get_random_publication(),
    ),     
    options=GenerationOptions(
        template_text="In which venue has the paper '[paper_title]' been published?",
        additional_requirements=[
            "The generated question should include the title of the paper.",
            "The context should only include the triple that contains the venue of the paper",
        ],
        validate_contexts=False,
        convert_path_to_text=False,
        classify_questions=False,
    )     
)
qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: In which venue has the paper 'Availability-Driven Architectural Change Propagation Through Bidirectional Model Transformations Between UML and Petri Net Models' been published?
Answer: The paper 'Availability-Driven Architectural Change Propagation Through Bidirectional Model Transformations Between UML and Petri Net Models' has been published in the International Conference on Software Architecture (ICSA).
Golden Triples: ['(R872741:Availability-Driven Architectural Change Propagation Through Bidirectional Model Transformations Between UML and Petri Net Models, venue, R820814:International Conference on Software Architecture (ICSA))']
Hops: 2
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
b64e49ec-3abc-42b5-9012-60966bcadb3d,In which venue has the paper 'Availability-Driven Architectural Change Propagation Through Bidirecti

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="Who are the authors of the paper '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the authors of the paper.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="Who are the authors of the paper '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the authors of the paper.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: Who are the authors of the paper 'Evaluating the Effectiveness of Multi-level Greedy Modularity Clustering for Software Architecture Recovery'?
Answer: The authors of the paper 'Evaluating the Effectiveness of Multi-level Greedy Modularity Clustering for Software Architecture Recovery' are Hasan Sözer and others.
Golden Triples: ['(R874518:Evaluating the Effectiveness of Multi-level Greedy Modularity Clustering for Software Architecture Recovery, authors, R874519:authors list)']
Hops: 2
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
861c3352-b72a-47f0-9394-640672ed6cc7,Who are the authors of the paper 'Evaluating the Effectiveness of Multi-level Greedy Modularity Clustering for Software Architecture Recovery'?,The authors of the paper 'Evaluating the Effectiveness of Multi-level Greedy Modularity Clustering for Software Arch

In [2]:
qa_pairs = []

qa_pairs.extend(ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the replication package link of the paper '[paper_title]'?",
            additional_requirements=[],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P168010",
        restriction_text="Replication Package Link",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate())

qa_strategy = FromTopicEntityGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    from_topic_entity_options=FromTopicEntityGeneratorOptions(
        topic_entity=graph.get_entity_by_id("R870141"), # We manually selected a publication that includes a replication package link
    ),     
    options=GenerationOptions(
        template_text="What is the replication package link of the paper '[paper_title]'?",
        additional_requirements=[
            "The generated question should include the title of the paper.",
            "The context should only include the triple of replication package link.",
        ],
        validate_contexts=False,
        convert_path_to_text=False,
        classify_questions=False,
    )     
)
qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: What is the replication package link of the paper 'Trace Link Recovery for Software Architecture Documentation'?
Answer: The replication package link of the paper 'Trace Link Recovery for Software Architecture Documentation' is https://doi.org/10.5281/zenodo.4730621.
Golden Triples: ['(R870161:Evidence, Replication Package Link, L1523794:https://doi.org/10.5281/zenodo.4730621)']
Hops: 4
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
6a276751-1922-45d7-bbcb-963908c05fbf,What is the replication package link of the paper 'Trace Link Recovery for Software Architecture Documentation'?,The replication package link of the paper 'Trace Link Recovery for Software Architecture Documentation' is https://doi.org/10.5281/zenodo.4730621.,['10.1007/978-3-030-86044-8_7'],,"['(R870161:Evidence, Replication Package Link, L1523794:https://doi.

### Aggregation


In [3]:
qa_pairs = []
qa_pairs.extend(ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have been published by the author [author name] in the year [publication year]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate())

print_qa_pairs(qa_pairs)

[32m2025-03-13 07:28:33,763[0m - Reached the soft limit of 10.
Question: Which publications have been published by the author Jan Werf in the year 2017?
Answer: In the year 2017, Jan Werf published the paper titled 'Workload-Based Clustering of Coherent Feature Sets in Microservice Architectures'.
Golden Triples: ['(R874644:authors list, has list element, L1533039:Jan Werf)', '(R874643:Workload-Based Clustering of Coherent Feature Sets in Microservice Architectures, publication year, L1533042:2017)']
Hops: 3
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
e4eeb4de-a119-4f63-8748-afe951b5f9a2,Which publications have been published by the author Jan Werf in the year 2017?,"In the year 2017, Jan Werf published the paper titled 'Workload-Based Clustering of Coherent Feature Sets in Microservice Architectures'.","['10.1109/ICSA.2017.38', 

In [4]:
qa_pairs = []
qa_pairs.extend(ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers have the research level [research level]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162008",
        restriction_text="research level",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate())

print_qa_pairs(qa_pairs)

[32m2025-04-09 11:57:54,404[0m - Found 2 inital clusters
[32m2025-04-09 11:57:54,404[0m - Using 2 clusters for generation
[32m2025-04-09 11:58:01,108[0m - The cluster has a golden triple size of 153 which is higher than the limit of 10.
Question: Which papers have the research level secondary research?
Answer: The papers with the research level secondary research include: 'Assessing Architecture Conformance to Coupling-Related Patterns and Practices in Microservices', 'Guidelines for Architecting Android Apps: A Mixed-Method Empirical Study', and 'On Interfaces to Support Agile Architecting in Automotive: An Exploratory Case Study'.
Golden Triples: ['(R869572:Research Level, secondary research, L1522540:True)', '(R874180:Research Level, secondary research, L1532059:True)', '(R868829:Research Level, secondary research, L1520953:True)']
Hops: 4
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_w

### Counting

In [5]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many publications have been published by the author [author name]?",
            additional_requirements=[],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 07:41:47,496[0m - Reached the soft limit of 10.
Question: How many publications have been published by the author Johannes Grohmann?
Answer: Johannes Grohmann has published two papers, which are 'Integrating Statistical Response Time Models in Architectural Performance Models' and 'Incremental Calibration of Architectural Performance Models with Parametric Dependencies'.
Golden Triples: ['(R872195:authors list, has list element, L1527978:Johannes Grohmann)', '(R870060:authors list, has list element, L1523562:Johannes Grohmann)']
Hops: 3
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
4bbdeb87-29d5-4ea2-9703-9f91b27ee413,How many publications have been published by the author Johannes Grohmann?,"Johannes Grohmann has published two papers, which are 'Integrating Statistical Response Time Models in Architectural Performan

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many publications have the paper class with the name [paper class name]?",
            additional_requirements=[],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 01:39:36,857[0m - The cluster has a golden triple size of 78 which is higher than the limit of 10.
[32m2025-03-13 01:39:39,520[0m - The cluster has a golden triple size of 69 which is higher than the limit of 10.
[32m2025-03-13 01:39:39,520[0m - The cluster has a golden triple size of 49 which is higher than the limit of 10.
Question: How many publications have the paper class with the name 'personal experience paper'?
Answer: There are two publications that have the paper class with the name 'personal experience paper'. These publications are 'Data-Centric Communication and Containerization for Future Automotive Software Architectures' and 'Towards a Reference Architecture for Cloud-Based Plant Genotyping and Phenotyping Analysis Frameworks'.
Golden Triples: ['(R872664:Paper Class, personal experience paper, L1528944:True)', '(R872562:Paper Class, personal experience paper, L1528757:True)']
Hops: 4
Topic Entity: Software Architecture and Design
CSV: 
 uid,question

### Ranking

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have the paper class [paper class] ranked by their publication year?",
            additional_requirements=[
                "The context should only include the triples that contain the paper class and publication year of the paper",
                "The answer should be a list of publication titles in chronological order",
                "Ensure that the list is ordered correctly based on the publication year in descending order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 15:29:45,245[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R871545', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Max retries exceeded with url: /api/statements?subject_id=R871545&page=0&size=100 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f428a8de8d0>: Failed to resolve 'sandbox.orkg.org' ([Errno -3] Temporary failure in name resolution)"))
[32m2025-03-13 15:33:52,367[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'object_id': 'R872451', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Max retries exceeded with url: /api/statements?object_id=R872451&page=0&size=100 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f4289ef2c60>: Failed to establish a new connection: [Errno 111] Connection refused'))
[32m

In [6]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have been published by the author [author name] ranked by their publication year?",
            additional_requirements=[
                "The context should only include the triples that contain the authors and publication year of the paper",
                "The answer should be a list of publication titles in chronological order",
                "Ensure that the list is ordered correctly based on the publication year in descending order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 07:50:43,412[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 07:50:48,544[0m - Reached the soft limit of 10.
Question: Which publications have been published by the author Fahed Alkhabbas ranked by their publication year?
Answer: 1. A Goal-Driven Approach for Deploying Self-Adaptive IoT Systems (2020)
2. ECo-IoT: An Architectural Approach for Realizing Emergent Configurations in the Internet of Things (2018)
Golden Triples: ['(R868365:authors list, has list element, L1519934:Fahed Alkhabbas)', '(R868364:A Goal-Driven Approach for Deploying Self-Adaptive IoT Systems, publication year, L1519939:2020)', '(R870713:authors list, has list element, L1524940:Fahed Alkhabbas)', '(R870712:ECo-IoT: An Architectural Approach for Realizing Emergent Configurations in the Internet of Things, publication year, L1524943:2018)']
Hops: 3
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chun

### Comparative

In [7]:
qa_pairs = []
for _ in range(2):
    qa_strategy = PaperComparisonGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        comparison_options=PaperComparisonGeneratorOptions(
            first_publication=graph.get_random_publication(),
            second_publication=graph.get_random_publication(),
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="In what publication year has the publication '[paper title 1]' been published in comparison to the publication '[paper title 2]'?",
            additional_requirements=[
                "The context should only include the triples of the publication years of the papers.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: In what publication year has the publication 'Continuous Architecture: Towards the Goldilocks Zone and Away from Vicious Circles' been published in comparison to the publication 'Designing Robust Software Systems through Parametric Markov Chain Synthesis'?
Answer: The publication 'Continuous Architecture: Towards the Goldilocks Zone and Away from Vicious Circles' was published in 2019, while 'Designing Robust Software Systems through Parametric Markov Chain Synthesis' was published in 2017.
Golden Triples: ['(R873775:Continuous Architecture: Towards the Goldilocks Zone and Away from Vicious Circles, publication year, L1531203:2019)', '(R868403:Designing Robust Software Systems through Parametric Markov Chain Synthesis, publication year, L1520024:2017)']
Hops: 2
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
ce7c8800-8719-4a5

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = PaperComparisonGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        comparison_options=PaperComparisonGeneratorOptions(
            first_publication=graph.get_random_publication(),
            second_publication=graph.get_random_publication(),
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What is the paper class of the paper '[paper title 1]' in comparison to the publication '[paper title 2]'?",
            additional_requirements=[
                "The context should only include the triples of the paper classes of the papers.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

[32m2025-03-13 02:00:03,680[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'object_id': 'R871530', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: 500 Server Error:  for url: https://sandbox.orkg.org/api/statements?object_id=R871530&page=0&size=100
[32m2025-03-13 02:00:07,742[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'object_id': 'R871530', 'page': 0, 'size': 100}. Attempt 2 of 10. Error: 500 Server Error:  for url: https://sandbox.orkg.org/api/statements?object_id=R871530&page=0&size=100
[32m2025-03-13 02:00:18,073[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'object_id': 'R871530', 'page': 0, 'size': 100}. Attempt 3 of 10. Error: 500 Server Error:  for url: https://sandbox.orkg.org/api/statements?object_id=R871530&page=0&size=100
[32m2025-03-13 02:00:34,143[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'object_id': 'R871530', 'page':

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which author has published the most publications with the paper class [paper class name]?",
            additional_requirements=[
                "Include all triples in your context"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="authors",
                split_clusters=True,
            )
        ],
        only_use_cluster_with_most_triples=True
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 20:00:12,673[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'L1524648', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
Question: Which author has published the most publications with the paper class evaluation research?
Answer: Paris Avgeriou has published the most publications with the paper class evaluation research. This is evidenced by her authorship in multiple papers categorized under this class, including 'An Exploratory Study on Architectural Knowledge in Issue Tracking Systems', 'The Evolution of Technical Debt in the Apache Ecosystem', 'System- and Software-level Architecting Harmonization Practices for Systems-of-Systems', 'Exploring Web Search Engines to Find Architectural Knowledge', and 'Architectural Assumptions and Their Management in Industry – An Exploratory Study'.
Golden Triples: ['(R869240:Paper Class, evaluati

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the paper class that the author [author name] has published the most?",
            additional_requirements=[
                "The context should only include the triples that contain the research object and the sub-property.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="paper class"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of papers that have been published by the author [author name] per year?",
            additional_requirements=[
                "The context should only include the triples that contain the research object and the sub-property.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 16:44:38,862[0m - Found 105 inital clusters
[32m2025-03-14 16:44:57,056[0m - Using 104 clusters for generation
[32m2025-03-14 16:44:57,056[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-14 16:44:57,056[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-14 16:44:57,057[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-14 16:45:01,446[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-14 16:45:01,447[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-14 16:45:01,447[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-14 16:45:01,447[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-14 16:45:15,154[0m - Skipping QA-Pair as not all go

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of paper classes that have been published by the author [author name] per year?",
            additional_requirements=[
                "The context should only include the triples that contain the research object and the sub-property.",
                "First list the amount of times the entity was used in the year",
                "Then give a final statement about the proportion",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            ),
            AdditionalInformationRestriction(
                information_predicate="paper class",
                information_value_restriction="True"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 17:49:35,239[0m - Found 105 inital clusters
[32m2025-03-14 17:51:55,986[0m - Using 104 clusters for generation
[32m2025-03-14 17:52:01,019[0m - The cluster has a golden triple size of 11 which is higher than the limit of 10.
[32m2025-03-14 17:52:01,019[0m - The cluster has a golden triple size of 19 which is higher than the limit of 10.
[32m2025-03-14 17:52:01,019[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 17:52:01,019[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-14 17:52:28,651[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 17:52:47,921[0m - Reached the soft limit of 10.
Question: What is the proportion of paper classes that have been published by the author Mohammad Sharaf per year?
Answer: In the year 2017, Mohammad Sharaf published two papers: 'An Architecture Framework for Modelling and Simulation 

## Use Case 2

### Basic


In [8]:
qa_pairs = []
for _ in range(2):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="Does the paper '[paper_title]' use tool support?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the tool support of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "Just give a concise answer to the question.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="Does the paper '[paper_title]' use tool support?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the tool support of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "Just give a concise answer to the question.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: Does the paper 'Data-Driven Software Architecture for Analyzing Confidentiality' use tool support in the context of Data-Driven Software Architecture for Analyzing Confidentiality?
Answer: No, the paper does not use tool support as indicated by the context stating that tool support is not used.
Golden Triples: ['(R873675:Tool Support, used, L1530999:False)']
Hops: 5
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
b90770f5-fce7-4aeb-b264-b958fec510ba,Does the paper 'Data-Driven Software Architecture for Analyzing Confidentiality' use tool support in the context of Data-Driven Software Architecture for Analyzing Confidentiality?,"No, the paper does not use tool support as indicated by the context stating that tool support is not used.",['10.1109/ICSA.2019.00009'],,"['(R873675:Tool Support, used, L1530999:False)']",publication_s

In [None]:
qa_pairs = []
for _ in range(2):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation method used in the paper '[paper title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that specifically mention the evaluation method of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "The answer should be similar to the following: 'The evaluation method used in the paper is [evaluation method]'.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation method used in the paper '[paper title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that specifically mention the evaluation method of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "The answer should be similar to the following: 'The evaluation method used in the paper is [evaluation method]'.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: What is the evaluation method used in the paper 'REST vs GraphQL: A Controlled Experiment'?
Answer: The evaluation method used in the paper is Controlled Experiment.
Golden Triples: ['(R870894:Evaluation Method Entity, Name, L1525324:Controlled Experiment)']
Hops: 7
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
f4e3f510-e8e2-4e31-ba67-a6c8cbc8b7cb,What is the evaluation method used in the paper 'REST vs GraphQL: A Controlled Experiment'?,The evaluation method used in the paper is Controlled Experiment.,['10.1109/ICSA47634.2020.00016'],,"['(R870894:Evaluation Method Entity, Name, L1525324:Controlled Experiment)']",publication_subgraph_strategy,R659055,Software Architecture and Design,7,What is the evaluation method used in the paper '[paper title]'?

------------------
Question: What is the evaluation method used in the pape

In [None]:
qa_pairs = []
for _ in range(2):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="Does the paper '[paper_title]' have a replication package?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the replication package of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "Just give a concise answer to the question.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="Does the paper '[paper_title]' have a replication package?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the replication package of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "Just give a concise answer to the question.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: Does the paper 'Updating Service-Based Software Systems in Air-Gapped Environments' have a replication package for the study titled 'Updating Service-Based Software Systems in Air-Gapped Environments'?
Answer: No, the paper does not provide a replication package.
Golden Triples: ['(R868543:Evidence, Provides Replication Package, L1520325:False)']
Hops: 4
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
332acc3e-c20d-43bb-82d0-aa48ba242c86,Does the paper 'Updating Service-Based Software Systems in Air-Gapped Environments' have a replication package for the study titled 'Updating Service-Based Software Systems in Air-Gapped Environments'?,"No, the paper does not provide a replication package.",['10.1007/978-3-030-86044-8_10'],,"['(R868543:Evidence, Provides Replication Package, L1520325:False)']",publication_subgraph_strategy,R6

### Aggregation

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field,
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="What are the evaluation methods of the publication '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain the evaluation methods of the paper",
                "Evaluation Methods are encoded in the triples of the format (R870531:Evaluation Method Entity, Name, [Evaluation Method Name])",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R871346"), # A publication with multiple evaluation methods
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="What are the evaluation methods of the publication '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain the evaluation methods of the paper",
                "Evaluation Methods are encoded in the triples of the format (R870531:Evaluation Method Entity, Name, [Evaluation Method Name])",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: What are the evaluation methods of the publication 'Decentralized Architecture for Energy-Aware Service Assembly'?
Answer: The publication 'Decentralized Architecture for Energy-Aware Service Assembly' employs a Technical Experiment as its evaluation method, as explicitly stated in the evaluation method triple.
Golden Triples: ['(R869917:Evaluation Method Entity, Name, L1523271:Technical Experiment)']
Hops: 7
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
89ae12a3-5e03-4aa8-87f9-44c62ea88986,What are the evaluation methods of the publication 'Decentralized Architecture for Energy-Aware Service Assembly'?,"The publication 'Decentralized Architecture for Energy-Aware Service Assembly' employs a Technical Experiment as its evaluation method, as explicitly stated in the evaluation method triple.",['10.1007/978-3-030-58923-3_4'],

In [9]:
qa_pairs = []
for _ in range(2):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field,
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="What are the threats to validity of the paper '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication(),
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="What are the threats to validity of the paper '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: What are the threats to validity of the paper 'Predicting the Performance of Privacy-Preserving Data Analytics Using Architecture Modelling and Simulation' from Predicting the Performance of Privacy-Preserving Data Analytics Using Architecture Modelling and Simulation?
Answer: The paper identifies two threats to validity with a true boolean value: external validity and internal validity.
Golden Triples: ['(R873384:Threat to Validity, external validity, L1530409:True)', '(R873384:Threat to Validity, internal validity, L1530408:True)']
Hops: 5
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
86aac29f-80ff-4ed4-a011-59f8cf7f311f,What are the threats to validity of the paper 'Predicting the Performance of Privacy-Preserving Data Analytics Using Architecture Modelling and Simulation' from Predicting the Performance of Privacy-Prese

### Counting


In [38]:
qa_pairs = []
for _ in range(4):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field,
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="How many evaluation methods does the paper with the title '[paper title]' have?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "You can only use triples of the form: (Evaluation, Evaluation method, [method name])"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication(),
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="How many evaluation methods does the paper with the title '[paper title]' have?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "You can only use triples of the form: (Evaluation, Evaluation method, [method name])"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

[32m2025-03-13 20:12:13,677[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R874344', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
[32m2025-03-13 20:12:51,850[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'object_id': 'R874341', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
Question: How many evaluation methods does the paper with the title "How Developers Discuss Architecture Smells? An Exploratory Study on Stack Overflow" have?
Answer: The paper includes one evaluation method: Evaluation Method List.
Golden Triples: ['(R868298:Evaluation, Evaluation method, R868299:Evaluation Method List)']
Hops: 5
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_t

In [9]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods are used by the author [author name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods of the papers",
                "Evaluation Methods are encoded in the triples of the format (R870531:Evaluation Method Entity, Name, [Evaluation Method Name])",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 23:31:34,107[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 23:31:48,991[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 23:31:59,730[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 23:32:04,073[0m - The cluster has a golden triple size of 17 which is higher than the limit of 10.
[32m2025-03-13 23:32:17,081[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 23:32:32,268[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 23:32:38,655[0m - Reached the soft limit of 10.
Question: How many evaluation methods are used by the author Bradley Schmerl?
Answer: The author Bradley Schmerl uses two evaluation methods: Case Study and Technical Experiment.
Golden Triples: ['(R871858:authors list, has list element, L1527271:Bradley Schmerl)', '(R871867:Evaluation Me

In [None]:
qa_pairs = []
for _ in range(2):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field,
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="How many threats to validity does the paper with the title '[paper title]' have?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication(),
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="How many threats to validity does the paper with the title '[paper title]' have?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: How many threats to validity does the paper titled 'PARAD Repository: On the Capitalization of the Performance Analysis Process for AADL Designs' have?
Answer: The paper has 0 threats to validity. Among all the triples that mention 'Threat to Validity', none of the corresponding boolean values are True.
Golden Triples: ['(R869148:Threat to Validity, confirmability, L1521629:False)', '(R869148:Threat to Validity, repeatability, L1521626:False)', '(R869148:Threat to Validity, internal validity, L1521627:False)', '(R869148:Threat to Validity, external validity, L1521628:False)', '(R869148:Threat to Validity, construct validity, L1521625:False)']
Hops: 5
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
12107bf8-106d-4dc3-a112-157b4d6cb579,How many threats to validity does the paper titled 'PARAD Repository: On the Capitalization o

### Ranking

In [12]:
qa_pairs = []
for _ in range(2):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="Which threads to validity does the publication '[paper title]' have, ranked in descending alphabetical order?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="Which threads to validity does the publication '[paper title]' have, ranked in descending alphabetical order?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: Which threads to validity does the publication 'A Quantitative Approach for the Assessment of Microservice Architecture Deployment Alternatives by Automated Performance Testing' have, ranked in descending alphabetical order?
Answer: The publication has the following thread to validity: confirmability.
Golden Triples: ['(R872355:Threat to Validity, confirmability, L1528354:True)']
Hops: 5
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
94845043-5196-4926-b89f-83570ddb63d6,"Which threads to validity does the publication 'A Quantitative Approach for the Assessment of Microservice Architecture Deployment Alternatives by Automated Performance Testing' have, ranked in descending alphabetical order?",The publication has the following thread to validity: confirmability.,['10.1007/978-3-030-00761-4_11'],,"['(R872355:Threat to Validity

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="Which sub-properties does the publication '[paper title]' have, ranked in descending alphabetical order?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the sub-properties of the paper.",
                "Your answer should list all sub-properties in descending alphabetical order.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R871674"), # A paper that has multiple sub-properties
        ),     
        options=GenerationOptions(
            template_text="Which sub-properties does the publication '[paper title]' have, ranked in descending alphabetical order?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the sub-properties of the paper.",
                "Your answer should list all sub-properties in descending alphabetical order.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: Which sub-properties does the publication 'An Architecture-Driven Adaptation Approach for Big Data Cyber Security Analytics' have, ranked in descending alphabetical order?
Answer: The publication has the following sub-properties: Property.
Golden Triples: ['(R870025:Property, Sub-Property, R870026:Property)']
Hops: 6
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
16021a78-5dae-4162-890c-ce267541892f,"Which sub-properties does the publication 'An Architecture-Driven Adaptation Approach for Big Data Cyber Security Analytics' have, ranked in descending alphabetical order?",The publication has the following sub-properties: Property.,['10.1109/ICSA.2019.00013'],,"['(R870025:Property, Sub-Property, R870026:Property)']",publication_subgraph_strategy,R659055,Software Architecture and Design,6,"Which sub-properties does the publicati

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which sub-property has been used the most in the year [year] with papers of the paper class [paper class]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Sub-Property"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        use_predicate_as_value=True,
        restriction_value="True",
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 12:39:35,299[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'L1526445', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Max retries exceeded with url: /api/statements?subject_id=L1526445&page=0&size=100 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7fd032e23e00>: Failed to resolve 'sandbox.orkg.org' ([Errno -3] Temporary failure in name resolution)"))
[32m2025-03-16 12:40:45,345[0m - Found 5 inital clusters
[32m2025-03-16 12:41:45,132[0m - Using 17 clusters for generation
[32m2025-03-16 12:41:45,133[0m - The cluster has a golden triple size of 48 which is higher than the limit of 10.
[32m2025-03-16 12:41:45,133[0m - The cluster has a golden triple size of 70 which is higher than the limit of 10.
[32m2025-03-16 12:41:45,133[0m - The cluster has a golden triple size of 23 which is higher than the limit of 10.
[32m2025-03-16

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which evaluation method has been used the most in the year [year] with papers of the paper class [paper class]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        use_predicate_as_value=True,
        restriction_value="True",
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 12:43:56,577[0m - Found 5 inital clusters
[32m2025-03-16 12:45:00,652[0m - Using 17 clusters for generation
[32m2025-03-16 12:45:00,652[0m - The cluster has a golden triple size of 24 which is higher than the limit of 10.
[32m2025-03-16 12:45:00,653[0m - The cluster has a golden triple size of 40 which is higher than the limit of 10.
[32m2025-03-16 12:45:00,653[0m - The cluster has a golden triple size of 47 which is higher than the limit of 10.
[32m2025-03-16 12:45:00,653[0m - The cluster has a golden triple size of 59 which is higher than the limit of 10.
[32m2025-03-16 12:45:00,653[0m - The cluster has a golden triple size of 48 which is higher than the limit of 10.
[32m2025-03-16 12:45:00,653[0m - The cluster has a golden triple size of 41 which is higher than the limit of 10.
[32m2025-03-16 12:45:00,653[0m - The cluster has a golden triple size of 61 which is higher than the limit of 10.
[32m2025-03-16 12:45:00,653[0m - The cluster has a golden t

### Comparative

In [13]:
qa_pairs = []
for _ in range(2):
    qa_strategy = PaperComparisonGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        comparison_options=PaperComparisonGeneratorOptions(
            first_publication=graph.get_random_publication(),
            second_publication=graph.get_random_publication(),
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What evaluation methods does the paper [paper_title_1] use compared to the paper [paper_title_2]?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples of the evaluation methods of the papers.",
                "Evaluation Methods are encoded in the triples of the format (R870531:Evaluation Method Entity, Name, [Evaluation Method Name])",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: What evaluation methods does the paper 'Model-Based Analysis of Microservice Resiliency Patterns' use compared to the paper 'Supporting Architectural Decision Making on Data Management in Microservice Architectures'?
Answer: In the paper 'Model-Based Analysis of Microservice Resiliency Patterns', the evaluation method used is the Technical Experiment, as indicated in the evaluation method triple, while in the paper 'Supporting Architectural Decision Making on Data Management in Microservice Architectures', the evaluation method employed is Grounded Theory.
Golden Triples: ['(R869869:Evaluation Method Entity, Name, L1523172:Technical Experiment)', '(R872990:Evaluation Method Entity, Name, L1529620:Grounded Theory)']
Hops: 7
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
90c11fd9-663f-4a90-ab34-29b7e67499f0,What evaluation met

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = PaperComparisonGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        comparison_options=PaperComparisonGeneratorOptions(
            first_publication=graph.get_random_publication(),
            second_publication=graph.get_random_publication(),
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What paper class does the paper [paper_title_1] have compared to the paper [paper_title_2]?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples of the paper class that are marked as being 'True'",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: What paper class does the paper 'How Software Architects Focus Their Attention' have compared to the paper 'Architecture-Based Change Impact Analysis in Information Systems and Business Processes'?
Answer: Based on the provided contexts, 'How Software Architects Focus Their Attention' is classified as a validation research paper (Context ID 27), while 'Architecture-Based Change Impact Analysis in Information Systems and Business Processes' is classified as an evaluation research paper (Context ID 55).
Golden Triples: ['(R868500:Paper Class, validation research, L1520228:True)', '(R873434:Paper Class, evaluation research, L1530510:True)']
Hops: 4
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
75d04db2-69b3-4f2e-b966-cc2e9ce0bf7d,What paper class does the paper 'How Software Architects Focus Their Attention' have compared to t

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of the evaluation methods that have been published by the author [author name] per year?",
            additional_requirements=[
                "The context should only include the triples that contain the research object and the sub-property.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 16:08:01,243[0m - Found 105 inital clusters
[32m2025-03-14 16:09:00,459[0m - Using 104 clusters for generation
[32m2025-03-14 16:09:13,944[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-14 16:09:19,541[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 16:09:42,215[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 16:09:42,215[0m - The cluster has a golden triple size of 23 which is higher than the limit of 10.
[32m2025-03-14 16:09:42,216[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-14 16:09:50,092[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 16:09:56,299[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 16:09:56,300[0m - The cluster has a golden triple size of 14 which is higher than

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of research objects that have been published by the author [author name] per year?",
            additional_requirements=[
                "The context should only include the triples that contain the research object and the sub-property.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            ),
            AdditionalInformationRestriction(
                information_predicate="Object"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 16:26:45,054[0m - Found 105 inital clusters
[32m2025-03-14 16:27:49,862[0m - Using 104 clusters for generation
[32m2025-03-14 16:27:54,004[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 16:28:00,377[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 16:28:00,378[0m - The cluster has a golden triple size of 11 which is higher than the limit of 10.
[32m2025-03-14 16:28:08,826[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 16:28:14,379[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 16:28:14,379[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-14 16:28:26,307[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 16:28:30,332[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[

## Use Case 3

### Basic

In [None]:
qa_pairs = []
for _ in range(1):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R868403"), # We manually selected this publication as it is the only one that has robustness as a property
        ),     
        options=GenerationOptions(
            template_text="Which paper includes the evaluation sub-property robustness?",
            additional_requirements=[
                "leave the template as is and do not change it",
                "The context should only include the robustness sub-property triple. Therefore only one triple can be in the context.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)


Question: Which paper includes the evaluation sub-property robustness in the context of Designing Robust Software Systems through Parametric Markov Chain Synthesis?
Answer: The paper titled 'Designing Robust Software Systems through Parametric Markov Chain Synthesis' includes the evaluation sub-property robustness.
Golden Triples: ['(R868417:Property, Name, L1520053:Robustness)']
Hops: 6
Topic Entity: Designing Robust Software Systems through Parametric Markov Chain Synthesis
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
45fd4cfa-0734-4805-b703-9108057fe185,Which paper includes the evaluation sub-property robustness in the context of Designing Robust Software Systems through Parametric Markov Chain Synthesis?,The paper titled 'Designing Robust Software Systems through Parametric Markov Chain Synthesis' includes the evaluation sub-property robustness.,['10.1109/ICSA.2017.16'],,"['

In [4]:
qa_pairs = []
for _ in range(1):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R873171"), # We manually selected this publication as it is the only one that has Recovery as a property
        ),     
        options=GenerationOptions(
            template_text="Which paper includes the evaluation sub-property Recovery?",
            additional_requirements=[
                "leave the template as is and do not change it",
                "The context should only include the Recovery sub-property triple. Therefore only one triple can be in the context.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)


Question: Which paper includes the evaluation sub-property Recovery in the context of "Butterfly Space: An Architectural Approach for Investigating Performance Issues"?
Answer: The paper titled 'Butterfly Space: An Architectural Approach for Investigating Performance Issues' includes the evaluation sub-property Recovery.
Golden Triples: ['(R873184:Property, Name, L1530004:Recovery)']
Hops: 6
Topic Entity: Butterfly Space: An Architectural Approach for Investigating Performance Issues
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
214b727e-0f2c-47c9-a05b-380312427489,"Which paper includes the evaluation sub-property Recovery in the context of ""Butterfly Space: An Architectural Approach for Investigating Performance Issues""?",The paper titled 'Butterfly Space: An Architectural Approach for Investigating Performance Issues' includes the evaluation sub-property Recovery.,['10.1109/I

In [3]:
qa_pairs = []
for _ in range(1):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R872980"), # We manually selected this publication as it is the only one that has Limit of detection as a property
        ),     
        options=GenerationOptions(
            template_text="Which paper includes the evaluation sub-property Limit of detection?",
            additional_requirements=[
                "leave the template as is and do not change it",
                "The context should only include the Limit of detection sub-property triple. Therefore only one triple can be in the context.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)


Question: Which paper includes the evaluation sub-property Limit of detection in the context of Supporting Architectural Decision Making on Data Management in Microservice Architectures?
Answer: The paper titled 'Supporting Architectural Decision Making on Data Management in Microservice Architectures' includes the evaluation sub-property Limit of detection.
Golden Triples: ['(R872993:Property, Name, L1529623:Limit of detection)']
Hops: 6
Topic Entity: Supporting Architectural Decision Making on Data Management in Microservice Architectures
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
6f38c463-5bcd-45c3-8bb7-c37ffef0f7fc,Which paper includes the evaluation sub-property Limit of detection in the context of Supporting Architectural Decision Making on Data Management in Microservice Architectures?,The paper titled 'Supporting Architectural Decision Making on Data Management in Micr

In [2]:
qa_pairs = []
for _ in range(1):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R873116"), # We manually selected this publication as it is the only one that has verification as evaluation method
        ),     
        options=GenerationOptions(
            template_text="Which paper includes the evaluation method Verification?",
            additional_requirements=[
                "leave the template as is and do not change it",
                "The context should only include the Verification evaluation method triple. Therefore only one triple can be in the context.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)


Question: Which paper includes the evaluation method Verification in the context of Semantic Differencing for Message-Driven Component & Connector Architectures?
Answer: The evaluation method Verification is included in the paper titled 'Semantic Differencing for Message-Driven Component & Connector Architectures.'
Golden Triples: ['(R873132:Evaluation Method Entity, Name, L1529895:Verification)']
Hops: 6
Topic Entity: Semantic Differencing for Message-Driven Component & Connector Architectures
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
2187663c-dd71-43a1-aa61-14fefbf9bc53,Which paper includes the evaluation method Verification in the context of Semantic Differencing for Message-Driven Component & Connector Architectures?,The evaluation method Verification is included in the paper titled 'Semantic Differencing for Message-Driven Component & Connector Architectures.',['10.1109/

### Aggregation


In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have the evaluation method [method]?",
            additional_requirements=[
                "The context should only include the triples of the evaluation methods of the papers.",
                "Evaluation Methods are encoded in the triples of the format (R870531:Evaluation Method Entity, Name, [Evaluation Method Name])",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[

        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 02:22:57,549[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 02:24:11,055[0m - Reached the soft limit of 10.
Question: Which publications have the evaluation method Data Science?
Answer: The publications that have the evaluation method Data Science include: 1. FLRA: A Reference Architecture for Federated Learning Systems, 2. An Expert Recommendation System for Design Decision Making: Who Should be Involved in Making a Design Decision?, 3. The Evolution of Technical Debt in the Apache Ecosystem, 4. Architectural Decay as Predictor of Issue- and Change-Proneness, 5. An Empirical Study of Architectural Decay in Open-Source Software, 6. Architectural Security Weaknesses in Industrial Control Systems (ICS) an Empirical Study Based on Disclosed Software Vulnerabilities, and 7. Architectural Design Decisions for Systems Supporting Model-Based Analysis of Runtime Events: A Qualitative Multi-method Study.
Golden Triples: ['

In [15]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have the evaluation sub-property [sub-property name]?",
            additional_requirements=[
                "The context should only include the triples of the sub-property of the papers.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 08:29:21,759[0m - The cluster has a golden triple size of 11 which is higher than the limit of 10.
[32m2025-03-13 08:29:21,759[0m - The cluster has a golden triple size of 13 which is higher than the limit of 10.
[32m2025-03-13 08:29:57,333[0m - The cluster has a golden triple size of 70 which is higher than the limit of 10.
[32m2025-03-13 08:30:04,843[0m - The cluster has a golden triple size of 45 which is higher than the limit of 10.
[32m2025-03-13 08:30:04,844[0m - The cluster has a golden triple size of 28 which is higher than the limit of 10.
[32m2025-03-13 08:30:19,665[0m - The cluster has a golden triple size of 34 which is higher than the limit of 10.
[32m2025-03-13 08:30:19,666[0m - The cluster has a golden triple size of 30 which is higher than the limit of 10.
Question: Which publications have the evaluation sub-property Usability?
Answer: The publications that have the evaluation sub-property Usability are: 1. Continuous Integration Impediments

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="W [sub property name]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="available"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 14:22:56,861[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R872432', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Max retries exceeded with url: /api/statements?subject_id=R872432&page=0&size=100 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7fdeb2b40320>: Failed to resolve 'sandbox.orkg.org' ([Errno -3] Temporary failure in name resolution)"))
[32m2025-03-15 14:26:34,785[0m - Found 15 inital clusters
[32m2025-03-15 14:29:55,381[0m - Using 12 clusters for generation
[32m2025-03-15 14:30:00,238[0m - The cluster has a golden triple size of 34 which is higher than the limit of 10.
[32m2025-03-15 14:30:14,275[0m - The cluster has a golden triple size of 38 which is higher than the limit of 10.
[32m2025-03-15 14:30:14,275[0m - The cluster has a golden triple size of 40 which is higher than the limit of 10.
[32m2025-03-15 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications evaluate the research object [research object name] with the sub-property [sub-property name] and have input data available?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_predicate_restriction="available",
                information_value_restriction="True",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 14:36:47,437[0m - Found 14 inital clusters
[32m2025-03-15 14:41:13,483[0m - Using 57 clusters for generation
[32m2025-03-15 14:41:22,771[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-15 14:41:22,771[0m - The cluster has a golden triple size of 24 which is higher than the limit of 10.
[32m2025-03-15 14:41:30,916[0m - The cluster has a golden triple size of 21 which is higher than the limit of 10.
[32m2025-03-15 14:41:30,916[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 14:41:37,895[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 14:41:52,266[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 14:41:52,266[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 14:42:31,332[0m - Skipping QA-Pair as not all golden 

### Counting


In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many publications have the evaluation sub-property [sub-property name]?",
            additional_requirements=[
                "The context should only include the triples of the sub-propertys of the papers.",
                "The answer should count the number of publications that have the sub-property.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[

        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 02:34:12,077[0m - The cluster has a golden triple size of 45 which is higher than the limit of 10.
[32m2025-03-13 02:34:12,077[0m - The cluster has a golden triple size of 30 which is higher than the limit of 10.
[32m2025-03-13 02:34:37,581[0m - The cluster has a golden triple size of 11 which is higher than the limit of 10.
[32m2025-03-13 02:34:40,618[0m - The cluster has a golden triple size of 13 which is higher than the limit of 10.
[32m2025-03-13 02:34:51,967[0m - The cluster has a golden triple size of 34 which is higher than the limit of 10.
[32m2025-03-13 02:34:51,968[0m - The cluster has a golden triple size of 70 which is higher than the limit of 10.
[32m2025-03-13 02:34:51,968[0m - The cluster has a golden triple size of 28 which is higher than the limit of 10.
Question: How many publications have the evaluation sub-property Usability?
Answer: There are 9 publications that have the evaluation sub-property Usability.
Golden Triples: ['(R871688:Pro

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many publications have the evaluation method [method]?",
            additional_requirements=[
                "The context should only include the triples of the evaluation methods of the papers.",
                "The answer should count the number of publications that have the evaluation method",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[

        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 20:33:45,026[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R659055', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
[32m2025-03-13 20:41:14,528[0m - Reached the soft limit of 10.
Question: How many publications have the evaluation method Field Experiment?
Answer: There are 3 publications that have the evaluation method Field Experiment. These publications are from the papers titled 'ACE: Easy Deployment of Field Optimization Experiments', 'An Architecture for Decentralized, Collaborative, and Autonomous Robots', and 'Decision Models for Microservices: Design Areas, Stakeholders, Use Cases, and Requirements'.
Golden Triples: ['(R869496:Evaluation Method Entity, Name, L1522369:Field Experiment)', '(R871533:Evaluation Method Entity, Name, L1526632:Field Experiment)', '(R869402:Evaluation Method Entity, Name, L1522173:Field Experim

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers, that discuss the [threat to validity] as a threat to validity, apply [evaluation method] as a evaluation method?",
            additional_requirements=[
                "The context needs to include all triples given to you.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
                split_clusters=True
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P123037",
        restriction_text="Threats To Validity",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 14:45:00,466[0m - Found 5 inital clusters
[32m2025-03-15 14:46:14,295[0m - Using 54 clusters for generation
[32m2025-03-15 14:46:14,295[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 14:46:14,296[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-15 14:46:14,296[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-15 14:46:28,767[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-15 14:46:35,413[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-15 14:46:35,413[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-15 14:46:47,174[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-15 14:46:47,175[0m - The cluster has a golden triple size of 28 whi

### Ranking

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have the evaluation method [evaluation method name] ranked by the publication year?",
            additional_requirements=[
                "The context should only include the triples of the evaluation sub-properties of the papers.",
                "The answer should be a list of publication titles ranked by publication year",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 02:47:20,053[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 02:47:20,053[0m - The cluster has a golden triple size of 100 which is higher than the limit of 10.
[32m2025-03-13 02:47:20,053[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-13 02:47:20,054[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 02:47:20,054[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-13 02:47:20,054[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-13 02:47:26,753[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 02:47:26,753[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 02:47:26,753[0m - The cluster has a golden triple size

In [19]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have the evaluation sub-property [evaluation sub-property name] ranked by the publication year?",
            additional_requirements=[
                "The context should only include the triples of the evaluation sub-properties of the papers.",
                "The answer should be a list of publication titles ranked by publication year",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 09:35:21,944[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-13 09:35:21,945[0m - The cluster has a golden triple size of 60 which is higher than the limit of 10.
[32m2025-03-13 09:35:31,625[0m - The cluster has a golden triple size of 56 which is higher than the limit of 10.
[32m2025-03-13 09:35:31,626[0m - The cluster has a golden triple size of 90 which is higher than the limit of 10.
[32m2025-03-13 09:35:31,626[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-13 09:35:35,672[0m - The cluster has a golden triple size of 140 which is higher than the limit of 10.
[32m2025-03-13 09:35:39,855[0m - The cluster has a golden triple size of 68 which is higher than the limit of 10.
[32m2025-03-13 09:35:39,855[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-13 09:35:48,308[0m - The cluster has a golden triple size

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of papers that have the research object [research object name] per year?",
            additional_requirements=[
                "You must include all the triples given to you in the context",
                "First list the amount of times the entity was used in the year",
                "Then give a final statement about the proportion",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            ),
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 18:56:23,768[0m - Found 14 inital clusters
[32m2025-03-14 18:57:12,191[0m - Using 13 clusters for generation
[32m2025-03-14 18:57:30,028[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 18:57:30,028[0m - The cluster has a golden triple size of 24 which is higher than the limit of 10.
[32m2025-03-14 18:57:30,028[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-14 18:57:30,029[0m - The cluster has a golden triple size of 32 which is higher than the limit of 10.
[32m2025-03-14 18:57:30,029[0m - The cluster has a golden triple size of 34 which is higher than the limit of 10.
[32m2025-03-14 18:57:30,029[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-14 18:57:30,029[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 18:57:30,029[0m - The cluster has a golden 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of papers that have the evaluation method [evaluation method name] per year?",
            additional_requirements=[
                "The context should include all triples given to you",
                "First list the amount of times the entity was used in the year",
                "Then give a final statement about the proportion",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            ),
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 18:32:08,313[0m - Found 14 inital clusters
[32m2025-03-14 18:33:12,223[0m - Using 13 clusters for generation
[32m2025-03-14 18:33:12,224[0m - The cluster has a golden triple size of 100 which is higher than the limit of 10.
[32m2025-03-14 18:33:12,224[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 18:33:12,224[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 18:33:12,225[0m - The cluster has a golden triple size of 96 which is higher than the limit of 10.
[32m2025-03-14 18:33:12,225[0m - The cluster has a golden triple size of 28 which is higher than the limit of 10.
[32m2025-03-14 18:33:17,204[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 18:33:17,204[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 18:33:17,205[0m - The cluster has a golden

### Comparative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers with the evaluation method [evaluation method name] have their input data marked as [input type] compared to those with the input data marked as [input type]?",
            additional_requirements=[
                "The context should include all triples given to you.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction=["available", "None"]
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 21:24:28,605[0m - Found 14 inital clusters
[32m2025-03-15 21:26:32,449[0m - Using 13 clusters for generation
[32m2025-03-15 21:26:49,874[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 21:26:49,874[0m - The cluster has a golden triple size of 54 which is higher than the limit of 10.
[32m2025-03-15 21:26:49,874[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 21:26:49,874[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-15 21:26:58,941[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 21:26:58,941[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 21:27:14,702[0m - The cluster has a golden triple size of 58 which is higher than the limit of 10.
Question: How many papers with the evaluation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers have used the sub-property [sub-property name] compared to those that used the sub-property [sub-property name], with the research object [research object name]?",
            additional_requirements=[
                "The context should include all triples given to you.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                information_value_restriction=["Portability", "Usability"]
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 14:16:04,362[0m - Found 14 inital clusters
[32m2025-03-16 14:18:06,835[0m - Using 9 clusters for generation
[32m2025-03-16 14:18:06,836[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-16 14:18:17,226[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
Question: How many papers have used the sub-property Usability compared to those that used the sub-property Portability, with the research object Reference Architecture?
Answer: In the provided contexts, the sub-property Usability was used in 1 paper (from 'Towards a Reference Architecture for Cloud-Based Plant Genotyping and Phenotyping Analysis Frameworks'), while the sub-property Portability was used in 2 papers (one from 'Towards a Reference Architecture for Cloud-Based Plant Genotyping and Phenotyping Analysis Frameworks' and one from 'Assessing Adaptability of Software Architectures for Cyber Physical Production Systems').
Golden

### Negation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers use the evaluation method [evaluation method name], but do not have any input data available?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
                "Evaluation methods are encoded as: (Evaluation Method Entity, Name, [Method Name])"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="None",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 12:56:16,253[0m - Found 14 inital clusters
[32m2025-03-16 12:58:34,356[0m - Using 11 clusters for generation
[32m2025-03-16 12:58:46,541[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-16 12:58:55,253[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-16 12:59:01,802[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-16 12:59:18,792[0m - The cluster has a golden triple size of 24 which is higher than the limit of 10.
Question: Which papers use the evaluation method 'Evaluation method' but do not have any input data available?
Answer: The papers that use the evaluation method 'Evaluation method' but do not have any input data available are: 1) 'Guidelines for Architecting Android Apps: A Mixed-Method Empirical Study' (Context ID 0) and 2) 'Quantum Computing Platforms: Assessing the Impact on Quality Attr

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers use the research object [research object name], but do not have any input data available?",
            additional_requirements=[
                "The context should include all triples given to you.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="None",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 15:04:59,664[0m - Found 14 inital clusters
[32m2025-03-15 15:06:58,386[0m - Using 12 clusters for generation
[32m2025-03-15 15:07:04,328[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-15 15:07:11,127[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 15:07:19,277[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-15 15:07:26,934[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-15 15:07:33,117[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-15 15:07:38,560[0m - The cluster has a golden triple size of 24 which is higher than the limit of 10.
Question: Which papers use the research object EASIER, but do not have any input data available?
Answer: The paper 'EASIER: An Evolutionary Approach for Multi-objective Software ArchItecturE Refactoring' uses the res

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the most used paper class for publications with the sub-property [sub property name]?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="paper class",
                information_value_restriction="True",           
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 00:02:17,499[0m - Found 15 inital clusters
[32m2025-03-16 00:05:58,290[0m - Using 14 clusters for generation
[32m2025-03-16 00:05:58,290[0m - The cluster has a golden triple size of 74 which is higher than the limit of 10.
[32m2025-03-16 00:05:58,291[0m - The cluster has a golden triple size of 167 which is higher than the limit of 10.
[32m2025-03-16 00:05:58,291[0m - The cluster has a golden triple size of 116 which is higher than the limit of 10.
[32m2025-03-16 00:05:58,291[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-16 00:06:09,391[0m - The cluster has a golden triple size of 25 which is higher than the limit of 10.
[32m2025-03-16 00:06:09,392[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-16 00:06:09,392[0m - The cluster has a golden triple size of 30 which is higher than the limit of 10.
[32m2025-03-16 00:06:25,718[0m - The cluster has a golden

In [7]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the most used paper class for publications with the evaluation method [method name] and the research object [resarch object name]?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",        
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="paper class",
                information_value_restriction="True",           
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 13:01:51,225[0m - Found 14 inital clusters
[32m2025-03-16 13:04:54,916[0m - Using 77 clusters for generation
[32m2025-03-16 13:04:54,916[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-16 13:05:04,656[0m - The cluster has a golden triple size of 15 which is higher than the limit of 10.
[32m2025-03-16 13:05:16,279[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-16 13:05:16,279[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-16 13:05:16,279[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-16 13:05:16,279[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-16 13:05:16,279[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-16 13:05:16,279[0m - The cluster has a golden trip

## Use Case 4

### Basic

In [None]:
# Here we want that the answer only contains one specific paper which is a special case in our data
# as it is the only one that is from the type Philosophical Papers. Therefore we use the 
# FromTopicEntityGenerator to generate the question
qa_pairs = []
qa_strategy = FromTopicEntityGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    from_topic_entity_options=FromTopicEntityGeneratorOptions(
        topic_entity=graph.get_entity_by_id("R873077"), # The id of the paper
        maximum_subgraph_size=100
    ), 
    options=GenerationOptions(
        template_text="Which research object is used with the paper class [paper class]?",
        additional_requirements=[
            "The context should only include the triple that contains the paper class and research object of the paper",
            "You can assume that it is the only paper with that specific conditions",
            "The context should contain two triples: (R873092:Paper Class, [the class name], True) and (Research Object Entity, Name, [research object name])",
        ],
        validate_contexts=False,
        convert_path_to_text=False,
        classify_questions=False,
    )     
)
qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: Which research object is used with the paper class philosophical paper in "Architectural Design Decisions for Systems Supporting Model-Based Analysis of Runtime Events: A Qualitative Multi-method Study"?
Answer: The research object used with the paper class philosophical paper is Architecture Decision Making.
Golden Triples: ['(R873092:Paper Class, philosophical paper, L1529814:True)', '(R873091:Research Object Entity, Name, L1529809:Architecture Decision Making)']
Hops: 5
Topic Entity: Architectural Design Decisions for Systems Supporting Model-Based Analysis of Runtime Events: A Qualitative Multi-method Study
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
24172dfa-095e-4798-803c-4c4ab294b94a,"Which research object is used with the paper class philosophical paper in ""Architectural Design Decisions for Systems Supporting Model-Based Analysis of Runtime Events: A Qualita

### Aggregation 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which research objects are used in conjunction with the [evaluation method name] evaluation method?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Object",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 02:59:06,302[0m - The cluster has a golden triple size of 28 which is higher than the limit of 10.
[32m2025-03-13 02:59:06,302[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 02:59:06,303[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 02:59:06,303[0m - The cluster has a golden triple size of 23 which is higher than the limit of 10.
[32m2025-03-13 02:59:06,303[0m - The cluster has a golden triple size of 97 which is higher than the limit of 10.
[32m2025-03-13 02:59:06,303[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-13 02:59:06,303[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 02:59:06,303[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-13 02:59:14,444[0m - The cluster has a golden triple size 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which evaluation methods are used in conjunction with the evaluation sub-property [evaluation sub property name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation sub-properties and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 22:20:28,601[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-13 22:20:28,601[0m - The cluster has a golden triple size of 90 which is higher than the limit of 10.
[32m2025-03-13 22:20:39,784[0m - The cluster has a golden triple size of 21 which is higher than the limit of 10.
[32m2025-03-13 22:20:56,753[0m - The cluster has a golden triple size of 60 which is higher than the limit of 10.
[32m2025-03-13 22:20:56,754[0m - The cluster has a golden triple size of 60 which is higher than the limit of 10.
[32m2025-03-13 22:20:56,754[0m - The cluster has a golden triple size of 70 which is higher than the limit of 10.
[32m2025-03-13 22:21:01,677[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-13 22:21:01,677[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-13 22:21:01,678[0m - The cluster has a golden triple size 

### Counting 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods are used in conjunction with the evaluation sub-property [evaluation sub property name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and evaluation sub-properties of the paper",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 03:13:08,125[0m - The cluster has a golden triple size of 21 which is higher than the limit of 10.
[32m2025-03-13 03:13:08,125[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-13 03:13:08,125[0m - The cluster has a golden triple size of 140 which is higher than the limit of 10.
[32m2025-03-13 03:13:08,126[0m - The cluster has a golden triple size of 70 which is higher than the limit of 10.
[32m2025-03-13 03:13:08,126[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-13 03:13:08,126[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-13 03:13:16,139[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 03:13:20,820[0m - The cluster has a golden triple size of 90 which is higher than the limit of 10.
[32m2025-03-13 03:13:20,820[0m - The cluster has a golden triple size of 60 whi

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods are used in conjunction with the [evaluation sub property name] sub-property?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Evaluation method"
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 13:54:17,665[0m - Found 15 inital clusters
[32m2025-03-16 13:55:37,927[0m - Using 14 clusters for generation
[32m2025-03-16 13:55:37,927[0m - The cluster has a golden triple size of 60 which is higher than the limit of 10.
[32m2025-03-16 13:55:37,927[0m - The cluster has a golden triple size of 90 which is higher than the limit of 10.
[32m2025-03-16 13:55:52,380[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-16 13:55:59,412[0m - The cluster has a golden triple size of 70 which is higher than the limit of 10.
[32m2025-03-16 13:55:59,412[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-16 13:56:15,193[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-16 13:56:15,193[0m - The cluster has a golden triple size of 140 which is higher than the limit of 10.
[32m2025-03-16 13:56:15,193[0m - The cluster has a golden

In [25]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many research objects are used in conjunction with the [evaluation sub property name] sub-property?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation sub-properties and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Object",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 13:08:17,443[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-13 13:08:17,443[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-13 13:08:17,443[0m - The cluster has a golden triple size of 27 which is higher than the limit of 10.
[32m2025-03-13 13:08:24,245[0m - The cluster has a golden triple size of 141 which is higher than the limit of 10.
[32m2025-03-13 13:08:24,245[0m - The cluster has a golden triple size of 91 which is higher than the limit of 10.
[32m2025-03-13 13:08:24,245[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-13 13:08:38,381[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 13:08:38,381[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-13 13:08:49,406[0m - Skipping QA-Pair as not all golden triples are

### Ranking

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which evaluation methods are used in conjunction with the evaluation sub-property [evaluation sub property name] ranked in descending alphabetical order?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and evaluation sub-properties of the paper",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 03:27:33,656[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-13 03:27:33,656[0m - The cluster has a golden triple size of 140 which is higher than the limit of 10.
[32m2025-03-13 03:27:33,656[0m - The cluster has a golden triple size of 90 which is higher than the limit of 10.
[32m2025-03-13 03:27:33,656[0m - The cluster has a golden triple size of 21 which is higher than the limit of 10.
[32m2025-03-13 03:27:46,468[0m - The cluster has a golden triple size of 70 which is higher than the limit of 10.
[32m2025-03-13 03:27:46,468[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-13 03:27:46,469[0m - The cluster has a golden triple size of 60 which is higher than the limit of 10.
[32m2025-03-13 03:27:57,365[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 03:27:57,365[0m - The cluster has a golden triple size of 18 whi

In [26]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which research objects are used in conjunction with the [evaluation sub property name] sub-property ranked in descending alphabetical order?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation sub-properties and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Object",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 13:16:51,613[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R869072', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
[32m2025-03-13 13:24:24,674[0m - The cluster has a golden triple size of 27 which is higher than the limit of 10.
[32m2025-03-13 13:24:24,675[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-13 13:24:24,675[0m - The cluster has a golden triple size of 60 which is higher than the limit of 10.
[32m2025-03-13 13:24:24,675[0m - The cluster has a golden triple size of 91 which is higher than the limit of 10.
[32m2025-03-13 13:24:29,701[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-13 13:24:29,701[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-13 13:24:29,701

### Negation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which research objects have occurrences, where the evaluation guideline marked as 'false'?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Has Guideline",
                information_value_restriction="False"
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 03:37:10,491[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-13 03:37:10,491[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 03:37:10,491[0m - The cluster has a golden triple size of 34 which is higher than the limit of 10.
[32m2025-03-13 03:37:10,491[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-13 03:37:10,491[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-13 03:37:10,492[0m - The cluster has a golden triple size of 60 which is higher than the limit of 10.
[32m2025-03-13 03:37:10,492[0m - The cluster has a golden triple size of 13 which is higher than the limit of 10.
[32m2025-03-13 03:37:10,492[0m - The cluster has a golden triple size of 52 which is higher than the limit of 10.
[32m2025-03-13 03:37:15,505[0m - Skipping QA-Pair as not all golden tr

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which research objects that have the evaluation method [evaluation method name] do not use evaluation guidelines?",
            additional_requirements=[
                "Include all context given to you"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Has Guideline",
                information_value_restriction="False",
            ),
            AdditionalInformationRestriction(
                information_predicate="Object"
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 16:19:53,683[0m - Found 14 inital clusters
[32m2025-03-14 16:20:59,134[0m - Using 13 clusters for generation
[32m2025-03-14 16:21:05,084[0m - The cluster has a golden triple size of 27 which is higher than the limit of 10.
[32m2025-03-14 16:21:05,084[0m - The cluster has a golden triple size of 15 which is higher than the limit of 10.
[32m2025-03-14 16:21:05,085[0m - The cluster has a golden triple size of 139 which is higher than the limit of 10.
[32m2025-03-14 16:21:05,085[0m - The cluster has a golden triple size of 21 which is higher than the limit of 10.
[32m2025-03-14 16:21:05,085[0m - The cluster has a golden triple size of 15 which is higher than the limit of 10.
[32m2025-03-14 16:21:10,900[0m - The cluster has a golden triple size of 21 which is higher than the limit of 10.
[32m2025-03-14 16:21:10,900[0m - The cluster has a golden triple size of 15 which is higher than the limit of 10.
[32m2025-03-14 16:21:10,901[0m - The cluster has a golden

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which evaluation sub-properties are used in the research object [research object name] but do not use evaluation guidelines?",
            additional_requirements=[
                "Include all context given to you"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Has Guideline",
                information_value_restriction="False",
            ),
            AdditionalInformationRestriction(
                information_predicate="Sub-Property"
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 16:23:05,736[0m - Found 14 inital clusters
[32m2025-03-14 16:24:06,013[0m - Using 13 clusters for generation
[32m2025-03-14 16:24:06,013[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-14 16:24:06,013[0m - The cluster has a golden triple size of 36 which is higher than the limit of 10.
[32m2025-03-14 16:24:16,749[0m - The cluster has a golden triple size of 95 which is higher than the limit of 10.
[32m2025-03-14 16:24:23,215[0m - The cluster has a golden triple size of 31 which is higher than the limit of 10.
[32m2025-03-14 16:24:23,215[0m - The cluster has a golden triple size of 64 which is higher than the limit of 10.
[32m2025-03-14 16:24:23,215[0m - The cluster has a golden triple size of 34 which is higher than the limit of 10.
[32m2025-03-14 16:24:23,216[0m - The cluster has a golden triple size of 49 which is higher than the limit of 10.
[32m2025-03-14 16:24:23,216[0m - The cluster has a golden 

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the most used sub-property that is used with the research object [research object name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P123038",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 03:45:52,692[0m - The cluster has a golden triple size of 54 which is higher than the limit of 10.
[32m2025-03-13 03:45:52,692[0m - The cluster has a golden triple size of 51 which is higher than the limit of 10.
[32m2025-03-13 03:45:52,693[0m - The cluster has a golden triple size of 29 which is higher than the limit of 10.
[32m2025-03-13 03:46:00,797[0m - The cluster has a golden triple size of 74 which is higher than the limit of 10.
[32m2025-03-13 03:46:00,798[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-13 03:46:00,798[0m - The cluster has a golden triple size of 33 which is higher than the limit of 10.
[32m2025-03-13 03:46:11,012[0m - The cluster has a golden triple size of 47 which is higher than the limit of 10.
[32m2025-03-13 03:46:11,012[0m - The cluster has a golden triple size of 30 which is higher than the limit of 10.
[32m2025-03-13 03:46:11,012[0m - The cluster has a golden triple size 

In [28]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the most used evaluation method that is used with the research object [research object name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P123038",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 13:40:34,561[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R871160', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
[32m2025-03-13 13:40:53,582[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R871160', 'page': 0, 'size': 100}. Attempt 2 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Max retries exceeded with url: /api/statements?subject_id=R871160&page=0&size=100 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fde2a59c3e0>, 'Connection to sandbox.orkg.org timed out. (connect timeout=15)'))
[32m2025-03-13 13:41:16,596[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R871160', 'page': 0, 'size': 100}. Attempt 3 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Max retries

### Comparative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How often is the sub-property [sub-property name] used in comparison to the sub-property [sub-property] with the research object [research object name]?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                information_value_restriction=["Context coverage", "Satisfaction"]
            ),
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 14:44:11,652[0m - Found 14 inital clusters
[32m2025-03-16 14:46:04,546[0m - Using 12 clusters for generation
[32m2025-03-16 14:46:18,988[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-16 14:46:32,230[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-16 14:46:48,292[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-16 14:46:48,292[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
Question: How often is the sub-property Satisfaction used in comparison to the sub-property Context coverage with the research object name Microservice Architecture in Reality?
Answer: The sub-property Satisfaction is used in comparison to the sub-property Context coverage with the research object name Microservice Architecture in Reality as follows: Satisfaction is a property that is frequently assessed in 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How often is the sub-property [sub-property name] used in comparison to the sub-property [sub-property] with the evaulation method [evaluation method name]?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                information_value_restriction=["Maintainability", "Usability"]
            ),
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 15:03:54,377[0m - Found 14 inital clusters
[32m2025-03-16 15:06:33,821[0m - Using 11 clusters for generation
[32m2025-03-16 15:07:18,914[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-16 15:07:18,914[0m - The cluster has a golden triple size of 15 which is higher than the limit of 10.
[32m2025-03-16 15:07:18,915[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
Question: How often is the sub-property Usability used in comparison to the sub-property Name with the evaluation method Controlled Experiment?
Answer: The sub-property Usability is used in comparison to the sub-property Name with the evaluation method Controlled Experiment in the context of both REST vs GraphQL and the understandability of semantic constraints for behavioral software architecture compliance. Specifically, the contexts indicate that Usability is a key property evaluated in controlled experiments to asses

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of the evaluation method [evaluation method name] that is applied per year?",
            additional_requirements=[
                "The context should include all triples given to you",
                "First list the amount of times the entity was used in the year",
                "Then give a final statement about the proportion",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 18:44:51,343[0m - Found 14 inital clusters
[32m2025-03-14 18:45:52,976[0m - Using 13 clusters for generation
[32m2025-03-14 18:45:52,977[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 18:45:52,977[0m - The cluster has a golden triple size of 96 which is higher than the limit of 10.
[32m2025-03-14 18:45:52,977[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 18:45:52,977[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-14 18:45:52,978[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-14 18:45:52,978[0m - The cluster has a golden triple size of 28 which is higher than the limit of 10.
[32m2025-03-14 18:46:02,206[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 18:46:02,207[0m - The cluster has a golden 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of the research object [research object name] that is applied per year?",
            additional_requirements=[
                "The context should include all triples given to you",
                "Only generate one answer and question pair",
                "First list the amount of times the entity was used in the year",
                "Then give a final statement about the proportion",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 17:43:03,032[0m - Found 14 inital clusters
[32m2025-03-14 17:43:54,271[0m - Using 13 clusters for generation
[32m2025-03-14 17:43:54,272[0m - The cluster has a golden triple size of 52 which is higher than the limit of 10.
[32m2025-03-14 17:43:54,272[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-14 17:44:01,758[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 17:44:01,758[0m - The cluster has a golden triple size of 32 which is higher than the limit of 10.
[32m2025-03-14 17:44:01,759[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-14 17:44:01,759[0m - The cluster has a golden triple size of 62 which is higher than the limit of 10.
[32m2025-03-14 17:44:01,759[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 17:44:01,760[0m - The cluster has a golden 

## Use Case 5

### Basic

In [None]:
qa_pairs = []
for _ in range(3):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation sub-property used on the research object [research object name] in the publication '[paper title]'?",
            additional_requirements=[
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation sub-property used on the research object [research object name] in the publication '[paper title]'?",
            additional_requirements=[
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: What is the evaluation sub-property used on the research object Evaluation in the publication 'Enabling Continuous Software Engineering for Embedded Systems Architectures with Virtual Prototypes'?
Answer: The evaluation sub-property used on the research object Evaluation in the publication 'Enabling Continuous Software Engineering for Embedded Systems Architectures with Virtual Prototypes' is Functional Suitability.
Golden Triples: ['(R870608:Research Object, evaluation, R870609:Evaluation)', '(R870613:Property, Name, L1524726:Functional Suitability)']
Hops: 7
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
997f2c00-4d34-4801-81f6-aa3b52186e51,What is the evaluation sub-property used on the research object Evaluation in the publication 'Enabling Continuous Software Engineering for Embedded Systems Architectures with Virtual P

In [29]:
qa_pairs = []
for _ in range(3):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation method used on the research object [research object name] in the publication '[paper title]'?",
            additional_requirements=[
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation method used on the research object [research object name] in the publication '[paper title]'?",
            additional_requirements=[
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: What is the evaluation method used on the research object Architectural Aspects in the publication 'Synchronous Reconfiguration of Distributed Embedded Applications During Operation'?
Answer: The evaluation method used on the research object Architectural Aspects in the publication 'Synchronous Reconfiguration of Distributed Embedded Applications During Operation' is a Case Study.
Golden Triples: ['(R873270:Evaluation Method Entity, Name, L1530156:Case Study)', '(R873267:Research Object, evaluation, R873268:Evaluation)']
Hops: 7
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
94ee4b8a-00a2-406c-bf93-5f9c34238fea,What is the evaluation method used on the research object Architectural Aspects in the publication 'Synchronous Reconfiguration of Distributed Embedded Applications During Operation'?,The evaluation method used on the

### Aggregation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation methods have been published by the author [author name] with the research object [research object name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should list the evaluation methods"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 04:03:41,328[0m - Reached the soft limit of 10.
Question: What evaluation methods have been published by the author Stephan Seifermann with the research object Architecture Analysis Method?
Answer: The evaluation methods published by the author Stephan Seifermann with the research object Architecture Analysis Method include Technical Experiment and Case Study.
Golden Triples: ['(R873654:authors list, has list element, L1530947:Stephan Seifermann)', '(R873671:Research Object Entity, Name, L1530987:Architecture Analysis Method)', '(R873667:Evaluation Method Entity, Name, L1530983:Technical Experiment)', '(R870337:authors list, has list element, L1524164:Stephan Seifermann)', '(R870356:Research Object Entity, Name, L1524191:Architecture Analysis Method)', '(R870352:Evaluation Method Entity, Name, L1524187:Case Study)']
Hops: 7
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topi

In [30]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation methods have been published by the author [author name] with the sub-property [sub-property name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should list the evaluation methods"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 13:53:07,698[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R871324', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
[32m2025-03-13 13:55:54,956[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R870956', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
[32m2025-03-13 14:02:19,884[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R869322', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Max retries exceeded with url: /api/statements?subject_id=R869322&page=0&size=100 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7fde242c0800>: Failed to resolve 'sandbox.orkg.

### Counting

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods have been published by the author [author name] with the sub-property [sub-property name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should count the number of evaluation methods"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 04:19:28,787[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:19:41,614[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:19:50,983[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:19:58,042[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:20:11,244[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:20:18,183[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:20:26,191[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:20:34,876[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:20:52,293[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:21:06,435[0m - Skipping QA

In [31]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods have been published by the author [author name] with the research object [research object name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should count the number of evaluation methods"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 14:24:04,994[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 14:24:13,360[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 14:25:17,810[0m - Reached the soft limit of 10.
Question: How many evaluation methods have been published by the author Yutong Zhao with the research object Architecture Analysis Method?
Answer: There are two evaluation methods published by the author Yutong Zhao with the research object Architecture Analysis Method. These methods are identified in the contexts as 'Case Study' from both papers: 'Butterfly Space: An Architectural Approach for Investigating Performance Issues' and 'Constructing a Shared Infrastructure for Software Architecture Analysis and Maintenance'.
Golden Triples: ['(R873172:authors list, has list element, L1529962:Yutong Zhao)', '(R873188:Research Object Entity, Name, L1530008:Architecture Analysis Method)', '(R873181:Evaluation Method Enti

### Ranking

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation methods have been published by the author [author name] with the research object [research object name] ranked in descending alphabetical order?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should count the number of evaluation methods",
                "The answer should be a list of evaluation methods in descending alphabetical order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 04:35:50,161[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:36:31,921[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:36:49,869[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:36:55,679[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 04:37:14,651[0m - Reached the soft limit of 10.
Question: What evaluation methods have been published by the author Xiwei Xu with the research object Architecture Decision Making ranked in descending alphabetical order?
Answer: The evaluation methods published by Xiwei Xu with the research object Architecture Decision Making are: Interview, Argumentation.
Golden Triples: ['(R869313:authors list, has list element, L1521973:Xiwei Xu)', '(R869327:Research Object Entity, Name, L1522007:Architecture Decision Making)', '(R869322:Evaluation Method Entity, Name

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation sub-properties have been published by the author [author name] with the evaluation method [evaluation method name] ranked in descending alphabetical order?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should list the evaluation methods",
                "The answer should be ranked in descending alphabetical order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 23:02:29,645[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R659055', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Max retries exceeded with url: /api/statements?subject_id=R659055&page=0&size=100 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f3f65892c60>, 'Connection to sandbox.orkg.org timed out. (connect timeout=15)'))
[32m2025-03-13 23:04:28,634[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R874345', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Max retries exceeded with url: /api/statements?subject_id=R874345&page=0&size=100 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f3f65d06810>, 'Connection to sandbox.orkg.org timed out. (connect timeout=15)'))
[32m2025-03-13 23:04:51,981[0m

### Comparative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many research objects with the name [research object name] have been published in 2018 in comparison to 2020?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "Research object are formatted as '('Research Object Entity', Name, [Research Object Name]) in the contexts given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2018", "2020"]
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 17:03:19,976[0m - Found 14 inital clusters
[32m2025-03-15 17:04:16,831[0m - Using 13 clusters for generation
[32m2025-03-15 17:04:28,390[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-15 17:04:50,397[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 17:05:00,898[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-15 17:05:00,898[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-15 17:05:00,898[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 17:05:10,695[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
Question: How many research objects with the name Architecture Description have been published in 2018 compared to those published in 2020?
Answer: Based on the available conte

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods with the name [evaluation method name] have been published in 2020 in comparison to 2021?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2020", "2021"],
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 17:54:58,344[0m - Found 14 inital clusters
[32m2025-03-15 17:56:13,821[0m - Using 12 clusters for generation
[32m2025-03-15 17:56:13,822[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-15 17:56:25,210[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 17:56:25,211[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-15 17:56:25,211[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-15 17:56:32,441[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-15 17:56:38,353[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 17:56:38,354[0m - The cluster has a golden triple size of 42 which is higher than the limit of 10.
[32m2025-03-15 17:56:38,354[0m -

### Negation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation methods that have been published by the author [author name] have no evaluation guidelines?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should list the evaluation methods",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Has Guideline",
                information_value_restriction="False",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 14:53:25,682[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R869235', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
[32m2025-03-13 14:55:18,360[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 14:55:23,628[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-13 14:55:44,411[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 14:56:39,891[0m - Reached the soft limit of 10.
Question: What evaluation methods that have been published by the author Manoj Bhat have no evaluation guidelines?
Answer: The evaluation methods published by the author Manoj Bhat that have no evaluation guidelines include Data Science and Case Study.
Golden Triples: ['(R873899:authors list, has list element, L1531467:Manoj Bhat)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What research objects, that have been published by the author [author name] have no evaluation method?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The answer you generate should list the research objects that have no evaluation method",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
                information_value_restriction="False",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Object",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

Question: What research objects, that have been published by the author Uwe Zdun, have no evaluation method?
Answer: The research objects published by Uwe Zdun that have no evaluation method are: Architecture Optimization Method and Architectural Aspects.
Golden Triples: ['(R869265:authors list, has list element, L1521866:Uwe Zdun)', '(R869280:Evaluation, Evaluation method, L1521903:False)', '(R869282:Research Object Entity, Name, L1521906:Architecture Optimization Method)', '(R871897:authors list, has list element, L1527351:Uwe Zdun)', '(R871912:Evaluation, Evaluation method, L1527385:False)', '(R871914:Research Object Entity, Name, L1527388:Architectural Aspects)']
Hops: 6
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
2025048a-6b67-4553-bb99-8330bad2bd56,"What research objects, that have been published by the author Uwe Zdun, have 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What research objects, that have been published by the author [author name] have their input data set to none?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The answer you generate should list the research objects that have no evaluation method",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="None"
            ),
            AdditionalInformationRestriction(
                information_predicate="Object",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 17:59:09,746[0m - Found 105 inital clusters
[32m2025-03-15 18:01:02,861[0m - Using 57 clusters for generation
[32m2025-03-15 18:01:58,279[0m - Reached the soft limit of 10.
Question: What research objects, that have been published by the author Patricia Lago, have their input data set to none?
Answer: The research objects published by Patricia Lago that have their input data set to none are 'Technical Debt' and 'Architecture Design Method'.
Golden Triples: ['(R872608:authors list, has list element, L1528843:Patricia Lago)', '(R875021:Input Data, None, L1533885:True)', '(R875018:Research Object Entity, Name, L1533875:Technical Debt)', '(R874175:authors list, has list element, L1532034:Patricia Lago)', '(R874187:Input Data, None, L1532074:True)', '(R874184:Research Object Entity, Name, L1532064:Architecture Design Method)']
Hops: 6
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator( #RERUN
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the sub-property that appears the most for the research object [research object name] published by the author [author name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Sub-Property"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 18:14:16,497[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'object_id': 'R869313', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
[32m2025-03-13 18:19:45,215[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 18:19:53,836[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 18:19:59,728[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 18:19:59,728[0m - The cluster has a golden triple size of 11 which is higher than the limit of 10.
[32m2025-03-13 18:20:07,765[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-13 18:20:14,204[0m - The cluster has a golden triple size of 11 which is higher than the limit of 10.
[32m2025-03-13 18:20:14,204[0m - The cluster has a golden triple siz

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation method has been used the most in the paper class [paper class name] with the research object [research object name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should count the number of evaluation methods",
                "The answer should be a list of evaluation methods in descending alphabetical order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=20,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 01:31:18,542[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-14 01:31:18,542[0m - The cluster has a golden triple size of 15 which is higher than the limit of 10.
[32m2025-03-14 01:31:18,542[0m - The cluster has a golden triple size of 41 which is higher than the limit of 10.
[32m2025-03-14 01:31:18,543[0m - The cluster has a golden triple size of 15 which is higher than the limit of 10.
[32m2025-03-14 01:31:26,508[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 01:31:26,509[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-14 01:31:26,509[0m - The cluster has a golden triple size of 24 which is higher than the limit of 10.
[32m2025-03-14 01:31:26,509[0m - The cluster has a golden triple size of 49 which is higher than the limit of 10.
[32m2025-03-14 01:31:26,509[0m - The cluster has a golden triple size of 43 whic

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the most used evaluation method that is used with the research object [research object name] in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should be a list of the evaluation methods"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=20,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 00:54:35,204[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 00:54:39,633[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-14 00:54:53,905[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 00:55:10,428[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 00:55:10,428[0m - The cluster has a golden triple size of 13 which is higher than the limit of 10.
[32m2025-03-14 00:55:21,906[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 00:55:29,386[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 00:55:29,386[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-14 00:55:29,386[0m - The cluster has a golden triple size of 17 which is higher than the limit of 10.
[32m2

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of the research objects that are used in conjunction with the evaluation method [evaluation method name] between 2019 and 2021?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2019", "2021"],
            ),
            AdditionalInformationRestriction(
                information_predicate="Object"
            ),
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 17:27:25,878[0m - Found 14 inital clusters
[32m2025-03-14 17:28:53,037[0m - Using 12 clusters for generation
[32m2025-03-14 17:29:01,829[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-14 17:29:06,847[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-14 17:29:06,847[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-14 17:29:06,848[0m - The cluster has a golden triple size of 19 which is higher than the limit of 10.
[32m2025-03-14 17:29:06,848[0m - The cluster has a golden triple size of 17 which is higher than the limit of 10.
[32m2025-03-14 17:29:06,848[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-14 17:29:06,848[0m - The cluster has a golden triple size of 61 which is higher than the limit of 10.
[32m2025-03-14 17:29:34,417[0m - Skipping QA-Pair as not all

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of the evaluation methods that are used in conjunction with the sub-property [sub property name] between 2017 and 2019?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2017", "2018", "2019"],
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 18:02:26,481[0m - Found 15 inital clusters
[32m2025-03-14 18:04:41,614[0m - Using 14 clusters for generation
[32m2025-03-14 18:04:41,615[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-14 18:04:41,615[0m - The cluster has a golden triple size of 11 which is higher than the limit of 10.
[32m2025-03-14 18:04:41,615[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 18:04:41,615[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-14 18:04:41,616[0m - The cluster has a golden triple size of 87 which is higher than the limit of 10.
[32m2025-03-14 18:04:48,019[0m - The cluster has a golden triple size of 17 which is higher than the limit of 10.
[32m2025-03-14 18:04:48,020[0m - The cluster has a golden triple size of 63 which is higher than the limit of 10.
[32m2025-03-14 18:04:48,020[0m - The cluster has a golden 

## Use Case 6

### Basic

In [None]:
# Here we want that the answer only contains one specific paper which is a special case in our data
# as it is the only one that is from the type Philosophical Papers. Therefore we use the 
# FromTopicEntityGenerator to generate the question
qa_pairs = []
qa_strategy = FromTopicEntityGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    from_topic_entity_options=FromTopicEntityGeneratorOptions(
        topic_entity=graph.get_entity_by_id("R873077") # The id of the paper
    ),     
    options=GenerationOptions(
        template_text="Which paper that includes the research object [research object name] has the paper class [paper class name]?",
        additional_requirements=[
            "The context should only include the triple that contains the paper class and research object of the paper",
            "You can assume that it is the only paper with that specific conditions"
        ],
        validate_contexts=False,
        convert_path_to_text=False,
        classify_questions=False,
    )     
)
qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

Question: Which paper that includes the research object Architecture Decision Making has the paper class philosophical paper and is titled "Architectural Design Decisions for Systems Supporting Model-Based Analysis of Runtime Events: A Qualitative Multi-method Study"?
Answer: The paper titled 'Architectural Design Decisions for Systems Supporting Model-Based Analysis of Runtime Events: A Qualitative Multi-method Study' includes the research object Architecture Decision Making and has the paper class philosophical paper.
Golden Triples: ['(R873092:Paper Class, philosophical paper, L1529814:True)']
Hops: 3
Topic Entity: Architectural Design Decisions for Systems Supporting Model-Based Analysis of Runtime Events: A Qualitative Multi-method Study
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
14f1dbb3-809b-43dc-8f62-e7d6a1173f68,"Which paper that includes the research object Architect

In [35]:

qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which paper that includes the evaluation method [evaluation method name] is authored by [author name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=2,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 15:11:35,746[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R872362', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
[32m2025-03-13 15:19:26,528[0m - The cluster has a golden triple size of 10 which is higher than the limit of 2.
[32m2025-03-13 15:19:26,528[0m - The cluster has a golden triple size of 6 which is higher than the limit of 2.
[32m2025-03-13 15:19:26,528[0m - The cluster has a golden triple size of 11 which is higher than the limit of 2.
[32m2025-03-13 15:19:26,529[0m - The cluster has a golden triple size of 7 which is higher than the limit of 2.
[32m2025-03-13 15:19:26,529[0m - The cluster has a golden triple size of 4 which is higher than the limit of 2.
[32m2025-03-13 15:19:26,529[0m - The cluster has a golden triple size of 7 which is higher than the limit of 2.
[32m2025-03-13 15:19:26,529[0m - The 

### Aggregation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator( #RERUN
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers used the evaluation method [evaluation method name] in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=4,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 20:33:09,760[0m - The cluster has a golden triple size of 24 which is higher than the limit of 10.
[32m2025-03-13 20:33:27,044[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-13 20:33:27,045[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-13 20:33:27,045[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-13 20:33:58,126[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-13 20:34:03,793[0m - Reached the soft limit of 10.
Question: Which papers used the evaluation method Interview in the year 2021?
Answer: The papers that used the evaluation method Interview in the year 2021 are: 'State of the Practice in Application Programming Interfaces (APIs): A Case Study', 'A Decision Model for Choosing Patterns in Blockchain-Based Applications', 'Aligning Architecture with Busines

In [36]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers used [evaluation method] as a evaluation method in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

KeyboardInterrupt: 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers used [research object] as a research object in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 01:04:03,813[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-14 01:04:12,046[0m - The cluster has a golden triple size of 36 which is higher than the limit of 10.
[32m2025-03-14 01:04:12,046[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 01:04:12,046[0m - The cluster has a golden triple size of 62 which is higher than the limit of 10.
[32m2025-03-14 01:04:12,047[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-14 01:04:12,047[0m - The cluster has a golden triple size of 34 which is higher than the limit of 10.
[32m2025-03-14 01:04:12,047[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 01:04:20,216[0m - The cluster has a golden triple size of 32 which is higher than the limit of 10.
[32m2025-03-14 01:04:20,217[0m - The cluster has a golden triple size 

### Counting

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers used [evaluation method] as a evaluation method in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 05:32:49,486[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 05:32:49,487[0m - The cluster has a golden triple size of 96 which is higher than the limit of 10.
[32m2025-03-13 05:32:49,487[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 05:32:49,487[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-13 05:32:49,487[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-13 05:32:58,423[0m - Reached the soft limit of 3.
Question: How many papers used 'Evaluation Method Entity' as an evaluation method in the year 2017?
Answer: One paper used 'Evaluation Method Entity' as an evaluation method in the year 2017. This paper is titled 'Experiments in Curation: Towards Machine-Assisted Construction of Software Architecture Knowledge Bases.'
Golden Triples: ['(R871597:Evaluatio

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers used [evaluation method] as a evaluation method in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 05:32:49,486[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 05:32:49,487[0m - The cluster has a golden triple size of 96 which is higher than the limit of 10.
[32m2025-03-13 05:32:49,487[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-13 05:32:49,487[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-13 05:32:49,487[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-13 05:32:58,423[0m - Reached the soft limit of 3.
Question: How many papers used 'Evaluation Method Entity' as an evaluation method in the year 2017?
Answer: One paper used 'Evaluation Method Entity' as an evaluation method in the year 2017. This paper is titled 'Experiments in Curation: Towards Machine-Assisted Construction of Software Architecture Knowledge Bases.'
Golden Triples: ['(R871597:Evaluatio

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers used [research object] as a research object in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 00:22:40,682[0m - Reached the soft limit of 10.
Question: How many papers used Incremental Calibration of Architectural Performance Models with Parametric Dependencies as a research object in the year 2020?
Answer: Two papers used Incremental Calibration of Architectural Performance Models with Parametric Dependencies as a research object in the year 2020. Both papers are published in 2020 and focus on the same research object.
Golden Triples: ['(R870075:Research Object Entity, Name, L1523593:Architecture Evolution)', '(R870059:Incremental Calibration of Architectural Performance Models with Parametric Dependencies, publication year, L1523564:2020)']
Hops: 6
Topic Entity: Software Architecture and Design
CSV: 
 uid,question,golden_answer,source_ids,golden_doc_chunks,golden_triples,is_generated_with,topic_entity_id,topic_entity_value,hops,based_on_template
a8f14b41-44a8-4165-829f-7100cf7880a4,How many papers used Incremental Calibration of Architectural Performance Mode

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers used [evaluation method] as a evaluation method in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 00:26:29,543[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-14 00:26:29,544[0m - The cluster has a golden triple size of 26 which is higher than the limit of 10.
[32m2025-03-14 00:26:47,154[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-14 00:27:04,428[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 00:27:17,353[0m - Reached the soft limit of 10.
Question: How many papers used the questionnaire as an evaluation method in the year 2018?
Answer: Two papers used the questionnaire as an evaluation method in the year 2018. These papers are 'Migrating Towards Microservice Architectures: An Industrial Survey' and 'Identifying and Prioritizing Architectural Debt Through Architectural Smells: A Case Study in a Large Software Company.'
Golden Triples: ['(R870999:Evaluation Method Entity, Name, L1525536:Questionnaire)', '(R870989

### Ranking

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications, ranked in descending order of their publication year, have [paper class] as their paper class and include the evaluation method [evaluation method name]?",
            additional_requirements=[
                "The answer should be list thhe publication titles in chronological order",
                "Ensure that the list is ordered correctly based on the publication year in descending order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 00:17:55,915[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-14 00:17:55,915[0m - The cluster has a golden triple size of 84 which is higher than the limit of 10.
[32m2025-03-14 00:18:02,772[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 00:18:02,772[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-14 00:18:02,772[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-14 00:18:02,772[0m - The cluster has a golden triple size of 21 which is higher than the limit of 10.
[32m2025-03-14 00:18:02,772[0m - The cluster has a golden triple size of 72 which is higher than the limit of 10.
[32m2025-03-14 00:18:19,049[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-14 00:18:27,923[0m - Skipping QA-Pair as not all golden triples are 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications, ranked in descending order of their publication year, have [author name] as an author have and have evaluation method [evaluation_method_name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods of the paper",
                "The answer should be a list of publication titles in chronological order",
                "Ensure that the list is ordered correctly based on the publication year in descending order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
                split_clusters=True
            ),            
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-13 16:23:40,455[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R870352', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Read timed out. (read timeout=15)
[32m2025-03-13 16:24:19,564[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R873139', 'page': 0, 'size': 100}. Attempt 1 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): Max retries exceeded with url: /api/statements?subject_id=R873139&page=0&size=100 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f428a995010>, 'Connection to sandbox.orkg.org timed out. (connect timeout=15)'))
[32m2025-03-13 16:24:39,384[0m - GET request failed for https://sandbox.orkg.org/api/statements with params {'subject_id': 'R873139', 'page': 0, 'size': 100}. Attempt 2 of 10. Error: HTTPSConnectionPool(host='sandbox.orkg.org', port=443): 

### Negation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers with the research object [research object name] do not have input data available in the year [year]?",
            additional_requirements=[
                "You must ensure that all the triples given to you are included in the context"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="None"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 22:05:57,573[0m - Found 14 inital clusters
[32m2025-03-15 22:07:38,622[0m - Using 26 clusters for generation
[32m2025-03-15 22:07:48,454[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 22:07:53,201[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 22:07:58,917[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 22:07:58,917[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 22:08:03,730[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 22:08:17,456[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
Question: Which papers with the research object Architecture Decision Making do not have input data available in the

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers with the evaluation method [evaluation method] do not have tool support available in the year [year]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="None"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 22:18:48,461[0m - Found 14 inital clusters
[32m2025-03-15 22:20:50,430[0m - Using 30 clusters for generation
[32m2025-03-15 22:20:57,769[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 22:21:12,685[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 22:21:19,154[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 22:21:19,155[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 22:21:28,269[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
Question: Which papers with the evaluation method 'Input Data' do not have tool support available in the year 2017?
Answer: The papers that do not have tool support available in the year 2017 with the evaluation method 'Input Data' 

### Comparative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers have the research object [research object name] in the year 2020 in comparison to 2021?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2020", "2021"],
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 18:10:17,126[0m - Found 14 inital clusters
[32m2025-03-15 18:11:15,116[0m - Using 12 clusters for generation
[32m2025-03-15 18:11:15,116[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-15 18:11:20,404[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 18:11:20,404[0m - The cluster has a golden triple size of 18 which is higher than the limit of 10.
[32m2025-03-15 18:11:20,404[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-15 18:11:20,405[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-15 18:11:20,405[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-15 18:11:25,333[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 18:11:31,988[0m

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers have the sub-property [sub property name] in the year 2017 in comparison to 2020?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2017", "2020"],
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-15 18:15:18,146[0m - Found 15 inital clusters
[32m2025-03-15 18:17:07,572[0m - Using 12 clusters for generation
[32m2025-03-15 18:17:07,572[0m - The cluster has a golden triple size of 52 which is higher than the limit of 10.
[32m2025-03-15 18:17:07,573[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-15 18:17:07,573[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-15 18:17:15,362[0m - Skipping QA-Pair as the amount of golden triples is not equal to the amount of generated pairs.
[32m2025-03-15 18:17:22,142[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-15 18:17:22,142[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-15 18:17:29,714[0m - The cluster has a golden triple size of 24 which is higher than the limit of 10.
[32m2025-03-15 18:17:29,714[0m - The cluster 

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which paper class has the most papers with the evaluation method [method name] in the publication year [year]?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="paper class",
                information_value_restriction="True"               
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 15:16:04,097[0m - Found 14 inital clusters
[32m2025-03-16 15:18:36,112[0m - Using 49 clusters for generation
[32m2025-03-16 15:18:55,604[0m - The cluster has a golden triple size of 36 which is higher than the limit of 10.
[32m2025-03-16 15:18:55,604[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-16 15:18:55,604[0m - The cluster has a golden triple size of 4 which is lower than the minimum of 6.
[32m2025-03-16 15:18:55,604[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-16 15:18:55,605[0m - The cluster has a golden triple size of 44 which is higher than the limit of 10.
[32m2025-03-16 15:19:04,763[0m - The cluster has a golden triple size of 11 which is higher than the limit of 10.
[32m2025-03-16 15:19:04,763[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-16 15:20:04,944[0m - The cluster has a golden tri

In [29]:
qa_pairs = ClusterBasedQuestionGenerator(#RERUN
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which paper class has the most papers with the research object [object name] in the publication year [year]?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="paper class",
                information_value_restriction="True",            
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-16 15:09:16,056[0m - Found 14 inital clusters
[32m2025-03-16 15:11:30,644[0m - Using 57 clusters for generation
[32m2025-03-16 15:11:30,645[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-16 15:11:45,374[0m - The cluster has a golden triple size of 11 which is higher than the limit of 10.
[32m2025-03-16 15:12:10,111[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-16 15:12:10,111[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-16 15:12:10,112[0m - The cluster has a golden triple size of 20 which is higher than the limit of 10.
[32m2025-03-16 15:12:10,112[0m - The cluster has a golden triple size of 3 which is lower than the minimum of 6.
[32m2025-03-16 15:12:10,112[0m - The cluster has a golden triple size of 17 which is higher than the limit of 10.
[32m2025-03-16 15:12:10,112[0m - The cluster has a golden tri

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of papers with the evaluation method [evaluation method name] between 2019 and 2021?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2019", "2020", "2021"],
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 17:56:43,869[0m - Found 14 inital clusters
[32m2025-03-14 17:57:57,772[0m - Using 13 clusters for generation
[32m2025-03-14 17:57:57,772[0m - The cluster has a golden triple size of 2 which is lower than the minimum of 6.
[32m2025-03-14 17:57:57,773[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-14 17:58:11,731[0m - The cluster has a golden triple size of 12 which is higher than the limit of 10.
[32m2025-03-14 17:58:28,535[0m - The cluster has a golden triple size of 56 which is higher than the limit of 10.
[32m2025-03-14 17:58:33,364[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 17:58:33,365[0m - The cluster has a golden triple size of 60 which is higher than the limit of 10.
[32m2025-03-14 17:58:38,911[0m - Skipping QA-Pair as not all golden triples are in the generated pairs.
[32m2025-03-14 17:58:44,536[0m - Skipping QA-Pair as not all golden triples are

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of papers with the research object [research object name] between 2018 and 2020?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2018", "2019", "2020"],
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

[32m2025-03-14 17:21:05,028[0m - Found 14 inital clusters
[32m2025-03-14 17:22:00,737[0m - Using 13 clusters for generation
[32m2025-03-14 17:22:00,738[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-14 17:22:00,738[0m - The cluster has a golden triple size of 22 which is higher than the limit of 10.
[32m2025-03-14 17:22:00,738[0m - The cluster has a golden triple size of 2 which is lower than the minimum of 6.
[32m2025-03-14 17:22:08,483[0m - The cluster has a golden triple size of 24 which is higher than the limit of 10.
[32m2025-03-14 17:22:08,483[0m - The cluster has a golden triple size of 34 which is higher than the limit of 10.
[32m2025-03-14 17:22:08,483[0m - The cluster has a golden triple size of 16 which is higher than the limit of 10.
[32m2025-03-14 17:22:08,484[0m - The cluster has a golden triple size of 14 which is higher than the limit of 10.
[32m2025-03-14 17:22:08,484[0m - The cluster has a golden t