### QA Generation

In this notebook, the Question-Answering (QA) Dataset for the annotation data [merged_ecsa_icsa.json](../../../../data/external/merged_ecsa_icsa.json) is created.

The creation was guided by a matrix consisting of two dimensions:
1. **Use Case**
    - **1**: The first use case reflects the current state-of-practice in scientific literature search. The researcher seeks additional details about the metadata of one or more papers. To find this information, the researcher provides the QA system with specific metadata information related to the papers he is interested in. In response, the QA system returns information on other metadata attributes of the papers rather than content information.
    - **2**: In the second use case, the researcher seeks information about the content of one or more papers. In this use case, the researcher provides the QA system with metadata information about the papers and asks a question about their contents. The QA system is then expected to extract content information related to specific papers that conform to the metadata constraints provided.
    - **3**: In the third use case, the researcher seeks information about metadata of one or more papers. In this use case, the researcher provides the QA system with content constraints about the papers and asks a question about the metadata of the paper. The QA system is then expected to extract metadata information related to the specific papers mentioned in the question.
    - **4**: In the fourth use case, the researcher seeks information about the content of one or more papers. In this use case, the researcher provides the QA system with content information about the paper, and asks a question about the content of the paper. The QA system is then expected to extract content information related to the specific papers mentioned in the question.
    - **5**: In the fifth use case, a researcher seeks information about the content of one or more papers. In this use case, the researcher provides the retriever with both metadata and content information about the papers and asks a question about the content of the paper. The retriever is then expected to extract content information related to the specific papers mentioned in the question.
    - **6**: In the sixth use case, the researcher seeks information about metadata of one or more papers. In this specific use case, the researcher provides the retriever with both metadata and content information about the papers, such as the name of an evaluation method and the year of publication, and asks a question about the metadata of the papers. The retriever is then expected to extract metadata information related to the specific papers mentioned in the question.

2. **Retrieval Operation Classification**
    - **Basic**: Classifies those questions where the retriever is required to just find one ore more facts in the Knowledge Graph and use those to provide the answer without further processing.
    - **Aggregation**: Classifies those questions where the retriever is required to quantitatively or qualitatively aggregate the information in the Knowledge Graph to answer the question.
    - **Comparative**: Classifies those questions where the retriever is required to compare two or more pieces of information in the Knowledge Graph to answer the question.
    - **Ranking**: Classifies those questions where the retriever is required to rank the information in the Knowledge Graph to answer the question.
    - **Counting**: Classifies those questions where the retriever is required to count the number of occurrences of a certain information in the Knowledge Graph to answer the question.
    - **Superlative**: Classifies those questions where the retriever is required to identify the most or least of a certain information in the Knowledge Graph to answer the question.
    - **Relationship**: Classifies questions where the retriever must identify any type of interconnection or reliance between pieces of information in the Knowledge Graph. Essentially, it captures all scenarios where one piece of data is influenced by, contingent upon, or systematically linked to another.
    - **Negation**: Classifies those questions where the retriever is required to negate the information in the Knowledge Graph to answer the question.

### How to Read this File

For each prepared question template (see [here](../templates.md)) we prepare the parameters for the clustering or subgraph construction strategies below. From each of the generated questions, we then selected those questions that we considered to already have a high quality and include no hallucination while conforming to the indended template, use case, and retrieval operation.

In [None]:
%load_ext autoreload
%autoreload 2

import pandas as pd
from sqa_system.knowledge_base.knowledge_graph.storage import KnowledgeGraphManager
from sqa_system.core.language_model.llm_provider import LLMProvider
from sqa_system.core.data.models import QAPair
from sqa_system.app.cli.cli_progress_handler import ProgressHandler
from sqa_system.core.config.models import LLMConfig, KnowledgeGraphConfig, EmbeddingConfig

# Initialize the Generators
from sqa_system.qa_generator.strategies import (
    FromTopicEntityGenerator, FromTopicEntityGeneratorOptions, GenerationOptions)
from sqa_system.qa_generator.strategies import (
    PaperComparisonGenerator, PaperComparisonGeneratorOptions)
from sqa_system.qa_generator.strategies.clustering_strategy.cluster_based_question_generator import (
    ClusterBasedQuestionGenerator, 
    ClusterGeneratorOptions, 
    AdditionalInformationRestriction,
    ClusterStrategyOptions
)


# Prepare the Progress Handler, which we are going to disable because of compatibility issues
# with Jupyter Notebooks
progress_handler = ProgressHandler()
progress_handler.disable()

# Prepare Knowledge Graph
kg_config = KnowledgeGraphConfig.from_dict({
    "additional_params": {
        "contribution_building_blocks": {
            "Classifications_2": [
                "paper_class",
                "research_level",
                "all_research_objects",
                "validity",
                "evidence"
            ]
        },
        "force_cache_update": True,
        "force_publication_update": False,
        "subgraph_root_entity_id": "R659055",
        "orkg_base_url": "https://sandbox.orkg.org"
    },
    "graph_type": "orkg",
    "dataset_config": {
        "name": "merged_ecsa.json_jsonpublicationloader_limit-1",
        "additional_params": {},
        "file_name": "merged_ecsa_icsa.json",
        "loader": "JsonPublicationLoader",
        "loader_limit": -1
    },
    "extraction_llm": {
        "name": "openai_gpt-4o-mini_tmp0.0_maxt-1",
        "additional_params": {},
        "endpoint": "OpenAI",
        "name_model": "gpt-4o-mini",
        "temperature": 0.0,
        "max_tokens": -1
    },
    "extraction_context_size": 4000,
    "chunk_repetitions": 2
})
graph = KnowledgeGraphManager().get_item(kg_config)

# Prepare the Research Field Topic Entity
research_field = graph.get_entity_by_id("R659055")

# Prepare Language Model
gpt_4o_mini_config = LLMConfig.from_dict({
    "endpoint": "OpenAI",
    "name_model": "gpt-4o-mini",
    "temperature": 0.0,
    "max_tokens": -1
})
gpt_4o_mini = LLMProvider().get_llm_adapter(gpt_4o_mini_config)

gpt_4o_config = LLMConfig.from_dict({
    "endpoint": "OpenAI",
    "name_model": "gpt-4o",
    "temperature": 0.0,
    "max_tokens": -1
})
gpt_4o = LLMProvider().get_llm_adapter(gpt_4o_config)

gpt_o3_mini_config = LLMConfig.from_dict({
    "endpoint": "OpenAI",
    "name_model": "o3-mini",
    "temperature": None,
    "max_tokens": -1,
    "reasoning_effort": "low"
})
gpt_o3_mini = LLMProvider().get_llm_adapter(gpt_o3_mini_config)

embedding_config = EmbeddingConfig.from_dict({
    "name": "openai_text-embedding-3-small",
    "additional_params": {},
    "endpoint": "OpenAI",
    "name_model": "text-embedding-3-small"
})

def print_qa_pairs(qa_pairs: list[QAPair]):
    if not qa_pairs:
        print("No QA pairs generated")
    for qa_pair in qa_pairs:
        print(f"Question: {qa_pair.question}")
        print(f"Answer: {qa_pair.golden_answer}")
        print(f"Golden Triples: {qa_pair.golden_triples}")
        print(f"Hops: {qa_pair.hops}")
        print(f"Topic Entity: {qa_pair.topic_entity_value}")
        df = pd.DataFrame([qa_pair.model_dump()])
        print(f"CSV: \n {df.to_csv(index=False)}")
        print("------------------") 

## Use Case 1

### Basic


In [None]:
qa_pairs = []
qa_strategy = FromTopicEntityGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    from_topic_entity_options=FromTopicEntityGeneratorOptions(
        topic_entity=research_field,
    ),     
    options=GenerationOptions(
        template_text="In which venue has the paper '[paper_title]' been published?",
        additional_requirements=[
            "The generated question should include the title of the paper.",
            "The context should only include the triple that contains the venue of the paper",
        ],
        validate_contexts=False,
        convert_path_to_text=False,
        classify_questions=False,
    )     
)
qa_pairs.extend(qa_strategy.generate())

qa_strategy = FromTopicEntityGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    from_topic_entity_options=FromTopicEntityGeneratorOptions(
        topic_entity=graph.get_random_publication(),
    ),     
    options=GenerationOptions(
        template_text="In which venue has the paper '[paper_title]' been published?",
        additional_requirements=[
            "The generated question should include the title of the paper.",
            "The context should only include the triple that contains the venue of the paper",
        ],
        validate_contexts=False,
        convert_path_to_text=False,
        classify_questions=False,
    )     
)
qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="Who are the authors of the paper '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the authors of the paper.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="Who are the authors of the paper '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the authors of the paper.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []

qa_pairs.extend(ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the replication package link of the paper '[paper_title]'?",
            additional_requirements=[],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P168010",
        restriction_text="Replication Package Link",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate())

qa_strategy = FromTopicEntityGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    from_topic_entity_options=FromTopicEntityGeneratorOptions(
        topic_entity=graph.get_entity_by_id("R870141"), # We manually selected a publication that includes a replication package link
    ),     
    options=GenerationOptions(
        template_text="What is the replication package link of the paper '[paper_title]'?",
        additional_requirements=[
            "The generated question should include the title of the paper.",
            "The context should only include the triple of replication package link.",
        ],
        validate_contexts=False,
        convert_path_to_text=False,
        classify_questions=False,
    )     
)
qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

### Aggregation


In [None]:
qa_pairs = []
qa_pairs.extend(ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have been published by the author [author name] in the year [publication year]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []
qa_pairs.extend(ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers have the research level [research level]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162008",
        restriction_text="research level",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate())

print_qa_pairs(qa_pairs)

### Counting

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many publications have been published by the author [author name]?",
            additional_requirements=[],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many publications have the paper class with the name [paper class name]?",
            additional_requirements=[],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

### Ranking

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have the paper class [paper class] ranked by their publication year?",
            additional_requirements=[
                "The context should only include the triples that contain the paper class and publication year of the paper",
                "The answer should be a list of publication titles in chronological order",
                "Ensure that the list is ordered correctly based on the publication year in descending order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have been published by the author [author name] ranked by their publication year?",
            additional_requirements=[
                "The context should only include the triples that contain the authors and publication year of the paper",
                "The answer should be a list of publication titles in chronological order",
                "Ensure that the list is ordered correctly based on the publication year in descending order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Comparative

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = PaperComparisonGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        comparison_options=PaperComparisonGeneratorOptions(
            first_publication=graph.get_random_publication(),
            second_publication=graph.get_random_publication(),
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="In what publication year has the publication '[paper title 1]' been published in comparison to the publication '[paper title 2]'?",
            additional_requirements=[
                "The context should only include the triples of the publication years of the papers.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = PaperComparisonGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        comparison_options=PaperComparisonGeneratorOptions(
            first_publication=graph.get_random_publication(),
            second_publication=graph.get_random_publication(),
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What is the paper class of the paper '[paper title 1]' in comparison to the publication '[paper title 2]'?",
            additional_requirements=[
                "The context should only include the triples of the paper classes of the papers.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which author has published the most publications with the paper class [paper class name]?",
            additional_requirements=[
                "Include all triples in your context"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="authors",
                split_clusters=True,
            )
        ],
        only_use_cluster_with_most_triples=True
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the paper class that the author [author name] has published the most?",
            additional_requirements=[
                "The context should only include the triples that contain the research object and the sub-property.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="paper class"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of papers that have been published by the author [author name] per year?",
            additional_requirements=[
                "The context should only include the triples that contain the research object and the sub-property.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of paper classes that have been published by the author [author name] per year?",
            additional_requirements=[
                "The context should only include the triples that contain the research object and the sub-property.",
                "First list the amount of times the entity was used in the year",
                "Then give a final statement about the proportion",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            ),
            AdditionalInformationRestriction(
                information_predicate="paper class",
                information_value_restriction="True"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

## Use Case 2

### Basic


In [None]:
qa_pairs = []
for _ in range(2):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="Does the paper '[paper_title]' use tool support?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the tool support of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "Just give a concise answer to the question.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="Does the paper '[paper_title]' use tool support?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the tool support of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "Just give a concise answer to the question.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []
for _ in range(2):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation method used in the paper '[paper title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that specifically mention the evaluation method of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "The answer should be similar to the following: 'The evaluation method used in the paper is [evaluation method]'.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation method used in the paper '[paper title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that specifically mention the evaluation method of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "The answer should be similar to the following: 'The evaluation method used in the paper is [evaluation method]'.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []
for _ in range(2):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="Does the paper '[paper_title]' have a replication package?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the replication package of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "Just give a concise answer to the question.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="Does the paper '[paper_title]' have a replication package?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the replication package of the paper.",
                "Do not use any IDs in your answer. Do not make any assumptions beyond the given triples.",
                "Just give a concise answer to the question.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

### Aggregation

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field,
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="What are the evaluation methods of the publication '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain the evaluation methods of the paper",
                "Evaluation Methods are encoded in the triples of the format (R870531:Evaluation Method Entity, Name, [Evaluation Method Name])",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R871346"), # A publication with multiple evaluation methods
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="What are the evaluation methods of the publication '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain the evaluation methods of the paper",
                "Evaluation Methods are encoded in the triples of the format (R870531:Evaluation Method Entity, Name, [Evaluation Method Name])",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []
for _ in range(2):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field,
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="What are the threats to validity of the paper '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication(),
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="What are the threats to validity of the paper '[paper_title]'?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

### Counting


In [None]:
qa_pairs = []
for _ in range(4):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field,
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="How many evaluation methods does the paper with the title '[paper title]' have?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "You can only use triples of the form: (Evaluation, Evaluation method, [method name])"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication(),
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="How many evaluation methods does the paper with the title '[paper title]' have?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "You can only use triples of the form: (Evaluation, Evaluation method, [method name])"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods are used by the author [author name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods of the papers",
                "Evaluation Methods are encoded in the triples of the format (R870531:Evaluation Method Entity, Name, [Evaluation Method Name])",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []
for _ in range(2):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field,
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="How many threats to validity does the paper with the title '[paper title]' have?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication(),
            maximum_subgraph_size=100
        ),     
        options=GenerationOptions(
            template_text="How many threats to validity does the paper with the title '[paper title]' have?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

### Ranking

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="Which threads to validity does the publication '[paper title]' have, ranked in descending alphabetical order?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="Which threads to validity does the publication '[paper title]' have, ranked in descending alphabetical order?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the threads to validity of the paper.",
                "Only include the threats that have a 'True' boolean value",
                "You can only use triples that explicitly have 'Threat to Validity' in the object"
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="Which sub-properties does the publication '[paper title]' have, ranked in descending alphabetical order?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the sub-properties of the paper.",
                "Your answer should list all sub-properties in descending alphabetical order.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R871674"), # A paper that has multiple sub-properties
        ),     
        options=GenerationOptions(
            template_text="Which sub-properties does the publication '[paper title]' have, ranked in descending alphabetical order?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples that contain information about the sub-properties of the paper.",
                "Your answer should list all sub-properties in descending alphabetical order.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which sub-property has been used the most in the year [year] with papers of the paper class [paper class]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Sub-Property"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        use_predicate_as_value=True,
        restriction_value="True",
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which evaluation method has been used the most in the year [year] with papers of the paper class [paper class]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        use_predicate_as_value=True,
        restriction_value="True",
    )
).generate()

print_qa_pairs(qa_pairs)

### Comparative

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = PaperComparisonGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        comparison_options=PaperComparisonGeneratorOptions(
            first_publication=graph.get_random_publication(),
            second_publication=graph.get_random_publication(),
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What evaluation methods does the paper [paper_title_1] use compared to the paper [paper_title_2]?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples of the evaluation methods of the papers.",
                "Evaluation Methods are encoded in the triples of the format (R870531:Evaluation Method Entity, Name, [Evaluation Method Name])",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []
for _ in range(2):
    qa_strategy = PaperComparisonGenerator(
        graph=graph,
        llm_adapter=gpt_o3_mini,
        comparison_options=PaperComparisonGeneratorOptions(
            first_publication=graph.get_random_publication(),
            second_publication=graph.get_random_publication(),
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What paper class does the paper [paper_title_1] have compared to the paper [paper_title_2]?",
            additional_requirements=[
                "The generated question should include the title of the paper.",
                "The context should only include the triples of the paper class that are marked as being 'True'",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of the evaluation methods that have been published by the author [author name] per year?",
            additional_requirements=[
                "The context should only include the triples that contain the research object and the sub-property.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of research objects that have been published by the author [author name] per year?",
            additional_requirements=[
                "The context should only include the triples that contain the research object and the sub-property.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            ),
            AdditionalInformationRestriction(
                information_predicate="Object"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

## Use Case 3

### Basic

In [None]:
qa_pairs = []
for _ in range(1):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R868403"), # We manually selected this publication as it is the only one that has robustness as a property
        ),     
        options=GenerationOptions(
            template_text="Which paper includes the evaluation sub-property robustness?",
            additional_requirements=[
                "leave the template as is and do not change it",
                "The context should only include the robustness sub-property triple. Therefore only one triple can be in the context.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)


In [None]:
qa_pairs = []
for _ in range(1):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R873171"), # We manually selected this publication as it is the only one that has Recovery as a property
        ),     
        options=GenerationOptions(
            template_text="Which paper includes the evaluation sub-property Recovery?",
            additional_requirements=[
                "leave the template as is and do not change it",
                "The context should only include the Recovery sub-property triple. Therefore only one triple can be in the context.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)


In [None]:
qa_pairs = []
for _ in range(1):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R872980"), # We manually selected this publication as it is the only one that has Limit of detection as a property
        ),     
        options=GenerationOptions(
            template_text="Which paper includes the evaluation sub-property Limit of detection?",
            additional_requirements=[
                "leave the template as is and do not change it",
                "The context should only include the Limit of detection sub-property triple. Therefore only one triple can be in the context.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)


In [None]:
qa_pairs = []
for _ in range(1):
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_entity_by_id("R873116"), # We manually selected this publication as it is the only one that has verification as evaluation method
        ),     
        options=GenerationOptions(
            template_text="Which paper includes the evaluation method Verification?",
            additional_requirements=[
                "leave the template as is and do not change it",
                "The context should only include the Verification evaluation method triple. Therefore only one triple can be in the context.",
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)


### Aggregation


In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have the evaluation method [method]?",
            additional_requirements=[
                "The context should only include the triples of the evaluation methods of the papers.",
                "Evaluation Methods are encoded in the triples of the format (R870531:Evaluation Method Entity, Name, [Evaluation Method Name])",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[

        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have the evaluation sub-property [sub-property name]?",
            additional_requirements=[
                "The context should only include the triples of the sub-property of the papers.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="W [sub property name]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="available"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications evaluate the research object [research object name] with the sub-property [sub-property name] and have input data available?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_predicate_restriction="available",
                information_value_restriction="True",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Counting


In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many publications have the evaluation sub-property [sub-property name]?",
            additional_requirements=[
                "The context should only include the triples of the sub-propertys of the papers.",
                "The answer should count the number of publications that have the sub-property.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[

        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many publications have the evaluation method [method]?",
            additional_requirements=[
                "The context should only include the triples of the evaluation methods of the papers.",
                "The answer should count the number of publications that have the evaluation method",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[

        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers, that discuss the [threat to validity] as a threat to validity, apply [evaluation method] as a evaluation method?",
            additional_requirements=[
                "The context needs to include all triples given to you.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
                split_clusters=True
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P123037",
        restriction_text="Threats To Validity",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

### Ranking

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have the evaluation method [evaluation method name] ranked by the publication year?",
            additional_requirements=[
                "The context should only include the triples of the evaluation sub-properties of the papers.",
                "The answer should be a list of publication titles ranked by publication year",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications have the evaluation sub-property [evaluation sub-property name] ranked by the publication year?",
            additional_requirements=[
                "The context should only include the triples of the evaluation sub-properties of the papers.",
                "The answer should be a list of publication titles ranked by publication year",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of papers that have the research object [research object name] per year?",
            additional_requirements=[
                "You must include all the triples given to you in the context",
                "First list the amount of times the entity was used in the year",
                "Then give a final statement about the proportion",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            ),
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of papers that have the evaluation method [evaluation method name] per year?",
            additional_requirements=[
                "The context should include all triples given to you",
                "First list the amount of times the entity was used in the year",
                "Then give a final statement about the proportion",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            ),
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Comparative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers with the evaluation method [evaluation method name] have their input data marked as [input type] compared to those with the input data marked as [input type]?",
            additional_requirements=[
                "The context should include all triples given to you.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction=["available", "None"]
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers have used the sub-property [sub-property name] compared to those that used the sub-property [sub-property name], with the research object [research object name]?",
            additional_requirements=[
                "The context should include all triples given to you.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                information_value_restriction=["Portability", "Usability"]
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
    )
).generate()

print_qa_pairs(qa_pairs)

### Negation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers use the evaluation method [evaluation method name], but do not have any input data available?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
                "Evaluation methods are encoded as: (Evaluation Method Entity, Name, [Method Name])"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="None",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers use the research object [research object name], but do not have any input data available?",
            additional_requirements=[
                "The context should include all triples given to you.",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="None",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
    )
).generate()

print_qa_pairs(qa_pairs)

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the most used paper class for publications with the sub-property [sub property name]?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="paper class",
                information_value_restriction="True",           
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the most used paper class for publications with the evaluation method [method name] and the research object [resarch object name]?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",        
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="paper class",
                information_value_restriction="True",           
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

## Use Case 4

### Basic

In [None]:
# Here we want that the answer only contains one specific paper which is a special case in our data
# as it is the only one that is from the type Philosophical Papers. Therefore we use the 
# FromTopicEntityGenerator to generate the question
qa_pairs = []
qa_strategy = FromTopicEntityGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    from_topic_entity_options=FromTopicEntityGeneratorOptions(
        topic_entity=graph.get_entity_by_id("R873077"), # The id of the paper
        maximum_subgraph_size=100
    ), 
    options=GenerationOptions(
        template_text="Which research object is used with the paper class [paper class]?",
        additional_requirements=[
            "The context should only include the triple that contains the paper class and research object of the paper",
            "You can assume that it is the only paper with that specific conditions",
            "The context should contain two triples: (R873092:Paper Class, [the class name], True) and (Research Object Entity, Name, [research object name])",
        ],
        validate_contexts=False,
        convert_path_to_text=False,
        classify_questions=False,
    )     
)
qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

### Aggregation 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which research objects are used in conjunction with the [evaluation method name] evaluation method?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Object",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which evaluation methods are used in conjunction with the evaluation sub-property [evaluation sub property name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation sub-properties and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Counting 

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods are used in conjunction with the evaluation sub-property [evaluation sub property name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and evaluation sub-properties of the paper",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods are used in conjunction with the [evaluation sub property name] sub-property?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Evaluation method"
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many research objects are used in conjunction with the [evaluation sub property name] sub-property?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation sub-properties and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Object",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Ranking

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which evaluation methods are used in conjunction with the evaluation sub-property [evaluation sub property name] ranked in descending alphabetical order?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and evaluation sub-properties of the paper",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which research objects are used in conjunction with the [evaluation sub property name] sub-property ranked in descending alphabetical order?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation sub-properties and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Object",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Negation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which research objects have occurrences, where the evaluation guideline marked as 'false'?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Has Guideline",
                information_value_restriction="False"
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which research objects that have the evaluation method [evaluation method name] do not use evaluation guidelines?",
            additional_requirements=[
                "Include all context given to you"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Has Guideline",
                information_value_restriction="False",
            ),
            AdditionalInformationRestriction(
                information_predicate="Object"
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which evaluation sub-properties are used in the research object [research object name] but do not use evaluation guidelines?",
            additional_requirements=[
                "Include all context given to you"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Has Guideline",
                information_value_restriction="False",
            ),
            AdditionalInformationRestriction(
                information_predicate="Sub-Property"
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the most used sub-property that is used with the research object [research object name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P123038",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the most used evaluation method that is used with the research object [research object name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods and research objects of the paper"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=(
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            ),
        )
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P123038",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Comparative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How often is the sub-property [sub-property name] used in comparison to the sub-property [sub-property] with the research object [research object name]?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                information_value_restriction=["Context coverage", "Satisfaction"]
            ),
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How often is the sub-property [sub-property name] used in comparison to the sub-property [sub-property] with the evaulation method [evaluation method name]?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                information_value_restriction=["Maintainability", "Usability"]
            ),
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of the evaluation method [evaluation method name] that is applied per year?",
            additional_requirements=[
                "The context should include all triples given to you",
                "First list the amount of times the entity was used in the year",
                "Then give a final statement about the proportion",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of the research object [research object name] that is applied per year?",
            additional_requirements=[
                "The context should include all triples given to you",
                "Only generate one answer and question pair",
                "First list the amount of times the entity was used in the year",
                "Then give a final statement about the proportion",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

## Use Case 5

### Basic

In [None]:
qa_pairs = []
for _ in range(3):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation sub-property used on the research object [research object name] in the publication '[paper title]'?",
            additional_requirements=[
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation sub-property used on the research object [research object name] in the publication '[paper title]'?",
            additional_requirements=[
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = []
for _ in range(3):
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=research_field
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation method used on the research object [research object name] in the publication '[paper title]'?",
            additional_requirements=[
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())
    
    qa_strategy = FromTopicEntityGenerator(
        graph=graph,
        llm_adapter=gpt_4o_mini,
        from_topic_entity_options=FromTopicEntityGeneratorOptions(
            topic_entity=graph.get_random_publication()
        ),     
        options=GenerationOptions(
            template_text="What is the evaluation method used on the research object [research object name] in the publication '[paper title]'?",
            additional_requirements=[
            ],
            validate_contexts=False,
            convert_path_to_text=False,
            classify_questions=False,
        )     
    )
    qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

### Aggregation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation methods have been published by the author [author name] with the research object [research object name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should list the evaluation methods"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation methods have been published by the author [author name] with the sub-property [sub-property name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should list the evaluation methods"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Counting

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods have been published by the author [author name] with the sub-property [sub-property name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should count the number of evaluation methods"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods have been published by the author [author name] with the research object [research object name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should count the number of evaluation methods"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Ranking

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation methods have been published by the author [author name] with the research object [research object name] ranked in descending alphabetical order?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should count the number of evaluation methods",
                "The answer should be a list of evaluation methods in descending alphabetical order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation sub-properties have been published by the author [author name] with the evaluation method [evaluation method name] ranked in descending alphabetical order?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should list the evaluation methods",
                "The answer should be ranked in descending alphabetical order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Sub-Property",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

### Comparative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_o3_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many research objects with the name [research object name] have been published in 2018 in comparison to 2020?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "Research object are formatted as '('Research Object Entity', Name, [Research Object Name]) in the contexts given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2018", "2020"]
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many evaluation methods with the name [evaluation method name] have been published in 2020 in comparison to 2021?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2020", "2021"],
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

### Negation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation methods that have been published by the author [author name] have no evaluation guidelines?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should list the evaluation methods",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Has Guideline",
                information_value_restriction="False",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What research objects, that have been published by the author [author name] have no evaluation method?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The answer you generate should list the research objects that have no evaluation method",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
                information_value_restriction="False",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Object",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What research objects, that have been published by the author [author name] have their input data set to none?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The answer you generate should list the research objects that have no evaluation method",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="None"
            ),
            AdditionalInformationRestriction(
                information_predicate="Object",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator( #RERUN
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the sub-property that appears the most for the research object [research object name] published by the author [author name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Sub-Property"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What evaluation method has been used the most in the paper class [paper class name] with the research object [research object name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should count the number of evaluation methods",
                "The answer should be a list of evaluation methods in descending alphabetical order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Object",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=20,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which is the most used evaluation method that is used with the research object [research object name] in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
                "The Evaluation methods are formatted as '('Evaluation Method Entity', Name, [Evaluation Method]) in the contexts given to you",
                "The answer you generate should be a list of the evaluation methods"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=20,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of the research objects that are used in conjunction with the evaluation method [evaluation method name] between 2019 and 2021?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2019", "2021"],
            ),
            AdditionalInformationRestriction(
                information_predicate="Object"
            ),
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of the evaluation methods that are used in conjunction with the sub-property [sub property name] between 2017 and 2019?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2017", "2018", "2019"],
            ),
            AdditionalInformationRestriction(
                information_predicate="Evaluation method"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

## Use Case 6

### Basic

In [None]:
# Here we want that the answer only contains one specific paper which is a special case in our data
# as it is the only one that is from the type Philosophical Papers. Therefore we use the 
# FromTopicEntityGenerator to generate the question
qa_pairs = []
qa_strategy = FromTopicEntityGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    from_topic_entity_options=FromTopicEntityGeneratorOptions(
        topic_entity=graph.get_entity_by_id("R873077") # The id of the paper
    ),     
    options=GenerationOptions(
        template_text="Which paper that includes the research object [research object name] has the paper class [paper class name]?",
        additional_requirements=[
            "The context should only include the triple that contains the paper class and research object of the paper",
            "You can assume that it is the only paper with that specific conditions"
        ],
        validate_contexts=False,
        convert_path_to_text=False,
        classify_questions=False,
    )     
)
qa_pairs.extend(qa_strategy.generate())

print_qa_pairs(qa_pairs)

In [None]:

qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which paper that includes the evaluation method [evaluation method name] is authored by [author name]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=2,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Aggregation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator( #RERUN
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers used the evaluation method [evaluation method name] in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=4,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers used [evaluation method] as a evaluation method in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers used [research object] as a research object in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Counting

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers used [evaluation method] as a evaluation method in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers used [evaluation method] as a evaluation method in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers used [research object] as a research object in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers used [evaluation method] as a evaluation method in the year [year]?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

### Ranking

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications, ranked in descending order of their publication year, have [paper class] as their paper class and include the evaluation method [evaluation method name]?",
            additional_requirements=[
                "The answer should be list thhe publication titles in chronological order",
                "Ensure that the list is ordered correctly based on the publication year in descending order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="paper class",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False,
        use_predicate_as_value=True,
        restriction_value="True"
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which publications, ranked in descending order of their publication year, have [author name] as an author have and have evaluation method [evaluation_method_name]?",
            additional_requirements=[
                "The context should only include the triples that contain the evaluation methods of the paper",
                "The answer should be a list of publication titles in chronological order",
                "Ensure that the list is ordered correctly based on the publication year in descending order"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="Evaluation method",
                split_clusters=True
            ),            
            AdditionalInformationRestriction(
                information_predicate="publication year",
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P154073",
        restriction_text="authors",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Negation

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers with the research object [research object name] do not have input data available in the year [year]?",
            additional_requirements=[
                "You must ensure that all the triples given to you are included in the context"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="None"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which papers with the evaluation method [evaluation method] do not have tool support available in the year [year]?",
            additional_requirements=[
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="Uses Input Data",
                information_value_restriction="True",
                information_value_predicate_restriction="None"
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True
    )
).generate()

print_qa_pairs(qa_pairs)

### Comparative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers have the research object [research object name] in the year 2020 in comparison to 2021?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2020", "2021"],
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="How many papers have the sub-property [sub property name] in the year 2017 in comparison to 2020?",
            additional_requirements=[
                "Include all triples that are provided to you as contexts!"
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2017", "2020"],
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P162024",
        restriction_text="Sub-Property",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        enable_caching=True,
        golden_triple_minimum=6
    )
).generate()

print_qa_pairs(qa_pairs)

### Superlative

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which paper class has the most papers with the evaluation method [method name] in the publication year [year]?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="paper class",
                information_value_restriction="True"               
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(#RERUN
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="Which paper class has the most papers with the research object [object name] in the publication year [year]?",
            additional_requirements=[
                "You need to include all context ids in your output. For example, if there are 4 contexts your output has to be [0, 1, 2, 3]",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                split_clusters=True
            ),
            AdditionalInformationRestriction(
                information_predicate="paper class",
                information_value_restriction="True",            
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

### Relationship

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of papers with the evaluation method [evaluation method name] between 2019 and 2021?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2019", "2020", "2021"],
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P59089",
        restriction_text="Evaluation method",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)

In [None]:
qa_pairs = ClusterBasedQuestionGenerator(
    graph=graph,
    llm_adapter=gpt_4o_mini,
    generator_options=ClusterGeneratorOptions(
        generation_options=GenerationOptions(
            template_text="What is the proportion of papers with the research object [research object name] between 2018 and 2020?",
            additional_requirements=[
                "The context should include all triples given to you",
            ],
            convert_path_to_text=False,
            validate_contexts=False,
            classify_questions=False,
        ),
        additional_restrictions=[
            AdditionalInformationRestriction(
                information_predicate="publication year",
                information_value_restriction=["2018", "2019", "2020"],
            )
        ]
    ),
    cluster_options=ClusterStrategyOptions(
        topic_entity=research_field,
        restriction_type="P47032",
        restriction_text="Object",
        cluster_eps=0.1,
        cluster_metric="cosine",
        cluster_emb_config=embedding_config,
        soft_limit_qa_pairs=10,
        golden_triple_limit=10,
        golden_triple_minimum=6,
        enable_caching=True,
        skip_clusters_with_only_one_root=False
    )
).generate()

print_qa_pairs(qa_pairs)