# CMEMQueryBuilder

The CMEM query builder generates a SPARQL query based on a given ontology and a natural language question.

In [1]:
%pip install cmem-cmempy llama-index python-dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


Load environment from `.env` file. 
Start by `cp .env-template .env` and edit the content of `.env` accordingly.

In [2]:
%load_ext dotenv
%dotenv
%reload_ext dotenv

Set up the LLM

In [3]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

model = "gpt-4o-mini"
llm = OpenAI(model=model)
Settings.llm = llm

Initialize the query builder with an ontology graph and a context (integration) graph.

In [4]:
from llama_index_cmem.utils.cmem_query_builder import CMEMQueryBuilder

ontology_graph = "http://ld.company.org/prod-vocab/"
context_graph = "http://ld.company.org/prod-inst/"

cmem_query_builder = CMEMQueryBuilder(
    ontology_graph=ontology_graph, context_graph=context_graph, llm=llm
)

## CMEMQuery

Now define your natural language question and let the query builder generate a SPARQL query.
The query builder return a CMEMQuery object which holds llm predictions and sparql extracts. This allows to refine the SPARQL query as often as you need.

In [5]:
from IPython.display import Markdown

from llama_index_cmem.utils.cmem_query import CMEMQuery


def generate(prompt: str) -> CMEMQuery:
    """Generate a CMEM query."""
    display(Markdown(f"## Prompt: _{prompt}_"))
    cmem_query = cmem_query_builder.generate_sparql(question=prompt)
    display(Markdown(f"### Prediction\n\n{cmem_query.get_last_prediction()!s}\n\n"))
    display(Markdown(f"### SPARQL\n\n```sparql\n{cmem_query.get_last_sparql()!s}```\n\n"))
    return cmem_query


question = "What is the product category where the most product managers are experts in?"
generated_cmem_query = generate(question)

## Prompt: _What is the product category where the most product managers are experts in?_

### Prediction

To answer the user question "What is the product category where the most product managers are experts in?" using the provided RDF ontology, we need to construct a SPARQL query that counts the number of product managers associated with each product category and then retrieves the category with the highest count.

### SPARQL Query

```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
```

### Explanation of the Query

1. **PREFIX Declarations**: We define the prefixes for the ontology we are using. `pv:` is the prefix for the product vocabulary.

2. **SELECT Clause**: We want to select the product category (`?category`) and the count of product managers (`COUNT(?manager) AS ?managerCount`).

3. **WHERE Clause**: 
   - We specify that `?manager` must be of type `pv:Manager`, which identifies them as product managers.
   - We then link each manager to their area of expertise using the property `pv:areaOfExpertise`, which connects the manager to a product category (`?category`).

4. **GROUP BY Clause**: We group the results by `?category` to aggregate the counts of managers for each category.

5. **ORDER BY Clause**: We order the results in descending order based on the count of managers (`?managerCount`), so that the category with the most managers appears first.

6. **LIMIT Clause**: We limit the results to just one entry, which will be the product category with the highest number of product managers.

### Result
This query will return the product category that has the most product managers as experts, along with the count of those managers.



### SPARQL

```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1```



### Refine query

Sometimes the generated SPARQL don't work as expected. If so the query builder can be used to refine the generated SPARQL.

In [6]:
def refine(prompt: str) -> None:
    """Refine the sparql query."""
    display(Markdown(f"## Prompt: _{prompt}_"))
    cmem_query_refined = cmem_query_builder.refine_sparql(
        question=question, cmem_query=generated_cmem_query
    )
    display(Markdown("### Prediction list\n\n"))
    for prediction in cmem_query_refined.get_prediction_list():
        display(Markdown(f"{prediction!s}\n-----\n"))
    display(Markdown("### SPARQL list\n\n"))
    for sparql in cmem_query_refined.get_sparql_list():
        display(Markdown(f"```sparql\n{sparql!s}\n-----\n"))


refine(question)

## Prompt: _What is the product category where the most product managers are experts in?_

### Prediction list



To answer the user question "What is the product category where the most product managers are experts in?" using the provided RDF ontology, we need to construct a SPARQL query that counts the number of product managers associated with each product category and then retrieves the category with the highest count.

### SPARQL Query

```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
```

### Explanation of the Query

1. **PREFIX Declarations**: We define the prefixes for the ontology we are using. `pv:` is the prefix for the product vocabulary.

2. **SELECT Clause**: We want to select the product category (`?category`) and the count of product managers (`COUNT(?manager) AS ?managerCount`).

3. **WHERE Clause**: 
   - We specify that `?manager` must be of type `pv:Manager`, which identifies them as product managers.
   - We then link each manager to their area of expertise using the property `pv:areaOfExpertise`, which connects the manager to a product category (`?category`).

4. **GROUP BY Clause**: We group the results by `?category` to aggregate the counts of managers for each category.

5. **ORDER BY Clause**: We order the results in descending order based on the count of managers (`?managerCount`), so that the category with the most managers appears first.

6. **LIMIT Clause**: We limit the results to just one entry, which will be the product category with the highest number of product managers.

### Result
This query will return the product category that has the most product managers as experts, along with the count of those managers.
-----


To refine the SPARQL query based on the user question "What is the product category where the most product managers are experts in?", we need to ensure that the query correctly counts the number of product managers associated with each product category and returns the category with the highest count. 

### Explanation of the Refinement:
1. **Focus on Product Managers**: The query should specifically target instances of `pv:Manager` and their associated expertise in `pv:ProductCategory`.
2. **Count Managers per Category**: We will count the number of managers for each category using `COUNT(?manager)`.
3. **Ordering and Limiting Results**: The results should be ordered by the count of managers in descending order, and we will limit the results to return only the top category.

### Refined SPARQL Query:
```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
```

### Explanation of the Query:
- **PREFIX Definitions**: We define the necessary prefixes for the vocabulary used in the query.
- **SELECT Clause**: We select the `?category` and the count of `?manager` as `?managerCount`.
- **WHERE Clause**: 
  - We specify that `?manager` must be of type `pv:Manager`.
  - We link each manager to their area of expertise using the property `pv:areaOfExpertise`, which points to the `?category`.
- **GROUP BY Clause**: This groups the results by `?category`, allowing us to count the number of managers per category.
- **ORDER BY Clause**: We order the results by the count of managers in descending order to get the category with the most managers at the top.
- **LIMIT Clause**: We limit the results to just one, which will be the category with the highest count of product managers.

This refined query should effectively answer the user's question by providing the product category with the most product managers as experts.
-----


### SPARQL list



```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
-----


```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
-----
