# CMEMQueryBuilder

The CMEM query builder generates a SPARQL query based on a given ontology and a natural language question.

In [8]:
%pip install cmem-cmempy llama-index python-dotenv

Note: you may need to restart the kernel to use updated packages.


Load environment from `.env` file. 
Start by `cp .env-template .env` and edit the content of `.env` accordingly.

In [9]:
%load_ext dotenv
%dotenv

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


Set up the LLM

In [10]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

model="gpt-4o-mini"
llm = OpenAI(model=model)
Settings.llm = llm

Initialize the query builder with an ontology graph and a context (integration) graph.

In [11]:
from llama_index_cmem.utils.cmem_query_builder import CMEMQueryBuilder

ontology_graph = "http://ld.company.org/prod-vocab/"
context_graph = "http://ld.company.org/prod-inst/"

cmem_query_builder = CMEMQueryBuilder(ontology_graph=ontology_graph, context_graph=context_graph, llm=llm)

## CMEMQuery

Now define your natural language question and let the query builder generate a SPARQL query.
The query builder return a CMEMQuery object which holds llm predictions and sparql extracts. This allows to refine the SPARQL query as often as you need.

In [12]:
from IPython.display import display, Markdown
from llama_index_cmem.utils.cmem_query import CMEMQuery

def generate(prompt) -> CMEMQuery:
    display(Markdown(f"## Prompt: _{prompt}_"))
    cmem_query = cmem_query_builder.generate_sparql(question=prompt)
    display(Markdown(f"### Prediction\n\n{str(cmem_query.get_last_prediction())}\n\n"))
    display(Markdown(f"### SPARQL\n\n```sparql\n{str(cmem_query.get_last_sparql())}```\n\n"))
    return cmem_query

question = "What is the product category where the most product managers are experts in?"
cmem_query = generate(question)

## Prompt: _What is the product category where the most product managers are experts in?_

### Prediction

To answer the user question "What is the product category where the most product managers are experts in?", we need to construct a SPARQL query that counts the number of product managers associated with each product category and then retrieves the category with the highest count.

### SPARQL Query

```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
```

### Explanation of the Query

1. **PREFIX Definitions**: We define the necessary prefixes for our query. Here, `pv:` is used for the product vocabulary.

2. **SELECT Clause**: We select two variables:
   - `?category`: This will hold the product category.
   - `COUNT(?manager) AS ?managerCount`: This counts the number of product managers associated with each category.

3. **WHERE Clause**: This is where we specify the conditions for our query:
   - `?manager a pv:Manager`: This line filters the results to include only instances of `pv:Manager`, which represents product managers.
   - `?manager pv:areaOfExpertise ?category`: This line retrieves the product category that each manager is an expert in.

4. **GROUP BY Clause**: We group the results by `?category` to aggregate the counts of managers for each category.

5. **ORDER BY Clause**: We order the results in descending order based on the count of managers (`?managerCount`). This ensures that the category with the most managers appears first.

6. **LIMIT Clause**: We limit the results to 1, so we only get the category with the highest count of product managers.

### Conclusion

This query effectively retrieves the product category with the highest number of product managers who are experts in that category, thus answering the user's question.



### SPARQL

```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1```



### Refine query

Sometimes the generated SPARQL don't work as expected. If so the query builder can be used to refine the generated SPARQL.

In [13]:
def refine(prompt):
    display(Markdown(f"## Prompt: _{prompt}_"))
    cmem_query_refined = cmem_query_builder.refine_sparql(question=question, cmem_query=cmem_query)
    display(Markdown(f"### Prediction list\n\n"))
    for prediction in cmem_query_refined.get_prediction_list():
        display(Markdown(f"{str(prediction)}\n-----\n"))
    display(Markdown(f"### SPARQL list\n\n"))
    for sparql in cmem_query_refined.get_sparql_list():
        display(Markdown(f"```sparql\n{str(sparql)}\n-----\n"))

refine(question)

## Prompt: _What is the product category where the most product managers are experts in?_

### Prediction list



To answer the user question "What is the product category where the most product managers are experts in?", we need to construct a SPARQL query that counts the number of product managers associated with each product category and then retrieves the category with the highest count.

### SPARQL Query

```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
```

### Explanation of the Query

1. **PREFIX Definitions**: We define the necessary prefixes for our query. Here, `pv:` is used for the product vocabulary.

2. **SELECT Clause**: We select two variables:
   - `?category`: This will hold the product category.
   - `COUNT(?manager) AS ?managerCount`: This counts the number of product managers associated with each category.

3. **WHERE Clause**: This is where we specify the conditions for our query:
   - `?manager a pv:Manager`: This line filters the results to include only instances of `pv:Manager`, which represents product managers.
   - `?manager pv:areaOfExpertise ?category`: This line retrieves the product category that each manager is an expert in.

4. **GROUP BY Clause**: We group the results by `?category` to aggregate the counts of managers for each category.

5. **ORDER BY Clause**: We order the results in descending order based on the count of managers (`?managerCount`). This ensures that the category with the most managers appears first.

6. **LIMIT Clause**: We limit the results to 1, so we only get the category with the highest count of product managers.

### Conclusion

This query effectively retrieves the product category with the highest number of product managers who are experts in that category, thus answering the user's question.
-----


To refine the SPARQL query based on the user question "What is the product category where the most product managers are experts in?", we need to ensure that the query accurately counts the number of product managers associated with each product category and returns the category with the highest count.

### Explanation of the Refinement:
1. **Focus on Product Managers**: The query should specifically look for instances of `pv:Manager` that have an area of expertise in `pv:ProductCategory`.
2. **Count Managers**: We will count the number of distinct managers for each category.
3. **Ordering and Limiting**: The results should be ordered by the count of managers in descending order, and we will limit the results to return only the top category.

### Refined SPARQL Query:
```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(DISTINCT ?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
```

### Explanation of the Query:
- **PREFIX Definitions**: We define the necessary prefixes for the vocabulary used in the query.
- **SELECT Clause**: We select the product category (`?category`) and the count of distinct managers (`COUNT(DISTINCT ?manager) AS ?managerCount`).
- **WHERE Clause**: 
  - We specify that `?manager` must be of type `pv:Manager`.
  - We link each manager to their area of expertise, which is a product category (`?manager pv:areaOfExpertise ?category`).
- **GROUP BY Clause**: We group the results by the product category to aggregate the counts of managers.
- **ORDER BY Clause**: We order the results by the count of managers in descending order to get the category with the most managers at the top.
- **LIMIT Clause**: We limit the results to only one entry, which will be the category with the highest count of product managers.

This refined query should effectively answer the user's question while adhering to the specified RDF ontology.
-----


### SPARQL list



```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
-----


```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(DISTINCT ?manager) AS ?managerCount)
WHERE {
    ?manager a pv:Manager .
    ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
-----
