# CMEMQueryBuilder

The CMEM query builder generates a SPARQL query based on a given ontology and a natural language question.

In [None]:
%pip install cmem-cmempy llama-index python-dotenv

Load environment from `.env` file. 
Start by `cp .env-template .env` and edit the content of `.env` accordingly.

In [1]:
%load_ext dotenv
%dotenv

Set up the LLM

In [2]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

model="gpt-4o-mini"
llm = OpenAI(model=model)
Settings.llm = llm

Initialize the query builder with an ontology graph and a context (integration) graph.

In [3]:
from llama_index_cmem.utils.cmem_query_builder import CMEMQueryBuilder

ontology_graph = "http://ld.company.org/prod-vocab/"
context_graph = "http://ld.company.org/prod-inst/"

cmem_query_builder = CMEMQueryBuilder(ontology_graph=ontology_graph, context_graph=context_graph, llm=llm)

Now define your natural language question and let the query builder generate a SPARQL query.

In [4]:
question = "What is the product category where the most product managers are experts in?"

cmem_query = cmem_query_builder.generate_sparql(question=question)

## CMEMQuery

The query builder return a CMEMQuery object which holds llm predictions and sparql extracts. This allows to refine the SPARQL query as often as you need.

In [5]:
print("Question:\n")
print(question)
print("\n")

print("Prediction:\n")
print(cmem_query.get_last_prediction())
print("\n")

print("SPARQL:\n")
print(cmem_query.get_last_sparql())
print("\n")

Question:

What is the product category where the most product managers are experts in?


Prediction:

To answer the user question "What is the product category where the most product managers are experts in?" using the provided RDF ontology, we need to construct a SPARQL query that counts the number of product managers associated with each product category and identifies the category with the highest count.

### SPARQL Query

```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
  ?manager a pv:Manager .
  ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
```

### Explanation of the Query

1. **PREFIX Definitions**: 
   - We define the prefixes for the ontology to simplify the URIs used in the query. `pv:` is used for the product vocabulary.

2. **SELECT Clause**: 
   - We select two variables: `?category`, which represents the

### Refine query

Sometimes the generated SPARQL don't work as expected. If so the query builder can be used to refine the generated SPARQL.

In [6]:
cmem_query = cmem_query_builder.refine_sparql(question=question, cmem_query=cmem_query)

print("Question:\n")
print(question)
print("\n")

print(len(cmem_query.get_prediction_list()))
print(len(cmem_query.get_sparql_list()))

print("Prediction list:\n")
for prediction in cmem_query.get_prediction_list():
    print(prediction)
    print("\n")
print("\n")

print("SPARQL list:\n")
for sparql in cmem_query.get_sparql_list():
    print(sparql)
    print("\n")
print("\n")

Question:

What is the product category where the most product managers are experts in?


2
2
Prediction list:

To answer the user question "What is the product category where the most product managers are experts in?" using the provided RDF ontology, we need to construct a SPARQL query that counts the number of product managers associated with each product category and identifies the category with the highest count.

### SPARQL Query

```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?category (COUNT(?manager) AS ?managerCount)
WHERE {
  ?manager a pv:Manager .
  ?manager pv:areaOfExpertise ?category .
}
GROUP BY ?category
ORDER BY DESC(?managerCount)
LIMIT 1
```

### Explanation of the Query

1. **PREFIX Definitions**: 
   - We define the prefixes for the ontology to simplify the URIs used in the query. `pv:` is used for the product vocabulary.

2. **SELECT Clause**: 
   - We select two variables: `?category`, which repre