# CMEMQueryBuilder

The CMEM query builder generates a SPARQL query based on a given ontology and a natural language question.

In [9]:
%pip install cmem-cmempy llama-index python-dotenv

Note: you may need to restart the kernel to use updated packages.


Load environment from `.env` file. 
Start by `cp .env-template .env` and edit the content of `.env` accordingly.

In [1]:
%load_ext dotenv
%dotenv
%reload_ext dotenv

Set up the LLM

In [2]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

model="gpt-4o-mini"
llm = OpenAI(model=model)
Settings.llm = llm

Initialize the query builder with an ontology graph and a context (integration) graph.

In [3]:
from llama_index_cmem.utils.cmem_query_builder import CMEMQueryBuilder

ontology_graph = "http://ld.company.org/prod-vocab/"
context_graph = "http://ld.company.org/prod-inst/"

cmem_query_builder = CMEMQueryBuilder(ontology_graph=ontology_graph, context_graph=context_graph, llm=llm)

## CMEMQuery

Now define your natural language question and let the query builder generate a SPARQL query.
The query builder return a CMEMQuery object which holds llm predictions and sparql extracts. This allows to refine the SPARQL query as often as you need.

In [4]:
from IPython.display import display, Markdown

def generate(prompt):
    display(Markdown(f"## Prompt: _{prompt}_"))
    cmem_query = cmem_query_builder.generate_sparql(question=prompt)
    display(Markdown(f"### Prediction\n\n{str(cmem_query.get_last_prediction())}\n\n"))
    display(Markdown(f"### SPARQL\n\n```sparql{str(cmem_query.get_last_sparql())}```\n\n"))

question = "List all services. Limit the results to 20 items."
generate(question)

## Prompt: _List all services. Limit the results to 20 items._

### Prediction

To answer the user question "List all services. Limit the results to 20 items," we can construct a SPARQL query that retrieves instances of the `pv:Service` class from the RDF graph defined by the provided ontology. 

Here’s the SPARQL query:

```sparql
PREFIX pv: <http://ld.company.org/prod-vocab/>

SELECT ?service ?name
WHERE {
  ?service a pv:Service .
  ?service pv:name ?name .
}
LIMIT 20
```

### Explanation of the Query:

1. **PREFIX Declaration**: 
   - We declare the prefix `pv:` to refer to the vocabulary defined in the ontology (`http://ld.company.org/prod-vocab/`). This allows us to use shorter URIs in our query.

2. **SELECT Clause**: 
   - We specify that we want to select two variables: `?service` (the service instance) and `?name` (the name of the service).

3. **WHERE Clause**: 
   - We define the conditions for our query:
     - `?service a pv:Service .`: This line filters the results to include only those resources that are of type `pv:Service`.
     - `?service pv:name ?name .`: This line retrieves the name of each service, linking the service instance to its name property.

4. **LIMIT Clause**: 
   - We limit the results to 20 items to comply with the user's request for a manageable number of results.

This query will return a list of up to 20 services along with their names from the specified RDF graph.



### SPARQL

```sparqlPREFIX pv: <http://ld.company.org/prod-vocab/>

SELECT ?service ?name
WHERE {
  ?service a pv:Service .
  ?service pv:name ?name .
}
LIMIT 20```

