CAG (Concept-Augmented Generation)
--------------------------

-   CAG enhances text generation by integrating concepts (structured knowledge) rather than just raw document retrieval. Instead of fetching exact passages like RAG, CAG pulls conceptual relationships, ontologies, or structured knowledge graphs to enrich responses.

🛠 How CAG Works
----------------
Concept Retrieval → Fetches relevant concepts from structured sources (e.g., Knowledge Graphs like DBpedia, Wikidata).

Context Expansion → Maps related concepts and relationships.

Augmented Generation → Uses the retrieved concepts to generate responses with deeper reasoning.

Example:
---------
RAG: Finds a Wikipedia page about Tesla Motors.

CAG: Retrieves the concepts of electric vehicles, lithium-ion batteries, sustainable energy, and Elon Musk’s involvement, forming a richer response.

🔥 Why CAG?
------------
Ideal for domain-specific knowledge (e.g., legal, medical, financial AI).

Helps with explainability and fact-based generation.

More structured than RAG, reducing hallucinations.

<div align="center">
  <img src="https://adasci.org/wp-content/uploads/2024/06/retrieval_LM-scaled.jpg" alt="Http cat">
</div>

REQUIRED LIBRARIES
--------------------------

-   sparqlwrapper : Provide Knowledge base to generator llm model
-   Ollama        : Generator model which generate described output

PACKAGE INSTALLATION
--------------------------

-   pip install sparqlwrapper  
-   pip install langchain_ollama

In [12]:
from SPARQLWrapper import SPARQLWrapper

In [7]:
from SPARQLWrapper import SPARQLWrapper, JSON

def get_concepts(entity: str):
    """Fetch related concepts from Wikidata."""
    sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
    
    query = f"""
    SELECT ?conceptLabel WHERE {{
      ?entity rdfs:label "{entity}"@en.
      ?entity wdt:P279|wdt:P31 ?concept.
      SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en". }}
    }}
    """
    
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    
    concepts = [result["conceptLabel"]["value"] for result in results["results"]["bindings"]]
    return concepts  # Return top 5 concepts


In [8]:
print(get_concepts("Elon Musk"))


['human', 'literary work', 'Wikimedia disambiguation page', 'single']


In [10]:
from langchain_ollama.llms import OllamaLLM

def generate_response(query: str):
    """Use CAG to generate an answer based on concepts."""
    concepts = get_concepts(query)
    concept_text = ", ".join(concepts)
    
    prompt = f"Using the following concepts: {concept_text}, generate a detailed answer to: {query}"
    
    llm = OllamaLLM(model="llama3.2")
    response = llm.invoke(prompt)
    
    return response


In [11]:
print(generate_response("What is Tesla?"))


Tesla, Inc. is an American multinational corporation that specializes in the production of electric vehicles (EVs), clean energy solutions, and sustainable energy storage products.

**History**

Tesla was founded on July 1, 2003, by Martin Eberhard and Marc Tarpenning in Palo Alto, California. The company's first product was the Tesla Roadster, an all-electric sports car that was designed to showcase the potential of electric vehicles. In 2004, Elon Musk, who would later become the CEO of SpaceX and other companies, invested $6.3 million in Tesla and became the company's chairman of the board.

**Products**

Tesla is known for its line of luxury electric vehicles, including:

1. Model S: a full-size luxury sedan
2. Model X: a full-size luxury SUV
3. Model 3: a compact luxury sedan
4. Model Y: a compact luxury SUV
5. Cybertruck: an electric pickup truck

Tesla also offers energy storage products, such as the Powerwall and Powerpack, which provide residential and commercial customers wit

## 🌍 Major SPARQL Endpoints & Their Contexts

| **SPARQL Endpoint** | **Data Source** | **What It Contains?** | **Endpoint URL** |
|---------------------|----------------|----------------------|-----------------|
| **Wikidata**       | Wikipedia, structured knowledge | Entities, relationships, concepts | [query.wikidata.org](https://query.wikidata.org/sparql) |
| **DBpedia**        | Wikipedia (structured version) | Structured infobox data from Wikipedia | [dbpedia.org/sparql](https://dbpedia.org/sparql) |
| **YAGO**           | Wikipedia, WordNet, GeoNames | Entity relationships, taxonomies | [yago-knowledge.org/sparql](https://yago-knowledge.org/sparql) |
| **WordNet RDF**    | WordNet | Lexical concepts, synonyms, meanings | [wordnet-rdf.princeton.edu](https://wordnet-rdf.princeton.edu/sparql) |
| **Europeana**      | European libraries, museums | Cultural heritage data | [sparql.europeana.eu](https://sparql.europeana.eu/) |
| **LinkedGeoData**  | OpenStreetMap | Geospatial data, locations | [linkedgeodata.org/sparql](http://linkedgeodata.org/sparql) |
| **BioPortal SPARQL** | Medical ontologies | Biomedical & clinical concepts | [sparql.bioontology.org](https://sparql.bioontology.org/) |
| **UNIPROT SPARQL** | Protein database | Protein sequences, gene data | [sparql.uniprot.org](https://sparql.uniprot.org/sparql) |
| **NASA SPARQL**    | Space data | Space missions, celestial bodies | [data.nasa.gov](https://data.nasa.gov/) |


🌍 Dynamically Switching SPARQL Endpoints
----------------------------------------
-   We can create a function that allows us to dynamically switch between different SPARQL endpoints based on the type of data we need.

In [18]:
from SPARQLWrapper import SPARQLWrapper, JSON

# List of available SPARQL endpoints
SPARQL_ENDPOINTS = {
    "wikidata": "https://query.wikidata.org/sparql",
    "dbpedia": "https://dbpedia.org/sparql",
    "yago": "https://yago-knowledge.org/sparql",
    "bioportal": "https://sparql.bioontology.org/",
    "linkedGeoData": "http://linkedgeodata.org/sparql"
}

def get_concepts(entity: str, source: str = "wikidata"):
    """Fetch related concepts from a dynamically selected SPARQL endpoint."""
    
    if source not in SPARQL_ENDPOINTS:
        raise ValueError(f"❌ Error: Unknown SPARQL source '{source}'. Choose from {list(SPARQL_ENDPOINTS.keys())}")

    sparql = SPARQLWrapper(SPARQL_ENDPOINTS[source])

    # Different query formats for different sources
    if source == "wikidata":
        query = f"""
        SELECT ?conceptLabel WHERE {{
          ?entity rdfs:label "{entity}"@en.
          ?entity wdt:P279|wdt:P31 ?concept.
          SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en". }}
        }}
        LIMIT 5
        """
    
    elif source == "dbpedia":
        query = f"""
        SELECT ?concept WHERE {{
          dbr:{entity} dbo:wikiPageWikiLink ?concept .
        }}
        LIMIT 5
        """
    

    else:
        return f"❌ No predefined query for {source} yet."

    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    concepts = [result["conceptLabel"]["value"] if "conceptLabel" in result else result["concept"]["value"]
                for result in results["results"]["bindings"]]

    return concepts


In [19]:
# Get concepts from Wikidata
print("🔹 Wikidata Concepts:", get_concepts("Tesla", "wikidata"))

# Get concepts from DBpedia
print("🔹 DBpedia Concepts:", get_concepts("Tesla", "dbpedia"))




🔹 Wikidata Concepts: ['Wikimedia disambiguation page', 'microarchitecture', 'rock band', 'business', 'enterprise']
🔹 DBpedia Concepts: ['http://dbpedia.org/resource/Corral_Hollow', 'http://dbpedia.org/resource/Nikola_Tesla_(disambiguation)', 'http://dbpedia.org/resource/Tesla_coil', 'http://dbpedia.org/resource/2244_Tesla', 'http://dbpedia.org/resource/Nikola_Tesla']


🔗 Combining Multiple Datasets
--------------------------------
-   Instead of querying just one SPARQL endpoint, we can query multiple endpoints and merge the results.

In [20]:
def get_concepts_multi(entity: str, sources=["wikidata", "dbpedia"]):
    """Fetch related concepts from multiple SPARQL endpoints and merge results."""
    
    all_concepts = set()  # Use a set to avoid duplicates
    
    for source in sources:
        try:
            concepts = get_concepts(entity, source)
            if concepts:
                all_concepts.update(concepts)
        except Exception as e:
            print(f"⚠️ Error querying {source}: {str(e)}")

    return list(all_concepts)  # Convert set back to list


In [21]:
concepts = get_concepts_multi("Tesla", sources=["wikidata", "dbpedia"])
print("🔹 Combined Concepts:", concepts)


🔹 Combined Concepts: ['http://dbpedia.org/resource/Nikola_Tesla', 'http://dbpedia.org/resource/Corral_Hollow', 'rock band', 'business', 'http://dbpedia.org/resource/2244_Tesla', 'http://dbpedia.org/resource/Tesla_coil', 'enterprise', 'microarchitecture', 'Wikimedia disambiguation page', 'http://dbpedia.org/resource/Nikola_Tesla_(disambiguation)']


🚀 Why This is Powerful?
---------------------------
-   ✅ More accurate knowledge by pulling data from multiple knowledge graphs.
-   ✅ Reduces bias (some sources may have missing data).
-   ✅ Handles diverse knowledge areas (medical, geography, etc.).