Grundlage:
* https://js.langchain.com/v0.1/docs/modules/chains/popular/sqlite/
* https://js.langchain.com/v0.1/docs/integrations/toolkits/sql/

* https://python.langchain.com/docs/tutorials/sql_qa/

# Setup

In [1]:
import sqlite3
import langchain
from langchain_community.utilities import SQLDatabase

In [2]:
db_path = r"./POC-LangChain/chinook-database-master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite"
db = SQLDatabase.from_uri(f"sqlite:///{db_path}")

In [3]:
print(db.dialect)
print(db.get_usable_table_names())
db.run("SELECT * FROM Artist LIMIT 10;")

sqlite
['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']


"[(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains'), (6, 'Antônio Carlos Jobim'), (7, 'Apocalyptica'), (8, 'Audioslave'), (9, 'BackBeat'), (10, 'Billy Cobham')]"

# Chains
Sequence of steps that does the following:
* converts the question into a SQL query;
* executes the query;
* uses the result to answer the original question.


## Application state

In [4]:
from typing_extensions import TypedDict

class State(TypedDict):
    question: str
    query: str
    result: str
    answer: str

In [5]:
from langchain import hub

query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt")

assert len(query_prompt_template.messages) == 1
query_prompt_template.messages[0].pretty_print()




Given an input question, create a syntactically correct [33;1m[1;3m{dialect}[0m query to run to help find the answer. Unless the user specifies in his question a specific number of examples they wish to obtain, always limit your query to at most [33;1m[1;3m{top_k}[0m results. You can order the results by a relevant column to return the most interesting examples in the database.

Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.

Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Only use the following tables:
[33;1m[1;3m{table_info}[0m

Question: [33;1m[1;3m{input}[0m


In [6]:
!ollama list

NAME                     	ID          	SIZE  	MODIFIED     
qwen2.5-coder:7b         	2b0496514337	4.7 GB	8 hours ago 	
deepseek-r1:8b           	28f8fd6cdc67	4.9 GB	2 days ago  	
llama3.2:1b-instruct-q4_0	53f2745c8077	770 MB	3 months ago	
llama3.2:1b              	baf6a787fdff	1.3 GB	3 months ago	
llama3.1:8b              	42182419e950	4.7 GB	4 months ago	
mistral:instruct         	f974a74358d6	4.1 GB	4 months ago	


In [28]:
from typing_extensions import Annotated


class QueryOutput(TypedDict):
    """Generated SQL query."""

    query: Annotated[str, ..., "Syntactically valid SQL query."]


def write_query(state: State):
    """Generate SQL query to fetch information."""
    prompt = query_prompt_template.invoke(
        {
            "dialect": db.dialect,
            "top_k": 10,
            "table_info": db.get_table_info(),
            "input": state["question"],
        }
    )
    structured_llm = llm.with_structured_output(QueryOutput)
    result = structured_llm.invoke(prompt)
    return {"query": result["query"]}

In [29]:
write_query({"question": "How many Employees are there?"})

TypeError: 'NoneType' object is not subscriptable

In [33]:
from langchain.llms import HuggingFaceLLM
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from typing_extensions import TypedDict, Annotated

# Initialize the open-source model
llm = HuggingFaceLLM(model_name="llama3.1:8b")  # Replace with your model

# Define the output structure
class QueryOutput(TypedDict):
    """Generated SQL query."""
    query: Annotated[str, ..., "Syntactically valid SQL query."]

def write_query(state: State):
    """Generate SQL query to fetch information."""
    prompt = ChatPromptTemplate.from_template(
        "Given the following database information:\n"
        "Dialect: {dialect}\n"
        "Table Info:\n{table_info}\n\n"
        "Write a SQL query to answer the following question: {input}\n"
        "Return ONLY the SQL query without any additional explanation."
    )

    output_parser = JsonOutputParser(pydantic_model=QueryOutput)

    chain = (
        RunnablePassthrough.assign(
            dialect=lambda x: db.dialect,
            table_info=lambda x: db.get_table_info()
        )
        | prompt 
        | llm 
        | output_parser
    )

    try:
        result = chain.invoke({"input": state["question"]})
        return {"query": result["query"]}
    except Exception as e:
        print(f"Error generating query: {e}")
        return {"query": ""}

# Test the function
print(write_query({"question": "How many Employees are there?"}))


ImportError: cannot import name 'HuggingFaceLLM' from 'langchain.llms' (c:\Users\flemm\AppData\Local\pypoetry\Cache\virtualenvs\langchain-lernen-GE3QnVly-py3.11\Lib\site-packages\langchain\llms\__init__.py)