# Metadata extraction to SQL database

In our previous example, we have seen how to extract metadata from a lab notebook and store it in a data model. In this example, we will see how to stream data from an LLM directly into a SQL database.

For the sake of demonstration, this example will not make use of existing datasets, but rather let our LLM provide the data. Specifically, we will ask the LLM to provide the relevant molecules for several biochemical pathways.

However, the approach can easily be looped to existing datasets as well! Lets see how this works in practice.

In [1]:
from typing import Iterable

import rich
from instructor import OpenAISchema

from mdmodels import DataModel
from mdmodels.llm import query_openai
from mdmodels.sql import generate_sqlmodel, DatabaseConnector

In [3]:
# 1) Load the data model from the markdown file
dm = DataModel.from_markdown("model.md")

# 2) Generate the SQLModel classes
models = generate_sqlmodel(data_model=dm, base_classes=[OpenAISchema])

# 3) Create the database and tables
db = DatabaseConnector(database="")
db.create_tables(models)

In [6]:
# We need to open a session to interact with the database
with db as session:
    # Instruct the LLM to extract the molecules
    response = query_openai(
        response_model=Iterable[models.Molecule], # If we expect multiple molecules, we need to use Iterable
        query="Give me all molecules in the glycolysis, pentose phosphate and citric acid pathway.",
        pre_prompt="You are proficient in chemistry and biochemistry.",
        use_scaffold=False, # No need to provide thoughts here
    )

    session.add_all(response)

    # Commit the changes to the database
    session.commit()

    # Reset the session to clear the cache
    session.reset()

    # Verify that the data was correctly inserted
    results = session.exec(
        select(models.Molecule)
    ).all()

    rich.print(results)

## Conclusion

In this example, we have seen how to stream data from an LLM directly into a SQL database. While this example is simple, the concept can be easily extended to more complex datasets and analysis pipelines. You can make use of all the tools you can learn from our [Database Example](./examples/sql/basic/SQLDatabaseExample.ipynb)!

Furthermore, the approach can easily be looped to existing datasets as well!