# Quickstart: Conceptual Search

This notebook provides a hands-on walkthrough of the **Conceptual Search** feature in the `intugle` library. You'll learn how to use natural language to generate, refine, and build a unified data product from a semantic model.

## 1. Setup

First, let's install the `intugle` library.

In [None]:
%pip install intugle -q

### Environment Variables

Conceptual Search relies on Large Language Models (LLMs) for its AI capabilities. You'll need to configure an LLM provider and API key. For this example, we'll use OpenAI.

## 1. LLM Configuration

Before running the project, you need to configure a Large Language Model (LLM). This is used for tasks like generating business glossaries and predicting links between tables. You will also need to set up Qdrant and provide an OpenAI API key. For detailed setup instructions, please refer to the [README.md](README.md) file.

You can configure the necessary services by setting the following environment variables:

*   `LLM_PROVIDER`: The LLM provider and model to use (e.g., `openai:gpt-3.5-turbo`). The format follows langchain's format for initializing chat models. Checkout how to specify your model [here](https://python.langchain.com/docs/integrations/chat/)
*   `API_KEY`: Your API key for the LLM provider. The exact name of the variable may vary from provider to provider (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`).
*   `QDRANT_URL`: The URL of your Qdrant instance (e.g., `http://localhost:6333`).
*   `QDRANT_API_KEY`: Your API key for the Qdrant instance, if authorization is enabled.
*   `EMBEDDING_MODEL_NAME`: The embedding model to use. The format follows LangChain's conventions for initializing embedding models (e.g., `openai:ada`, `azure_openai:ada`).
*   `OPENAI_API_KEY`: Your OpenAI API key, required if you are using an OpenAI embedding model.
*   `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, `OPENAI_API_VERSION`: Your Azure OpenAI credentials, required if you are using an Azure OpenAI embedding model.

For the best results, we also recommend setting a **Tavily API key**, which allows the planning agent to perform web searches for better contextual understanding.

Here's an example of how to set these variables in your environment:

```bash
export LLM_PROVIDER="openai:gpt-3.5-turbo"
export OPENAI_API_KEY="your-openai-api-key"
```
Alternatively, you can set them in the notebook like this:



In [None]:
import os

os.environ["LLM_PROVIDER"] = "openai:gpt-3.5-turbo"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"  # Replace with your actual key

# Qdrant Configuration
os.environ["QDRANT_URL"] = "http://localhost:6333"
os.environ["QDRANT_API_KEY"] = ""  # if authorization is used
os.environ["EMBEDDING_MODEL_NAME"] = "openai:ada"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

# For Azure OpenAI models
os.environ["EMBEDDING_MODEL_NAME"] = "azure_openai:ada"
os.environ["AZURE_OPENAI_API_KEY"] = "your-azure-openai-api-key"
os.environ["AZURE_OPENAI_ENDPOINT"] = "your-azure-openai-endpoint"
os.environ["OPENAI_API_VERSION"] = "your-openai-api-version"

# TAVILY
os.environ["TAVILY_API_KEY"] = "YOUR_TAVILY_API_KEY" # Optional, but recommended

### Download Sample Data

We'll use the healthcare dataset for this demonstration.

In [None]:
import os
import requests

raw_datasets='healthcare'
api_url = f"https://api.github.com/repos/Intugle/data-tools/contents/sample_data/{raw_datasets}"
local_dir = f"sample_data/{raw_datasets}"
os.makedirs(local_dir, exist_ok=True)

r = requests.get(api_url)
r.raise_for_status()

for item in r.json():
    if item["name"].endswith(".csv"):
        print(f"Downloading {item['name']}...")
        file_data = requests.get(item["download_url"])
        with open(os.path.join(local_dir, item["name"]), "wb") as f:
            f.write(file_data.content)

print("All CSV files downloaded successfully.")

## 2. Build the Semantic Model

Conceptual Search operates on top of a semantic layer. Before we can use it, we need to build this layer using the `SemanticModel` class. This process involves profiling the data, predicting links between tables, and generating a business glossary.

In [None]:
def generate_config(table_name: str) -> str:
    """Append the base URL to the table name."""
    return {
        "path": f"./sample_data/healthcare/{table_name}.csv",
        "type": "csv",
    }


table_names = [
    "allergies",
    "careplans",
    "claims",
    "claims_transactions",
    "conditions",
    "devices",
    "encounters",
    "imaging_studies",
    "immunizations",
    "medications",
    "observations",
    "organizations",
    "patients",
    "payers",
    "payer_transitions",
    "procedures",
    "providers",
    "supplies",
]

datasets = {table: generate_config(table) for table in table_names}

## 3. Generate a Data Product Plan

Now that the semantic layer is built, we can use the `DataProduct` class to generate a plan from a natural language query. The `plan()` method kicks off the first stage of Conceptual Search.

In [None]:
from intugle import DataProduct

dp = DataProduct()

# Generate a plan from a natural language query
query = "patient 360 view"
data_product_plan = await dp.plan(query=query)
data_product_plan

## 4. Modify the Plan

The generated plan is a starting point. You can programmatically modify it to refine the final output. Let's rename an attribute and disable another one that we don't need.

In [None]:
# Let's assume the plan included 'Patient Last Name' and we want to rename it
data_product_plan.rename_attribute('Patient Last Name', 'Family Name')

# Let's also assume it included 'Patient Birth Date' which we don't need
data_product_plan.disable_attribute('Patient Birth Date')

print("--- Modified Plan ---")
data_product_plan

## 5. Build the Data Product

Once you're satisfied with the plan, you can proceed to the building stage. This will trigger the second AI agent to map the plan's attributes to physical columns and generate the final SQL query.

In [None]:
# Build the data product from the modified plan
data_product = await dp.build_from_plan(data_product_plan)

# Access the results as a pandas DataFrame
df = data_product.to_df()
print(df.head())

You can also inspect the final, generated SQL query.

In [None]:
print(data_product.sql_query)

## Conclusion

You have successfully used Conceptual Search to:
1.  Generate a data product plan from a natural language query.
2.  Review and modify the AI-generated plan.
3.  Build a unified data product without writing any SQL.