# Web Shop Orders — Databao example

This notebook shows how to use Databao to answer questions about the web shop orders dataset.



In [None]:
# Imports and DB connection
import duckdb

# Set up the database connection
DB_PATH = "data/web_shop.duckdb"
conn = duckdb.connect(DB_PATH, read_only=True)
print(f"Connected to DuckDB database: {DB_PATH}")

In [None]:
# Databao pip installation
!pip install databao

In [None]:
import databao
from databao.configs.llm import LLMConfigDirectory

In [None]:
# Setup llm config for CLOUD model
# llm_config = LLMConfig(name="gpt-4.1-2025-04-14", temperature=0)

# Setup llm config for LOCAL model
# to run it with local model, install ollama https://ollama.com/download/

llm_config = LLMConfigDirectory.QWEN3_8B_OLLAMA
# llm_config = LLMConfig.from_yaml("configs/oss-20b-ollama.yaml")
# llm_config = LLMConfig(name="ollama:gpt-oss:20b", temperature=0)

In [None]:
# If you want to use cloud model, you need to put your Open AI API token in the env variable

%env OPENAI_API_KEY=

In [None]:
# Opening a databao session. A session manages all databases and Dataframes as well as the context for them.
session = databao.open_session(name="demo", llm_config=llm_config)

# Add your database to the session
session.add_db(conn)

# You can also add dataframes, if you want:

# df_test = pd.read_csv("source-data/webshop_customers.csv")
# import pandas as pd
# df_test = pd.DataFrame({"month": pd.date_range("2017-01-01", periods=6, freq="MS").strftime("%Y-%m-%d"),
#                         "flag": [False, False, False, False, True, True]})
# session.add_df(df_test, context='Helper DF with flags for each month')

# context is an optional keyword, in which you can provide additional information about the data to the model.
# It can be a string or a path to file.

In [None]:
# Starting a new thread. Thread is a single conversation, like a single chat with chatGPT.
#    - Maintains its own message history (isolated from other threads).
#    - Materializes data and visualizations lazily on demand and caches results per thread.

thread = session.thread()

### 1) Split payments and installments: impact on AOV
Task: How do split payments relate to average order value? Report orders count, average order value, and late-delivery rate by installments bucket (1, 2-6, >6) and by split vs single payment.


In [None]:
thread.ask(
    "How do split payments relate to average order value? Report orders count, average order value,"
    " and late-delivery rate by installments bucket (1, 2-6, >6) and by split vs single payment."
)

In [None]:
df_first_task = thread.df()
df_first_task

In [None]:
df_first_task

In [None]:
thread.plot()

In [None]:
# You can also see SQL code that was used to produce these results
print(thread.code())

### 2) Late delivery impact on review scores
Task: How does on-time vs late delivery affect review scores? Report the average and median review score and the number of reviewed orders for on‑time vs late deliveries (latest review per order by `review_answer_timestamp`).


In [None]:
thread.ask(
    "Task: How does on-time vs late delivery affect review scores?"
    " Report the average and median review score and the number of reviewed orders for on-time "
    "vs late deliveries (latest review per order by review_answer_timestamp)."
)

In [None]:
df_second_task = thread.df()
df_second_task

In [None]:
df_second_task

In [None]:
thread.plot()

### 3) Multi-item vs single-item orders: freight and cancellations
Task: How do single‑item vs multi‑item orders differ in total freight paid per order and cancellation rate? Report orders count, average total freight per order, and cancellation rate by item group (single vs multi).

In [None]:
# you can try to answer it with databao by yourself!


### 4) Customer cohort LTV over time (by first order month)
Task: For each customer cohort, what is the monthly revenue and cumulative LTV over time? Include cohort size and months since cohort start; show cumulative LTV per month per cohort.



### 5) Seller-to-customer lanes: time-in-transit by state pairs
Task: By seller_state → customer_state lanes with at least 20 delivered orders, what are the average and median days in transit? Also report the number of orders per lane and sort by average days in transit.


### 6) Monthly Order Volume and Average Order Value Trends
Task: How do order volumes and average order values change over time? Analyze monthly trends to identify any patterns or seasonality in order activity and spending. Plot the results.

In [None]:
# Close the database connection
conn.close()
print("Database connection closed successfully!")