# Querying & Preparing Time-Series Data with MOSTLY AI SDK
Using AI Assistant to Query Connected Data Sources

Welcome to this tutorial!

🧠 What if you could enable **non-technical users to explore real data safely** using natural language queries?

--

💪 After this tutorial you'll be able to:

■ Select subsets of data before training a generator

■ Pre-aggregate or transform data before training

■ Connect your tables of choice to the Mostly AI Assistant

■ Empower non-technical users to explore real data safely


--

🧐 Which connectors work for this approach? MOSTLY AI supports running custom SQL queries on:

■ Cloud storage: AWS S3, Azure Blob, GCP buckets

■ Data warehouses: BigQuery, Snowflake, Databricks

■ Databases: MySQL, PostgreSQL, Oracle, MariaDB, Microsoft SQL Server, Apache Hive

--

Let's go!

## 🎯 Use Case Context
This dataset represents a **typical customer lifecycle**: a customer places one or more orders, each containing one or more items. It’s ideal for demonstrating **multi-sequence, time-series data generation** with MOSTLY AI.

Key columns:
- `customer_id` for identifying sequences
- `order_date` for time alignment

This setup simulates a customer behavior dataset that can be explored, queried, and synthesized.

In [None]:
# Install and import the MOSTLY AI SDK
!pip install mostlyai

from mostlyai.sdk import MostlyAI

In [None]:
# Authenticate with your API key (replace with actual key)
# -- In case you don't have an API key yet: https://app.mostly.ai/settings/api-keys
mostly = MostlyAI(
    api_key='your_api_key',
    base_url='https://app.mostly.ai'
)

In [None]:
# List all connectors in your workspace
connectors = mostly.connectors.list()

# Print their names and IDs
for c in connectors:
    print(f"Name: {c.name}, ID: {c.id}")

In [None]:
# Connect to your connector of choice. In this case we use a Google Cloud Storage bucket.

connector_id = '3ca7cb19-744c-4292-8117-a6c7fb53c4bc'
connector = mostly.connectors.get(connector_id)
connector

In [None]:
# List all available locations in your connector

locations = connector.locations(prefix=None)
locations

In [None]:
# List the contents of the 'mostlyai-challenge-bucket/' directory

locations_in_bucket = connector.locations(prefix='mostlyai-challenge-bucket/')
locations_in_bucket

In [None]:
# Now let's load the data. Starting with the first table:

df_customers = connector.read_data(location='mostlyai-challenge-bucket/customers.csv')
df_customers.head()

In [None]:
# The second table:
df_order_items = connector.read_data(location='mostlyai-challenge-bucket/order_items.csv')
df_order_items.head()


In [None]:
# The third table:
df_orders = connector.read_data(location='mostlyai-challenge-bucket/orders.csv')
df_orders.head()

In [None]:
# Join orders with customers on customer_id
df_orders_customers = df_orders.merge(df_customers, on='customer_id', how='left')

# Join the result with order_items on order_id
df_full = df_orders_customers.merge(df_order_items, on='order_id', how='left')

df_full.head()

In [None]:
# Let's make sure we teach the synthetic data generator with chronologically correct data
df_full = df_full.sort_values(by=['customer_id', 'order_date'])
df_full.head()

In [None]:
 # Let's get the file ready for export!
 df_full.to_csv('data.csv', index=False)

🥳 Congratulations! You completed this tutorial.

## ✅ Summary
- Connected to a data source
- Queried and joined user + transaction + event data
- Formatted it for time-series synthetic data generation

👉 Try this workflow on your own data or explore the [MOSTLY AI SDK Docs](https://mostly.ai/docs)

## 🤖 Assistant Prompt Example
After uploading the dataset via the UI and linking it to the Assistant, try a query like:

> *"Show me the average number of items per order over the past 6 months."*

This prompt will be translated into SQL automatically by the Assistant and help users explore the dataset safely.

# Next:

😱 Did you find that some of your tables were missing data? Try this next tutorial on Smart Imputation:
https://colab.research.google.com/github/mostly-ai/mostlyai/blob/main/docs/tutorials/smart-imputation/smart-imputation.ipynb

🧐 For more info on how to link your specific connector: https://mostly.ai/docs/connectors