# Real-time Data for LLMs
This notebook demonstrates how to retrieve **real-time features** from TurboML and use them in **LLM prompts** for more **contextual** and **personalized** responses in real time.

### Why Feature Platforms for LLMs?
- **Real-time personalization**: Many LLM tasks benefit from fresh data (e.g., user’s recent transactions, recent chat sentiment) to personalize outputs.
- **Low-latency data access**: Feature platforms are built for **real-time** lookups, enabling you to insert up-to-date stats at inference time.
- **Consistency**: Feature platforms unify offline and online features, ensuring the same definitions are used for analytics/batch and real-time inference.

When combined with LLMs, feature platforms enable a new paradigm of AI applications that can
reason about current data while leveraging the powerful capabilities of language models.

Set up the environment and install TurboML's SDK.

In [None]:
!pip install turboml-installer 
import turboml_installer ; turboml_installer.install_on_colab()

The kernel should now be restarted with TurboML's SDK installed.

Login to your TurboML instance.

## 1. Install and Import Required Libraries

In [4]:
import pandas as pd
import turboml as tb

#Initialize the TurboML platform
tb.init(backend_url=BACKEND_URL, api_key=API_KEY)

## 2. Initialize Feature Platform & Load Data
We’ll initialize a connection to our **TurboML** feature platform, then upload a transactions dataset.


We'll use TurboML’s **push-based ingestion** approach by uploading the pandas DataFrame via **`tb.OnlineDataset`**.  

In [5]:
# Load and prepare transaction data
try:
    transactions_df = tb.datasets.FraudDetectionDatasetFeatures().df
    transactions = tb.OnlineDataset.from_pd(
        id="transactions_prompt",
        key_field="transactionID",
        df=transactions_df,
        load_if_exists=True,
    )
except:
    transactions = tb.OnlineDataset.from_pd(id="transactions_prompt")

INFO:turboml.common.dataloader:Uploading 201406 rows to dataset transactions_prompt
Progress: 100%|██████████| 201k/201k [00:05<00:00, 37.9krows/s]
INFO:turboml.common.dataloader:Upload complete. Waiting for server to process messages.


## 3. Feature Engineering
Feature stores are invaluable because they can create **aggregated** or **transformed** features on the fly, then materialize them for real-time serving.

Real-time features can be used to trigger or supplement an LLM-based pipeline (e.g., if user sentiment is dropping, take action, if transaction volume is unusually high, investigate fraud, etc.).

### 3.1 Register Timestamp
First, we register our timestamp column so we can perform time-based aggregations.

In [6]:
transactions.feature_engineering.register_timestamp(
    column_name="timestamp",
    format_type="epoch_seconds"
)

### 3.2 Create Multiple Aggregations
We’ll create the following **time-windowed** features:
1. `my_sum_feat_24h`: Sum of transaction amounts in the last 24 hours (per account).
2. `my_avg_feat_48h`: Average transaction amount in the last 48 hours (per account).
3. `my_count_feat_7d`: Number of transactions in the last 7 days (per account).
4. `my_max_feat_7d`: Maximum transaction in the last 7 days (per account).

After creating them, we’ll **materialize** them so they’re available in the online store for real-time reads.

In [7]:
# 1) Sum of transactionAmount over 24h
transactions.feature_engineering.create_aggregate_features(
    column_to_operate="transactionAmount",
    column_to_group="accountID",
    operation="SUM",
    new_feature_name="my_sum_feat_24h",
    timestamp_column="timestamp",
    window_duration=24,
    window_unit="hours"
)

# 2) Average transactionAmount over 48h
transactions.feature_engineering.create_aggregate_features(
    column_to_operate="transactionAmount",
    column_to_group="accountID",
    operation="AVG",
    new_feature_name="my_avg_feat_48h",
    timestamp_column="timestamp",
    window_duration=48,
    window_unit="hours"
)

# 3) Count transactions in the last 7 days
transactions.feature_engineering.create_aggregate_features(
    column_to_operate="transactionAmount",
    column_to_group="accountID",
    operation="COUNT",
    new_feature_name="my_count_feat_7d",
    timestamp_column="timestamp",
    window_duration=7,
    window_unit="days"
)

# 4) Max transaction in the last 7 days
transactions.feature_engineering.create_aggregate_features(
    column_to_operate="transactionAmount",
    column_to_group="accountID",
    operation="MAX",
    new_feature_name="my_max_feat_7d",
    timestamp_column="timestamp",
    window_duration=7,
    window_unit="days"
)

# Now materialize
features_to_materialize = [
    "my_sum_feat_24h", "my_avg_feat_48h", "my_count_feat_7d", "my_max_feat_7d"
]
transactions.feature_engineering.materialize_features(features_to_materialize)

## 4. Constructing Prompt Templates
We often build **prompt templates** that reference real-time or near real-time features. For instance, if we want an LLM to determine the *likelihood of fraud*, we might include the user’s `accountID`, `transactionAmount`, plus any aggregated features like `my_sum_feat_24h`.

Below we define a helper class `TurboMLPromptTemplate`. In its `get_prompts` method, it calls the TurboML SDK to fetch the relevant features for each row, then formats a prompt.


In [8]:
from typing import List
from turboml.common.feature_engineering import retrieve_features

class TurboMLPromptTemplate:
    def __init__(self, template: str, dataset_id: str):
        self.dataset_id = dataset_id
        self.template = template

    def get_prompts(self, df: pd.DataFrame) -> List[str]:
        # This calls TurboML's retrieve_features to get real-time features
        prompt_data_list = retrieve_features(self.dataset_id, df).to_dict('records')

        prompts = []
        for prompt_data in prompt_data_list:
            # We do a standard python .format(**dict) replacement
            prompt = self.template.format(**prompt_data)
            prompts.append(prompt)
        return prompts

### Example Prompt Construction
Now we define our prompt with placeholders for `accountID`, `transactionAmount`, and `timestamp`. We can also incorporate our newly created feature `my_sum_feat` if we want to pass that aggregated info to the LLM.

**Example**: `Give the likelihood of fraud for account {accountID} for the transaction of amount {transactionAmount} performed at time {timestamp}. The sum of this account’s transactions in the past 24hr is {my_sum_feat}.`

Feel free to adjust the prompt to suit your scenario (sentiment analysis, dynamic recommendations, etc.).

### 4.1 Example Prompt
Our first prompt references one feature (`my_sum_feat_24h`).

In [9]:
template_str = (
    "Give the likelihood of fraud for account {accountID} "
    "for the transaction of amount {transactionAmount} performed at time {timestamp}. "
    "The sum of this account’s transactions in the past 24h is {my_sum_feat_24h}."
)

fraud_prompt = TurboMLPromptTemplate(
    template=template_str,
    dataset_id="transactions_prompt"
)

### 4.2 Example Prompt
Here we reference **multiple** features: `my_sum_feat_24h`, `my_avg_feat_48h`, `my_count_feat_7d`, and `my_max_feat_7d`. This might be used for a more detailed risk analysis prompt.


In [10]:
template_str_2 = (
    "Based on the following data for account {accountID}:\n"
    "- sum of last 24 hours: {my_sum_feat_24h}\n"
    "- average transaction in last 48 hours: {my_avg_feat_48h}\n"
    "- total transactions in last 7 days: {my_count_feat_7d}\n"
    "- max transaction in last 7 days: {my_max_feat_7d}\n"
    "Please determine the risk rating for this transaction of {transactionAmount} at time {timestamp}."
)

risk_prompt = TurboMLPromptTemplate(
    template=template_str_2,
    dataset_id="transactions_prompt"
)

## 5. Testing the Prompts
We'll retrieve the **last 5** records from our local DataFrame, call `get_prompts(...)`, and see how they look. Notice that in a production system, each query might come from a live user or from an API endpoint, which would retrieve real-time features.

In [13]:
sample_df = transactions.preview_df[-5:]

In [14]:
print("--- Prompt #1 (Fraud prompt) ---")
prompts_fraud = fraud_prompt.get_prompts(df=sample_df)
for i, prompt in enumerate(prompts_fraud):
    print(f"\nPrompt {i+1}:\n{prompt}\n")

--- Prompt #1 (Fraud prompt) ---


INFO:turboml.common.internal:Starting to upload data... Total rows: 5
Progress: 100%|██████████| 1.00/1.00 [00:00<00:00, 1.10kchunk/s]
INFO:turboml.common.internal:Completed data upload.



Prompt 1:
Give the likelihood of fraud for account A844428178170104 for the transaction of amount 145.59 performed at time 1380585391.0. The sum of this account’s transactions in the past 24h is 145.59.


Prompt 2:
Give the likelihood of fraud for account A844428178164561 for the transaction of amount 84.79 performed at time 1380585437.0. The sum of this account’s transactions in the past 24h is 84.79.


Prompt 3:
Give the likelihood of fraud for account A844427182177392 for the transaction of amount 49.99 performed at time 1380585481.0. The sum of this account’s transactions in the past 24h is 49.99.


Prompt 4:
Give the likelihood of fraud for account A844427572488296 for the transaction of amount 148.74 performed at time 1380585504.0. The sum of this account’s transactions in the past 24h is 148.74.


Prompt 5:
Give the likelihood of fraud for account A985156974500548 for the transaction of amount 148.39 performed at time 1380585550.0. The sum of this account’s transactions in the 

In [15]:
print("\n--- Prompt #2 (Risk prompt) ---")
prompts_risk = risk_prompt.get_prompts(df=sample_df)
for i, prompt in enumerate(prompts_risk):
    print(f"\nPrompt {i+1}:\n{prompt}\n")


--- Prompt #2 (Risk prompt) ---


INFO:turboml.common.internal:Starting to upload data... Total rows: 5
Progress: 100%|██████████| 1.00/1.00 [00:00<00:00, 982chunk/s]
INFO:turboml.common.internal:Completed data upload.



Prompt 1:
Based on the following data for account A844428178170104:
- sum of last 24 hours: 145.59
- average transaction in last 48 hours: 145.58999633789062
- total transactions in last 7 days: 1
- max transaction in last 7 days: 145.59
Please determine the risk rating for this transaction of 145.59 at time 1380585391.0.


Prompt 2:
Based on the following data for account A844428178164561:
- sum of last 24 hours: 84.79
- average transaction in last 48 hours: 84.79000091552734
- total transactions in last 7 days: 1
- max transaction in last 7 days: 84.79
Please determine the risk rating for this transaction of 84.79 at time 1380585437.0.


Prompt 3:
Based on the following data for account A844427182177392:
- sum of last 24 hours: 49.99
- average transaction in last 48 hours: 49.9900016784668
- total transactions in last 7 days: 1
- max transaction in last 7 days: 49.99
Please determine the risk rating for this transaction of 49.99 at time 1380585481.0.


Prompt 4:
Based on the followi

## 6. (Optional) LLM Inference
We can now pass these prompts to an LLM endpoint (like OpenAI, Anthropic, Gemini or a local model) to get responses.

In [None]:
# !%pip install --upgrade openai
# from openai import OpenAI

# ## Set the API key and model name
# MODEL="gpt-4o"
# client = OpenAI(api_key="YOUR_API_KEY")  # Replace with your API key

# # Make the API call using the first prompt from your prompts_fraud list
# completion = client.chat.completions.create(
#     model=MODEL,
#     messages=[
#         {
#             "role": "system", 
#             "content": "You are a financial assistant. Help me assess the risk of fraud for this transaction."
#         },
#         {
#             "role": "user", 
#             "content": prompts_fraud[0]
#         }
#     ],
#     max_tokens=150
# )
# print("Assistant:", completion.choices[0].message.content)

By integrating **TurboML** with your LLM workflow:
- You can easily **materialize** real-time data into prompts.
- You can keep track of **complex** features without writing a lot of custom code.
- You can create more powerful and **context-aware** LLM-based applications.