# Analyzing CAN bus data with GPT-4 and DuckDB
In this tutorial we will see how to use the parquet file format, the in-process database DuckDB and GPT-4 to perform queries on CAN-bus data.

## Convert decoded mf4 to parquet

In [23]:
from asammdf import MDF

with MDF(r"../data/decoded.mf4") as mdf:
    df = mdf.to_dataframe()
    df.to_parquet("vehicle_data.parquet", engine="pyarrow")

## Connect with DuckDB

In [None]:
import duckdb

# Connect to an in-memory DuckDB instance
conn = duckdb.connect()

# Use DESCRIBE to list column details
query = "DESCRIBE SELECT * FROM 'vehicle_data.parquet';"

# Execute the query and get the result as a DataFrame
columns_df = conn.execute(query).fetchdf()

# Extract and print the column names
column_names = columns_df["column_name"].tolist()
print("Column Names:", column_names)

## Ask GPT-4 about the semantics of the data

### Sample query 1

Try to query different types of information such as velocity, speed, battery charge, rpm and more...

I have a database with vehicle data. It contains with the following channels with time series data:
Column Names: ['Service', 'Response', 'Length', 'S01PID', 'S01PID04_CalcEngineLoad', 'S01PID05_EngineCoolantTemp', 'S01PID0B_IntakeManiAbsPress', 'S01PID0C_EngineRPM', 'S01PID0D_VehicleSpeed', 'S01PID11_ThrottlePosition', 'S01PID2F_FuelTankLevel', 'S01PID33_AbsBaroPres', 'S01PID42_ControlModuleVolt', 'S01PID62_ActualEngTorqPct', 'timestamps']

I am looking for the following information:
engine revolutions per minute

If there is no channel with the desired information, the answer: "There is no channel with the desired information"
If there is one or more channel with the desired information, then answer with the channel names only.

## Sandbox to execute generated queries on DuckDB

In [14]:
# Paste query here
query = """
    PASTE QUERY HERE
"""

# Run the query and get the result as a DataFrame
df = conn.execute(query).fetchdf()

# Display the result
print(df)