-
Notifications
You must be signed in to change notification settings - Fork 0
Preparing Your Data
Manu Murugesan edited this page Mar 13, 2026
·
1 revision
This guide explains how to convert raw accelerometer data into the HDF5 format expected by the tool.
The tool reads HDF5 files with a table named readings containing these columns:
| Column | Type | Required |
|---|---|---|
timestamp |
datetime | Yes |
x |
float | Yes |
y |
float | Yes |
z |
float | Yes |
light |
float | No (ignored) |
button |
float | No (ignored) |
temperature |
float | No (ignored) |
Additional columns are silently ignored.
import pandas as pd
# Load your raw data
df = pd.read_csv("raw_data.csv")
# Ensure you have the required columns
# Rename as needed:
df = df.rename(columns={
"time": "timestamp",
"accel_x": "x",
"accel_y": "y",
"accel_z": "z",
})
# Parse timestamps
df["timestamp"] = pd.to_datetime(df["timestamp"])
# Sort by time
df = df.sort_values("timestamp").reset_index(drop=True)
# Write HDF5
df.to_hdf(
"output.h5",
key="readings",
format="table",
data_columns=["timestamp"],
complevel=9,
complib="zlib",
)Key points:
-
format="table"is required — the tool useswhereclauses for time-range queries, which only work with table format (not fixed format) -
data_columns=["timestamp"]indexes the timestamp column for fast queries -
complevel=9, complib="zlib"is optional but recommended for compression
Files should be placed in visualize_accelerometry/data/readings/. The naming convention used by the project is:
<subject_id>-<datetime>.h5
For example: 900001-20230315093000.h5
The tool doesn't enforce this convention — any .h5 filename works — but it helps with organization.
The tool is sampling-rate agnostic. The demo uses 85 Hz, but any rate works. Higher rates mean more data points, which LTTB handles by downsampling to ~10,000 points for display.
If your sensor has irregular sampling intervals:
# Resample to 85 Hz with forward-fill
df = df.set_index("timestamp")
df = df.resample("11765us").ffill() # 1/85 sec ≈ 11765 microseconds
df = df.reset_index()Quick check that the tool can read your file:
import pandas as pd
df = pd.read_hdf("output.h5", "readings", start=0, stop=5)
print(df.columns.tolist()) # Should include: timestamp, x, y, z
print(df.dtypes) # timestamp should be datetime64
print(len(pd.read_hdf("output.h5", "readings"))) # Total rows| Duration | Rate | Rows | File Size (compressed) |
|---|---|---|---|
| 10 min | 85 Hz | ~51K | 5–10 MB |
| 1 hour | 85 Hz | ~306K | 30–60 MB |
| 24 hours | 85 Hz | ~7.3M | 700 MB–1.4 GB |