Skip to content

Preparing Your Data

Manu Murugesan edited this page Mar 13, 2026 · 1 revision

Preparing Your Data

This guide explains how to convert raw accelerometer data into the HDF5 format expected by the tool.

Required Schema

The tool reads HDF5 files with a table named readings containing these columns:

Column Type Required
timestamp datetime Yes
x float Yes
y float Yes
z float Yes
light float No (ignored)
button float No (ignored)
temperature float No (ignored)

Additional columns are silently ignored.

Creating an HDF5 File from a CSV

import pandas as pd

# Load your raw data
df = pd.read_csv("raw_data.csv")

# Ensure you have the required columns
# Rename as needed:
df = df.rename(columns={
    "time": "timestamp",
    "accel_x": "x",
    "accel_y": "y",
    "accel_z": "z",
})

# Parse timestamps
df["timestamp"] = pd.to_datetime(df["timestamp"])

# Sort by time
df = df.sort_values("timestamp").reset_index(drop=True)

# Write HDF5
df.to_hdf(
    "output.h5",
    key="readings",
    format="table",
    data_columns=["timestamp"],
    complevel=9,
    complib="zlib",
)

Key points:

  • format="table" is required — the tool uses where clauses for time-range queries, which only work with table format (not fixed format)
  • data_columns=["timestamp"] indexes the timestamp column for fast queries
  • complevel=9, complib="zlib" is optional but recommended for compression

File Naming Convention

Files should be placed in visualize_accelerometry/data/readings/. The naming convention used by the project is:

<subject_id>-<datetime>.h5

For example: 900001-20230315093000.h5

The tool doesn't enforce this convention — any .h5 filename works — but it helps with organization.

Sampling Rate

The tool is sampling-rate agnostic. The demo uses 85 Hz, but any rate works. Higher rates mean more data points, which LTTB handles by downsampling to ~10,000 points for display.

Resampling to a Uniform Rate

If your sensor has irregular sampling intervals:

# Resample to 85 Hz with forward-fill
df = df.set_index("timestamp")
df = df.resample("11765us").ffill()  # 1/85 sec ≈ 11765 microseconds
df = df.reset_index()

Validating Your File

Quick check that the tool can read your file:

import pandas as pd

df = pd.read_hdf("output.h5", "readings", start=0, stop=5)
print(df.columns.tolist())  # Should include: timestamp, x, y, z
print(df.dtypes)            # timestamp should be datetime64
print(len(pd.read_hdf("output.h5", "readings")))  # Total rows

Typical File Sizes

Duration Rate Rows File Size (compressed)
10 min 85 Hz ~51K 5–10 MB
1 hour 85 Hz ~306K 30–60 MB
24 hours 85 Hz ~7.3M 700 MB–1.4 GB

Clone this wiki locally