Preparing Your Data

This guide explains how to convert raw accelerometer data into the HDF5 format expected by the tool.

Required Schema

The tool reads HDF5 files with a table named readings containing these columns:

Column	Type	Required
`timestamp`	datetime	Yes
`x`	float	Yes
`y`	float	Yes
`z`	float	Yes
`light`	float	No (ignored)
`button`	float	No (ignored)
`temperature`	float	No (ignored)

Additional columns are silently ignored.

Creating an HDF5 File from a CSV

import pandas as pd

# Load your raw data
df = pd.read_csv("raw_data.csv")

# Ensure you have the required columns
# Rename as needed:
df = df.rename(columns={
    "time": "timestamp",
    "accel_x": "x",
    "accel_y": "y",
    "accel_z": "z",
})

# Parse timestamps
df["timestamp"] = pd.to_datetime(df["timestamp"])

# Sort by time
df = df.sort_values("timestamp").reset_index(drop=True)

# Write HDF5
df.to_hdf(
    "output.h5",
    key="readings",
    format="table",
    data_columns=["timestamp"],
    complevel=9,
    complib="zlib",
)

Key points:

format="table" is required — the tool uses where clauses for time-range queries, which only work with table format (not fixed format)
data_columns=["timestamp"] indexes the timestamp column for fast queries
complevel=9, complib="zlib" is optional but recommended for compression

File Naming Convention

Files should be placed in visualize_accelerometry/data/readings/. The naming convention used by the project is:

<subject_id>-<datetime>.h5

For example: 900001-20230315093000.h5

The tool doesn't enforce this convention — any .h5 filename works — but it helps with organization.

Sampling Rate

The tool is sampling-rate agnostic. The demo uses 85 Hz, but any rate works. Higher rates mean more data points, which LTTB handles by downsampling to ~10,000 points for display.

Resampling to a Uniform Rate

If your sensor has irregular sampling intervals:

# Resample to 85 Hz with forward-fill
df = df.set_index("timestamp")
df = df.resample("11765us").ffill()  # 1/85 sec ≈ 11765 microseconds
df = df.reset_index()

Validating Your File

Quick check that the tool can read your file:

import pandas as pd

df = pd.read_hdf("output.h5", "readings", start=0, stop=5)
print(df.columns.tolist())  # Should include: timestamp, x, y, z
print(df.dtypes)            # timestamp should be datetime64
print(len(pd.read_hdf("output.h5", "readings")))  # Total rows

Typical File Sizes

Duration	Rate	Rows	File Size (compressed)
10 min	85 Hz	~51K	5–10 MB
1 hour	85 Hz	~306K	30–60 MB
24 hours	85 Hz	~7.3M	700 MB–1.4 GB

Official Docs | Live Demo | Report an Issue

Navigation

Links

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preparing Your Data

Preparing Your Data

Required Schema

Creating an HDF5 File from a CSV

File Naming Convention

Sampling Rate

Resampling to a Uniform Rate

Validating Your File

Typical File Sizes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally