# STEDI Step Test — Curated Silver Dataset

This notebook curates raw STEDI sensor data into a clean Silver dataset by aligning continuous device messages with rapid step test windows using timestamps. Each sensor reading is labeled as either occurring during a real step or outside a step window.


## Source Datasets

### device_messages_raw
This table contains continuous ultrasonic sensor readings sent by STEDI devices every few milliseconds. Each record includes a timestamp, device ID, sensor type, and a raw distance measurement stored as a string.

### rapid_step_tests_raw
This table contains step test sessions performed by users. Each record defines a start and stop time window during which real steps occurred for a given device.


In [0]:
%sql
CREATE OR REPLACE TEMP VIEW device_messages_clean AS
SELECT
  date AS event_time,
  device_id,
  sensor_type,
  CAST(REGEXP_REPLACE(distance, 'cm', '') AS INT) AS distance_cm,
  'device' AS source_label
FROM workspace.bronze.device_messages_raw;


In [0]:
%sql
CREATE OR REPLACE TEMP VIEW step_tests_clean AS
SELECT
  device_id,
  start_time,
  stop_time
FROM workspace.bronze.rapid_step_tests_raw;


In [0]:
%sql
CREATE OR REPLACE TEMP VIEW final_df AS
SELECT
  d.event_time,
  d.device_id,
  d.sensor_type,
  d.distance_cm,
  d.source_label,
  s.start_time,
  s.stop_time,
  CASE
    WHEN s.device_id IS NOT NULL THEN 'step'
    ELSE 'no_step'
  END AS step_label
FROM device_messages_clean d
LEFT JOIN step_tests_clean s
  ON d.device_id = s.device_id
 AND d.event_time BETWEEN s.start_time AND s.stop_time;


In [0]:
%sql
CREATE OR REPLACE TABLE workspace.silver.curated_step_sensor_data
USING DELTA
AS
SELECT * FROM final_df;


In [0]:
%sql
SELECT COUNT(*) AS row_count
FROM workspace.silver.curated_step_sensor_data;


In [0]:
%sql
SELECT step_label, COUNT(*) AS count
FROM workspace.silver.curated_step_sensor_data
GROUP BY step_label;


In [0]:
%sql
SELECT *
FROM workspace.silver.curated_step_sensor_data
WHERE step_label NOT IN ('step', 'no_step')
   OR step_label IS NULL
LIMIT 50;


## Ethics Check

I ensure my data labeling process is fair by assigning step labels strictly based on recorded test time windows, rather than making assumptions about your behavior. I use device identifiers only for technical alignment, so they never reveal your personal identity.

I don't make any medical or health claims with this dataset; I’ve intended it solely for movement analysis and machine-learning experiments, not for diagnosis or treatment.


In [0]:
%sql
SELECT * FROM workspace.silver.curated_step_sensor_data LIMIT 20;
