## Step 1 — Bronze Table Exploration

### Table: `workspace.bronze.device_messages_raw`
This table stores raw sensor messages coming directly from STEDI devices. It includes the device identifier (`device_id`), the raw `distance` measurement (stored as a string), the sensor metadata (`sensor_type`), and event timing fields like `timestamp` and `date`. This table is essential for ML feature building because it contains the time-series signal (`distance`) that will be cleaned into numeric `distance_cm` and later labeled as step vs. no_step.

### Table: `workspace.bronze.rapid_step_tests_raw`
This table stores summaries of rapid step test sessions, including which customer performed the test (`customer`), the device used (`device_id`), and the test window boundaries (`start_time`, `stop_time`) plus step timing details (`step_points`, `total_steps`, `test_time`). It helps ML feature building by providing the ground-truth time windows that let us label sensor readings as “step” if a reading falls inside a test session window (and “no_step” otherwise).


## Step 2 — Create Silver Table (labeled_step_test)

This Silver table contains clean sensor readings with a machine-learning label (`step_label`).
We convert the raw `distance` field into numeric centimeters (`distance_cm`) and label each
reading as "step" when the reading timestamp falls inside a rapid step test window
(start_time to stop_time) for the same device.


In [0]:
CREATE SCHEMA IF NOT EXISTS workspace.silver;

In [0]:
CREATE TABLE workspace.silver.labeled_step_test (

  timestamp LONG COMMENT
    'Exact time when the sensor reading was recorded (matches device_messages_raw.timestamp).',

  device_id STRING COMMENT
    'The device that produced the sensor reading. Used to link readings to step tests.',

  sensor_type STRING COMMENT
    'Type of sensor that produced the reading (e.g., ultrasonic).',

  distance_cm DOUBLE COMMENT
    'Clean numeric distance value in centimeters, converted from raw string distance.',

  source STRING COMMENT
    'Indicates original source of the data (device messages).',

  start_time LONG COMMENT
    'Start time of the rapid step test window.',

  stop_time LONG COMMENT
    'End time of the rapid step test window.',

  testId STRING COMMENT
    'Generated identifier to group readings belonging to the same step test.',

  step_label STRING COMMENT
    'Label indicating whether the reading occurred during a step test window.'

)
USING DELTA;


In [0]:
INSERT INTO workspace.silver.labeled_step_test
WITH dm_clean AS (
  SELECT
    device_id,
    timestamp AS ts_ms,
    sensor_type,
    message_origin,
    session_key,
    TRY_CAST(REGEXP_REPLACE(CAST(distance AS STRING), '[^0-9.]', '') AS DOUBLE) AS distance_cm
  FROM workspace.bronze.device_messages_raw
),
tests AS (
  SELECT
    customer,
    device_id,
    start_time,
    stop_time,
    -- Optional: generate a testId for grouping
    CONCAT(device_id, '_', CAST(start_time AS STRING), '_', CAST(stop_time AS STRING)) AS testId
  FROM workspace.bronze.rapid_step_tests_raw
)
SELECT
  dm.ts_ms                               AS timestamp,
  dm.device_id                           AS device_id,
  dm.sensor_type                         AS sensor_type,
  CAST(dm.distance_cm AS DOUBLE)         AS distance_cm,
  dm.message_origin                      AS source,
  t.start_time                           AS start_time,
  t.stop_time                            AS stop_time,
  t.testId                               AS testId,
  CASE
    WHEN t.device_id IS NOT NULL THEN 'step'
    ELSE 'no_step'
  END                                    AS step_label
FROM dm_clean dm
LEFT JOIN tests t
  ON dm.device_id = t.device_id
 AND dm.ts_ms BETWEEN t.start_time AND t.stop_time;


In [0]:
SELECT step_label, COUNT(*) AS rows
FROM workspace.silver.labeled_step_test
GROUP BY step_label
ORDER BY rows DESC;

## Step 3 — Query Silver Table Schema (Unity Catalog)

This query pulls the official column definitions (name, data type, and comments) from Unity Catalog
for the Silver table. This ensures the schema is documented and exportable for submission.


In [0]:
SELECT 
  column_name AS `Column Name`, 
  data_type AS `Data Type`,
  comment AS Comment
FROM workspace.information_schema.columns
WHERE table_schema = 'silver'
  AND table_name = 'labeled_step_test'
ORDER BY ordinal_position;
