# Phase 1: Data Audit and Cleaning

This notebook runs the data audit and cleaning scripts. It provides a summary of the data quality and the transformations applied.

## 1. Data Audit

First, we run the data audit script to check for issues like missing timestamps, duplicates, and price spikes.

In [None]:
!python ../scripts/data_audit.py

The audit summary is saved in `artifacts/audit_summary.json`. Let's load it and see the results.

In [None]:
import json
with open('../artifacts/audit_summary.json', 'r') as f:
    audit_summary = json.load(f)
print(json.dumps(audit_summary, indent=4))

## 2. Data Cleaning and Imputation

Next, we run the cleaning and imputation script. This script handles any issues found in the audit and prepares the data for analysis.

In [None]:
!python ../scripts/clean_impute.py

The cleaned data is saved in `data/cleaned/BTCUSD_1min.cleaned.csv`. Let's look at the first few rows to see the new columns.

In [None]:
import pandas as pd
df_cleaned = pd.read_csv('../data/cleaned/BTCUSD_1min.cleaned.csv')
df_cleaned.head()

The imputation log shows what changes were made. Since our initial data was clean, this log should be empty.

In [None]:
imputation_log = pd.read_csv('../artifacts/imputation_log.csv')
imputation_log