## 1. Load and Explore Metadata Files (train.csv, test.csv, sample_submission.parquet)
First, we load the metadata files using pandas and inspect their contents. The training and test CSV files contain metadata for each ECG record, and the sample submission file shows the required output format for the competition. We will examine the number of records and the columns provided in each:

In [None]:
import pandas as pd

# Load metadata files
train_df = pd.read_csv("/kaggle/input/physionet-ecg-image-digitization/train.csv")
test_df = pd.read_csv("/kaggle/input/physionet-ecg-image-digitization/test.csv")
submission_df = pd.read_parquet("/kaggle/input/physionet-ecg-image-digitization/sample_submission.parquet")

# Print basic information
print(f"Training records: {len(train_df)}")
print(f"Test records: {len(test_df)}")
print("Train columns:", train_df.columns.tolist())
print("Test columns:", test_df.columns.tolist())
print("Sample submission columns:", submission_df.columns.tolist())

# Peek at the first few rows of each
print("\nTrain.csv sample:")
print(train_df.head(3))  # first 3 rows of train metadata
print("\nTest.csv sample:")
print(test_df.head(3))
print("\nSample_submission.parquet sample:")
print(submission_df.head(5))

## 2. Load and Visualize Sample ECG Time-Series (12-Lead Signals)

Now, let’s load a few example ECG time-series from the training set and plot their 12-lead signals. Each training ECG has a corresponding CSV file (train/<id>/<id>.csv) containing the raw waveform data. This CSV has 12 columns, one for each ECG lead (typically labeled I, II, III, aVR, aVL, aVF, V1, V2, V3, V4, V5, V6 – the standard 12 leads ). We will:
	•	Select a few sample record IDs from train_df.
	•	For each example ID, read its CSV file into a DataFrame.
	•	Print the sampling frequency and signal length from the metadata (to know the time span of the recording).
	•	Plot all 12 lead signals using matplotlib, arranging subplots for clarity.

In [None]:
import matplotlib.pyplot as plt

# Select a few example ECG IDs from the training set (e.g., first 2 records)
example_ids = train_df['id'].iloc[:2].tolist()

for ecg_id in example_ids:
    # Load the raw time-series data for this ECG ID
    filepath = f"/kaggle/input/physionet-ecg-image-digitization/train/{ecg_id}/{ecg_id}.csv"
    ecg_signal = pd.read_csv(filepath)
    
    # Get sampling frequency (fs) and signal length from metadata
    fs = train_df.loc[train_df['id'] == ecg_id, 'fs'].iloc[0]
    sig_len = train_df.loc[train_df['id'] == ecg_id, 'sig_len'].iloc[0]
    duration = sig_len / fs if fs else 0
    print(f"\nECG ID: {ecg_id} -> Sampling frequency = {fs} Hz, Signal length = {sig_len} samples (~{duration:.1f} sec)")
    print("Lead columns:", list(ecg_signal.columns))
    
    # Plot the 12-lead ECG signals
    fig, axes = plt.subplots(nrows=6, ncols=2, figsize=(12, 8))
    axes = axes.ravel()  # flatten the 6x2 grid to a 1D array for easy iteration
    for i, lead in enumerate(ecg_signal.columns):
        axes[i].plot(ecg_signal[lead], color='black')
        axes[i].set_title(lead)
        axes[i].set_xlabel("Sample index")
        axes[i].set_ylabel("Amplitude")
        # Optional: add grid or adjust y-axis for clarity
        axes[i].grid(True, which='both', linestyle='--', linewidth=0.5)
    plt.tight_layout()
    plt.show()

## **3. Load and Visualize Sample ECG Images**

In addition to numeric signals, the dataset provides **ECG images** for each record. These are scans or renderings of the ECG paper printouts. Each training record’s folder contains one or more PNG images of the ECG (multiple images if the ECG spans more than one page) , and each test record has at least one ECG image (since our task is to digitize these images). We can use Matplotlib to load and display these images to understand how the ECGs look in image form.

*Example of a 12-lead ECG image from the dataset (synthetic data). The 12 leads (I, II, III, aVR, aVL, aVF, V1–V6) are arranged on a standard grid. Time runs along the horizontal axis and voltage along the vertical axis; each small grid box typically represents 40 ms (horizontal) and 0.1 mV (vertical). The printout includes metadata (patient details, recording info) at the top, which in this example is synthetically generated or anonymized. This image illustrates the kind of input (paper ECG format) that needs to be converted back to digital signals.*

In code, let’s read and display one training ECG image and one test ECG image:

In [None]:
# Choose an example train ID and test ID to visualize images
train_example_id = example_ids[0]            # use one of the earlier example IDs
test_example_id = test_df['id'].iloc[0]      # first test ID (for example)

# Load and display the first page of the train ECG image
train_img_path = f"/kaggle/input/physionet-ecg-image-digitization/train/{train_example_id}/{train_example_id}-0001.png"
train_img = plt.imread(train_img_path)
plt.figure(figsize=(6, 4))
plt.imshow(train_img, cmap='gray')
plt.title(f"Train ECG Image - ID: {train_example_id} (page 1)")
plt.axis('off')  # hide axis ticks
plt.show()

# Load and display the test ECG image
test_img_path = f"/kaggle/input/physionet-ecg-image-digitization/test/{test_example_id}.png"
test_img = plt.imread(test_img_path)
plt.figure(figsize=(6, 4))
plt.imshow(test_img, cmap='gray')
plt.title(f"Test ECG Image - ID: {test_example_id}")
plt.axis('off')
plt.show()