# Importing External Data into TorchSig: Bring Your Own Data (BYOD) NumPy Example
This notebook shows an example of how to import externally created data into TorchSig using a basic NumPy dataset with JSON metadata file format. 

This example employs a provided NumPy `NPYReader` subclass of TorchSig's `FileReader` to read a custom externally created dataset as a `StaticTorchSigDataset`.

---

In [None]:
import numpy as np
import os, csv, json, math

# TorchSig
from torchsig.utils.file_handlers.npy import NPYReader
from torchsig.datasets.datasets import StaticTorchSigDataset
from torchsig.transforms.transforms import ComplexTo2D

## Step 1: External Data Generation Process: create synthetic data outside TorchSig workflow

If your data already exists somewhere, you can skip to Step 2.

We will write a sample dataset using Numpy's npy for signal data and and csv for metadata. 

### External Synthetic Data and Metadata Generation

In [None]:
# configuration parameters
root = "datasets/byod_npy_example"  # data file top-level folder
seed = 1234567890  # rng seed

os.makedirs(root, exist_ok=True)  # directory for files

Below, we generate some signals (outside of TorchSig).

In [None]:
# Parameters
fs = 1_000_000  # 1 MHz sample-rate (fixed rate)
num_iq_samples = 1024  # I/Q samples per data element (fixed size)
dataset_size = 8  # number of total data elements in dataset
elements_per_file = 2  # number of data elements per .npy file
num_files = math.ceil(dataset_size / elements_per_file)

labels = ["BPSK", "QPSK", "Noise"]  # three arbitrary metadata class labels (strings)
modcod = [0, 1, 2]  # three arbitrary metadata integers
rng = np.random.default_rng(seed)  # random number generator

In [None]:
# Create some external data: non-TorchSig synthetic data along with limited metadata

signals_array = np.empty(
    (dataset_size, num_iq_samples), dtype=np.complex64
)  # during generation, store all data in memory
meta_rows = []  # during generation, store all metadata in memory

t = np.arange(num_iq_samples) / fs  # timesteps

# create dataset
for idx in range(dataset_size):
    label = rng.choice(labels)
    mc = rng.choice(modcod)

    if label == "BPSK":
        bits = rng.integers(0, 2, num_iq_samples)
        sig = (2 * bits - 1) + 0j
    elif label == "QPSK":
        bits = rng.integers(0, 4, num_iq_samples)
        table = {0: 1 + 1j, 1: 1 - 1j, 2: -1 + 1j, 3: -1 - 1j}
        sig = np.vectorize(table.get)(bits)
    else:  # white noise
        sig = (
            rng.normal(size=num_iq_samples) + 1j * rng.normal(size=num_iq_samples)
        ) * 0.1

    sig /= np.sqrt((np.abs(sig) ** 2).mean())  # normalize power for consistency
    signals_array[idx] = sig.astype(np.complex64)

    # add to metadata
    meta_rows.append(dict(index=idx, label=label, modcod=mc, sample_rate=fs))

# write dataset-level metadata to JSON
global_metadata = {
    "size": dataset_size,
    "num_iq_samples": num_iq_samples,
    "num_files": num_files,
    "elements_per_file": elements_per_file,
    "class_labels": labels,
    "sample_rate": fs,
}
with open(os.path.join(root, "info.json"), "w") as f:
    json.dump(global_metadata, f, indent=4)

# write data as multiple .npy files
for i in range(num_files):
    start = i * elements_per_file
    end = min(dataset_size, (i + 1) * elements_per_file)
    chunk = signals_array[start:end]  # slice the chunk for current file
    filename = os.path.join(root, f"data_{i}.npy")
    np.save(filename, chunk)  # save chunk to .npy


# write sample-specific metadata to CSV
with open(os.path.join(root, "metadata.csv"), "w", newline="") as csvfile:
    csv.DictWriter(csvfile, fieldnames=meta_rows[0].keys()).writerows(meta_rows)

print(f"Synthetic signals + metadata staged in {root}")

## Step 2. FileReader
To enable your data on disk to interface with TorchSig, you may use one of the provided `FileReader` examples or write your own `FileReader` so TorchSig knows how to handle your data. Make sure to call `super()` in your own implementation. For this example we will use the provided `NPYReader` reader class.

## Step 3: StaticTorchSigDataset

Use `StaticTorchSigDataset` and a file handler to interface with the dataset.

In [None]:
root = "datasets/byod_npy_example"

custom_dataset = StaticTorchSigDataset(
    file_handler_class=NPYReader, root=root, target_labels=None
)
print(f"Dataset size: {len(custom_dataset)}")

sample = custom_dataset[4]
print(f"Data: {sample.data}")
print(sample)

In [None]:
# can apply transforms and metadata transforms
root = "datasets/byod_npy_example"

custom_dataset_2 = StaticTorchSigDataset(
    file_handler_class=NPYReader,
    root=root,
    transforms=[ComplexTo2D()],
    target_labels=["modcod"],
)  # transform complex data to 2D format  # return custom label
print(f"Dataset size: {len(custom_dataset_2)}")

data, label = custom_dataset_2[4]
print(f"Data element shape: {data.shape}")
print(f"Label: {label}")