# Reproducability and Seeding
This notebook showcases how TorchSig handles random seeding to allow reproducable experiments.

---

## Narrowband Metadata
As in the Narrowband Dataset example, in order to create a NewNarrowband dataset, you must define parameters in NarrowbandMetadata. This can be done either in code or inside a YAML file. Look at `narrowband_example.yaml` for a sample YAML file.

In [None]:
# Define Variables

num_iq_samples_dataset = 4096 # 64^2
fft_size = 64
impairment_level = 0 # clean

In [None]:
from torchsig.datasets.dataset_metadata import NarrowbandMetadata

narrowband_metadata = NarrowbandMetadata(
    num_iq_samples_dataset = num_iq_samples_dataset, # 64^2
    fft_size = fft_size,
    impairment_level = impairment_level, # clean
)
print(narrowband_metadata)

## Iterable Dataset
The TorchSigIterableDataset class inherits torch IterableDataset, and is used to sample synthetic datasets at runtime

In [None]:
from torchsig.datasets.datasets import TorchsigIterableDataset

Creating the dataset without seeding; this print statement will create a different random signal every time you call it.

If you run this cell multiple times, or if you reload this notebook and run it again it will not produce the same signal.

In [None]:
narrowband_dataset = TorchsigIterableDataset(narrowband_metadata)
print(next(narrowband_dataset)[0])
print(next(narrowband_dataset)[0])
print(next(narrowband_dataset)[0])

In [None]:
narrowband_dataset = TorchsigIterableDataset(narrowband_metadata)
print(next(narrowband_dataset)[0])
print(next(narrowband_dataset)[0])
print(next(narrowband_dataset)[0])

## Seeding
All torchsig Transforms, Datasets, DatasetMetadata objects, and DataLoaders are seedable objects.

This means they all have a .seed(N) method, which can be called to input a random seed. If no seed is given, the seedable object with produce its own seed and generate different random numbers every time you run your code.

If you want reproducable experiments, you generally will want to call .seed(N) on some integer N of your choosing. This will ensure the same 'random' outcomes occur each time the code is executed.

You don't need to seperately seed connected objects. If a dataset contains several transforms, seeding the dataset is enough to also correctly seed all of its transforms.

In general, you will only need to call .seed() on the top level object you are using (typically either a dataset or a data loader).

NOTE: Calling numpy.random.seed will not seed torchsig datasets; they should always be seeded explicitely if a seed is desired

### Seeding the Dataset
Here the same dataset from above is seeded; this code will produce the same random signals every time it is run

In [None]:
narrowband_dataset = TorchsigIterableDataset(narrowband_metadata)
narrowband_dataset.seed(42)
print(next(narrowband_dataset)[0])
print(next(narrowband_dataset)[0])
print(next(narrowband_dataset)[0])

In [None]:
narrowband_dataset = TorchsigIterableDataset(narrowband_metadata)
narrowband_dataset.seed(42)
print(next(narrowband_dataset)[0])
print(next(narrowband_dataset)[0])
print(next(narrowband_dataset)[0])

### Seeding a DataLoader
On a single worker threads/process, seeding a dataset alone is sufficient to reproduce results correctly.

Since DataLoaders typically use several different worker threads/processes, each with a copy of the dataset, we generally want each worker to have a different seed for its copy of the dataset, so that its randomly generated data does not match that of the other workers.

This is unneccessary for loading static data from files, but it is needed for loading on-the-fly random generated data correctly.

To address this issue, TorchSig exposes a WorkerSeedingDataLoader, which will seed a torchsig dataset differently in all workers.

NOTE: WorkerSeedingDataLoader uses it's own worker init function, and is not compatible with other custom worker init functions; the exact data generated will still depend on the configuration of workers, so it will not produce the same data with different worker counts


In the code below, we create and seed a WorkerSeedingDataLoader for our dataset

In [None]:
from torchsig.utils.data_loading import WorkerSeedingDataLoader

In [None]:
narrowband_dataset = TorchsigIterableDataset(narrowband_metadata)
dataloader = WorkerSeedingDataLoader(narrowband_dataset, batch_size=8, num_workers=2)
dataloader.seed(42)
print(next(iter(dataloader))[0])

Because we are seeding our dataloader with the same seed value, and because both dataloaders have the same worker count, this code will produce the same batch of signals every time it is run

In [None]:
narrowband_dataset = TorchsigIterableDataset(narrowband_metadata)
dataloader = WorkerSeedingDataLoader(narrowband_dataset, batch_size=8, num_workers=2)
dataloader.seed(42)
print(next(iter(dataloader))[0])