# Example 01 - Train Online Sig53
This notebook steps through an example of how to use `torchsig` to instantiate a `SignalDataset` containing 53 unique modulations. The notebook then plots the signals using `Visualizers` for both the IQ and Spectrogram representations of the dataset. The end of the notebook then shows how the instantiated dataset can be saved to an LMDB static dataset for standalone research, experimentation, and/or analysis.

----
### Import Libraries
First, import all the necessary public libraries as well as a few classes from the `torchsig` toolkit.

In [10]:
from torchsig.models.iq_models.efficientnet.efficientnet import efficientnet_b4
from torchsig.datasets.modulations import ModulationsDataset
from pytorch_lightning.callbacks import ModelCheckpoint
from torch.utils.data import DataLoader
from torchsig.datasets import conf
from functools import partial
import torchsig.transforms as ST
import pytorch_lightning as pl
import numpy as np
import random
import torch
import os

----
### Instantiate Modulations Dataset
Next, instantiate the `ModulationsDataset` by passing in the desired classes, a boolean specifying whether to use the class name or index as the label, the desired level of signal impairments/augmentations, the number of IQ samples per example, and the total number of samples. Note that the total number of samples will be divided evenly among the class list (for example, `num_samples=5300` will result in 100x samples of each of the 53 modulation classes). Also note that the classes input parameter can be omitted if all classes are desired. 

If all classes are included at `level=0` (clean signals), all signals will occupy roughly half of the returned signal bandwidth except for the FSK and MSK modulations. These two subfamilies do not contain any pulse shaping, and as such are returned at roughly 1/8th occupied bandwidth for the main lobe. At the higher impairment levels, there is a randomized low pass filter applied at the 8x oversampled rate to suppress the sidelobes prior to downsampling to roughly the same half bandwidth target as the remaining signals.

Within the OFDM family, there are 12 subclasses pertaining to the number of subcarriers present within the OFDM signal. These subcarriers are the powers of 2 from 64 to 2048 as well as the LTE specifications values of 72, 180, 300, 600, 900, and 1200. The DC subcarrier is randomly on or off throughout all subcarrier counts. The subcarrier modulations are divided into two categories: 

1. randomly select a single modulation from the list: `bpsk`, `qpsk`, `16qam`, `64qam`, `256qam`, and `1024qam` and modulate all subcarriers with the random selection; and 

2. randomly select a modulation from the same list for each subcarrier independently. 

The subcarrier modulations are not included in any of the labels for future classification tasks. In addition to these randomizations, the cyclic prefix ratio is also randomly selected between discrete values of 1/8 and 1/4, and it is also not included in the labels at this time. As a final randomization with the OFDM signals, two distinct sidelobe suppression techniques are evenly sampled from to smooth the discontinuities at the symbol boundaries: 1) apply a window, and 2) apply a low pass filter.

In [12]:
transform = ST.Compose([
    ST.RandomPhaseShift(phase_offset=(-1, 1)),
    ST.Normalize(norm=np.inf),
    ST.ComplexTo2D(),
])
target_transform = ST.DescToClassIndex(class_list=ModulationsDataset.default_classes)


## Choose a Configuration
These are configurations used to generate the static Sig53 dataset.

You can use them, or not.

In [13]:
train_config = conf.Sig53CleanTrainQAConfig
# config = conf.Sig53CleanTrainConfig
val_config = conf.Sig53CleanValQAConfig
# config = conf.Sig53CleanValConfig

In [15]:
# Seed the dataset instantiation for reproduceability
pl.seed_everything(1234567891)

train_dataset = ModulationsDataset(
    classes=ModulationsDataset.default_classes,
    use_class_idx=train_config.use_class_idx,
    level=train_config.level,
    num_iq_samples=train_config.num_iq_samples,
    num_samples=train_config.num_samples,
    eb_no=train_config.eb_no,
    include_snr=False,
    transform=transform,
    target_transform=target_transform
)

val_dataset = ModulationsDataset(
    classes=ModulationsDataset.default_classes,
    use_class_idx=val_config.use_class_idx,
    level=val_config.level,
    num_iq_samples=val_config.num_iq_samples,
    num_samples=val_config.num_samples,
    eb_no=val_config.eb_no,
    include_snr=False,
    transform=transform,
    target_transform=target_transform
)

idx = np.random.randint(len(train_dataset))
data, modulation = train_dataset[idx]


print("Dataset length: {}".format(len(train_dataset)))
print("Number of classes: {}".format(len(ModulationsDataset.default_classes)))
print("Data shape: {}".format(data.shape))
print("Example modulation: {}".format(modulation))

Global seed set to 1234567891


Dataset length: 106
Number of classes: 53
Data shape: (2, 4096)
Example modulation: 44


In [16]:
def worker_init_fn(worker_id: int, seed: int):
    seed = seed + worker_id
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

In [17]:
# Create dataloaders
train_dataloader = DataLoader(
    dataset=train_dataset,
    batch_size=16,
    num_workers=8,
    worker_init_fn=partial(worker_init_fn, seed=train_config.seed),
    shuffle=True,
    drop_last=True,
)

val_dataloader = DataLoader(
    dataset=val_dataset,
    batch_size=16,
    num_workers=8,
    worker_init_fn=partial(worker_init_fn, seed=val_config.seed),
    shuffle=False,
    drop_last=True,
)

In [18]:
model = efficientnet_b4(
    pretrained=True,
    path="efficientnet_b4.pt",
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

In [19]:
class ExampleModel(pl.LightningModule):
    def __init__(self, model, data_loader, val_data_loader):
        super(ExampleModel, self).__init__()
        self.mdl: torch.nn.Module = model
        self.data_loader: DataLoader = data_loader
        self.val_data_loader: DataLoader = val_data_loader

        # Hyperparameters
        self.lr = 0.001
        self.batch_size = data_loader.batch_size

    def forward(self, x):
        return self.mdl(x)

    def predict(self, x):
        with torch.no_grad():
            out = self.forward(x)
        return out

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr)

    def train_dataloader(self):
        return self.data_loader

    def training_step(self, batch, batch_nb):
        x, y = batch
        y = torch.squeeze(y.to(torch.int64))
        loss = torch.nn.functional.cross_entropy(self(x.float()), y)
        return {"loss": loss}

    def val_dataloader(self):
        return self.val_data_loader

    def validation_step(self, batch, batch_nb):
        x, y = batch
        y = torch.squeeze(y.to(torch.int64))
        val_loss = torch.nn.functional.cross_entropy(self(x.float()), y)
        self.log("val_loss", val_loss, prog_bar=True)
        return {"val_loss": val_loss}

In [20]:
example_model = ExampleModel(model, train_dataloader, val_dataloader)

In [21]:
# Setup checkpoint callbacks
checkpoint_filename = "{}/checkpoints/checkpoint".format(os.getcwd())
checkpoint_callback = ModelCheckpoint(
    filename=checkpoint_filename,
    save_top_k=True,
    monitor="val_loss",
    mode="min",
)

# Create and fit trainer
epochs = 25
trainer = pl.Trainer(
    max_epochs=epochs, callbacks=checkpoint_callback, accelerator="gpu", devices=[0]
)
trainer.fit(example_model)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name | Type         | Params
--------------------------------------
0 | mdl  | EfficientNet | 17.3 M
--------------------------------------
17.3 M    Trainable params
0         Non-trainable params
17.3 M    Total params
69.085    Total estimated model params size (MB)


                                                                           

  rank_zero_warn(


Epoch 5: 100%|██████████| 6/6 [00:01<00:00,  3.27it/s, v_num=11, val_loss=2.290]

In [None]:
# Load best checkpoint
checkpoint = torch.load(checkpoint_filename+".ckpt", map_location=lambda storage, loc: storage)
example_model.load_state_dict(checkpoint["state_dict"], strict=False)
example_model = example_model.eval()
example_model = example_model.cuda() if torch.cuda.is_available() else example_model