# Generating Dataset for Channel Estimation Training
The first step in any deep learning project is preparing the dataset. In this notebook we create an OFDM communication pipeline and capture the received grid together with the transmitted grid containing DMRS information. We are assuming a Multi-layer MIMO configuration with 2 layers, 16 transmiter antenna, 4 receiver antenna. In this configuration each slot can carry 4 code-blocks. Each sample of dataset is a ``6 x 14 x 540`` tensor and ground-truth labels are ``4 x 14 x 540`` tesnors. 
The following picture shows how each dataset sample is created. The communication of each slot generates 4 dataset samples (one for each receiver antenna). In this experiment we are using $N_L = 2$ (Number of Layers), $L=14$ (Number of time-symbols), $K=540$ (Number of sub-carriers), and $N_r=4$ (Number of receiver antenna).

Please note that since we are using DMRS for channel estimation, the "channel" includes the effect of precoding.

![Data Generation Pipeline](ModelIO.png)

So, let's get started by importing some modules from **NeoRadium**.

In [1]:
import numpy as np
import time, os

from neoradium import Carrier, PDSCH, CdlChannel, AntennaPanel, Grid, random, LdpcEncoder
from ChEstUtils import getRandomPilotInfo, getModelIn, getLabels

The ``makeDataset`` function below runs the pipeline and creates samples with 0 (DMRS only), 1, 2, or 3 successfully decoded code-blocks. Please refer to the ``ChEstUtils.py`` file for the implementation details of the functions ``getRandomPilotInfo``, ``getModelIn``, and ``getLabels``.

In [2]:
def makeDataset(numSlots, snrDbs, seed, pdsch, fileName=None, freqDomain=True):
    bwp = pdsch.bwp                       # The only bandwidth part in the carrier

    # We don't need to use channel coding in the pipeline during dataset generation. However, we need
    # to know the code-block sizes to be able to find the corresponding REs in the grid. The following 
    # code calculates the code-block sizes for each one of 4 code-blocks
    ldpcEncoder = LdpcEncoder(baseGraphNo=1, modulation=pdsch.modems[0].modulation,
                              txLayers=pdsch.numLayers, targetRate=490/1024)
    txGrid = pdsch.getGrid()                                    # Create a resource grid populated with DMRS
    numBits = pdsch.getBitSizes(txGrid)[0]                      # Number of bits available in the resource grid
    txBlockSize = pdsch.getTxBlockSize(ldpcEncoder.targetRate)  # Transport Block Size based on TS 38.214
    txBlock = random.bits(txBlockSize[0])                       # Create random binary data
    rateMatchedCBs = ldpcEncoder.getRateMatchedCodeBlocks(txBlock, numBits, concatCBs=False)
    cbSizes = [len(cb) for cb in rateMatchedCBs]                # Code-block sizes

    # Creating a random CDL channel matrix generator 
    chanGen = CdlChannel.getChanGen(numSlots, carrier.curBwp,   # Number of channels and bandwidth part
                                    profiles="ABCDE",           # Randomly pick a CDL profile
                                    delaySpread=300,            # 300 ns
                                    ueSpeed=0.5,                # 0.5 mps ≈ 6.7Hz doppler
                                    carrierFreq=4e9,            # Carrier frequency
                                    txAntenna=AntennaPanel([2,4], polarization="x"),  # 16 TX antenna
                                    rxAntenna=AntennaPanel([1,2], polarization="x"),  # 4 RX antenna
                                    seed=seed)
    
    samples, labels = [], []                        # Initialize samples and labels
    t0 = time.time()                                # Start time for time estimation
    random.setSeed(seed)
    print(f"Making dataset for {numSlots:,} slots")
    for s, channelMatrix in enumerate(chanGen):
        txGrid = pdsch.getGrid()                    # Create a resource grid populated with DMRS
        numBits = pdsch.getBitSizes(txGrid)[0]      # Number of bits available in the resource grid
        txBits = random.bits(numBits)               # Create random binary data
        pdsch.populateGrid(txGrid, txBits)          # Map/modulate the data to the resource grid

        # Getting the Precoding Matrix, and precoding the resource grid
        precoder = pdsch.getPrecodingMatrix(channelMatrix)      # Get the precoder matrix from PDSCH object
        perfectChannel = channelMatrix @ precoder[None,...]     # ground truth channel with the effect of precoding
        precodedGrid = txGrid.precode(precoder)                 # Perform the precoding
        
        snrDb = snrDbs[s%len(snrDbs)]               # Get next SNR value

        if freqDomain:
            rxGrid = precodedGrid.applyChannel(channelMatrix)   # Apply the channel in frequency domain
            noisyRxGrid = rxGrid.addNoise(snrDb=snrDb)
        else:
            channel = chanGen.curChan                           # Get the channel model
            txWaveform = precodedGrid.ofdmModulate()            # OFDM Modulation
            maxDelay = channel.getMaxDelay()                    # Get the max. channel delay
            txWaveform = txWaveform.pad(maxDelay)               # Pad with zeros
            rxWaveform = channel.applyToSignal(txWaveform)      # Apply channel in time domain
            noisyRxWaveform = rxWaveform.addNoise(snrDb=snrDb, nFFT=bwp.nFFT)  # Add noise
            offset = channel.getTimingOffset()                  # Get timing info for synchronization
            syncedWaveform = noisyRxWaveform.sync(offset)       # Synchronization
            noisyRxGrid = syncedWaveform.ofdmDemodulate(bwp)    # OFDM demodulation
        
        # pilotIdx contains the indexes in the txGrid with known values. Known values could include DMRS
        # and a number of successfuly decoded code-blocks (0..numCBs-1)
        pilotIdx = getRandomPilotInfo(pdsch, txGrid, cbSizes)
        newSamples = getModelIn(pilotIdx, txGrid, noisyRxGrid, 10000)   # rr x 2*(pp+1) x ll x kk
        newLabels = getLabels( perfectChannel )                         # rr x 2*pp x ll x kk

        samples += [ newSamples ]
        labels += [ newLabels ]

        dt = time.time()-t0                                     # Get the duration of time since the beginning
        percentDone = (s+1)*100/numSlots                        # Calculate the percentage of task done

        # Print messages about the progress
        remainTime = int(np.round(100*dt/percentDone-dt))       # Estimated remaining time
        print(f"  {int(percentDone)}% done in {int(np.round(dt)):,} Sec., Remaining time: {remainTime:,} Sec.",
              end='            \r')

    # n: Number of samples in the dataset, pp: Number of ports, ll: Number of OFDM symbols, kk: Number of subcarriers
    samples = np.concatenate(samples, axis=0)       # n x 2*(pp+1) x ll x kk
    labels = np.concatenate(labels, axis=0)         # n x 2*pp x ll x kk

    if fileName is not None:
        np.save(fileName, np.concatenate([samples,labels],axis=1))   # Save the dataset to the specified file
        print(f"\r  Done. ({dt:.2f} Sec.) Saved to \"{fileName}\".                       ")
    else:
        print(f"\r  Done. ({dt:.2f} Sec.)                                                ")

    return samples, labels

In [3]:
carrier = Carrier(numRbs=45, spacing=30)    # Create a carrier with 45 RBs and 30KHz subcarrier spacing
bwp = carrier.curBwp                        # The only bandwidth part in the carrier

# Create a 2-layer PDSCH object with type-2 DMRS on symbols 2 and 11
pdsch = PDSCH(bwp, numLayers=2, nID=carrier.cellId, modulation="16QAM")
pdsch.setDMRS(configType=2, additionalPos=1)

dataPath = "/data/datasets/SelfRefine/"     # Replace with the location of your data files
os.makedirs(dataPath, exist_ok=True)        # Create data folder if it does not exist
   
snrDbs = np.arange(-15,-9,.5)               # Set the range of SNR values (in dB)

# Create training, validation, and test dataset files (About 30GB of disk space needed):
makeDataset(17500, snrDbs, seed=123, pdsch=pdsch, fileName=os.path.join(dataPath,"Train.npy"))
makeDataset(2500,  snrDbs, seed=456, pdsch=pdsch, fileName=os.path.join(dataPath,"Valid.npy"))
makeDataset(5000,  snrDbs, seed=789, pdsch=pdsch, fileName=os.path.join(dataPath,"Test.npy"));

Making dataset for 17,500 slots
  Done. (975.53 Sec.) Saved to "/data/datasets/SelfRefine/Train.npy".                       
Making dataset for 2,500 slots
  Done. (140.02 Sec.) Saved to "/data/datasets/SelfRefine/Valid.npy".                       
Making dataset for 5,000 slots
  Done. (279.61 Sec.) Saved to "/data/datasets/SelfRefine/Test.npy".                       
