Philippe Joly 2025-05-25

# Data Size Overview

This notebook explores the size of the ALBATROS data and how much can be processed within the MIST machine architecture

## MIST Machine Specifications
Information found here comes from the [MIST Docs](https://docs.scinet.utoronto.ca/index.php/Mist#Specifications)

Mist is a SciNet-SOSCIP joint GPU cluster consisting of:
- 54 IBM AC922 servers
- Each node has:
    - 32 IBM Power9 cores
    - 256 GB RAM
    - 4 NVIDIA V100-SMX2 32 GB GPUs

In [2]:
import numpy as np
import matplotlib.pyplot as plt

## ALBATROS Data Size 

We can first look at the size of the desired output data. As information is conserved through the transforms, it will be sufficient when comparing it to GPU RAM. 

We can also look at exactly what initial data and parameters are needed to get the desired resultant data

In [3]:
with np.load("/project/s/sievers/mohanagr/xcorr_axion/xcorr_all_ant_4bit_1721361571_14336_64_15326_0_120.npz") as f:
    data = f['data']
    mask = f['mask']
    missing_fraction = f["missing_fraction"]
    chans = f['chans']

In [4]:
print("Data Shape:", data.shape)
print("Mask Shape:", mask.shape)
print("Missing Fraction Shape:", missing_fraction.shape)
print("Channels Shape", chans.shape)

Data Shape: (2, 2, 7680, 15326)
Mask Shape: (2, 2, 7680, 15326)
Missing Fraction Shape: (1, 15326)
Channels Shape (7680,)


In [25]:
print("Data Size:", data.nbytes*1e-9, "GB") # (4 bits * 2 [complex]) * 4 [polarisations] * (120*64) [frequency] * 15326 [time] 
print("Mask Size:", mask.nbytes*1e-6, "MB")
print("Missing Fraction Size:", missing_fraction.nbytes*1e-3, "kB")
print("Channels Size", chans.nbytes*1e-3, "kB\n")

print("Predicted File Size", (data.nbytes+mask.nbytes+missing_fraction.nbytes+chans.nbytes)*1e-9, "GB")
print("File Size (checked on NIAGARA) 4.237517668 GB")

Data Size: 3.76651776 GB
Mask Size: 470.81471999999997 MB
Missing Fraction Size: 122.608 kB
Channels Size 61.44 kB

Predicted File Size 4.2375165280000004 GB
File Size (checked on NIAGARA) 4.237517668 GB


In [30]:
def get_size(band, resolution, time, avg_ws):
    df_record = 125e6/2048
    return 8*4*band/resolution*(time*df_record/avg_ws) 

## Limitations