# WSU size of computing estimate numbers (Database)

The goal here is to put together some numbers on the type of projects ALMA would process as part of the WSU to be used to produce a total size of computing estimate.

Amanda Kepley (20220921)

In [67]:
import numpy as np
import astropy.units as u
from ast import literal_eval
from astropy import constants as const
from matplotlib import pyplot as plt, ticker as mticker
import re
import math
from astropy.table import Table, QTable, vstack, join
from astropy import constants as const
from importlib import reload

## TOC

* [Read in Data](#readin)
* [Generate WSU DB](#wsu_db)

## Read in massaged cycle 7 and 8 data <a id="readin"></a>

In [58]:
cycle7tab = Table.read('data/result_table_cycle7_with_calc_values_20220923.csv')
cycle8tab = Table.read('data/result_table_cycle8_with_calc_values_20220923.csv')

Information that we already have:
* FOV
* resolution
* mosaic or not
* frequency
* image linear size
* number of baseline (7m vs. 12m)
* n polarizations

Information that I need to calculate for WSU:
* Bandwidth (16GHz for all receivers)
* channel width (follow procedure in data memo)
* nchan (calculate from bandwidth and channel width)
* dump time (is this just a 7m vs. 12m dichotomy). Looks like in her spreadsheet Crystal uses 3.024s for 12m and 10.08s for 7m.

The data rate memo calculated the number of channels two ways:
* finest spectral resolution across all spw for cycle 7 project [the writeup says sbs, but I think they mean spws. confirm with crystal?]
* fix resolution:
  * *> 10km/s* -> 10km/s
  * 1-10 km/s -> 1km/s
  * 0.1-1 km/s -> 0.1km/s
  * <0.1km/s -> 0.1km/s

How am I going to get to a fraction of projects? Using the time estimate doesn't seem right because the time for spectral scans will be drastically faster. What I can do is just count the number of MOUS'es and divide by the total.

Another question is what to do for projects with multiple targets? Just do the same thing for all targets. That might make sense since you usually want the same spectral setup for all your sources. But you might have a different mosaic size. Can I take largest? or average? That might be easiest.


## Put together WSU mous data base <a id="wsu_db"></a>

In [59]:
import wsu_db

In [60]:
reload(wsu_db)

<module 'wsu_db' from '/Users/akepley/Dropbox/Support/naasc/WSU/big_cubes/wsu_db.py'>

In [61]:
result = wsu_db.create_database(cycle7tab)

In [62]:
result_c8 = wsu_db.create_database(cycle8tab)

In [63]:
# save data bases if what's desired
result.write('data/cycle7wsu_20221003.fits',overwrite=True)
result_c8.write('data/cycle8wsu_20221003.fits',overwrite=True)

In [64]:
result.columns

<TableColumns names=('mous','proposal_id','array','nant_typical','nant_array','nant_all','band','ntarget','target_name','s_fov','s_resolution','mosaic','imsize','pb','cell','npol','velocity_resolution_current','wsu_freq','wsu_bandwidth_initial','wsu_bandwidth_final','wsu_bandwidth_spw','wsu_specwidth_finest','wsu_chanavg_finest','wsu_specwidth_stepped','wsu_chanavg_stepped','wsu_specwidth_stepped2','wsu_chanavg_stepped2','tint','nbase_typical','nbase_array','nbase_all','wsu_nchan_final_finest','wsu_nchan_final_stepped','wsu_nchan_final_stepped2','wsu_nchan_spw_finest','wsu_nchan_spw_stepped','wsu_nchan_spw_stepped2','wsu_nchan_initial_finest','wsu_nchan_initial_stepped','wsu_nchan_initial_stepped2','vis_rate_typical_initial_finest','vis_rate_typical_initial_stepped','vis_rate_typical_initial_stepped2','vis_rate_array_initial_finest','vis_rate_array_initial_stepped','vis_rate_array_initial_stepped2','vis_rate_all_initial_finest','vis_rate_all_initial_stepped','vis_rate_all_initial_stepp

In [65]:
len(result)

11519

## Adding in calibration TOS information

This is needed to get the total number of visibilities and the data volume. Also necessary to start refine the data rates.

In [22]:
import large_cubes
from importlib import reload

In [83]:
reload(large_cubes)
tos_db = large_cubes.calc_time_on_source('data/project_mous_band_array_eb_size___source_intent_inttime')

Intent not recognized: BANDPASS DIFFGAIN FLUX PHASE WVR
Intent not recognized: BANDPASS DIFFGAIN FLUX PHASE WVR
Intent not recognized: BANDPASS DIFFGAIN FLUX PHASE WVR
Intent not recognized: DIFFGAIN PHASE WVR
Intent not recognized: DIFFGAIN PHASE WVR
Intent not recognized: DIFFGAIN PHASE WVR
Intent not recognized: DIFFGAIN PHASE WVR
Intent not recognized: DIFFGAIN PHASE WVR
Intent not recognized: BANDPASS DIFFGAIN FLUX PHASE WVR
Intent not recognized: BANDPASS DIFFGAIN FLUX PHASE WVR
Intent not recognized: BANDPASS PHASE WVR
Intent not recognized: BANDPASS PHASE WVR
project_id list greater than 1. This shouldn't happen. MOUS: uid://A002/X445835/X6
made it to table creation


In [84]:
tos_db

proposal_id,mous,band,array,bp_time_s,flux_time_s,phase_time_s,pol_time_s,check_time_s,target_time_s,target_name,target_time_tot_s,ntarget
str14,str22,float64,str6,float64,float64,float64,float64,float64,float64,str35,float64,float64
2019.1.01326.S,uid://A001/X1465/X1002,3.0,7m,1209.6,0.0,362.88,0.0,0.0,60.48,Position_8,60.48,1.0
2019.1.01326.S,uid://A001/X1465/X1008,3.0,7m,604.8,0.0,120.96,0.0,0.0,10.08,Position_2,10.08,1.0
2019.1.01326.S,uid://A001/X1465/X100e,3.0,7m,604.8,0.0,120.96,0.0,0.0,10.08,Position_3,10.08,1.0
2019.1.01326.S,uid://A001/X1465/X1014,3.0,7m,604.8,0.0,120.96,0.0,0.0,10.08,Position_4,10.08,1.0
2019.1.01326.S,uid://A001/X1465/X101a,3.0,7m,604.8,0.0,120.96,0.0,0.0,10.08,Position_5,10.08,1.0
2019.1.01326.S,uid://A001/X1465/X1020,3.0,7m,604.8,0.0,120.96,0.0,0.0,10.08,Position_6,10.08,1.0
2019.1.01326.S,uid://A001/X1465/X1026,3.0,7m,604.8,0.0,120.96,0.0,0.0,10.08,Position_7,10.08,1.0
2019.1.01326.S,uid://A001/X1465/X102c,3.0,7m,604.8,0.0,120.96,0.0,0.0,10.08,Position_7,10.08,1.0
2019.1.01326.S,uid://A001/X1465/X1032,6.0,7m,604.8,0.0,423.36,0.0,0.0,20.16,Position_1-R,20.16,1.0
2019.1.01326.S,uid://A001/X1465/X1038,6.0,7m,604.8,0.0,423.36,0.0,0.0,20.16,Position_1-L,20.16,1.0


In [111]:
len(tos_db)

22430

In [113]:
tos_db.write('data/tos_db.ecsv',overwrite=True)

In [112]:
len(tos_db)

22430

In [114]:
tos_db.columns

<TableColumns names=('proposal_id','mous','band','array','bp_time_s','flux_time_s','phase_time_s','pol_time_s','check_time_s','target_time_s','target_name','target_time_tot_s','ntarget')>

In [120]:
reload(wsu_db)

<module 'wsu_db' from '/Users/akepley/Dropbox/Support/naasc/WSU/big_cubes/wsu_db.py'>

In [121]:
result_tos = wsu_db.add_tos_to_db(result,tos_db)

In [122]:
result_c8_tos = wsu_db.add_tos_to_db(result_c8,tos_db)

In [123]:
result_tos.write('data/result_tos.ecsv')
result_c8_tos.write('data/result_c8_tos.ecsv')

In [124]:
result_tos.columns

<TableColumns names=('mous','proposal_id','array','nant_typical','nant_array','nant_all','band','ntarget','target_name','s_fov','s_resolution','mosaic','imsize','pb','cell','npol','velocity_resolution_current','wsu_freq','wsu_bandwidth_initial','wsu_bandwidth_final','wsu_bandwidth_spw','wsu_specwidth_finest','wsu_chanavg_finest','wsu_specwidth_stepped','wsu_chanavg_stepped','wsu_specwidth_stepped2','wsu_chanavg_stepped2','tint','nbase_typical','nbase_array','nbase_all','wsu_nchan_final_finest','wsu_nchan_final_stepped','wsu_nchan_final_stepped2','wsu_nchan_spw_finest','wsu_nchan_spw_stepped','wsu_nchan_spw_stepped2','wsu_nchan_initial_finest','wsu_nchan_initial_stepped','wsu_nchan_initial_stepped2','vis_rate_typical_initial_finest','vis_rate_typical_initial_stepped','vis_rate_typical_initial_stepped2','vis_rate_array_initial_finest','vis_rate_array_initial_stepped','vis_rate_array_initial_stepped2','vis_rate_all_initial_finest','vis_rate_all_initial_stepped','vis_rate_all_initial_stepp

## Mosaic imsize investigation

In [92]:
idx = (result['mosaic'] == 'T') & (result['imsize'] >5800)
result['mous','imsize','cell','s_fov','s_resolution','wsu_freq','pb','mosaic'][idx]

mous,imsize,cell,s_fov,s_resolution,wsu_freq,pb,mosaic
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,deg,arcsec,GHz,Unnamed: 6_level_1,Unnamed: 7_level_1
str32,float64,float64,float64,float64,float64,float64,str32
uid://A001/X1471/X313,8092.5,0.006952670050637,0.0105627840853767,0.0347633502531854,223.75965364178407,26.03952023758145,T
uid://A001/X1471/X317,7715.0,0.007193344668357,0.0104404556643089,0.0359667233417854,227.56311759987636,25.5974979187608,T
uid://A001/X1471/X31b,5870.0,0.0063405378016522,0.0069944225676824,0.0317026890082614,338.9565787793444,17.175818907103793,T


In [None]:
0.01056278408537675 *3600.0  

Image pre-check values for  2019.1.00796.S, uid://A001/X1471/X317	

* beam = 0.0457 x 0.0404 arcsec
* cell = 0.0081 x 0.0081 arcsec

Unmitigated imsize calculated in pipeline for X317 is 7776, 7776 according to SCG tests

eye balling the spatial set up it looks like there's 10-12arcsec between pointings and the plot says the primary beam is 26.0arcsec

The pipeline math is   

npts <= 3
* nxpix = int((1.65 * beam_radius_v + xspread) / cellx_v)

npts >3
* nxpix = int((1.5 * beam_radius_v + xspread) / cellx_v)

We only have two pointings here.



In [None]:
(26.0 + 10.0)

In [None]:
(1.65 * 26.0 + 10.0)/0.0081

So my estimate is a little on the low end, but not crazy

In [None]:
(0.01044*3600+25.6*0.70)/0.0072

## Imsize investigation

Something is  odd with my image sizes. I'm using 2019.1.01463.S uid://A001/X1465/Xc05 as my poster child

For the unmitigated imaging done by the pipeline, the pipeline calculates the following values:
* beam: 0.0322" x 0.0211"
* cell: 0.0042" x 0.0042"
* imsize: [11250, 11250] pixels
* FOV: 47.25 arcsec

Now let's look at what I get from my calculations

In [None]:
#2019.1.01463.S
idx =result['mous'] == 'uid://A001/X1465/Xc05'
result['mous','s_fov','s_resolution','imsize','wsu_nchan_final_stepped','wsu_nchan_final_finest','mosaic'][idx]

In [None]:
np.log10(237037.03703703705)

In [None]:
np.log10(32921.81069958848)

In [None]:
# the imsize is
0.007157768473981626*3600.00 # arcsec

In [None]:
# What's the estimated imsize at this frequency??
# frequency
freq = 218.821 #GHz
19.4*300/218.821

This is comparable to the imsize calculated above.

In [None]:
# What pixel size does this imply for five pixels per beam?
0.024588/5.0

In [None]:
# What pixel size does this imply for six pixels per beam?
0.024588/6.0

What happens if I use the points_per_fov value??

In [None]:
idx2 = cycle7tab['member_ous_uid'] == 'uid://A001/X1465/Xc05'
cycle7tab['proposal_id','member_ous_uid','s_fov','s_resolution','points_per_fov','spw_nchan','is_mosaic'][idx2]

In [None]:
# imsize from points per fov value
np.sqrt(1100957.4775723005)*5.0

Matches imsize above.

So it looks like the FOV is the difference:

In [None]:
(47.25/25.76)*5250

Still an underestimate, but closer.

The pipeline calculates the primary beam as

primary_beam_size = \
            1.22 \
            * cqa.getvalue(cqa.convert(cqa.constants('c'), 'm/s')) \
            / ref_frequency \
            / smallest_diameter \
            * (180.0 * 3600.0 / math.pi)

In [None]:
1.22 * ((const.c.value /  218.821e9) / (12.0) )*(180*3600.0/math.pi)

Pipeline calculation is here:

beam_radius_v = primary_beam

beam_fwhp = 1.12 / 1.22 * beam_radius_v

nxpix = int(utils.round_half_up(1.1 * beam_fwhp * math.sqrt(-math.log(sfpblimit) / math.log(2.)) / cellx_v))

In [None]:
(1.12/1.22)* 28.73

In [None]:
1.1 * 26.38 * math.sqrt(-math.log(0.2) / math.log(2.0))

Okay. This is the value I get above. 

What's the constant??

In [None]:
1.1* (1.12/1.22)*math.sqrt(-math.log(0.2) / math.log(2.0))

In [None]:
1.54*25.8

In [None]:
40.0/0.0040