# Estimation of compute needs for Orca-Sound MSDS 2026

November 24, 2025

## Estimate size of Orca-Sound S3 buckets

Attempted to get bucket sizes with aws CLI and console but the bucket size is so big I am unable to compute the size. The following is an estimate of the size for each hydrophone location.

I am estimating the size by finding the duration the hydrphones have been online and the bit rate of the .ts files, the average bytes per second.

In [4]:
import time

First online times taken from inspecting https://open.quiltdata.com/b/audio-orcasound-net/tree/

In [7]:
bush_point = time.strptime("2018-10", "%Y-%m")
andrews_bay = time.strptime("2025-11", "%Y-%m")
mast_center = time.strptime("2023-08", "%Y-%m")
north_sjc = time.strptime("2023-06", "%Y-%m")
orcasound_lab = time.strptime("2018-10", "%Y-%m")
point_robertson = time.strptime("2023-09", "%Y-%m")
port_townsend = time.strptime("2019-09", "%Y-%m")


Duration calculation for each hydrophone

In [8]:
total_bush_point = (time.time() - time.mktime(bush_point))
total_andrews_bay = (time.time() - time.mktime(andrews_bay))
total_mast_center = (time.time() - time.mktime(mast_center))
total_north_sjc = (time.time() - time.mktime(north_sjc))
total_orcasound_lab = (time.time() - time.mktime(orcasound_lab))
total_point_robertson = (time.time() - time.mktime(point_robertson))
total_port_townsend = (time.time() - time.mktime(port_townsend))

byte per second estimated from typical .ts file size, about 180 kBytes for 10 second file length

In [16]:
byte_rate = 18000 # bytes per second

Bucket size for Bush Point hydrophone with 7 years of data, similar size for Port Townsend and Orca Sound lab

In [17]:
byte_rate * total_bush_point / 100000000000  # Terabytes

40.61840017691844

In [18]:
bush_point_size = byte_rate * total_bush_point # bytes

## Measure RAM used for processing small subset of data

In [1]:
from orcasound_noise.pipeline.pipeline import NoiseAnalysisPipeline
from orcasound_noise.utils import Hydrophone
import datetime as dt
import pandas as pd

  from pkg_resources import resource_filename


setup of Noise Analysis Pipeline

In [2]:
if __name__ == '__main__':
    pipeline = NoiseAnalysisPipeline(Hydrophone.PORT_TOWNSEND,
                                     delta_f=1, bands=None,
                                     delta_t=60, mode='safe')



Measure RAM usage for generating parquet file from .ts files for 10 minute subset of data

In [5]:
import psutil
import os

# Get the current process
process = psutil.Process(os.getpid())

# Get initial memory usage (Resident Set Size in MB)
initial_memory_mb = process.memory_info().rss / (1024 * 1024)
print(f"Initial RAM usage: {initial_memory_mb:.2f} MB")

start_time = time.perf_counter()

psd_path, broadband_path = pipeline.generate_parquet_file(dt.datetime(2023, 3, 22, 12, 0), 
                                                          dt.datetime(2023, 3, 22, 12, 10), 
                                                          upload_to_s3=False)

end_time = time.perf_counter()

# Get final memory usage
final_memory_mb = process.memory_info().rss / (1024 * 1024)
print(f"Final RAM usage: {final_memory_mb:.2f} MB")

# Calculate the difference
memory_increase_mb = final_memory_mb - initial_memory_mb
print(f"RAM usage increase in this cell: {memory_increase_mb:.2f} MB")

# Calculate elapsed time
elapsed_time = end_time - start_time
print(f"Elapsed time: {elapsed_time:.2f} seconds")


Initial RAM usage: 324.39 MB
Found 7541 folders in all for hydrophone
0
1763971220
1679511600
Found 1 folders in date range
Downloading live178.ts


live178.ts: 188kB [00:00, 852kB/s]                             


Downloading live179.ts


live179.ts: 188kB [00:00, 619kB/s]                             


Downloading live180.ts


live180.ts: 188kB [00:00, 566kB/s]                             


Downloading live181.ts


live181.ts: 188kB [00:00, 603kB/s]                             


Downloading live182.ts


live182.ts: 188kB [00:00, 904kB/s]                             


Downloading live183.ts


live183.ts: 188kB [00:00, 1.03MB/s]                            


Downloading live184.ts


live184.ts: 188kB [00:00, 1.09MB/s]                            


Downloading live185.ts


live185.ts: 188kB [00:00, 607kB/s]                             


Downloading live186.ts


live186.ts: 188kB [00:00, 922kB/s]                             


Downloading live187.ts


live187.ts: 188kB [00:00, 1.03MB/s]                            


Downloading live188.ts


live188.ts: 188kB [00:00, 911kB/s]                             


Downloading live189.ts


live189.ts: 188kB [00:00, 931kB/s]                             


Downloading live190.ts


live190.ts: 188kB [00:00, 480kB/s]                             


Downloading live191.ts


live191.ts: 188kB [00:00, 591kB/s]                             


Downloading live192.ts


live192.ts: 188kB [00:00, 503kB/s]                             


Downloading live193.ts


live193.ts: 164kB [00:00, 624kB/s]                             


Downloading live194.ts


live194.ts: 164kB [00:00, 562kB/s]                             


Downloading live195.ts


live195.ts: 156kB [00:00, 523kB/s]                             


Downloading live196.ts


live196.ts: 188kB [00:00, 720kB/s]                             


Downloading live197.ts


live197.ts: 188kB [00:00, 374kB/s]                             


Downloading live198.ts


live198.ts: 188kB [00:00, 642kB/s]                             


Downloading live199.ts


live199.ts: 156kB [00:00, 491kB/s]                             


Downloading live200.ts


live200.ts: 188kB [00:00, 775kB/s]                             


Downloading live201.ts


live201.ts: 164kB [00:00, 533kB/s]                             


Downloading live202.ts


live202.ts: 188kB [00:00, 962kB/s]                             


Downloading live203.ts


live203.ts: 188kB [00:00, 950kB/s]                             


Downloading live204.ts


live204.ts: 188kB [00:00, 708kB/s]                             


Downloading live205.ts


live205.ts: 188kB [00:00, 827kB/s]                             


Downloading live206.ts


live206.ts: 188kB [00:00, 1.07MB/s]                            


Downloading live207.ts


live207.ts: 188kB [00:00, 742kB/s]                             


Downloading live208.ts


live208.ts: 188kB [00:00, 997kB/s]                             


Downloading live209.ts


live209.ts: 188kB [00:00, 1.03MB/s]                            


Downloading live210.ts


live210.ts: 164kB [00:00, 605kB/s]                             


Downloading live211.ts


live211.ts: 188kB [00:00, 996kB/s]                             


Downloading live212.ts


live212.ts: 188kB [00:00, 731kB/s]                             


Downloading live213.ts


live213.ts: 164kB [00:00, 738kB/s]                             


Downloading live214.ts


live214.ts: 188kB [00:00, 920kB/s]                             


Downloading live215.ts


live215.ts: 188kB [00:00, 587kB/s]                             


Downloading live216.ts


live216.ts: 188kB [00:00, 979kB/s]                             


Downloading live217.ts


live217.ts: 188kB [00:00, 903kB/s]                             


Downloading live218.ts


live218.ts: 188kB [00:00, 762kB/s]                             


Downloading live219.ts


live219.ts: 188kB [00:00, 589kB/s]                             


Downloading live220.ts


live220.ts: 188kB [00:00, 576kB/s]                             


Downloading live221.ts


live221.ts: 188kB [00:00, 1.04MB/s]                            


Downloading live222.ts


live222.ts: 188kB [00:00, 1.00MB/s]                            


Downloading live223.ts


live223.ts: 188kB [00:00, 959kB/s]                             


Downloading live224.ts


live224.ts: 188kB [00:00, 880kB/s]                             


Downloading live225.ts


live225.ts: 0.00B [00:00, ?B/s]


Skipping https://s3-us-west-2.amazonaws.com/audio-orcasound-net/rpi_port_townsend/hls/1679509823/live225.ts : error.
Downloading live226.ts


live226.ts: 139kB [00:00, 574kB/s]                             


Downloading live227.ts


live227.ts: 123kB [00:00, 576kB/s]                             


Downloading live228.ts


live228.ts: 156kB [00:00, 475kB/s]                             


Downloading live229.ts


live229.ts: 188kB [00:00, 1.09MB/s]                            


Downloading live230.ts


live230.ts: 164kB [00:00, 955kB/s]                             


Downloading live231.ts


live231.ts: 188kB [00:00, 989kB/s]                             


Downloading live232.ts


live232.ts: 164kB [00:00, 775kB/s]                             


Downloading live233.ts


live233.ts: 188kB [00:00, 883kB/s]                             


Downloading live234.ts


live234.ts: 188kB [00:00, 751kB/s]                             


Downloading live235.ts


live235.ts: 188kB [00:00, 693kB/s]                             


Downloading live236.ts


live236.ts: 188kB [00:00, 870kB/s]                             


Downloading live237.ts


live237.ts: 188kB [00:00, 572kB/s]                             


Downloading live238.ts


live238.ts: 188kB [00:00, 585kB/s]                             


Final RAM usage: 458.02 MB
RAM usage increase in this cell: 133.62 MB
Elapsed time: 61.96 seconds


## Calculate cores needed for completion in given time

from Claude:

`Required parallel units = (Total sequential time) / (Available time Ã— efficiency factor)`

In [19]:
ten_minute_size = 18000 * 10 * 60 # bytes
processing_time_per_byte = elapsed_time / ten_minute_size  # seconds per byte
total_sequencial_time = processing_time_per_byte * bush_point_size  # seconds
print(f"Estimated total sequential processing time for Bush Point: {total_sequencial_time / 3600:.2f} hours")

Estimated total sequential processing time for Bush Point: 6472.79 hours


In [20]:
cores = total_sequencial_time / ( (10*60*60*24) * .8)  # 80% efficiency over 10 days
print(f"Estimated cores needed to process Bush Point in 10 days: {cores:.2f}")

Estimated cores needed to process Bush Point in 10 days: 33.71


Maybe 10 cents per core hour?

In [21]:
cost = cores * (10*24) * 0.10  # Maybe 10 cents per core hour?
print(f"Estimated cost to process Bush Point in 10 days: ${cost:.2f}")

Estimated cost to process Bush Point in 10 days: $809.10
