# 52: Try multithreading for downloads

In [2]:
from herbie.tools import fast_Herbie, fast_Herbie_download, fast_Herbie_xarray
import pandas as pd

In [3]:
# Get the F00-F06 forecasts for each of the runs initialized
# between 00z-06z on January 1, 2022 (a total of 42 Herbie objects)
DATES = pd.date_range("2022-01-01 00:00", "2022-01-01 06:00", freq="1H")
fxx = range(0, 7)

In [4]:
# Create list of Herbie objects for all dates and lead times requested.
HH = fast_Herbie(DATES=DATES, fxx=fxx)
HH

[[HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F00[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F01[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F02[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F03[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F04[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F05[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F06[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F00[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F01[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F02[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F03[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F04[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F05[0m,
 [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F06[0m,
 [HRRR

In [5]:
# Download many GRIB2 files; subset the files for 2-m temperature
d = fast_Herbie_download(DATES=DATES, fxx=fxx, searchString="TMP:2 m")
d

{'passed': [[HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F00[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F01[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F02[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F03[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F04[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F05[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 00:00 UTC F06[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F00[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F01[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F02[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F03[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F04[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 01:00 UTC F05[0m,
  [HRRR] model [sfc] product run at [32m2022-Jan-01 0

In [6]:
# Load many files into xarray; subset for 10-m u and v wind.
ds = fast_Herbie_xarray(DATES=DATES, fxx=fxx, searchString="(?:U|V)GRD:10 m")
ds

---

## Multithreadding for Herbie Downloads: `H.download()`

In this simple test, it is slightly faster to download many GRIB2 files with multthreading

On my windows laptop on home network:

|Files| MultiThreading (5 threads)| Sequential  | Percent Change |
|-----|---------------------------|-------------|----------------|
|  5  | 1 min 14 sec              | 1 min 25 sec|     -12.94%    |
| 15  | 3 min 27 sec              | 5 min  8 sec|     -32.79%    |


On an HPC linux system:

|Files| MultiThreading (5 threads)| Sequential  | Percent Change |
|-----|---------------------------|-------------|----------------|
|  5  | 0 min 17 sec              | 0 min 13 sec|     +30%       |
| 15  | 0 min 32 sec              | 0 min 50 sec|     -36%       |
| 48  | 1 min 21 sec              | 2 min 59 sec|    -54.7%      |

For a small handful of files (<10ish), it seems sequential download is faster.

Probably no performance gained for subsetting, due to the compute done to evaluate the byte range. Possibility would need to use MultiProcessing.

# MultiThreading to create Herbie Objects: `Herbie()`
Does multithreading offer speedup for creating Herbie objects (checking if file exists in an archive)

Yes! Multithreading offers great speedup for creating many Herbie objects


On an HPC linux system:

|Files| MultiThreading (5 threads)| Sequential     | Percent Change |
|-----|---------------------------|----------------|----------------|
| 48  | 0 min 1.67 sec            | 0 min  7.86 sec|    -97.9%       |
| 288 | 0 min 17 sec              | 1 min  26 sec  |    -80.2%       |

In [30]:
%%time
H = [
    Herbie(DATE, model="hrrr", product="sfc", fxx=fxx, verbose=False)
    for DATE in DATES
    for fxx in range(12)
]
len(H)

CPU times: user 4.69 s, sys: 394 ms, total: 5.08 s
Wall time: 1min 26s


288

In [31]:
len(H)

288

In [None]:
pd.to_datetime

In [53]:
%%time

## Use MultThreading to create Herbie objects

with ThreadPoolExecutor(5) as exe:
    futures = [
        exe.submit(Herbie, DATE, model="hrrr", produce="sfc", fxx=fxx, verbose=False)
        for DATE in DATES
        for fxx in range(12)
    ]

    # Return list of Herbie objects in order completed
    # data = [future.result() for future in as_completed(futures)]

    # Return list of Herbie objects in order submitted
    futures, _ = wait(futures)
    data = [future.result() for future in futures]

CPU times: user 4.37 s, sys: 342 ms, total: 4.72 s
Wall time: 16.9 s


In [55]:
len(data), data

(288,
 [[HRRR] model [sfc] product run at [32m2021-Jan-01 08:00 UTC F06[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 23:00 UTC F11[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 18:00 UTC F04[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 12:00 UTC F09[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 14:00 UTC F02[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 21:00 UTC F02[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 04:00 UTC F01[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 08:00 UTC F07[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 18:00 UTC F05[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 00:00 UTC F08[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 12:00 UTC F10[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 02:00 UTC F02[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 05:00 UTC F08[0m,
  [HRRR] model [sfc] product run at [32m2021-Jan-01 14:00