# Cached download with pansat

This notebook provides an example of how to realize a cached download system for GOES satellite observations using pansat.

In [1]:
%load_ext autoreload
%autoreload 2

from pathlib import Path
from pansat.download.providers.goes_aws import GOESAWSProvider
from pansat.products.satellite.goes import goes_16_l1b_radiances_c01_conus, goes_16_l1b_radiances_c02_conus


PRODUCTS = [goes_16_l1b_radiances_c01_conus,
            goes_16_l1b_radiances_c02_conus]

def download_cached(start_time,
                    end_time,
                    no_cache=False):
    """
    Download all files from the GOES satellite products in PRODUCTS in a given
    time range but avoid redownloading files that are already present.
    
    Args:
        start_time: datetime object specifying the start of the time range
            for which to download files.
        end_time: datetime object specifying the end of the time range.
        no_cache: If this is set to True, it forces a re-download of files even
             if they are already present.
        
    Returns:
        List of pathlib.Path object pointing to the available data files
        in the request time range.
    """
    global CACHE
    
    files = []
    for p in PRODUCTS:
        
        dest = Path(p.default_destination)
        dest.mkdir(parents=True, exist_ok=True)
        
        provider = GOESAWSProvider(p)
        filenames = provider.get_files_in_range(start_time, end_time)
        for f in filenames:
            print(f)
            path = dest / f
            if not path.exists():
                print("Downloading")
                data = provider.download_file(f, path)
            files.append(path)
    return files
    

# Example

The example below shows the benefits of using a cache by two files for an identical time
range and compares the time it takes in both cases.

In [2]:
from datetime import datetime
t_0 = datetime(2021, 2, 11, 10, 0)    
t_1 = datetime(2021, 2, 11, 10, 5)    

In [3]:
%time files = download_cached(t_0, t_1)

CPU times: user 1.24 s, sys: 30.5 ms, total: 1.27 s
Wall time: 3.62 s


In [4]:
%time files = download_cached(t_0, t_1)

CPU times: user 55.4 ms, sys: 137 µs, total: 55.5 ms
Wall time: 55.2 ms
