In [1]:
import pprint
import datetime
import pyaurorax

aurorax = pyaurorax.PyAuroraX()

# Downloading data

PyAuroraX allows you to download data for a given dataset, time frame, and optionally the site. A progress bar is shown by default, and it can be disabled or modified using the optional parameters. The output path of the downloaded data can be modified when you initialize the `pyaurorax.PyAuroraX()` object. We show an example of this near the bottom of this crib sheet.

To figure out the dataset name that we want to download data for, we can use the `aurorax.data.list_datasets()` function, or navigate to the [Dataset Descriptions](https://data.phys.ucalgary.ca/about_datasets) page and dive into a particular instrument array page.

Below we are going to download an hour of THEMIS ASI data from the Athabasca, AB, imager. We will use the `THEMIS_ASI_RAW` dataset name, and the `start`, `end`, and `site_uid` parameters to filter further.

In [2]:
# download an hour of THEMIS ASI data from Athabasca
dataset_name = "THEMIS_ASI_RAW"
start_dt = datetime.datetime(2021, 11, 4, 9, 0)
end_dt = datetime.datetime(2021, 11, 4, 9, 59)
site_uid = "atha"
r = aurorax.data.ucalgary.download(dataset_name, start_dt, end_dt, site_uid=site_uid)

Downloading THEMIS_ASI_RAW files:   0%|          | 0.00/128M [00:00<?, ?B/s]

In [3]:
# view information about the downloaded data
r.pretty_print()

FileListingResponse:
  count             : 60
  dataset           : Dataset(name=THEMIS_ASI_RAW, short_description='THEMIS All Sky Imagers 3-sec raw data', provider='UCalgary', level='L0', doi_details='None', ...)
  filenames         : [60 filenames]
  output_root_path  : /home/darrenc/pyaurorax_data/THEMIS_ASI_RAW
  total_bytes       : 0


In [4]:
# an example of downloading several minutes of data from all
# THEMIS ASI sites (no site_uid filtering)
dataset_name = "THEMIS_ASI_RAW"
start_dt = datetime.datetime(2021, 11, 4, 9, 0)
end_dt = datetime.datetime(2021, 11, 4, 9, 4)
_ = aurorax.data.ucalgary.download(dataset_name, start_dt, end_dt)

Downloading THEMIS_ASI_RAW files:   0%|          | 0.00/136M [00:00<?, ?B/s]

## Changing the download location

To change where data is downloaded to, you can adjust an attribute in the PyAuroraX() class that was initialized at the beginning of the code.

Note that the below code is commented out on purpose here since we just want to show how to do this, and not actually do it.

In [5]:
# NOTE: the path you set can be a regular string path (nice for Linux and Mac)
# or a pathlib Path() object (nice for Windows).

#------------------
# aurorax.download_output_root_path = "some path"
#
# import pathlib
# aurorax.download_output_root_path = Path("some path")


## Customizing the progress bar

You also have control over the progress bar a bit, where certain methods have additional progress bar parameters to help you customize them as you'd like.

For the `download()` method, the following are available to you:

- `progress_bar_disable`: Disable the progress bar,
- `progress_bar_ncols`: Set the width of the progress bar,
- `progress_bar_ascii`: Change the ASCII character used in the progress bar,
- `progress_bar_desc`: Change the description prefix for the progress bar,

The `progress_bar_*` parameters can be used to enable/disable/adjust the progress bar. Excluding the `progress_bar_disable` parameter, all others are straight pass-throughs to the tqdm progress bar function. The `progress_bar_ncols` parameter allows for adjusting the width. The `progress_bar_ascii` parameter allows for adjusting the appearance of the progress bar. And the `progress_bar_desc` parameter allows for adjusting the description at the beginning of the progress bar. Further details can be found on the [tqdm documentation](https://tqdm.github.io/docs/tqdm/#tqdm-objects).

You can also change the progress bar style in a more global manner, using the `aurorax.progress_bar_backend` parameter.

Note that the below code is commented out on purpose here since we just want to show how to do this, and not actually do it.

In [6]:
# disable the progress bar in a download() call
# -----------------------------------------------
# r = aurorax.data.ucalgary.download(dataset_name, start_dt, end_dt, progress_bar_disable=True)

# globally set the progress bar style
# --------------------------------------
# aurorax.progress_bar_backend = "standard"
# aurorax.progress_bar_backend = "notebook"
# aurorax.progress_bar_backend = "auto"  # the default


# NOTE: Just a heads up, if you're working in Spyder, the tqdm progress bar PyAuroraX uses doesn't 
# get detected properly. So setting the progress bar to 'standard' is recommended in this circumstance.

# Read data

Downloading data is only one part of the process. To allow you to not have to repeatedly download data, the `download()` and `read()` functions are split into two processes. 

The data reading routines are simple at the core. They take in a list of filenames on your computer, read in those files, and return the results back as an object. Be sure to pass in only one type of data at a time, otherwise the read routine will get rather confused!

The advantage of this is that the read function just needs filenames. You can download data to any storage medium, and manually leverage `glob` like functions to get filenames. This can be beneficial if you don't have an internet connection at the time, but have already downloaded data. Or, you can simply run the `download()` function repeatedly; it will not re-download data if you already have it, unless the `overwrite` parameter is enabled.

There are two methods that can be used for reading data:

1) using the generic method
2) using a specific dataset read function

The generic method is the recommended way as it is simpler. However, if more control is wanted then you can use the specific read functions directly. The generic method simply uses the dataset name to determine which specific read function to use.


In [7]:
# we will show the generic method first, since it is the easiest way
#
# NOTE: we are reading the 1hr of data we downloaded earlier on, using 2
# parallel processes to improve performance
data = aurorax.data.ucalgary.read(r.dataset, r.filenames, n_parallel=2)

print(data)
print()
data.pretty_print()

Data(data=array([[[2540, 2602, 2635, ..., 2562, 2646, 2579],
        [2503, 2533, 2604, ..., 2556, 2622, 2519],
        [2600, 2537, 2600, ..., 2596, 2580, 2520],
        ...,
        [2557, 2567, 2589, ..., 2561, 2622, 2614],
        [2542, 2575, 2536, ..., 2502, 2540, 2597],
        [2589, 2550, 2568, ..., 2600, 2601, 2562]],

       [[2544, 2526, 2521, ..., 2574, 2569, 2545],
        [2584, 2566, 2662, ..., 2608, 2671, 2562],
        [2601, 2593, 2592, ..., 2591, 2562, 2531],
        ...,
        [2529, 2618, 2596, ..., 2575, 2624, 2680],
        [2574, 2504, 2624, ..., 2598, 2512, 2554],
        [2612, 2574, 2535, ..., 2548, 2532, 2532]],

       [[2572, 2568, 2552, ..., 2582, 2594, 2523],
        [2550, 2549, 2525, ..., 2558, 2612, 2528],
        [2519, 2591, 2555, ..., 2495, 2509, 2617],
        ...,
        [2590, 2521, 2587, ..., 2630, 2565, 2606],
        [2551, 2564, 2508, ..., 2505, 2561, 2528],
        [2611, 2537, 2574, ..., 2551, 2648, 2611]],

       ...,

       [[2564,

In [None]:
# Since we know we're reading in THEMIS raw data, we can also use the
# specific read routine. Use these specific read functions if you want
# more control than the simpler read() function.
data = aurorax.data.ucalgary.readers.read_themis(r.filenames, n_parallel=2, dataset=r.dataset)

print(data)
print()
data.pretty_print()

Data(data=array([[[2540, 2602, 2635, ..., 2562, 2646, 2579],
        [2503, 2533, 2604, ..., 2556, 2622, 2519],
        [2600, 2537, 2600, ..., 2596, 2580, 2520],
        ...,
        [2557, 2567, 2589, ..., 2561, 2622, 2614],
        [2542, 2575, 2536, ..., 2502, 2540, 2597],
        [2589, 2550, 2568, ..., 2600, 2601, 2562]],

       [[2544, 2526, 2521, ..., 2574, 2569, 2545],
        [2584, 2566, 2662, ..., 2608, 2671, 2562],
        [2601, 2593, 2592, ..., 2591, 2562, 2531],
        ...,
        [2529, 2618, 2596, ..., 2575, 2624, 2680],
        [2574, 2504, 2624, ..., 2598, 2512, 2554],
        [2612, 2574, 2535, ..., 2548, 2532, 2532]],

       [[2572, 2568, 2552, ..., 2582, 2594, 2523],
        [2550, 2549, 2525, ..., 2558, 2612, 2528],
        [2519, 2591, 2555, ..., 2495, 2509, 2617],
        ...,
        [2590, 2521, 2587, ..., 2630, 2565, 2606],
        [2551, 2564, 2508, ..., 2505, 2561, 2528],
        [2611, 2537, 2574, ..., 2551, 2648, 2611]],

       ...,

       [[2564,

# Managing downloaded data

Managing data is hard! For the All-sky Imager (ASI) data, the biggest concern to keep in mind is the available storage. ASI data is not small...THEMIS ASI alone is 120TB as of December 2024!

To help with this, we have some utility functions at your fingertips. The `show_data_usage()` function can help show you how much data is on your computer in PyAuroraX's download output root path. Then `purge_download_output_root_path()` can delete all the data in that directory.

In [9]:
# to view the amount of data that is currently downloaded, do the following
aurorax.show_data_usage()

Dataset name               Size    
TREX_SPECT_PROCESSED_V1    530.8 MB
TREX_RGB_RAW_NOMINAL       479.9 MB
THEMIS_ASI_RAW             263.3 MB
TREX_RGB_SKYMAP_IDLSAV     42.4 MB 
THEMIS_ASI_SKYMAP_IDLSAV   17.7 MB 
TREX_SPECT_SKYMAP_IDLSAV   11.3 kB 

Total size: 1.3 GB


In [10]:
# alternatively, you can get the data usage information returned as a dictionary
data_usage_dict = aurorax.show_data_usage(return_dict=True)
pprint.pprint(data_usage_dict)

{'THEMIS_ASI_RAW': {'path_obj': PosixPath('/home/darrenc/pyaurorax_data/THEMIS_ASI_RAW'),
                    'size_bytes': 263332336,
                    'size_str': '263.3 MB'},
 'THEMIS_ASI_SKYMAP_IDLSAV': {'path_obj': PosixPath('/home/darrenc/pyaurorax_data/THEMIS_ASI_SKYMAP_IDLSAV'),
                              'size_bytes': 17696892,
                              'size_str': '17.7 MB'},
 'TREX_RGB_RAW_NOMINAL': {'path_obj': PosixPath('/home/darrenc/pyaurorax_data/TREX_RGB_RAW_NOMINAL'),
                          'size_bytes': 479942066,
                          'size_str': '479.9 MB'},
 'TREX_RGB_SKYMAP_IDLSAV': {'path_obj': PosixPath('/home/darrenc/pyaurorax_data/TREX_RGB_SKYMAP_IDLSAV'),
                            'size_bytes': 42440484,
                            'size_str': '42.4 MB'},
 'TREX_SPECT_PROCESSED_V1': {'path_obj': PosixPath('/home/darrenc/pyaurorax_data/TREX_SPECT_PROCESSED_V1'),
                             'size_bytes': 530808113,
                          

In [11]:
# to clean up all data we've downloaded, you can delete
# the data using a helper function, or manually delete
# it yourself
#
# delete all data
# aurorax.purge_download_output_root_path()

# delete data for single specific dataset
# aurorax.purge_download_output_root_path(dataset_name="THEMIS_ASI_RAW")


Note the above function calls are commented out on purpose. Uncomment as needed.