Skip to content

Commit

Permalink
Update data-access.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
lgarrison committed Sep 17, 2021
1 parent 148df82 commit fe3a8fe
Showing 1 changed file with 12 additions and 6 deletions.
18 changes: 12 additions & 6 deletions docs/data-access.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ We are pleased to be able to offer two online portals to access AbacusSummit dat

Constellation, as a tape-backed portal, is appropriate for bulk transfers between supercomputer centers. NERSC, as a disk-backed portal, and is appropriate for fetching narrow subsets of the data. The Constellation portal also contains the HighZ and ScaleFree simulations.

Full Data on Tape
~~~~~~~~~~~~~~~~~
OLCF Constellation: Full Data on Tape
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Oak Ridge Leadership Computing Facility's `Constellation <https://www.olcf.ornl.gov/olcf-resources/rd-project/constellation-doi-framework-and-portal/>`_ portal hosts the full 2 PB AbacusSummit data set on the magnetic-tape-backed `High Performance Storage System (HPSS) <https://www.olcf.ornl.gov/olcf-resources/data-visualization-resources/hpss/>`_. HPSS offers high throughput, but high access latency. To amortize the latency, we aggregate the simulation files with coarse granularity, such that in most cases one must download many TB of data. For example, the halo catalogs for each simulation are in a single tarball (per simulation), which is 6.6 TB for a ``base`` simulation.

The primary DOI of AbacusSummit is ``10.13139/OLCF/1811689``. This is a persistent identifer to the access information at the following URL, from where the AbacusSummit data may be browsed and downloaded via Globus: https://doi.ccs.ornl.gov/ui/doi/355
Expand All @@ -31,9 +31,13 @@ Note that it can take many hours before a transfer from Constellation begins if

The availability of Constellation depends on the status of HPSS, which undergoes regular downtime for maintenance. If the data is inaccessible, please check the status of HPSS on the following page: https://www.olcf.ornl.gov/for-users/center-status/

Subset of Data on Disk
~~~~~~~~~~~~~~~~~~~~~~
NERSC's `Community File System <https://docs.nersc.gov/filesystems/community/>`_ (CFS) hosts a 750 TB subset of the most important AbacusSummit data products (includes most products except for the 7% "B" particle subsample and the 100% time slice outputs). We will shortly be able to provide a Globus portal to this data.
NERSC: Subset of Data on Disk
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NERSC's `Community File System <https://docs.nersc.gov/filesystems/community/>`_ (CFS) hosts a 750 TB subset of the most important AbacusSummit data products (includes most products except for the 7% "B" particle subsample and the 100% time slice outputs). The portal to this data is here: https://abacussummit-portal.nersc.gov/

Using that portal, you can select the desired subset of simulations, data products, and redshifts, and initiate the transfer via Globus. See :ref:`Using Globus`.

Some data products (initial conditions, merger trees) are not yet exposed via the web interface of this portal, but they can still be manually accessed by browsing the directory tree via Globus.

The availability of the NERSC portal depends on the availability of CFS and the DTNs (data transfer nodes). If the data is inaccessible, please check the CFS and DTN status on the following page: https://www.nersc.gov/live-status/motd/

Expand All @@ -45,7 +49,9 @@ Note that most university and large computing centers have Globus endpoints alre

What data are available?
------------------------
The :doc:`data-products` page documents the data products.
The :doc:`data-products` page documents the data products. All products are available at the Constellation portal (including ScaleFree and HighZ), and most products except for the 7% "B" particle subsample and the 100% time slice outputs are available at the NERSC portal.

Some data products (initial conditions, merger trees) are not yet exposed via the web interface of the NERSC portal, but they can still be manually accessed by browsing the directory tree via Globus.

Note that you will almost certainly need to use the utilities at
https://abacusutils.readthedocs.io/
Expand Down

0 comments on commit fe3a8fe

Please sign in to comment.