# Reading On-The-Fly catalogs

e.g. from `https://vizcat.cds.unistra.fr/hats:n=1000000/gaia_edr3/`

This currently does not work because reading the `_common_metadata` file fails. We are able to successfully read the `properties` and `partition_info.csv` files, but also require the schema to properly initialize catalog characteristics.

In this notebook, we look at attempts to read this file.

In [1]:
import hats

hats.__version__

ModuleNotFoundError: No module named 'hats'

In [4]:
import pyarrow as pa
import pyarrow.parquet as pq

pa.__version__

'18.0.0'

In [7]:
from upath import UPath

gaia_path = UPath("https://vizcat.cds.unistra.fr/hats:n=1000000/gaia_edr3/")


hats.read_hats(gaia_path)

<hats.catalog.catalog.Catalog at 0x7a3182623ef0>

## Reading with `pyarrow.parquet`

This uses fsspec under the hood to perform the read operations.

In [8]:
gaia_path = gaia_path / "dataset" / "_common_metadata"

parquet_file = pq.read_metadata(gaia_path.path, filesystem=gaia_path.fs)

ValueError: Cannot seek streaming HTTP file

## Reading with `fsspec`



In [None]:
import io

with gaia_path.open(mode='rb') as f:
    metadata = pq.read_metadata(io.BytesIO(f.read()))
metadata

<pyarrow._parquet.FileMetaData object at 0x7c566e16ff10>
  created_by: CDS at2s-parquet version 0.1.0 (using parquet-rs). Please report any problem to francois-xavier.pineau@astro.unistra.fr.
  num_columns: 115
  num_rows: 0
  num_row_groups: 0
  format_version: 1.0
  serialized_size: 175512

In [None]:
gaia_path.stat()

UPathStatResult(st_mode=32768, st_ino=0, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=0, st_atime=0, st_mtime=0, st_ctime=0, info={'name': 'https://vizcat.cds.unistra.fr/hats:n=1000000/gaia_edr3/dataset/_common_metadata', 'size': None, 'mimetype': 'application/vnd.apache.parquet', 'url': 'https://vizcat.cds.unistra.fr/hats:n=1000000/gaia_edr3/dataset/_common_metadata', 'type': 'file'})