Skip to content

ID search throws TypeError with latest universal_pathlib #1047

@jaladh-singhal

Description

@jaladh-singhal

Bug report
catalog.id_search fails with TypeError:

TypeError                                 Traceback (most recent call last)
Cell In[52], line 1
----> 1 ztf_lcs = ztf_lc_catalog.id_search(values={ztf_lc_idx_column:ztf_oids})
      2 ztf_lcs

File ~/work/irsa-tutorials/irsa-tutorials/.tox/py312-buildhtml/lib/python3.12/site-packages/lsdb/catalog/catalog.py:593, in Catalog.id_search(self, values, index_catalogs, fine)
    590     raise TypeError(f"Catalog index for field `{field}` is not of type `HCIndexCatalog`")
    592 field_indexes = {field_name: _get_index_catalog_for_field(field_name) for field_name in values.keys()}
--> 593 return self.search(IndexSearch(values,field_indexes,fine))

File ~/work/irsa-tutorials/irsa-tutorials/.tox/py312-buildhtml/lib/python3.12/site-packages/lsdb/catalog/catalog.py:607, in Catalog.search(self, search)
    595 def search(self, search: AbstractSearch):
    596     """Find rows by reusable search algorithm.
    597 
    598     Filters partitions in the catalog to those that match some rough criteria.
   (...)    605         A new Catalog containing the points filtered to those matching the search parameters.
    606     """
--> 607     cat = super().search(search)
    608     cat.margin = self.margin.search(search) if self.margin is not None else None
    609     return cat

File ~/work/irsa-tutorials/irsa-tutorials/.tox/py312-buildhtml/lib/python3.12/site-packages/lsdb/catalog/dataset/healpix_dataset.py:632, in HealpixDataset.search(self, search)
    626 if (
    627     self.hc_structure.catalog_info.total_rows > 0
    628     and self.hc_structure.catalog_base_dir is not None
    629     and self.hc_structure.original_schema is not None
    630 ):
    631     return self._reload_with_filter(search)
--> 632 filtered_hc_structure = search.filter_hc_catalog(self.hc_structure)
    633 ddf_partition_map, search_ndf = self._perform_search(filtered_hc_structure, search)
    634 return self._create_updated_dataset(
    635     ddf=search_ndf, ddf_pixel_map=ddf_partition_map, hc_structure=filtered_hc_structure
    636 )

File ~/work/irsa-tutorials/irsa-tutorials/.tox/py312-buildhtml/lib/python3.12/site-packages/lsdb/core/search/abstract_search.py:37, in AbstractSearch.filter_hc_catalog(self, hc_structure)
     35 def filter_hc_catalog(self, hc_structure: HCCatalogTypeVar) -> HCCatalogTypeVar:
     36     """Determine the target partitions for further filtering."""
---> 37     filtered_cat = self.perform_hc_catalog_filter(hc_structure)
     38     if not self.fine:
     39         # If running a coarse search, the coverage of the catalog will match the healpix pixels the
     40         # catalog is filtered to, not the finer filtered moc of the filtered catalog.
     41         filtered_cat.moc = filtered_cat.pixel_tree.to_moc()

File ~/work/irsa-tutorials/irsa-tutorials/.tox/py312-buildhtml/lib/python3.12/site-packages/lsdb/core/search/index_search.py:38, in IndexSearch.perform_hc_catalog_filter(self, hc_structure)
     36 for field_name, field_value in self.values.items():
     37     field_value = field_value if isinstance(field_value, list) else [field_value]
---> 38     pixels_for_field = set(self.index_catalogs[field_name].loc_partitions(field_value))
     39     all_pixels = all_pixels.intersection(pixels_for_field)
     40 return hc_structure.filter_from_pixel_list(list(all_pixels))

File ~/work/irsa-tutorials/irsa-tutorials/.tox/py312-buildhtml/lib/python3.12/site-packages/hats/catalog/index/index_catalog.py:27, in IndexCatalog.loc_partitions(self, ids)
     18 """Find the set of partitions in the primary catalog for the ids provided.
     19 
     20 Args:
   (...)     24     that may contain rows for the id values
     25 """
     26 metadata_file = paths.get_parquet_metadata_pointer(self.catalog_base_dir)
---> 27 dataset = pds.parquet_dataset(metadata_file,filesystem=metadata_file.fs)
     29 # There's a lot happening in a few pyarrow dataset methods:
📖 Built tutorials/parquet-catalog-demos/irsa-hats-with-lsdb.md in 1.82 min.
     30 # We create a simple pyarrow expression that roughly corresponds to a SQL statement like
     31 #   WHERE id_column IN (<ids>)
   (...)     34 # (uint8 and uint64 aren't always friendly between pyarrow and the rest of python),
     35 # and offers easy iteration to create our HealpixPixel list.
     36 filtered = dataset.filter(pc.field(self.catalog_info.indexing_column).isin(ids)).to_table()

File ~/work/irsa-tutorials/irsa-tutorials/.tox/py312-buildhtml/lib/python3.12/site-packages/pyarrow/dataset.py:569, in parquet_dataset(metadata_path, schema, filesystem, format, partitioning, partition_base_dir)
    566 else:
    567     filesystem = _ensure_filesystem(filesystem)
--> 569 metadata_path = filesystem.normalize_path(_stringify_path(metadata_path))
    570 options = ParquetFactoryOptions(
    571     partition_base_dir=partition_base_dir,
    572     partitioning=_ensure_partitioning(partitioning)
    573 )
    575 factory = ParquetDatasetFactory(
    576     metadata_path, filesystem, format, options=options)

File ~/work/irsa-tutorials/irsa-tutorials/.tox/py312-buildhtml/lib/python3.12/site-packages/pyarrow/util.py:151, in _stringify_path(path)
    148 except AttributeError:
    149     pass
--> 151 raise TypeError("not a path-like object")

TypeError: not a path-like object

Caught this in CI at irsa-tutorials (job log)

Environment Information

fsspec                     2025.9.0
hats                       0.6.5
lsdb                       0.6.5
pyarrow                    21.0.0
s3fs                       2025.9.0
universal_pathlib          0.3.0

Solution
Pin universal_pathlib <0.3 until this typing issue is fixed. I did that for our CI and it works fine with v0.2.6 of universal_pathlib (which was in my local environment where this notebook was working)

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, and any applicable data others will need to reproduce the problem.
  • I have included information about my environment, including the version of this package (e.g. lsdb.__version__)
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions