Skip to content

Conversation

@dougbrn
Copy link
Contributor

@dougbrn dougbrn commented Apr 30, 2025

Closes #722
Closes #730

Modifies generate_data with ra/dec & and integer id column
Adds generate_catalog one-liner
Allows use of search_filters like in to_hats to control spatial region of generated data. Currently supports BoxSearch (equivalent to just using ra_range and dec_range) and ConeSearch
Removes generate_parquet_file because it's not really relevant to LSDB

These functions are currently found in the import structure as:

from lsdb.nested import generate_data, generate_catalog
# or
from lsdb.nested.datasets import generate_data, generate_catalog

Is there a better place for these?

@github-actions
Copy link

github-actions bot commented Apr 30, 2025

Before [4ea2e56] After [caff746] Ratio Benchmark (Parameter)
74.1±0.9ms 75.6±0.6ms 1.02 benchmarks.time_kdtree_crossmatch
23.4±0.3ms 23.8±1ms 1.02 benchmarks.time_polygon_search
11.8±0.2ms 11.8±0.2ms 1 benchmarks.time_box_filter_on_partition
6.59±0.02s 6.56±0s 1 benchmarks.time_create_large_catalog
975±5ms 979±5ms 1 benchmarks.time_create_midsize_catalog

Click here to view all benchmarks.

@codecov
Copy link

codecov bot commented Apr 30, 2025

Codecov Report

❌ Patch coverage is 98.46154% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 97.31%. Comparing base (f53cbe7) to head (17bb123).
⚠️ Report is 229 commits behind head on main.

Files with missing lines Patch % Lines
src/lsdb/nested/datasets/generation.py 98.36% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #729      +/-   ##
==========================================
+ Coverage   97.20%   97.31%   +0.11%     
==========================================
  Files          53       53              
  Lines        2253     2312      +59     
==========================================
+ Hits         2190     2250      +60     
+ Misses         63       62       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dougbrn dougbrn changed the title WIP: LSDB.nested catalog generation LSDB.nested catalog generation May 2, 2025
@dougbrn dougbrn marked this pull request as ready for review May 2, 2025 16:57
@dougbrn
Copy link
Contributor Author

dougbrn commented May 2, 2025

One failed test on the windows side that seems unrelated...

@dougbrn dougbrn requested a review from delucchi-cmu May 2, 2025 17:01
@hombit hombit linked an issue May 2, 2025 that may be closed by this pull request
@dougbrn dougbrn requested a review from delucchi-cmu May 5, 2025 16:06
@dougbrn dougbrn merged commit a36ef29 into main May 7, 2025
11 of 12 checks passed
@dougbrn dougbrn deleted the generate_catalog branch May 7, 2025 15:34
@delucchi-cmu delucchi-cmu mentioned this pull request May 13, 2025
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LSDB.from_dataframe loses Nested-Pandas dtypes on load Make LSDB.nested dataset generation functions have radec columns Simple simulated catalog.

4 participants