Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying astroquery/sdss package init file to use most current SDSS DR (as well as associated documentation) #2478

Merged
merged 3 commits into from
Sep 11, 2022

Conversation

jsobeck
Copy link
Contributor

@jsobeck jsobeck commented Aug 2, 2022

Hi, this PR closes #2365. The __init__.py file in the sdss package as well as some of the related documentation (in sdss.rst) has been slightly modified to reflect and use the latest SDSS data release (DR17). Some package tests have been run. Please let me know if I need to do anything further. Thank you.

@pep8speaks
Copy link

pep8speaks commented Aug 2, 2022

Hello @jsobeck! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-09-11 04:20:09 UTC

@codecov
Copy link

codecov bot commented Aug 2, 2022

Codecov Report

Merging #2478 (5f60c09) into main (5c1eefc) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #2478   +/-   ##
=======================================
  Coverage   63.08%   63.08%           
=======================================
  Files         133      133           
  Lines       17313    17313           
=======================================
  Hits        10922    10922           
  Misses       6391     6391           
Impacted Files Coverage Δ
astroquery/sdss/core.py 92.06% <ø> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

docs/sdss/sdss.rst Outdated Show resolved Hide resolved
contains new optical and infrared spectra from both Apache Point
Observatory and Las Campanas Observatory. Previously released
integral-field datacubes and maps, stellar library spectra, as well as
images, are also included in DR17. Users may select alternate DR's.
Copy link
Member

@eerovaher eerovaher Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can a different data release be selected? Documentation is not helpful if it just says that something can be done, but neither describes nor provides examples how.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, first timer. In the core routine of the sdss package, the various query procedures allow for the data release parameter to be specified (with the default DR set initially). This has never been documented properly with examples (though seems straightforward for user specification). The easiest thing would be to remove the phrase now and then, take some time to generate an example. Sound alright?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that changing the data release is not documented is a shortcoming that should be fixed, but I would tend to agree that it would be beyond the scope of a pull request that simply updates the default. But perhaps it would be better to remove the entire section because only removing the last sentence might leave the wrong impression that DR17 is the only option.

Copy link
Contributor Author

@jsobeck jsobeck Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chatted about this with @weaverba137. We believe that it is important to note what the default data release is and what it contains. This omission by astroquery is what lead to the initial issue request as users were confused as to why DR14 was employed by astroquery as opposed to the most recent DR from SDSS (DR17). The language can be massaged accordingly, but users do have some inherent responsibility to understand something about the database/dataset they are querying. If a user is employing SDSS data, there is a high probability that they understand that it has issued multiple DR's over the years.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the language @jsobeck is suggesting should remain, because it opens the door to astroquery eventually getting the functionality to access these other SDSS data sets.

More comments below.

@eerovaher
Copy link
Member

Updating the default data release needs an entry in the change log

I'll give you a few suggestions for any future pull requests you might open.

Write the pull associated issue number in the description of the pull request using one of these phrases so that the issue can be automatically linked to the pull request. Writing the issue number in the title is not helpful.

It would be much better if the pull request title would describe what its purpose is, not how it is implemented. In this case using the title of your first commit as the pull request title would have been much more useful.

Markdown interprets double underscores as bold font specifiers, which is why your pull request description contains init.py instead of __init__.py. You should use backticks to format code or filenames.

@jsobeck jsobeck changed the title PR for Issue #2365: Modifying astroquery/sdss package init file and associated documentation Closes #2365: Modifying astroquery/sdss package init file and associated documentation Aug 2, 2022
@jsobeck
Copy link
Contributor Author

jsobeck commented Aug 2, 2022

Got it, @eerovaher. Thanks for information. I was moving too quickly and should have been more careful/deliberate. Though I looked at the change log and knew it had to be modified, it was not clear to me that I should be the one to change it (as opposed to a "reviewer"; even after reading through, e.g., CONTRIBUTING.rst).

@eerovaher
Copy link
Member

The related issue must be mentioned in the pull request description, i.e. the opening post, not the title. You can see that GitHub has not linked the issue because you have used the correct expression in the wrong place.

Though I looked at the change log and knew it had to be modified, it was not clear to me that I should be the one to change it (as opposed to a "reviewer"; even after reading through, e.g., CONTRIBUTING.rst).

Make sure to mention this over at #2462

@bsipocz
Copy link
Member

bsipocz commented Aug 2, 2022

There is one remote data test that is failing and has to be fixed. Also, please squash the commits down to one. And I agree with @eerovaher that the docs should contain code blocks to be useful, mentioning what the default is doesn't really bring much to the table without showing how to change it after all the default is not hidden, but listed in the signature and is set by a standard astropy config system. I agree that it can be confusing to users, but then the confusion should be fixed by showing of how to use a different release, with an example.

@bsipocz bsipocz added the sdss label Aug 2, 2022
@weaverba137
Copy link
Member

I suspect that the broken remote data test may be fixed in #2477.

I also agree that an example with the data release set explicitly would be useful, and we can use such an example to highlight real changes that have happened to the SDSS data sets over the years. Two examples:

  1. Compare redshifts of a small set of objects over data releases.
  2. The photometric calibration was redone in DR13. This is not necessarily widely known, so one could compare fluxes/magnitudes for a small number of photometric objects between say, DR12 (before recalibration) and DR17.

@bsipocz
Copy link
Member

bsipocz commented Aug 2, 2022

The test is broken due to the change in the default dr, so definitely belongs to this PR. OTOH, I'm OK with you cherry-picking and squashing down the commits from this PR to yours, so Jennifer gets a commit credit but we're not juggling two PRs at the same time for the same module.

@jsobeck jsobeck changed the title Closes #2365: Modifying astroquery/sdss package init file and associated documentation Modifying astroquery/sdss package init file to use most current SDSS DR (as well as associated documentation) #2365 Aug 2, 2022
@jsobeck jsobeck changed the title Modifying astroquery/sdss package init file to use most current SDSS DR (as well as associated documentation) #2365 Modifying astroquery/sdss package init file to use most current SDSS DR (as well as associated documentation) Aug 2, 2022
@weaverba137
Copy link
Member

What about merging #2477 first, then rebasing this PR? I think that may be easier in the long run.

@jsobeck
Copy link
Contributor Author

jsobeck commented Aug 2, 2022

Agree to the suggestion from @weaverba137 to merge #2477 first (though my input is not the most crucial here).

@weaverba137
Copy link
Member

@bsipocz, could you say which specific remote data test is failing? I'd like to verify that it is in fact fixed or at least not also broken in #2477.

@bsipocz
Copy link
Member

bsipocz commented Aug 3, 2022

__________________________________________ TestSDSSRemote.test_large_crossid ___________________________________________

self = <astroquery.sdss.tests.test_sdss_remote.TestSDSSRemote object at 0x14eb85490>
large_results = <SkyCoord (ICRS): (ra, dec) in deg
    [(334.26483  ,  3.0572122 ), (334.35421  ,  2.9292896 ),
     (334.35896  ,  3....  ( 22.726164 , 20.667794  ), ( 22.655787 , 20.861832  ),
     ( 22.714435 , 20.469003  ), ( 22.713696 , 20.939534  )]>

    def test_large_crossid(self, large_results):
        # Regression test for #589
    
        results = sdss.SDSS.query_crossid(large_results)
>       assert len(results) == 894
E       assert 845 == 894
E        +  where 845 = len(<Table length=845>\n obj_id        objID               ra        ...       obj_id1        type \n bytes7        int64   ...557872910172 ... 1237679541360984660 GALAXY\nobj_999 1237679476942701227 22.7136962885291 ... 1237679476942701227 GALAXY)

astroquery/sdss/tests/test_sdss_remote.py:205: AssertionError

@weaverba137
Copy link
Member

Thank you, I will take a look at the details there.

@weaverba137
Copy link
Member

I've been able to reproduce this. The issue is that the input data large_results is not necessarily deterministic even within a given data release, and certainly not deterministic from one data release to another.

Ultimately, what is test_large_crossid testing? If it is simply to ensure that the queries can handle a ~ 1000-item data set, there should be a better way to generate the input data deterministically.

@bsipocz bsipocz added this to the v0.4.7 milestone Sep 11, 2022
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default needs to be mentioned in the changelog. And the test is still failing.

So, I just go ahead, rebased this on top of the recently merged other PR, to be extra sure, and add the missing pieces for a quick merge.

@bsipocz
Copy link
Member

bsipocz commented Sep 11, 2022

Thank you @jsobeck!

@weaverba137
Copy link
Member

@bsipocz, I guess the resolution to the failed test was just to mark DR17 as default?

I'd still like to know the original purpose of test_large_crossid, if that information still exists?

@bsipocz
Copy link
Member

bsipocz commented Sep 13, 2022

Yes, the "fix" was to update to the result of DR17.

As for your second question, it was a regression reported in #589

@bsipocz
Copy link
Member

bsipocz commented Sep 13, 2022

(so I suppose we could change the test to not assert on the actual number, but that it runs and returns results, but I don't think it's a big deal to leave it as is)

@weaverba137
Copy link
Member

OK, thanx. I'll go back and read that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update SDSS to Prominently Feature DR17
5 participants