Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Spectra URLs #2214

Merged
merged 13 commits into from
Nov 19, 2021
Merged

Update Spectra URLs #2214

merged 13 commits into from
Nov 19, 2021

Conversation

weaverba137
Copy link
Member

This PR:

Copy link
Contributor

@keflavich keflavich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a remote test for this, perhaps?

@weaverba137
Copy link
Member Author

I'll see what I can do.

@codecov
Copy link

codecov bot commented Nov 16, 2021

Codecov Report

Merging #2214 (6d47ac7) into main (e440f2b) will increase coverage by 0.00%.
The diff coverage is 88.88%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2214   +/-   ##
=======================================
  Coverage   61.80%   61.81%           
=======================================
  Files         129      129           
  Lines       16288    16295    +7     
=======================================
+ Hits        10067    10072    +5     
- Misses       6221     6223    +2     
Impacted Files Coverage Δ
astroquery/sdss/core.py 85.96% <87.50%> (-0.37%) ⬇️
astroquery/sdss/field_names.py 92.85% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e440f2b...6d47ac7. Read the comment docs.

@weaverba137
Copy link
Member Author

I've updated a few of the tests, but I don't have easy access to a Python 3.[789] environment at this moment, so I can't actually run a --remote-data test myself.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a changelog.

Running the remote tests returns errors for DRs: 8, 9, 11
It's great that those errors are recovered, but otoh the newly added test extension don't really test the fix, they are just as failing as before this patch.

However, the patch indeed fixes the example in the issue comment, #1731 (comment). What about adding that one as a regression test?

astroquery/sdss/tests/test_sdss_remote.py Outdated Show resolved Hide resolved
astroquery/sdss/tests/test_sdss.py Outdated Show resolved Hide resolved
@balashev
Copy link

balashev commented Nov 17, 2021

There is still a problem to load spectra from DR16 data release since it needs to add '/lite' of '/full' in the link address which is not supported by previous data releases.

Also, it seems that you do not need to explicitly provide {instrument} within the format of the link since 'sdss' is allowable by server. Otherwise to download dr16 spectra it should be 'eboss' instead of 'boss'!

@weaverba137
Copy link
Member Author

@balashev, this PR does account for the /lite versus /full directories in DR16. And the {instrument}variable is already removed from the URL.

@weaverba137
Copy link
Member Author

@bsipocz, can you show me an example of the failure error messages for DR8, 9? DR11 should not actually be included in the remote data test suite since it does not have any SkyServer support.

@weaverba137
Copy link
Member Author

@bsipocz, actually, I found a minor bug in the test_sdss_spectrum method for remote data. I think a completely new round of testing is in order.

@bsipocz
Copy link
Member

bsipocz commented Nov 17, 2021

@bsipocz, can you show me an example of the failure error messages for DR8, 9? DR11 should not actually be included in the remote data test suite since it does not have any SkyServer support.

It returned those obscured ValueErrors, I'll be able to dig up the actual server responses a bit later and will dump it in a gist.

@balashev
Copy link

@balashev, this PR does account for the /lite versus /full directories in DR16. And the {instrument}variable is already removed from the URL.

It still produce invalid link address for dr16. It seems it should be
linkstr = linkstr.replace('/spectra/', '/spectra/full/')
in line 589 in core.py

@bsipocz
Copy link
Member

bsipocz commented Nov 18, 2021

@weaverba137:
For DR8:

> /Users/bsipocz/munka/devel/astroquery/astroquery/sdss/core.py(863)_parse_result()
    861         if 'error_message' in io.BytesIO(response.content):
    862             raise RemoteServiceError(response.content)
--> 863         arr = np.atleast_1d(np.genfromtxt(io.BytesIO(response.content),
    864                                           names=True, dtype=None,
    865                                           delimiter=',', skip_header=1,

ipdb> response.content
b"ERROR\n\nSQL returned the following error message:\nInvalid column name 'instrument'.\nYour SQL command was:\nEXEC spExecuteSQL 'SELECT DISTINCT p.ra, p.dec, p.objid, p.run, p.rerun, p.camcol, p.field, s.z, s.plate, s.mjd, s.fiberID, s.specobjid, s.run2d, s.instrument FROM PhotoObjAll AS p JOIN SpecObjAll s ON p.objID = s.bestObjID WHERE ((p.ra between 2.02291 and 2.02402) and (p.dec between 14.8393 and 14.8404))  ', 500000,'skyserver.sdss.org','DSA001','172.23.250.206','public'\n"
ipdb> response.url
'http://skyserver.sdss.org/dr8/en/tools/search/x_sql.asp?cmd=SELECT+DISTINCT+p.ra%2C+p.dec%2C+p.objid%2C+p.run%2C+p.rerun%2C+p.camcol%2C+p.field%2C+s.z%2C+s.plate%2C+s.mjd%2C+s.fiberID%2C+s.specobjid%2C+s.run2d%2C+s.instrument+FROM+PhotoObjAll+AS+p++JOIN+SpecObjAll+s+ON+p.objID+%3D+s.bestObjID++WHERE+%28%28p.ra+between+2.02291+and+2.02402%29+and+%28p.dec+between+14.8393+and+14.8404%29%29&format=csv'
ipdb> 

For DR9 it actually passes now with these warnings:

In [10]: sdss.SDSS.query_region(coords, spectro=True, data_release=dr)
WARNING: Field info are not available for this data release [astroquery.sdss.field_names]
WARNING: Field info are not available for this data release [astroquery.sdss.field_names]

And thus as that None is passed in to the next command of get_spectra, it produces the error here:

b"ERROR\n\nSQL returned the following error message:\nIncorrect syntax near 'WHERE'.\nYour SQL command was:\nEXEC spExecuteSQL 'SELECT DISTINCT s.instrument, s.run2d, s.plate, s.mjd, s.fiberID FROM PhotoObjAll AS p JOIN SpecObjAll s ON p.objID = s.bestObjID WHERE   ', 500000,'skyserver.sdss.org','DSA001','172.23.250.206','public'\n"

[edit]: sorry, the test doesn't even get to this second command as it fails on the Table assert in the line above

self = <astroquery.sdss.tests.test_sdss_remote.TestSDSSRemote object at 0x14fabb6a0>, dr = 9

    @pytest.mark.parametrize("dr", dr_list)
    def test_sdss_spectrum(self, dr):
        xid = sdss.SDSS.query_region(self.coords, spectro=True, data_release=dr)
>       assert isinstance(xid, Table)
E       assert False
E        +  where False = isinstance(None, Table)

@bsipocz
Copy link
Member

bsipocz commented Nov 18, 2021

@balashev - It seems to be working for DR16:

In [1]: from astropy.coordinates import SkyCoord

In [2]: from astroquery import sdss

In [3]: coords = SkyCoord('0h8m05.63s +14d50m23.3s')

In [4]: dr=16

In [5]: xid=sdss.SDSS.query_region(coords, spectro=True, data_release=dr)
/Users/bsipocz/munka/devel/astroquery/astroquery/sdss/core.py:863: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
  arr = np.atleast_1d(np.genfromtxt(io.BytesIO(response.content),

In [6]: xid
Out[6]: 
<Table length=1>
       ra              dec               objid         run  rerun camcol field     z      plate  mjd  fiberID     specobjid      run2d instrument
    float64          float64             int64        int64 int64 int64  int64  float64   int64 int64  int64        int64        int64   bytes4  
---------------- ---------------- ------------------- ----- ----- ------ ----- ---------- ----- ----- ------- ------------------ ----- ----------
2.02344596573482 14.8398237551311 1237652943176138868  1739   301      3   315 0.04559058   751 52251     160 845594848269461504    26       SDSS

In [7]: sdss.SDSS.get_spectra(matches=xid, data_release=dr)
Downloading https://data.sdss.org/sas/dr16/sdss/spectro/redux/26/spectra/0751/spec-0751-52251-0160.fits
|========================================================================================================================================================================| 604k/604k (100.00%)         0s
Out[7]: [[<astropy.io.fits.hdu.image.PrimaryHDU object at 0x142b21160>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x142d9eb50>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x142e7bc70>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x142e94640>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x142eaa2b0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x142eb9ee0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x144942b50>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x14495a7c0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x144972430>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x144a060a0>]]

@balashev
Copy link

balashev commented Nov 18, 2021

@bsipocz:
Yes, it is ok, since your spectrum is from the early releases (plate 0751).

And it is not working for dr16 plates, e.g.:

SDSS.get_spectra(plate=9403, mjd=58018, fiberID=485, data_release=16)

@bsipocz bsipocz added this to the v0.4.5 milestone Nov 18, 2021
@Nestak2
Copy link

Nestak2 commented Nov 18, 2021

This branch doesn't resolve for me the issue with the wrong usage of \boss\ instead of \eboss\ in the URLs. This is the code I am running in a google colab notebook, I hope I am downloading the right branch:

!pip install git+https://github.com/weaverba137/astroquery.git

from astropy import coordinates as coords
from astroquery.sdss import SDSS

from astroquery.sdss import SDSS
query = "select top 10                        z, ra, dec, bestObjID                      from                        specObj                      where                        class = 'galaxy'                        and programname='eboss'"
res = SDSS.query_sql(query)
co = coords.SkyCoord(ra=res['ra'], dec=res['dec'], unit='deg')
table_spec = SDSS.get_spectra(co)

The very last line returns HTTP Error 404: Not Found. I can look up the URLs that are tried to be used by running .get_spectra_async instead of .get_spectra in the last line and then doing print(table_spec). The URLs are still in the wrong form of

https://data.sdss.org/sas/dr14/boss/spectro/redux/v5_10_0/spectra/7574/spec-7574-56945-0441.fits instead of
https://data.sdss.org/sas/dr14/eboss/spectro/redux/v5_10_0/spectra/7574/spec-7574-56945-0441.fits

Am I doing something wrong? Am I downloading the wrong branch/version/pull/patch?

@bsipocz
Copy link
Member

bsipocz commented Nov 18, 2021

@Nestak2 - you're installing the wrong branch. I would suggest to simply wait until this is merged and then install the dev version from pypi.

@weaverba137
Copy link
Member Author

@bsipocz, thank you for the test outputs, I'm able to reproduce now at least, and will dig deeper.

@weaverba137
Copy link
Member Author

@bsipocz, I think I found the problems for DR8 & 9.

  1. The default columns in field_names.py included 'instrument', which isn't defined for earlier data releases. Since 'instrument' isn't necessary to construct the URL, I removed it.
  2. Starting with DR10, the returned CSV has this header:
    #Table1
    ra,dec,objid,run,rerun,camcol,field,z,plate,mjd,fiberID,specobjid,run2d
    
    However, earlier releases do not have the #Table1 line.
  3. Now catch instances where run2d is parsed as an integer instead of a string. It should be a string, even if it "looks" like a pure integer.

@weaverba137
Copy link
Member Author

I'll fix the conflict with CHANGES.rst momentarily.

@bsipocz
Copy link
Member

bsipocz commented Nov 18, 2021

Could you also add the line from #2214 (comment) as a test given that slipped through originally?

@weaverba137
Copy link
Member Author

This line, right?

SDSS.get_spectra(plate=9403, mjd=58018, fiberID=485, data_release=16)

@bsipocz
Copy link
Member

bsipocz commented Nov 18, 2021

And there is one more (a new) failure:

astroquery/sdss/tests/test_sdss_remote.py ................F.xx..                                                                                                                                       [ 99%]
docs/sdss/sdss.rst s                                                                                                                                                                                   [100%]

================================================================================================== FAILURES ==================================================================================================
______________________________________________________________________________________ TestSDSSRemote.test_sdss_specobj ______________________________________________________________________________________

self = <astroquery.sdss.tests.test_sdss_remote.TestSDSSRemote object at 0x14c568700>

    def test_sdss_specobj(self):
        colnames = ['ra', 'dec', 'objid', 'run', 'rerun', 'camcol', 'field',
                    'z', 'plate', 'mjd', 'fiberID', 'specobjid', 'run2d',
                    'instrument']
        dtypes = [float, float, np.int64, int, int, int, int, float, int, int,
                  int, np.int64, int, bytes]
        data = [
            [46.8390680395307, 5.16972676625711, 1237670015125750016, 5714,
             301, 2, 185, -0.0006390358, 2340, 53733, 291, 2634685834112034816,
             26, 'SDSS'],
            [46.8705377929765, 5.42458826592292, 1237670015662621224, 5714,
             301, 3, 185, 0, 2340, 53733, 3, 2634606669274834944, 26, 'SDSS'],
            [46.8899751105478, 5.09432755808192, 1237670015125815346, 5714,
             301, 2, 186, -4.898809E-05, 2340, 53733, 287, 2634684734600407040,
             26, 'SDSS'],
            [46.8954031261838, 5.9739184644185, 1237670016199491831, 5714,
             301, 4, 185, 0, 2340, 53733, 329, 2634696279472498688, 26,
             'SDSS'],
            [46.9155836662379, 5.50671723824944, 1237670015662686398, 5714,
             301, 3, 186, 0, 2340, 53733, 420, 2634721293362030592, 26,
             'SDSS']]
        table = Table(data=[x for x in zip(*data)],
                      names=colnames, dtype=dtypes)
        xid = sdss.SDSS.query_specobj(plate=2340)
        assert isinstance(xid, Table)
        for row in table:
            i = np.nonzero(xid['specobjid'] == row['specobjid'])[0]
            assert len(i) == 1
            for j, c in enumerate(colnames):
                if dtypes[j] is float:
                    assert_allclose(xid[i][c], row[c])
                else:
>                   assert xid[i][c] == row[c]

astroquery/sdss/tests/test_sdss_remote.py:112: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/Users/bsipocz/.pyenv/versions/3.9.1/lib/python3.9/site-packages/astropy/table/table.py:1867: in __getitem__
    return self.columns[item]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <TableColumns names=('ra','dec','objid','run','rerun','camcol','field','z','plate','mjd','fiberID','specobjid','run2d')>, item = 'instrument'

    def __getitem__(self, item):
        """Get items from a TableColumns object.
        ::
    
          tc = TableColumns(cols=[Column(name='a'), Column(name='b'), Column(name='c')])
          tc['a']  # Column('a')
          tc[1] # Column('b')
          tc['a', 'b'] # <TableColumns names=('a', 'b')>
          tc[1:3] # <TableColumns names=('b', 'c')>
        """
        if isinstance(item, str):
>           return OrderedDict.__getitem__(self, item)
E           KeyError: 'instrument'

/Users/bsipocz/.pyenv/versions/3.9.1/lib/python3.9/site-packages/astropy/table/table.py:246: KeyError
========================================================================================== short test summary info ===========================================================================================
FAILED astroquery/sdss/tests/test_sdss_remote.py::TestSDSSRemote::test_sdss_specobj - KeyError: 'instrument'
======================================================================= 1 failed, 248 passed, 1 skipped, 2 xfailed in 60.83s (0:01:00) =======================================================================
$

@bsipocz
Copy link
Member

bsipocz commented Nov 18, 2021

(I wonder how usable the github VS code would be in cases like this, do you @pllim or @saimn know if anyone used it successfully to run remote tests to debug or it's just wishful thinking?)

@weaverba137
Copy link
Member Author

OK, I think I've got both of those in, please test.

@weaverba137
Copy link
Member Author

Oops, caught one more bug in one of the tests.

@pllim
Copy link
Member

pllim commented Nov 18, 2021

github VS code

I have only used it for simple editing. I never tried to run the test suite on it. Sorry. 😬

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All remotes pass now with previously failing examples included in the test suite.

@bsipocz
Copy link
Member

bsipocz commented Nov 19, 2021

Thanks @weaverba137!

@bsipocz bsipocz merged commit cbec82e into astropy:main Nov 19, 2021
@Nestak2
Copy link

Nestak2 commented Nov 19, 2021

@bsipocz Thanks for the previous answer! Is now the new version/branch/patch available for use? How do I "install the dev version from pypi". I find no information to that in google, can you provide me with a code snippet? Sorry, I am a simple astrophysics PhD student, that can code in Python, but has no idea about IT, github, version control etc.

What I can successfully do in my google colab notebook is:

!pip install git+https://github.com/astropy/astroquery.git

from astropy import coordinates as coords
from astroquery.sdss import SDSS

query = "select top 10                        z, ra, dec, bestObjID                      from                        specObj                      where                        class = 'galaxy'                        and programname='eboss'"
res = SDSS.query_sql(query)
co = coords.SkyCoord(ra=res['ra'], dec=res['dec'], unit='deg')
table_spec = SDSS.get_spectra(co)

And this runs well as long as I have select top 10. When I change it to select top 100 the error appears

ValueError: Some errors were detected !
    Line #12 (got 3 columns instead of 1)
    Line #32 (got 5 columns instead of 1)

Is the error fixable with "install the dev version from pypi" or is the error independent?

@pllim
Copy link
Member

pllim commented Nov 19, 2021

Regardless of PyPI pre-release or not, you can install the dev version like this:

pip install git+https://github.com/astropy/astroquery

@bsipocz
Copy link
Member

bsipocz commented Nov 19, 2021

It's now on pypi, too, so the a pip install with the --pre option should work, too.

@Nestak2
Copy link

Nestak2 commented Nov 22, 2021

@pllim @bsipocz Thanks, in my code I state that I have downloaded the version with
pip install git+https://github.com/astropy/astroquery, but I get a different error unrelated to a mismatch of the strings boss and eboss (see above). Any ideas why this is happening and how I can resolve it? Tnx

@pllim
Copy link
Member

pllim commented Nov 22, 2021

I am not familiar with this remote service, but you might want to look at the install log carefully to make sure you really installed the dev version that you think you are installing. If you are using a Python environment manager like conda, try it on a new environment.

@pllim
Copy link
Member

pllim commented Nov 22, 2021

p.s. Also make sure you clear the cache, just in case.

@Nestak2
Copy link

Nestak2 commented Nov 22, 2021

@pllim Thanks, I will try out your suggestions! The code from above is actually on a completely clean conda environment. Could you do me the favor and check if my code from above throws an error for you, too? It should be just as simple as copy-paste-run (and of course download the current version of the repo, if you haven't yet). Tnx

@keflavich
Copy link
Contributor

I can reproduce the error:

from astropy import coordinates as coords
from astroquery.sdss import SDSS

res = SDSS.query_sql("select top 101 z, ra, dec, bestObjID from specObj where class = 'galaxy' and programname='eboss'")
co = coords.SkyCoord(ra=res['ra'], dec=res['dec'], unit='deg')
table_spec = SDSS.get_spectra(co)

yields

Traceback (most recent call last):
  File "<ipython-input-8-818093dbfea0>", line 1, in <module>
    table_spec = SDSS.get_spectra(co)
  File "/home/adam/repos/astroquery/astroquery/sdss/core.py", line 616, in get_spectra
    readable_objs = self.get_spectra_async(coordinates=coordinates,
  File "/home/adam/repos/astroquery/astroquery/sdss/core.py", line 570, in get_spectra_async
    matches = self._parse_result(result)
  File "/home/adam/repos/astroquery/astroquery/sdss/core.py", line 867, in _parse_result
    arr = np.atleast_1d(np.genfromtxt(io.BytesIO(response.content),
  File "/home/adam/anaconda3/envs/python3.9/lib/python3.9/site-packages/numpy/lib/npyio.py", line 2124, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #12 (got 3 columns instead of 1)
    Line #32 (got 5 columns instead of 1)

(I used 101 and 100 to avoid risk of caching confusing me)

@pllim
Copy link
Member

pllim commented Nov 23, 2021

Sounds like you should open a new issue and ping SDSS developers.

@weaverba137 weaverba137 deleted the sdss-spectra-url branch March 2, 2022 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP Error 404 For SDSS.get_spectra on BOSS spectra
6 participants