Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: CADC: fix the ability to pass on Path objects as output_file #2541

Merged
merged 3 commits into from
Oct 4, 2022

Conversation

bsipocz
Copy link
Member

@bsipocz bsipocz commented Sep 28, 2022

fixes #2540

@bsipocz bsipocz marked this pull request as ready for review September 28, 2022 02:42
@codecov
Copy link

codecov bot commented Sep 28, 2022

Codecov Report

Merging #2541 (5ce572b) into main (4b4af55) will increase coverage by 0.37%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##             main    #2541      +/-   ##
==========================================
+ Coverage   63.63%   64.01%   +0.37%     
==========================================
  Files         132      132              
  Lines       17092    16972     -120     
==========================================
- Hits        10876    10864      -12     
+ Misses       6216     6108     -108     
Impacted Files Coverage Δ
astroquery/cadc/core.py 81.18% <75.00%> (+0.95%) ⬆️
astroquery/alma/utils.py 32.43% <0.00%> (+14.30%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@bsipocz
Copy link
Member Author

bsipocz commented Sep 28, 2022

Unfortunately this (the test itself) doesn't seem to work with astropy<5.1.

While the relevant votable.utils.convert_to_writable_filelike function doesn't seem to change for a very long time, I didn't follow the call sequence. Instead will simply just skip the test for older versions (and please do advise if you see a workaround, but I don't think it's worth spending too much time with it)

======================================================= FAILURES =======================================================
____________________________________________________ test_exec_sync ____________________________________________________

    @patch('astroquery.cadc.core.get_access_url',
           Mock(side_effect=lambda x, y=None: 'https://some.url'))
    def test_exec_sync():
        # save results in a file
        # create the VOTable result
        # example from http://docs.astropy.org/en/stable/io/votable/
        votable = VOTableFile()
        resource = Resource()
        votable.resources.append(resource)
        table = Table(votable)
        resource.tables.append(table)
        table.fields.extend([
            Field(votable, name="filename", datatype="char", arraysize="*"),
            Field(votable, name="matrix", datatype="double", arraysize="2x2")])
        table.create_arrays(2)
        table.array[0] = ('test1.xml', [[1, 0], [0, 1]])
        table.array[1] = ('test2.xml', [[0.5, 0.3], [0.2, 0.1]])
        buffer = BytesIO()
        votable.to_xml(buffer)
        cadc = Cadc(auth_session=requests.Session())
        response = Mock()
        response.to_table.return_value = table.to_table()
        cadc.cadctap.search = Mock(return_value=response)
        output_files = ['{}/test_vooutput.xml'.format(tempfile.tempdir),
                        Path(tempfile.tempdir, 'test_path_vooutput.xml')]
        for output_file in output_files:
>           cadc.exec_sync('some query', output_file=output_file)

astroquery/cadc/tests/test_cadctap.py:376: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
astroquery/cadc/core.py:615: in exec_sync
    result.write(fname, format=output_format, overwrite=True)
/Users/bsipocz/.pyenv/versions/3.9.1/lib/python3.9/site-packages/astropy/table/connect.py:127: in __call__
    registry.write(instance, *args, **kwargs)
/Users/bsipocz/.pyenv/versions/3.9.1/lib/python3.9/site-packages/astropy/io/registry.py:570: in write
    writer(data, *args, **kwargs)
/Users/bsipocz/.pyenv/versions/3.9.1/lib/python3.9/site-packages/astropy/io/votable/connect.py:173: in write_table_votable
    table_file.to_xml(output, tabledata_format=tabledata_format)
/Users/bsipocz/.pyenv/versions/3.9.1/lib/python3.9/site-packages/astropy/io/votable/tree.py:3662: in to_xml
    with util.convert_to_writable_filelike(
/Users/bsipocz/.pyenv/versions/3.9.1/lib/python3.9/contextlib.py:117: in __enter__
    return next(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

fd = PosixPath('/var/folders/9s/070g0pd502q70k3gffpxv8km0000gq/T/test_path_vooutput.xml'), compressed = False

    @contextlib.contextmanager
    def convert_to_writable_filelike(fd, compressed=False):
        """
        Returns a writable file-like object suitable for streaming output.
    
        Parameters
        ----------
        fd : str or file-like
            May be:
    
                - a file path string, in which case it is opened, and the file
                  object is returned.
    
                - an object with a :meth:``write`` method, in which case that
                  object is returned.
    
        compressed : bool, optional
            If `True`, create a gzip-compressed file.  (Default is `False`).
    
        Returns
        -------
        fd : writable file-like
        """
        if isinstance(fd, str):
            if fd.endswith('.gz') or compressed:
                with gzip.GzipFile(fd, 'wb') as real_fd:
                    encoded_fd = io.TextIOWrapper(real_fd, encoding='utf8')
                    yield encoded_fd
                    encoded_fd.flush()
                    real_fd.flush()
                    return
            else:
                with open(fd, 'wt', encoding='utf8') as real_fd:
                    yield real_fd
                    return
        elif hasattr(fd, 'write'):
            assert callable(fd.write)
    
            if compressed:
                fd = gzip.GzipFile(fileobj=fd)
    
            # If we can't write Unicode strings, use a codecs.StreamWriter
            # object
            needs_wrapper = False
            try:
                fd.write('')
            except TypeError:
                needs_wrapper = True
    
            if not hasattr(fd, 'encoding') or fd.encoding is None:
                needs_wrapper = True
    
            if needs_wrapper:
                yield codecs.getwriter('utf-8')(fd)
                fd.flush()
            else:
                yield fd
                fd.flush()
    
            return
        else:
>           raise TypeError("Can not be coerced to writable file-like object")
E           TypeError: Can not be coerced to writable file-like object

/Users/bsipocz/.pyenv/versions/3.9.1/lib/python3.9/site-packages/astropy/io/votable/util.py:85: TypeError
------------------------------------------------- Captured stdout call -------------------------------------------------
      filename matrix [2,2]
     --------- ------------
     test1.xml   1.0 .. 1.0
     test2.xml   0.5 .. 0.1
=============================================== short test summary info ================================================
FAILED astroquery/cadc/tests/test_cadctap.py::test_exec_sync - TypeError: Can not be coerced to writable file-like ob...
======================================= 1 failed, 12 passed, 15 skipped in 3.33s =======================================

@bsipocz bsipocz added this to the v0.4.7 milestone Sep 28, 2022
astroquery/cadc/core.py Outdated Show resolved Hide resolved
@@ -56,6 +56,9 @@ cadc

- Deprecated keywords and ``run_query`` method have been removed. [#2389]

- Fixed a bug to be able to pass longer that filename Path objects as
``output_file``. [#2541]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this was a bug. The documentation for the method states that output_file is either a string or a file handler. Adding support for Path in addition to that is a great idea but it's a new feature.

Comment on lines 609 to 614
if isinstance(output_file, str):
fname = output_file
elif hasattr(output_file, 'name'):
fname = output_file.name
fname = str(output_file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second block is meant of file handlers. Maybe all is needed is for the first if to be if isinstance(output_file, (str, Path)): name = str(output_file)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I can do that, it makes sense. Though it would also make sense to add a test for the file handler case.

Copy link

@andamian andamian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Thanks for the contribution!

@bsipocz
Copy link
Member Author

bsipocz commented Oct 3, 2022

I'm not sure what goes on with the open file/etc issues on windows, maybe @pllim has a suggestion how to patch that failure around?

@pllim
Copy link
Member

pllim commented Oct 3, 2022

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process

I feel like we seen this in astropy before but I don't remember if we fixed it or simply ignored the failure. I don't think we even run open files check in the Windows job... 💭

@bsipocz
Copy link
Member Author

bsipocz commented Oct 3, 2022

I don't run open files check, this just chokes on the warning itself. I'm just leaning towards catching and ignoring it for windows.

@pllim
Copy link
Member

pllim commented Oct 3, 2022

Might be the same problem as astropy/astropy#7404 ?

@bsipocz
Copy link
Member Author

bsipocz commented Oct 3, 2022

off topic but that is a very sad issue, full of the most amazing people most of whom already left.

@bsipocz bsipocz changed the title BUG: CADC: fix the ability to pass on Path objects as output_file ENH: CADC: fix the ability to pass on Path objects as output_file Oct 4, 2022
@bsipocz
Copy link
Member Author

bsipocz commented Oct 4, 2022

OK, so windows is worked around. Given that there were no tests for the other OSes, I still consider that this PR leaves the code in a better shape. If anyone has an idea how to fix windows, PRs are welcome :)

@bsipocz bsipocz merged commit 0258503 into astropy:main Oct 4, 2022
@Vital-Fernandez
Copy link

I am experiencing the issue in astropy/astropy#7404: while trying to combine multiple fits into one file, individual hduls are closed at the point you want to save the new file:

from pathlib import Path
from astropy.io import fits


def join_fits_logs(log_path_list, output_address, keep_individual_files=False):

    # Create new HDU for the combined file with a new PrimaryHDU
    hdulist = fits.HDUList([fits.PrimaryHDU()])

    # Iterate through the file paths, open each FITS file, and append the non-primary HDUs to hdulist
    missing_files = []
    for log_path in log_path_list:
        if log_path.is_file():
            with fits.open(log_path) as hdulist_i:

                # Remove primary
                if isinstance(hdulist_i[0], fits.PrimaryHDU):
                    hdulist_i.pop(0)

                # Combine list
                hdulist += hdulist_i.copy()

        else:
            missing_files.append(log_path)

    # Save to a combined file
    hdulist.writeto(output_address, overwrite=True, output_verify='ignore')
    hdulist.close()

    return

mask_log_list = [Path("..\..\sample_data\SHOC579_log_MASK-MASK_0.fits"),
                 Path("..\..\sample_data\SHOC579_log_MASK-MASK_1.fits"),
                 Path("..\..\sample_data\SHOC579_log_MASK-MASK_2.fits")]

join_fits_logs(mask_log_list, Path("..\..\sample_data\SHOC579_log_COMB.fits"))

I can make it work by creating new hdus for with the existing hdu.data and hdu.headr (although I would swear the combined files are much larger than expected)

I wonder @bsipocz if you could share your work around for windows.

Thanks for any advice.

@bsipocz
Copy link
Member Author

bsipocz commented Oct 2, 2023

@Vital-Fernandez - I would consider moving this discussion upstream to astropy as we don't have a solution or workaround for it here. We just simply skipped testing this on windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: cadc cuts down filepath when non string output_file is passed on
5 participants