Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downloading text files with utf-8 encoding #135

Closed
Schleissheimer-Stieglitz opened this issue Dec 13, 2019 · 8 comments
Closed

downloading text files with utf-8 encoding #135

Schleissheimer-Stieglitz opened this issue Dec 13, 2019 · 8 comments
Labels
good first issue Help Wanted We will be glad if somebody proposes a solution via PR

Comments

@Schleissheimer-Stieglitz

Downloading an text file with utf-8 encoding from Artifactory doesn’t work correctly.
If I try it with an example file with the following content:

This is a test.
This file is encoded in utf-8.
Some special character: äöüß

the downloaded file contains some strange symbols instead of the content which it should contain.
Content of downloaded file:

�‹�      ��ÉÈ,V ¢D…’Ôâ�=^®��@ZfN*H45/9?%5E!3O¡´$M×�(�œŸ›ªP\�šœ™˜£�œ‘X”˜\’Zd¥pxÉám‡÷�žÏË� šbçhS     

As workaround I added in artifactory.py this function to the class _ArtifactoryAccessor

def writeto(self, fd, out):
    url = str(fd)
    res = fd.session.get(url, stream=True, verify=True, cert=None)
    if res.status_code != 200:
        raise RuntimeError(res.status_code)        
    for chunk in res.iter_content(chunk_size=256):
        if chunk:
            out.write(chunk)

and this function to the class ArtifactoryPath.

def writeto(self, out):
    self._accessor.writeto(self, out)

Now I use

with open(dest, "wb") as out:
    path.writeto(out)

instead of

with path.open() as fd:
    with open(dest, "wb") as out:
        out.write(fd.read())

to download the file and it works fine for me.

@fuzzmz
Copy link
Contributor

fuzzmz commented Jan 6, 2020

I'm not able to reproduce this using dohq-artifactory==0.7.297 on top of Python 3.7.2 and Artifactory Version 6.14.0. Test was ran on Windows 10 .

Using the default download example, just modified to point to the test file works ok.

from artifactory import ArtifactoryPath

path = ArtifactoryPath(
    "http://sampleaf/artifactory/testrepo-local/testfile.txt"
)

with path.open() as fd:
    with open("testfile.txt", "wb") as out:
        out.write(fd.read())

@Schleissheimer569, after uploading the file to Artifactory, if you download it through the browser does it maintain the correct encoding?

@Schleissheimer-Stieglitz
Copy link
Author

If i download the file through the browser it works correctly.

I used the default download example, too.

I used dohq-artifactory==0.7.311, Python 3.7.4 and Artifactory Version 6.16.0 on Windows 10.

@eladavron
Copy link

eladavron commented Feb 27, 2020

I had a different issue which @Schleissheimer569 solution solved.
For me, on Python 3.8 (not 3.7 for some reason), some files would give this error when downloading:

File "C:\Python38\lib\site-packages\urllib3\response.py", line 440, in read
     data = self._fp.read()
   File "C:\Python38\lib\http\client.py", line 467, in read
     s = self._safe_read(self.length)
   File "C:\Python38\lib\http\client.py", line 608, in _safe_read
     data = self.fp.read(amt)
   File "C:\Python38\lib\socket.py", line 669, in readinto
     return self._sock.recv_into(b)
   File "C:\Python38\lib\ssl.py", line 1241, in recv_into
     return self.read(nbytes, buffer)
   File "C:\Python38\lib\ssl.py", line 1099, in read
     return self._sslobj.read(len, buffer)
OverflowError: Python int too large to convert to C long

Implementing Schleissheimer569 's workaround solved it.
It would be nice if there was a build in options to use chunks instead of a direct stream download.

@allburov
Copy link
Member

allburov commented Mar 3, 2020

Probably, we could add these functions to our library, why not. It looks useful.

Could someone create a PR with these changes? I think we also should to add ability to customize chunk_size.

@allburov allburov added good first issue Help Wanted We will be glad if somebody proposes a solution via PR labels Nov 16, 2020
@andjelx
Copy link
Contributor

andjelx commented Dec 15, 2020

I have same issue as topic starter.
Trying to download YAML artifact and getting strange set of symbols.

Python 3.7.9 // dohq-artifactory==0.7.468 // Artifactory from Jfrog Cloud

@andjelx
Copy link
Contributor

andjelx commented Dec 15, 2020

What's more interesting: if I rename file.yaml file to smth like file.rpt everything become working.
Seems like Artifactory reports text/yaml files in some strange form

@andjelx
Copy link
Contributor

andjelx commented Dec 15, 2020

#204 @allburov @fuzzmz

@beliaev-maksim
Copy link
Member

@allburov
please close this issue, fixes are available in #204 and potentially #242 could be also used

@allburov allburov closed this as completed Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Help Wanted We will be glad if somebody proposes a solution via PR
Development

No branches or pull requests

6 participants