Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tests, and fix discovered bugs #68

Merged
merged 24 commits into from Sep 13, 2017
Merged

Update tests, and fix discovered bugs #68

merged 24 commits into from Sep 13, 2017

Conversation

althonos
Copy link
Member

@althonos althonos commented Aug 31, 2017

Hi Will, I updated the tests in fs.test with some missing checks, and discovered some new bugs I will fix gradually in this PR.

New tests

  1. FS.copy, FS.move, FS.copydir and FS.movedir should raise ResourceNotFound when the source is nowhere to be found
  2. FS.copy and FS.move should probably raise FileExpected when the source is a directory (not listed in the documentation, but makes sense I guess ?...)
  3. FS.movedir and FS.copydir should raise DirectoryExpected when the source is a file
  4. If a filesystem is advertising itself as case insensitive, check that's true
  5. If a filesystem is advertising itself as supporting unicode paths, check that's true
  6. Check that file-like objects returned by FS.openbin:
    • have a seek method that returns the new absolute position when called
    • have a write method that returns the number of bytes (in binary mode) / characters (in text mode) written
    • have a truncate method that returns the new length of the file

Discovered bugs

  • FTPFile.seek, FTPFile.truncate and FTPFile.write do not return anything fixed
  • FTPFS crashes on unicode paths (/földér) fixed, see below
  • MemoryFile.write and MemoryFile.truncate do not return anything fixed
  • many filesystem's move and copy do not raise FileExpected when given a directory fixed by fixing FS.copy and FS.move

About FTPFS

FTP servers supporting UTF-8 encoded paths display the feature UTF8. But the ftplib.FTP objects require to set the encoding before connecting to the server, or else the server and the client encoding will not be synchronised (client will send utf-8 paths but server will treat them as latin-1).

To fix this, the encoding is checked when accessing the FTPFS.ftp property for the first time. A mock connection is established to check for the UTF8 feature ; then, a proper connection is established, with utf-8 as the encoding if available, and latin-1 otherwise. Then, any new ftp connection will use the previous encoding, so the overhead is limited.

About test_ftpfs

I rewrote the class so that a new server is created once by class instead of before each test, which was a lot longer for nothing. I also made use of pyftpdlib.test.FTPd instead of externally calling the file and then task-killing the thread, which was a lot hackier.

I still think there is an issue with the timeout of FTPFS objects, since the test_connection_error tests take a really long time, longer than what they should take given the timeout.

@willmcgugan
Copy link
Member

Thanks, Martin.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 99.208% when pulling 085c8a1 on althonos:master into f49c33a on PyFilesystem:master.

1 similar comment
@coveralls
Copy link

coveralls commented Sep 1, 2017

Coverage Status

Coverage decreased (-0.8%) to 99.208% when pulling 085c8a1 on althonos:master into f49c33a on PyFilesystem:master.

@coveralls
Copy link

coveralls commented Sep 3, 2017

Coverage Status

Coverage decreased (-0.8%) to 99.156% when pulling 3ee3fc1 on althonos:master into f49c33a on PyFilesystem:master.

@coveralls
Copy link

coveralls commented Sep 3, 2017

Coverage Status

Coverage decreased (-0.9%) to 99.131% when pulling 1a36612 on althonos:master into f49c33a on PyFilesystem:master.

@coveralls
Copy link

coveralls commented Sep 4, 2017

Coverage Status

Coverage decreased (-0.9%) to 99.127% when pulling e738c3c on althonos:master into f49c33a on PyFilesystem:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.05%) to 99.949% when pulling 72a0e6e on althonos:master into f49c33a on PyFilesystem:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.05%) to 99.949% when pulling 72a0e6e on althonos:master into f49c33a on PyFilesystem:master.

@coveralls
Copy link

coveralls commented Sep 4, 2017

Coverage Status

Coverage decreased (-0.05%) to 99.949% when pulling 72a0e6e on althonos:master into f49c33a on PyFilesystem:master.

@coveralls
Copy link

coveralls commented Sep 4, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling 8d3b88f on althonos:master into f49c33a on PyFilesystem:master.

@althonos
Copy link
Member Author

althonos commented Sep 4, 2017

@willmcgugan : clear to merge on my side.

@willmcgugan
Copy link
Member

Great. Give me a day or two to pick through this. Apologies in advance for any pedantry.

@althonos
Copy link
Member Author

althonos commented Sep 4, 2017

No problem, I'll try to answer quickly any question you may have.

Copy link
Member

@willmcgugan willmcgugan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Some questions potential issues.

@@ -318,6 +318,8 @@ def copydir(self, src_path, dst_path, create=False):
with self._lock:
if not create and not self.exists(dst_path):
raise errors.ResourceNotFound(dst_path)
if not self.getinfo(src_path).is_dir:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would isdir not be better? If you don't need the info object, then the FS object can use a potentially faster codepath.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling getinfo will raise a ResourceNotFound as expected, whereas if you were to use isdir you would need one call to check for the existence of a resource, and a second call to check that the resource is a directory.

Since the base implementation uses getinfo as a base for isdir, isfile and exist, it will result in only one call to getinfo to do as such but two calls otherwise.

@@ -820,6 +822,8 @@ def move(self,

if not overwrite and self.exists(dst_path):
raise errors.DestinationExists(dst_path)
if self.getinfo(src_path).is_dir:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here...

fs/ftpfs.py Outdated
@@ -99,16 +107,19 @@ def __init__(self, ftpfs, path, mode):
self._read_conn = None
self._write_conn = None

def __length_hint__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit puzzled by this. My understanding is that this should return an estimate of the number of results if you were to iterate over the object. But if you iterate over a file you get lines, so shouldn't this method return the number of lines, rather than the number of bytes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, that totally went above my head when I made it !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. Don't think there is any shortcut for calculating number of lines for ftp files. Its an idea for MemoryFS though!

fs/ftpfs.py Outdated
@@ -15,7 +15,7 @@
from ftplib import error_temp

from six import PY2
from six import text_type
from six import text_type, binary_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't strictly need binary_type since bytes is the same on Python2.7 and Python3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

six.binary_type in Python 2 is str, not bytes 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but on Python2.7

>>> str is bytes
True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been living in a lie for the last three years.

fs/ftpfs.py Outdated
ftp = self.fs._open_ftp()
ftp.voidcmd(_encode('TYPE I'))
ftp = self.fs._open_ftp(self.fs.encoding)
ftp.voidcmd(str('TYPE I'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So str would be bytes on Python 2.7 and text type on Py3? Which is what the ftp library expects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is a summary from what I read during my researches:

ftplib expected binary (str) strings in Python 2, but was patched to simply decode text (str) strings in Python 3 and decode them as UTF-8. Because of this, there may be cases where a string is decode twice, or things like that. I'm not completely sure about it (the StackOverflow where people debated that stated the ftplib was a mess), but at least that provided good results with unicode path tests : when I did it another way, I'd start to have decoding/encoding errors in either Python 2 or Python 3.

fs/ftpfs.py Outdated
def writelines(self, lines):
self.write(b''.join(lines))

def truncate(self, size=None):
# Inefficient, but I don't know if truncate is possible with ftp
with self._lock:
if size is None:
size = self.tell()
size = size or self.tell()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would break if you did truncate(0). Would it not truncate at the current file position?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right, I'll change it to self.tell() if size is None else size.

fs/ftpfs.py Outdated
self._welcome = self._ftp.getwelcome()
return self._ftp

@property
def encoding(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be possible to detect and store the encoding in the ftp property? Just wondering the lazy detection of encoding would result in excessive requests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but in both cases it only makes one additional call compared to the master implementation I guess. I could move it into the ftp property to encapsulate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might simplify things if the encoding detection logic is moved in to the ftpproperty. If the encoding defaulted to latin-1 then you wouldn't need to pass it to open_ftp.

I'm also struggling a bit to understand why you need to open a new connection to detect the encoding. I think you should be able to call "FEAT" on first connection, and cache the encoding on the FTPFS. No need to make a new connection object.

Copy link
Member Author

@althonos althonos Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the first point, I'll change that.

For the second point: you have to set the encoding prior to connecting to the server. If you connect to the server with the default (latin-1), then change to utf-8 after seing the UTF8 feature, the server will still think you're sending latin-1 encoded strings. So, to reset the server and use the proper encoding, you have to reconnect to the server with the new encoding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've figured out something here. What was confusing me is that, I don't believe the encoding is ever sent to the ftp server. So the server is always going to send the same data regardless of what FTP.encoding is set to.

I've looked at the source and what appears to be happening is that its ftplib itself is using the encoding when it calls self.sock.makefile in the connect method. That creates a file-like object with the encoding when you connect.

So that would explain why you can't change the encoding after you have connected. But only due to a quite artificial limitation in ftplib. Otherwise you could just do a FEAT, and then set the encoding in the same connection.

It's really unfortunate.

I can think of a fudge, but its not very nice.

If the encoding is 'utf-8', you could re-encode the paths as 'latin-1' (effectively getting back the same bytes you received), then decode them as 'utf-8'. That should give you properly decode paths.

Another possibility is to do this if you detect 'utf-8' encoding:

ftp.file = ftp.sock.makefile('r', encoding='utf-8')

Both are hacks, so I'm not sure they are worth it. Otherwise we may be stuck with the single additional request to get the encoding...

def features(self):
"""Get features dict from FTP server."""
if self._features is None:
try:
response = self.ftp.sendcmd(_encode("FEAT"))
response = _decode(self.ftp.sendcmd("FEAT"), "ascii")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possibility of a unicode decode error here at all? I see the original did something similar, but it might be worth handling encoding problems for paranoia's sake.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC does not say anything about encoding, but I believe features will only contain ASCII characters, so I guess there's no point over-interpreting the server response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to handle the potential edge case where a response doesn't return 7-bit ascii. Maybe it won't ever happen, but there is a potential UnicodeDecodeError lurkng here.

How about _decode has unicode error handling set to replace? That way a flakey ftp server couldn't break this code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if a server can't even get the features list right, I'm sure the code will break elsewhere at some point 😁 But I'll change the code accordingly.

pasw='1234'

@classmethod
def setUpClass(cls):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem simpler than launching numerous ftp servers! Is it faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ! Running tests.test_ftpfs takes 6.5s on my machine :)

Copy link
Member Author

@althonos althonos Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(except for test_connection_error which sometimes takes longer, sometimes not, so I left the slow attribute for this method only)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a socket error? Maybe socket.settimeout

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 5ed51f8 on althonos:master into f49c33a on PyFilesystem:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 5ed51f8 on althonos:master into f49c33a on PyFilesystem:master.

@coveralls
Copy link

coveralls commented Sep 10, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling 5ed51f8 on althonos:master into f49c33a on PyFilesystem:master.

@coveralls
Copy link

coveralls commented Sep 10, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling 357f334 on althonos:master into f49c33a on PyFilesystem:master.

@althonos
Copy link
Member Author

Any other change you want me to do ?

@willmcgugan
Copy link
Member

Looks good. Will have a quick pass later and merge.

Curious re your thoughts on #68 (diff) Looks like we don't strictly need 2 connections, to determine the encoding but the ftplib api doesn't allow for it.

Copy link
Member

@willmcgugan willmcgugan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@willmcgugan willmcgugan merged commit 00eda5e into PyFilesystem:master Sep 13, 2017
@willmcgugan
Copy link
Member

Hmm. A few test fails on my Mac, but TravisCI isn't complaining.

@althonos
Copy link
Member Author

everything worked on my ArchLinux 😕
Thanks for the merge ! Now's time for me to work the extensions compatible 😉

@willmcgugan
Copy link
Member

I'll document the test fails on the issues. Could use your feed back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants