Update tests, and fix discovered bugs #68

althonos · 2017-08-31T20:30:57Z

Hi Will, I updated the tests in fs.test with some missing checks, and discovered some new bugs I will fix gradually in this PR.

New tests

FS.copy, FS.move, FS.copydir and FS.movedir should raise ResourceNotFound when the source is nowhere to be found
FS.copy and FS.move should probably raise FileExpected when the source is a directory (not listed in the documentation, but makes sense I guess ?...)
FS.movedir and FS.copydir should raise DirectoryExpected when the source is a file
If a filesystem is advertising itself as case insensitive, check that's true
If a filesystem is advertising itself as supporting unicode paths, check that's true
Check that file-like objects returned by FS.openbin:
- have a seek method that returns the new absolute position when called
- have a write method that returns the number of bytes (in binary mode) / characters (in text mode) written
- have a truncate method that returns the new length of the file

Discovered bugs

~~FTPFile.seek, FTPFile.truncate and FTPFile.write do not return anything~~ fixed
~~FTPFS crashes on unicode paths (/földér)~~ fixed, see below
~~MemoryFile.write and MemoryFile.truncate do not return anything~~ fixed
~~many filesystem's move and copy do not raise FileExpected when given a directory~~ fixed by fixing FS.copy and FS.move

About FTPFS

FTP servers supporting UTF-8 encoded paths display the feature UTF8. But the ftplib.FTP objects require to set the encoding before connecting to the server, or else the server and the client encoding will not be synchronised (client will send utf-8 paths but server will treat them as latin-1).

To fix this, the encoding is checked when accessing the FTPFS.ftp property for the first time. A mock connection is established to check for the UTF8 feature ; then, a proper connection is established, with utf-8 as the encoding if available, and latin-1 otherwise. Then, any new ftp connection will use the previous encoding, so the overhead is limited.

About test_ftpfs

I rewrote the class so that a new server is created once by class instead of before each test, which was a lot longer for nothing. I also made use of pyftpdlib.test.FTPd instead of externally calling the file and then task-killing the thread, which was a lot hackier.

I still think there is an issue with the timeout of FTPFS objects, since the test_connection_error tests take a really long time, longer than what they should take given the timeout.

willmcgugan · 2017-09-01T13:53:40Z

Thanks, Martin.

coveralls · 2017-09-01T22:56:38Z

Coverage decreased (-0.8%) to 99.208% when pulling 085c8a1 on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-01T22:56:38Z

Coverage decreased (-0.8%) to 99.208% when pulling 085c8a1 on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-03T21:20:01Z

Coverage decreased (-0.8%) to 99.156% when pulling 3ee3fc1 on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-03T21:33:29Z

Coverage decreased (-0.9%) to 99.131% when pulling 1a36612 on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-04T08:56:27Z

Coverage decreased (-0.9%) to 99.127% when pulling e738c3c on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-04T09:48:52Z

Coverage decreased (-0.05%) to 99.949% when pulling 72a0e6e on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-04T09:48:53Z

Coverage decreased (-0.05%) to 99.949% when pulling 72a0e6e on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-04T09:48:53Z

Coverage decreased (-0.05%) to 99.949% when pulling 72a0e6e on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-04T14:25:47Z

Coverage remained the same at 100.0% when pulling 8d3b88f on althonos:master into f49c33a on PyFilesystem:master.

althonos · 2017-09-04T14:34:00Z

@willmcgugan : clear to merge on my side.

willmcgugan · 2017-09-04T14:46:04Z

Great. Give me a day or two to pick through this. Apologies in advance for any pedantry.

althonos · 2017-09-04T14:47:36Z

No problem, I'll try to answer quickly any question you may have.

willmcgugan

Great work. Some questions potential issues.

willmcgugan · 2017-09-09T12:44:26Z

fs/base.py

@@ -318,6 +318,8 @@ def copydir(self, src_path, dst_path, create=False):
        with self._lock:
            if not create and not self.exists(dst_path):
                raise errors.ResourceNotFound(dst_path)
+            if not self.getinfo(src_path).is_dir:


Would isdir not be better? If you don't need the info object, then the FS object can use a potentially faster codepath.

Calling getinfo will raise a ResourceNotFound as expected, whereas if you were to use isdir you would need one call to check for the existence of a resource, and a second call to check that the resource is a directory.

Since the base implementation uses getinfo as a base for isdir, isfile and exist, it will result in only one call to getinfo to do as such but two calls otherwise.

willmcgugan · 2017-09-09T12:44:42Z

fs/base.py

@@ -820,6 +822,8 @@ def move(self,

        if not overwrite and self.exists(dst_path):
            raise errors.DestinationExists(dst_path)
+        if self.getinfo(src_path).is_dir:


Ditto here...

willmcgugan · 2017-09-09T12:52:35Z

fs/ftpfs.py

@@ -99,16 +107,19 @@ def __init__(self, ftpfs, path, mode):
        self._read_conn = None
        self._write_conn = None

+    def __length_hint__(self):


Bit puzzled by this. My understanding is that this should return an estimate of the number of results if you were to iterate over the object. But if you iterate over a file you get lines, so shouldn't this method return the number of lines, rather than the number of bytes.

Yes indeed, that totally went above my head when I made it !

No worries. Don't think there is any shortcut for calculating number of lines for ftp files. Its an idea for MemoryFS though!

willmcgugan · 2017-09-09T12:54:30Z

fs/ftpfs.py

@@ -15,7 +15,7 @@
 from ftplib import error_temp

 from six import PY2
-from six import text_type
+from six import text_type, binary_type


I think we don't strictly need binary_type since bytes is the same on Python2.7 and Python3.

six.binary_type in Python 2 is str, not bytes 😄

Yes, but on Python2.7

>>> str is bytes True

I've been living in a lie for the last three years.

willmcgugan · 2017-09-09T13:00:11Z

fs/ftpfs.py

-        ftp = self.fs._open_ftp()
-        ftp.voidcmd(_encode('TYPE I'))
+        ftp = self.fs._open_ftp(self.fs.encoding)
+        ftp.voidcmd(str('TYPE I'))


So str would be bytes on Python 2.7 and text type on Py3? Which is what the ftp library expects?

So, this is a summary from what I read during my researches:

ftplib expected binary (str) strings in Python 2, but was patched to simply decode text (str) strings in Python 3 and decode them as UTF-8. Because of this, there may be cases where a string is decode twice, or things like that. I'm not completely sure about it (the StackOverflow where people debated that stated the ftplib was a mess), but at least that provided good results with unicode path tests : when I did it another way, I'd start to have decoding/encoding errors in either Python 2 or Python 3.

willmcgugan · 2017-09-09T13:05:56Z

fs/ftpfs.py

    def writelines(self, lines):
        self.write(b''.join(lines))

    def truncate(self, size=None):
        # Inefficient, but I don't know if truncate is possible with ftp
        with self._lock:
-            if size is None:
-                size = self.tell()
+            size = size or self.tell()


I think this would break if you did truncate(0). Would it not truncate at the current file position?

Yes you're right, I'll change it to self.tell() if size is None else size.

willmcgugan · 2017-09-09T13:08:43Z

fs/ftpfs.py

            self._welcome = self._ftp.getwelcome()
        return self._ftp

    @property
+    def encoding(self):


Would it not be possible to detect and store the encoding in the ftp property? Just wondering the lazy detection of encoding would result in excessive requests.

I think so, but in both cases it only makes one additional call compared to the master implementation I guess. I could move it into the ftp property to encapsulate it.

I think it might simplify things if the encoding detection logic is moved in to the ftpproperty. If the encoding defaulted to latin-1 then you wouldn't need to pass it to open_ftp.

I'm also struggling a bit to understand why you need to open a new connection to detect the encoding. I think you should be able to call "FEAT" on first connection, and cache the encoding on the FTPFS. No need to make a new connection object.

I agree with the first point, I'll change that.

For the second point: you have to set the encoding prior to connecting to the server. If you connect to the server with the default (latin-1), then change to utf-8 after seing the UTF8 feature, the server will still think you're sending latin-1 encoded strings. So, to reset the server and use the proper encoding, you have to reconnect to the server with the new encoding.

I think I've figured out something here. What was confusing me is that, I don't believe the encoding is ever sent to the ftp server. So the server is always going to send the same data regardless of what FTP.encoding is set to.

I've looked at the source and what appears to be happening is that its ftplib itself is using the encoding when it calls self.sock.makefile in the connect method. That creates a file-like object with the encoding when you connect.

So that would explain why you can't change the encoding after you have connected. But only due to a quite artificial limitation in ftplib. Otherwise you could just do a FEAT, and then set the encoding in the same connection.

It's really unfortunate.

I can think of a fudge, but its not very nice.

If the encoding is 'utf-8', you could re-encode the paths as 'latin-1' (effectively getting back the same bytes you received), then decode them as 'utf-8'. That should give you properly decode paths.

Another possibility is to do this if you detect 'utf-8' encoding:

ftp.file = ftp.sock.makefile('r', encoding='utf-8')

Both are hacks, so I'm not sure they are worth it. Otherwise we may be stuck with the single additional request to get the encoding...

willmcgugan · 2017-09-09T13:12:58Z

fs/ftpfs.py

    def features(self):
        """Get features dict from FTP server."""
        if self._features is None:
            try:
-                response = self.ftp.sendcmd(_encode("FEAT"))
+                response = _decode(self.ftp.sendcmd("FEAT"), "ascii")


Is there a possibility of a unicode decode error here at all? I see the original did something similar, but it might be worth handling encoding problems for paranoia's sake.

The RFC does not say anything about encoding, but I believe features will only contain ASCII characters, so I guess there's no point over-interpreting the server response.

I'd be tempted to handle the potential edge case where a response doesn't return 7-bit ascii. Maybe it won't ever happen, but there is a potential UnicodeDecodeError lurkng here.

How about _decode has unicode error handling set to replace? That way a flakey ftp server couldn't break this code.

Well, if a server can't even get the features list right, I'm sure the code will break elsewhere at some point 😁 But I'll change the code accordingly.

willmcgugan · 2017-09-09T13:19:42Z

tests/test_ftpfs.py

+    pasw='1234'
+
+    @classmethod
+    def setUpClass(cls):


This does seem simpler than launching numerous ftp servers! Is it faster?

Yes ! Running tests.test_ftpfs takes 6.5s on my machine :)

(except for test_connection_error which sometimes takes longer, sometimes not, so I left the slow attribute for this method only)

Is that a socket error? Maybe socket.settimeout

coveralls · 2017-09-10T16:28:51Z

Coverage remained the same at 100.0% when pulling 5ed51f8 on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-10T16:28:51Z

Coverage remained the same at 100.0% when pulling 5ed51f8 on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-10T16:28:51Z

Coverage remained the same at 100.0% when pulling 5ed51f8 on althonos:master into f49c33a on PyFilesystem:master.

coveralls · 2017-09-10T16:36:42Z

Coverage remained the same at 100.0% when pulling 357f334 on althonos:master into f49c33a on PyFilesystem:master.

althonos · 2017-09-13T09:02:08Z

Any other change you want me to do ?

willmcgugan · 2017-09-13T12:47:07Z

Looks good. Will have a quick pass later and merge.

Curious re your thoughts on #68 (diff) Looks like we don't strictly need 2 connections, to determine the encoding but the ftplib api doesn't allow for it.

willmcgugan

Thanks!

willmcgugan · 2017-09-13T17:40:33Z

Hmm. A few test fails on my Mac, but TravisCI isn't complaining.

althonos · 2017-09-13T17:47:52Z

everything worked on my ArchLinux 😕
Thanks for the merge ! Now's time for me to work the extensions compatible 😉

willmcgugan · 2017-09-14T08:53:06Z

I'll document the test fails on the issues. Could use your feed back.

Martin Larralde added 6 commits August 31, 2017 20:54

Check that 'seek' method of 'openbin' objects returns the new position

33b0be2

Check that FS.copy and FS.copydir raise the right errors

1cb9bdb

Check that FS.move and FS.movedir raise the right errors

509a780

Add tests for filesystems supporting unicode paths

7a17ad3

Add tests to check case sensitivity

1ce5c3e

Fix FTPFile methods that didn't return a proper value

0e678ff

Martin Larralde added 6 commits September 1, 2017 18:22

Make FS.move raise FileExpected is the src_path is a folder

bcfbabf

Fix FTPFile.truncate not returning the new size

84700ac

Fix MemoryFile.truncate and MemoryFile.write not returning anything

341085e

Fix MemoryFile.truncate moving the cursor when given a size

06a2411

Fix FS.copydir not raising DirectoryExpected and ResourceNotFound

8acd0d6

Check that FTP server supports UTF-8 before connecting to it for real

085c8a1

Martin Larralde added 2 commits September 3, 2017 21:45

Check that syspath works fine with an UTF-8 filename

d6b4b3f

Rewrite TestFTPFS to use pyftpdlib.test.FTPd

3ee3fc1

Fix ftpfs._encode not using utf-8 in Python 2.7

1a36612

Martin Larralde added 2 commits September 4, 2017 10:53

Bump required pyftpdlib to 1.5.2

46003ff

Make FTPFS properly encode/decode the FTP server response

e738c3c

Fix nose mistaking ThreadedTestFTPd for a TestCase

72a0e6e

Make sure ftpfs._encode/_decode use the proper encoding

8d3b88f

willmcgugan requested changes Sep 9, 2017

View reviewed changes

Martin Larralde added 4 commits September 10, 2017 18:06

Remove potential bug in FTPFile.truncate

a9074bc

Remove binary_type import from six

43e341b

Add server encoding detection inside of FTPFS.ftp

2f36daa

Use 'replace' error handling to decode server in FTPFS

5ed51f8

Remove FTPFile.__lenght_hint__

357f334

willmcgugan approved these changes Sep 13, 2017

View reviewed changes

willmcgugan merged commit 00eda5e into PyFilesystem:master Sep 13, 2017

Update tests, and fix discovered bugs #68

Update tests, and fix discovered bugs #68

Conversation

althonos commented Aug 31, 2017 • edited

New tests

Discovered bugs

About FTPFS

About test_ftpfs

willmcgugan commented Sep 1, 2017

coveralls commented Sep 1, 2017

coveralls commented Sep 1, 2017 • edited

coveralls commented Sep 3, 2017 • edited

coveralls commented Sep 3, 2017 • edited

coveralls commented Sep 4, 2017 • edited

coveralls commented Sep 4, 2017

coveralls commented Sep 4, 2017

coveralls commented Sep 4, 2017 • edited

coveralls commented Sep 4, 2017 • edited

althonos commented Sep 4, 2017

willmcgugan commented Sep 4, 2017

althonos commented Sep 4, 2017

willmcgugan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

althonos Sep 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

althonos Sep 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Sep 10, 2017

coveralls commented Sep 10, 2017

coveralls commented Sep 10, 2017 • edited

coveralls commented Sep 10, 2017 • edited

althonos commented Sep 13, 2017

willmcgugan commented Sep 13, 2017

willmcgugan left a comment

Choose a reason for hiding this comment

willmcgugan commented Sep 13, 2017

althonos commented Sep 13, 2017

willmcgugan commented Sep 14, 2017

althonos commented Aug 31, 2017 •

edited

coveralls commented Sep 1, 2017 •

edited

coveralls commented Sep 3, 2017 •

edited

coveralls commented Sep 3, 2017 •

edited

coveralls commented Sep 4, 2017 •

edited

coveralls commented Sep 4, 2017 •

edited

coveralls commented Sep 4, 2017 •

edited

althonos Sep 10, 2017 •

edited

althonos Sep 10, 2017 •

edited

coveralls commented Sep 10, 2017 •

edited

coveralls commented Sep 10, 2017 •

edited