New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tests, and fix discovered bugs #68
Conversation
Thanks, Martin. |
1 similar comment
2 similar comments
@willmcgugan : clear to merge on my side. |
Great. Give me a day or two to pick through this. Apologies in advance for any pedantry. |
No problem, I'll try to answer quickly any question you may have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. Some questions potential issues.
@@ -318,6 +318,8 @@ def copydir(self, src_path, dst_path, create=False): | |||
with self._lock: | |||
if not create and not self.exists(dst_path): | |||
raise errors.ResourceNotFound(dst_path) | |||
if not self.getinfo(src_path).is_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would isdir
not be better? If you don't need the info object, then the FS object can use a potentially faster codepath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling getinfo
will raise a ResourceNotFound
as expected, whereas if you were to use isdir
you would need one call to check for the existence of a resource, and a second call to check that the resource is a directory.
Since the base implementation uses getinfo
as a base for isdir
, isfile
and exist
, it will result in only one call to getinfo
to do as such but two calls otherwise.
@@ -820,6 +822,8 @@ def move(self, | |||
|
|||
if not overwrite and self.exists(dst_path): | |||
raise errors.DestinationExists(dst_path) | |||
if self.getinfo(src_path).is_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here...
fs/ftpfs.py
Outdated
@@ -99,16 +107,19 @@ def __init__(self, ftpfs, path, mode): | |||
self._read_conn = None | |||
self._write_conn = None | |||
|
|||
def __length_hint__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit puzzled by this. My understanding is that this should return an estimate of the number of results if you were to iterate over the object. But if you iterate over a file you get lines, so shouldn't this method return the number of lines, rather than the number of bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, that totally went above my head when I made it !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. Don't think there is any shortcut for calculating number of lines for ftp files. Its an idea for MemoryFS though!
fs/ftpfs.py
Outdated
@@ -15,7 +15,7 @@ | |||
from ftplib import error_temp | |||
|
|||
from six import PY2 | |||
from six import text_type | |||
from six import text_type, binary_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't strictly need binary_type
since bytes
is the same on Python2.7 and Python3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
six.binary_type
in Python 2 is str
, not bytes 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but on Python2.7
>>> str is bytes
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been living in a lie for the last three years.
fs/ftpfs.py
Outdated
ftp = self.fs._open_ftp() | ||
ftp.voidcmd(_encode('TYPE I')) | ||
ftp = self.fs._open_ftp(self.fs.encoding) | ||
ftp.voidcmd(str('TYPE I')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So str
would be bytes on Python 2.7 and text type on Py3? Which is what the ftp library expects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is a summary from what I read during my researches:
ftplib
expected binary (str
) strings in Python 2, but was patched to simply decode text (str
) strings in Python 3 and decode them as UTF-8. Because of this, there may be cases where a string is decode twice, or things like that. I'm not completely sure about it (the StackOverflow where people debated that stated the ftplib
was a mess), but at least that provided good results with unicode path tests : when I did it another way, I'd start to have decoding/encoding errors in either Python 2 or Python 3.
fs/ftpfs.py
Outdated
def writelines(self, lines): | ||
self.write(b''.join(lines)) | ||
|
||
def truncate(self, size=None): | ||
# Inefficient, but I don't know if truncate is possible with ftp | ||
with self._lock: | ||
if size is None: | ||
size = self.tell() | ||
size = size or self.tell() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would break if you did truncate(0)
. Would it not truncate at the current file position?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right, I'll change it to self.tell() if size is None else size
.
fs/ftpfs.py
Outdated
self._welcome = self._ftp.getwelcome() | ||
return self._ftp | ||
|
||
@property | ||
def encoding(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be possible to detect and store the encoding in the ftp
property? Just wondering the lazy detection of encoding would result in excessive requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but in both cases it only makes one additional call compared to the master
implementation I guess. I could move it into the ftp
property to encapsulate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might simplify things if the encoding detection logic is moved in to the ftp
property. If the encoding defaulted to latin-1
then you wouldn't need to pass it to open_ftp
.
I'm also struggling a bit to understand why you need to open a new connection to detect the encoding. I think you should be able to call "FEAT" on first connection, and cache the encoding on the FTPFS. No need to make a new connection object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the first point, I'll change that.
For the second point: you have to set the encoding prior to connecting to the server. If you connect to the server with the default (latin-1), then change to utf-8 after seing the UTF8
feature, the server will still think you're sending latin-1 encoded strings. So, to reset the server and use the proper encoding, you have to reconnect to the server with the new encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've figured out something here. What was confusing me is that, I don't believe the encoding is ever sent to the ftp server. So the server is always going to send the same data regardless of what FTP.encoding
is set to.
I've looked at the source and what appears to be happening is that its ftplib
itself is using the encoding when it calls self.sock.makefile
in the connect
method. That creates a file-like object with the encoding when you connect.
So that would explain why you can't change the encoding after you have connected. But only due to a quite artificial limitation in ftplib. Otherwise you could just do a FEAT, and then set the encoding in the same connection.
It's really unfortunate.
I can think of a fudge, but its not very nice.
If the encoding is 'utf-8', you could re-encode the paths as 'latin-1' (effectively getting back the same bytes you received), then decode them as 'utf-8'. That should give you properly decode paths.
Another possibility is to do this if you detect 'utf-8' encoding:
ftp.file = ftp.sock.makefile('r', encoding='utf-8')
Both are hacks, so I'm not sure they are worth it. Otherwise we may be stuck with the single additional request to get the encoding...
def features(self): | ||
"""Get features dict from FTP server.""" | ||
if self._features is None: | ||
try: | ||
response = self.ftp.sendcmd(_encode("FEAT")) | ||
response = _decode(self.ftp.sendcmd("FEAT"), "ascii") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a possibility of a unicode decode error here at all? I see the original did something similar, but it might be worth handling encoding problems for paranoia's sake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC does not say anything about encoding, but I believe features will only contain ASCII characters, so I guess there's no point over-interpreting the server response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to handle the potential edge case where a response doesn't return 7-bit ascii. Maybe it won't ever happen, but there is a potential UnicodeDecodeError lurkng here.
How about _decode
has unicode error handling set to replace
? That way a flakey ftp server couldn't break this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if a server can't even get the features list right, I'm sure the code will break elsewhere at some point 😁 But I'll change the code accordingly.
pasw='1234' | ||
|
||
@classmethod | ||
def setUpClass(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does seem simpler than launching numerous ftp servers! Is it faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ! Running tests.test_ftpfs
takes 6.5s on my machine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(except for test_connection_error
which sometimes takes longer, sometimes not, so I left the slow
attribute for this method only)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a socket error? Maybe socket.settimeout
2 similar comments
Any other change you want me to do ? |
Looks good. Will have a quick pass later and merge. Curious re your thoughts on #68 (diff) Looks like we don't strictly need 2 connections, to determine the encoding but the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Hmm. A few test fails on my Mac, but TravisCI isn't complaining. |
everything worked on my ArchLinux 😕 |
I'll document the test fails on the issues. Could use your feed back. |
Hi Will, I updated the tests in
fs.test
with some missing checks, and discovered some new bugs I will fix gradually in this PR.New tests
FS.copy
,FS.move
,FS.copydir
andFS.movedir
should raiseResourceNotFound
when the source is nowhere to be foundFS.copy
andFS.move
should probably raiseFileExpected
when the source is a directory (not listed in the documentation, but makes sense I guess ?...)FS.movedir
andFS.copydir
should raiseDirectoryExpected
when the source is a fileFS.openbin
:seek
method that returns the new absolute position when calledwrite
method that returns the number of bytes (in binary mode) / characters (in text mode) writtentruncate
method that returns the new length of the fileDiscovered bugs
fixedFTPFile.seek
,FTPFile.truncate
andFTPFile.write
do not return anythingfixed, see belowFTPFS
crashes on unicode paths (/földér
)fixedMemoryFile.write
andMemoryFile.truncate
do not return anythingmany filesystem'sfixed by fixingmove
andcopy
do not raiseFileExpected
when given a directoryFS.copy
andFS.move
About FTPFS
FTP servers supporting UTF-8 encoded paths display the feature
UTF8
. But theftplib.FTP
objects require to set the encoding before connecting to the server, or else the server and the client encoding will not be synchronised (client will send utf-8 paths but server will treat them as latin-1).To fix this, the encoding is checked when accessing the
FTPFS.ftp
property for the first time. A mock connection is established to check for theUTF8
feature ; then, a proper connection is established, with utf-8 as the encoding if available, and latin-1 otherwise. Then, any new ftp connection will use the previous encoding, so the overhead is limited.About test_ftpfs
I rewrote the class so that a new server is created once by class instead of before each test, which was a lot longer for nothing. I also made use of
pyftpdlib.test.FTPd
instead of externally calling the file and then task-killing the thread, which was a lot hackier.I still think there is an issue with the timeout of
FTPFS
objects, since thetest_connection_error
tests take a really long time, longer than what they should take given the timeout.