Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode "/" in filenames #7

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

larsks
Copy link

@larsks larsks commented May 20, 2023

Replace "/" in recipe names with "%2F". Without this change a recipe whose
name contains "/" cannot be saved to the filesystem.

Closes #6

Replace "/" in recipe names with "%2F". Without this change a recipe whose
name contains "/" cannot be saved to the filesystem.

Closes coddingtonbear#6
@@ -74,6 +74,10 @@ def calculate_hash(self) -> str:
def update_hash(self):
self.hash = self.calculate_hash()

@property
def safe_name(self):
return self.name.replace("/", "%2F")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a safe_name property! But, I wonder if maybe we should instead take a different path. Could we consider just stripping these filename-invalid characters?

There are a few reasons I think that might be a preferable approach:

  1. The filename isn't all that important -- it's just there to make it somewhat easier to tell which recipe is going to be in whichever file you open.
  2. I don't know that %2F is all that meaningful to anybody. I've been programming across tons of projects for the last 20 years or so, and although I'd recognize %2F as being the normal hex-based URL-encoding scheme, my first thought would be, maybe "well, I guess, ampersand?", and I'd have to look up what %2F means specifically to discover that it was actually /.
  3. There are other characters that can exist in a recipe name but are also filename-unsafe -- we should probably handle those, too, no? Unfortunately, the list of fillename-safe characters per-OS varies; so we might need to pick a pretty conservative set.

Alternatively, if you did convince me that url-encoding is the right path (and, well, you could probably do that?), I really think we should use the built-in facilities for performing that conversion: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote -- of course, you'll probably want to mark certain unambiguously filename-safe characters as "safe" using the safe keyword argument (maybe just ?).

Copy link
Author

@larsks larsks Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think standard URL-encoding characters like %2F and %3A are all that uncommon, especially to developers, but I agree that we should probably use the existing urllib tooling for this.

How about:

import urllib.parse as urlparse

...

@property
def safe_name(self):
    return urlparse.quote(self.name, safe='')

I think that will result in names that are safe for filenames across Linux, MacOS, and Windows, since it will encode all of / and : and \ (and non-printable characters like \n and \r and \t, etc).

Copy link
Author

@larsks larsks Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: Using the quote method will render something like "비빔밥" to "%EB%B9%84%EB%B9%94%EB%B0%A5"...which isn't really any better than just hashing the recipe names and using the hex digest, but at least it's not going to result in an invalid filename.

We should probably also limit the length of the string returned by safe_name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"download-recipes" fails with "FileNotFoundError"
2 participants