Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow quoted source: metatag, allow source: metatag when editing posts #2679

Merged
merged 1 commit into from Sep 27, 2016

Conversation

Type-kun
Copy link
Collaborator

This should take care of #2664.

@r888888888 requesting code review. I don't think it would break anything, it simply extracts source:"..." verbatim from tag string before splitting the rest on spaces. I have two concerns about other code, though:

  • What's the purpose of .gsub(/[%,]/, "") in Tag.scan_tags? I put source extraction before it, since I figured that source can contain commas and percents.
  • source:none will set post source to nil. source:"" and source: will set it to empty string. Should something be done about that?

@r888888888
Copy link
Collaborator

I'm honestly not a fan of this. Quotes aren't used in this manner anywhere else so why would this specific use case with tagging the source enable it? To give an example of elsewhere where you might expect to be able to do this: tagging the pool with the title, favorites, favgroups. This also applies to search. Can you search for source using the quotes? You also get into issues like, should escaped quotes be parsed?

To answer your specific questions, % is stripped out because it has a special meaning in SQL LIKE queries. Commas are stripped out because they are used as a delimiter in some places.

Null/empty string sources are already accounted for so probably nothing needs to be done about that.

cc @evazion

@Type-kun
Copy link
Collaborator Author

Yeah, I can see that inconsistence is annoying, but sources are a bit of special case. Everything besides source internally replaces spaces with underscores, so it never was a problem in other cases. This actually was requested multiple times, see #2178, #1548. That said, why not extend the solution and allow quotes in the cases you mentioned, too?

This also applies to search. Can you search for source using the quotes?

Yes, the changes made to scan_query and parse_query allow searching using quoted string, it actually turned out to be as simple as that. I'm not sure whether it's because of Ruby's DRY or Danbooru code being properly structured, but it left me quite impressed :3

should escaped quotes be parsed?

I left that for the future, as spaces are used way more often than quotes, both for search and for specifying the source, and unlike space, " can be replaced with ' when tagging, if necessary. However, thinking about it further, allowing quotes as \" is ridiculously simple: source:".*?[^\\]". All that's left is to replace \" to " before actual usage.

@evazion
Copy link
Member

evazion commented Sep 25, 2016

Quotes aren't used in this manner anywhere else so why would this specific use case with tagging the source enable it? To give an example of elsewhere where you might expect to be able to do this: tagging the pool with the title, favorites, favgroups. This also applies to search. Can you search for source using the quotes? You also get into issues like, should escaped quotes be parsed?

Why not allow it? Not especially useful for current metatags, but imagine being able to search stuff like comment:"translation bump" translation_request or note:"just as planned" parody or commentary:"blah blah blah" commentary_request. Those can be done via forms in other pages, but that's only necessary because the queries can't be expressed as tag searches.

I do think it's pretty hacky though trying to do this with regexes. IMO in a perfect world tag searches would be parsed by a real parser instead of a bunch of regexes.

source:none will set post source to nil. source:"" and source: will set it to empty string. Should something be done about that?

IMO text fields should be enforced as NOT NULL in the db whenever possible. Confusion over fields being nil or the empty string was the root cause of #2665 and #2648.

However, thinking about it further, allowing quotes as " is ridiculously simple: source:".*?[^]". All that's left is to replace " to " before actual usage.

That lets you escape " with \, but what if you want to escape \" itself? (Probably not needed in practice, but still).

@Type-kun
Copy link
Collaborator Author

That lets you escape " with \ , but what if you want to escape \" itself? (Probably not needed in practice, but still).

\\"? It'll become \" after replacement.

I do think it's pretty hacky though trying to do this with regexes. IMO in a perfect world tag searches would be parsed by a real parser instead of a bunch of regexes.

Well, Ruby has a decent native regexp support, but I don't remember it having built-in lexers. Either way, it's easily done with two or three regexps, why bother with parser? Allowing other metatags to have quoted values will not require new regexp, only amending an existing one: /(source|pool|favgroup):".*?"/, like that. Unquoting and handling escaping quotes will be moved to function if this gets green-lit, either way.

@DennouNeko
Copy link

DennouNeko commented Sep 26, 2016

Technically, escaped " should be replaced with \", \ should be replaced with \\, so escaping \" should produce \\\" as a result. When unescaping, once \ is reached, unless it's a special escape sequence, like \r, \n, \t, the next character should be copied as-is. At least that's how pretty any programming language handles these things.

@Type-kun
Copy link
Collaborator Author

Programming languages do that for programmers' sake, to write escape sequences instead of charcodes. It might have been necessary if we had to escape anything besides ", but we don't even need to escape \ itself, it's not considered a special character in this case. Similar case would be single quote ' in Oracle or Postgresql strings - it's the only character that is escaped, because a string itself is denoted by single quotes, and all \ characters are treated literally.

@DennouNeko
Copy link

DennouNeko commented Sep 26, 2016

Well, if you want to escape " with \, you'll want to escape \ itself, otherwise you won't be able to tell if it's beginning of escape sequence, or just a character. Just saying ;)
Edit:
My bad, I've re-read it after getting home. I see your point.

@Type-kun
Copy link
Collaborator Author

Nope, I'll be able to tell - if it's followed by ", it is an escape sequence, otherwise it is not :3

We are not processing the string as character sequence anyway. Unescaping is done by calling replace on \" to ". If you want to enter literal \", you go with \\", like I mentioned earlier: \" is replaced to ", and leftmost \ is simply left alone. Escaping \ in our case is unnecessary and produces more clutter. Either way, we're somewhat diverging from the topic here, let's move to dmail if you're on Danbooru and want to continue :3

@Type-kun
Copy link
Collaborator Author

Re: escaping quotes and backslashes, @DennouNeko raised a rare yet valid case where quoted string ends with \. so src_string\ will get entered as source:"src_string\" and parsing will fail. Sadly, escaping backslashes as well as quotes will make this overly complicated - I can't conjure an easy regexp to account for that from the top of my head. Or rather, we probably can use source:"(?:\\\\|\\"|[^"\\])*?" - see http://rubular.com/r/de4jnCeCr2 - but I think default reaction to seeing this expression in code later would probably be "what", which is not a good sign. It also doesn't match if a single \ is entered...

I see some ways out:

  • ignore the possibility of failure, but leave a note somewhere for an off-chance we somehow stumble on this years later
  • don't bother with escaping at all - don't allow double quotes in a quoted string, period
  • go SQL way - escape double quotes as double-double-quotes - you'd have to enter "" instead of ". It simplifies regexp to source:"(?:""|[^"])*" - http://rubular.com/r/hikXVOKuY8 - but is probably somewhat counterintuitive to users

Despite being lazy, ignoring the possible error might be a better solution.

@r888888888
Copy link
Collaborator

I'll merge this for now. Additional ideas can be implemented in future PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants