Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional validation to post creates/updates #3352

Closed
5 tasks done
BrokenEagle opened this issue Nov 4, 2017 · 10 comments
Closed
5 tasks done

Add additional validation to post creates/updates #3352

BrokenEagle opened this issue Nov 4, 2017 · 10 comments
Labels

Comments

@BrokenEagle
Copy link
Collaborator

BrokenEagle commented Nov 4, 2017

I've been prototyping some additional tag input validations with a userscript in topic #1234 (GitHub code link). I don't believe there are any additional considerations at this point, so it's probably safe to move forward with this.

Basically on tag inputs for uploads, post edits, and quick post edits, a validation is run on all added/removed tags to check for different error conditions.

  • 1. Tag add validation - Checks for tags that are empty, and that will be created/populated
  • 2. Tag remove validation - Checks hierarchy relationships that will cause a tag removal to be readded
  • 3. Rating validation - Quicker and less destructive way to check that the rating exists
  • 4. First party source validation - Sources from Pixiv, Nicoseiga, Nijie et al should always have an artist tag. An exception to this should be for official company sources as indicated by the presence of the official_art tag.
  • 5. Copyright tag validation - Posts should either have a copyright tag, the original tag, copyright_request

Preedit and postedit tag calculation (psuedocode):

preedittags = tags that already exist on post (for uploads this will be the null set)
postedittags = tags that exist at form submission
postedittags = transformdowncase(postedittags) #Normalize tags
postedittags = filtermetatags(postedittags)

1. Tag add validator

This is more of a warning condition, as it could potentially indicate a mispelling or mistagging. Adding this check would reduce the vast number of mistags that get accidentally added. One overide on the tag add validation would be if the tag is prepended with one of the tagcategory tags, e.g. general:, gen:, as this is seen as more of a deliberate action.

Added tags calculation (psuedocode):

addedtags = filtercategorytags(postedittags)  #The overide as mentioned before
addedtags = filternegativetags(addedtags)
addedtags = addedtags - preedittags  #Set difference
negativetags = getnegativetags(postedittags)  #Filters nonnegative and removes minus "-" prepend
addedtags = addedtags - negativetags  #Allows users to correct an error without having to hunt for it

For the validation, it checks all tag adds using the /tags controller with the search[hide_empty]=true option to find tags that don't exist or have a post_count of 0. Additionally, it checks for any existing aliases using the /tag_aliases controller with search[antecedent_name]=tagname and also search[active]=true.

On detection, it lists all of the added tags that will be created/populated.

2. Tag remove validator

This is more of an informational condition, as the user may think they have removed a tag when actually an existing tag implication may re-add that tag back. Adding this check would assist with tag gardening by assisting the user to remove all implications for a tag removal.

Removed tags calculation (psuedocode):

postedittags = transformcategorytags(postedittags)  #Removes any category prepends
removedtags_1 = preedittags - postedittags  #Tags removed via deletion
negativetags = getnegativetags(postedittags)  
removedtags_2 = postedittags & negativetags  #Set intersection
reomvedtags = removedtags_1 + removedtags_2  #Set union/concat
finaltags = postedittags - removedtags

For the validation, it checks all preedit tags using the /tag_implications controller with search[consequent_name]=tagname and also search[active]=true. For each tag in the removedtags set, all of its tag relations are calculated recursively. A bad remove is detected if one of those tag relations is in the set of finaltags.

On detection, it lists all of the removed tags and the hierarchy relationships that prevent the remove.

3. Rating validator

This validation already exists for Danbooru, however having it done with Javascript is nice because it's much quicker. It also doesn't cause a page reload, which causes the post preview to disappear on the uploads page. This latter item is more of an annoyance, as there have been times when I remembered more tags to add after submitting the form, however the post preview was no longer there as an aid after it errored out because of the validation.

Final

This was prototyped client-side with Javascript, but it may be better to do more of the work server-side since all of the alias, implication, and tag queries are pretty much done in batches. Additionally, a lot of data was being stored client-side to reduce the penalty for network calls, but that started conflicting with Danbooru's autocomplete (ref. issue #3335). The primary storage method was migrated to IndexedDB to alleviate this, however this only works on the more modern browser versions that support IndexedDB (Wikipedia: IndexedDB), as older versions that don't support it fall back to using localStorage which would again cause conflict with autocomplete. Additionally, the use of the async and await keywords and also promises would have to be rewritten for those browsers that don't support them.

What do others think though? I'd be willing to try my hand at implementing this, but would like to know which way would be the preferred method.

Edit:

  • (2017-11-08) After speaking with evazion and a few others on Discord, I added a first party source validation. Basically, there should be no excuse for not adding an artist tag for a first party source like Pixiv, NicoSeiga, Nijie, etc. We have a thread in the forum to ask for naming help, and as a fallback generalized qualifiers like pixiv12222122 could be used. This assists with collating posts from the same artist and site. Collecting the posts after the fact can be challenging, especially since artists can delete their accounts.
  • (2017-11-09) Another idea brought up was validating the presence of copyright tags. Basically, all posts should have have a copyright tag, the original tag, or copyright_request
  • (2017-11-13) Updated 5. to include official art as an exception to the copyright validation
  • (2017-11-25) Made a mistake with official_art. It adds an exception to artist validation, since it may be difficult if not impossible to determine the artist for official art.
@r888888888
Copy link
Collaborator

IME the red highlighting on the post count regularly catches typos for me. It would be nice to catch these client-side but the current text area field isn't too amenable to that. Curious how you were thinking of handling those validations. The current flow with submitting, noticing the error, fixing and resubmitting doesn't seem that bad to me. A lot of the assets will be cached at that point.

For removing implied tags, IME this doesn't happen often because usually the relationship is obvious (at least with copyrights and subtypes that reuse the name). If you do a reltag search on the tag the implied ones will be highly correlated.

For ratings, again it's basically a question of moving it from server-side to client-side. You could do some interesting stuff with guessing the rating as tags are added. We could pull some stats about popular tags that are highly correlated to a specific rating. But I'm not sure how much of a problem this is currently.

Overall, I think it's a lot of extra JS for not much benefit. I've been getting concerned with the JS getting bloated lately and will probably take steps to better modularize them, so that at least the post pages are heavily optimized. Unfortunately any addition to editing posts means they'll have to be loaded on posts/show.

@Type-kun
Copy link
Collaborator

Type-kun commented Nov 7, 2017

I agree that submit-warn-resubmit flow is nice; I think there are some other checks that could use that kind of flow without requiring page refresh.

If JS bloating is the concern, this particular task could be handled almost entirely server-side. The client would just post "before" and "after" tag strings as parameters, and get the results of all checks as JSON object. In fact, I think that tag checks are better handled server-side anyway, since there's much more data available and it's easily accessible. The load should be comparable to saving the post, and the function wouldn't be called on mass updates or tag scripts, so I don't think it would put much strain on the server. Besides, this would allow to significantly improve or expand the checks, if necessary.

As for implications re-applying tags, I must confess that I sometimes had to scratch my head to figure out why the tag refuses to remove. What's worse is that you don't always even see that the tag is not removed, so it's a nifty feature to have.

@BrokenEagle
Copy link
Collaborator Author

BrokenEagle commented Nov 9, 2017

I liken the current setup for handling errant mispelled/mistagged items to brake lights on a vehicle. They work great if people are paying attention... not so much if people aren't. It's more of a passive warning system. Plus, it only warns on the first instance of a mispelling/mistagging. What I suggested in the OP is more of an active warning system. Going back to the brake analogy, it would be equivalent to automated braking. It catches errors even if you aren't paying attention.

Also, even though I'm a pretty experienced uploader by now, I still make mistakes as seen on my upload report. Before I started using my userscript, I would even occasionally have spelling/mistagging errors that had to be corrected by others. Going by the tags page on Danbooru, there are close to ~13K general tags with only a single post. At this point, it would be a Herculean task to go through and validate all of those manually, and that's not taking into account items that might have been misspelled/mistagged the same way more than once. These mistagged posts are lost from the correctly-spelled tag search until they get corrected.

As far as the Rating Validator goes, I didn't mean that it would check that the image rating is appropriate. I meant that it would check that you had entered any rating whatsoever. The current workflow is to submit, have the page error out after more than a second, have to go back and add a rating, and then resubmit. With Javascript, this check could instead be done almost immediately, checking that any of the rating radio buttons have been selected. This is important as I miss adding the rating a lot, since I always add it last to avoid uploading a poorly tagged post.

Getting to what Type-kun said, I agree that it would make sense to do most if not all of the validations server-side, as it has all of the information and doesn't suffer from network latency in its calculations. Also, any additional delay these checks cause would be negligible compared to how long an upload usually takes, generally on the order of several seconds.

@evazion
Copy link
Member

evazion commented Nov 9, 2017

#3357 is a proof-of-concept showing how the first two checks can be done server-side with no extra Javascript. I only implemented the first point (checking for empty tags) for now.

@Type-kun
Copy link
Collaborator

Type-kun commented Nov 9, 2017

The downside of fully server-side checks is that they require full upload processing before warning user, and the warning can be accidentally ignored. Having user consciously put a mark in "yes, I saw that warning,it's ok" checkbox before uploading would be better.

If we define server-side checks as JSON POST endpoints and have Javascript post the info there and retrieve the answer, it would only take a few JS lines to implement.

@BrokenEagle
Copy link
Collaborator Author

Yeah, some of the server-side checks would work better if they were done before submitting the upload such as the creation of a new tags and non-removals, as users may completely miss the warning even though it is more prominently displayed. Even if they don't miss it, we're basically forcing the workflow for users to be open post, open post edit box, submit changes, wait for page reload, notice error, open post edit box again, submit changes again. Using the JS method as Type-kun described, we'd be cutting out two of those steps.

I think the artist validations would work better after the post is created though, as those are more of a cleanup action than a mistake, as we don't necessarily want to force a user to create an artist tag or entry before they're allowed to upload a picture.

@evazion
Copy link
Member

evazion commented Nov 9, 2017

For most checks we need to normalize the tags, apply negated tags, handle metatags, apply aliases, apply implications, etc, before we can analyze the tags. We get this for free if the checks happen as a side effect of saving the post. Doing the checks elsewhere would mean duplicating all the Post#normalize_tags logic, which I think would be error-prone.

As far as detecting typos, I think that's a more general problem than the other validations. I think typos would be better handled by a) making autocomplete autocorrect typos (see 1, 2, 3), and/or b) having a Did you mean? section in the related tags listing mistyped tags along with corrections, c) giving Did you mean? suggestions for normal searches too. IMO this would be a better workflow than either requiring upfront confirmation or giving an after-the-fact warning, plus it would improve search usability in general.

IMO failing to remove implied tags is rare enough that an after-the-fact warning isn't so bad. Also consider tag scripts and quick edits. Having to call a separate validation endpoint would make it a little more difficult to do the checks in those contexts, versus returning a warning message as a side-effect of the PUT /posts/$id.json call.

@BrokenEagle
Copy link
Collaborator Author

I have to agree that evazion makes some good points, and now that I think about it, a graduated response might be more what is called for instead of going full bore right off the bat. Therefore, we could start out small by trying things using the after post save validations as in #3357, and see how that does and if anything needs to be moved to earlier in the post submittal process.

r888888888 added a commit that referenced this issue Nov 21, 2017
Moved input existence validations to client (#3352)
r888888888 added a commit that referenced this issue Dec 13, 2017
Post editing: add warning when creating new tags (#3352)
@stale
Copy link

stale bot commented Jul 30, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 30, 2019
@BrokenEagle
Copy link
Collaborator Author

This has already been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants