-
Notifications
You must be signed in to change notification settings - Fork 196
[META] User Flag Functionality #848
Comments
Outstanding action items:
|
Hey @annatuma, I've updated the tickets with the required work for the backend. |
We're going to assign this to everyone on the team in order to review this in this order. Once you're done reviewing it, please assign it to the next person in the queue: |
(Deleted my question; I read the DMCA page and see that there's a Google form, and understand the desired DMCA flow now) |
On the NSFW front, I'm not aware of any source where we're pulling content that we know is mature. |
I'm quite concerned about potential abuse of the flagging system. This is a common vector for censorship attacks on the internet at the moment. I'd like to make sure we have the necessary data to detect such an attack, and to prevent abusive flagging. On the other hand, preventing abuse of the flagging system could entail collecting data to identify abusive flaggers, and I am quite uncomfortable with keeping any data on users. I think we should make sure that we don't collect any data without good reason, and fair warning to the user. One option might be storing an anonymized IP address in the table as well (with a large bold warning that the data will be stored on the submission form). The alternate idea I had for that was to ask for an email, and send a one-time link to the 'real' form to the email they submit (thereby validating the email). We could then store an anonymized version of the validated email address. |
To my understanding, the scope of this (meta) issue is limited to getting info into the DB at the service layer. @annatuma Is that correct? If so, do we want further issues to propagate that info back into the data layer, or should it stay in the service DB only? If so, we won't be able to use the meta_data field for holding the NSFW tag. We'd need some other plan there. |
LGTM. @annatuma the frontend work can start without the API work having to be complete. The API spec is enough for the work to start. |
I think we need to standardize on a single terminology here. The reporting calls it "Adult", the backend flag is "NSFW", and the filter is "Mature". This is going to cause confusion later. We should call it one thing everywhere.
We're building API endpoints for this. Why do we want to run it every 12 hours instead of immediately? cc @aldenstpage Additionally, I share @mathemancer's concerns about trying to identify and minimize abuse of the reporting functionality. We should address this in the design. I like the idea of storing anonymized IP addresses, but we need to update the mockup to clearly let people know that we're storing their anonymized IP address. |
@annatuma assigning back to you for review of all the comments and potential updates to the spec. |
I'd prefer engineering input on what makes sense here. What's the use case for storing this in the data layer? Is there a reason the service layer doesn't suffice? Service layer is clearly enough for the frontend, but if there are reasons this should go back to the data layer we should evaluate those. @mathemancer @aldenstpage @kgodey please weigh in. |
Fine by me. Updating it to "Mature" everywhere.
Immediately would be better, the 12 hour run came out of our earlier conversations about this feature. That said, whatever @aldenstpage says is feasible here (12 hours versus immediate) is fine by me.
I'll sync with @sarahpearson about the appropriate way to do this. |
@aldenstpage please update the API spec to say "mature" instead of "nsfw" per @annatuma's update above. |
I envisioned this as getting propagated back to the data layer through Kafka as described here; in my mind you would handle these events by setting the appropriate flag in the An alternative design is to keep all logic related to mature content and DMCAs in the API layer in separate tables. We could perform mature content detection during ingestion (or in the future, a proper ETL pipeline). However, if you are able to detect that something is marked mature upstream (e.g. you find a way to get the NSFW flag from the Flickr API), you should still mark it in the
We temporarily store IPs on our servers for rate limiting purposes and store them with the last octet erased in our server logs for up to 3 months; we should have the technical means to catch abuse already. Data retention practices are described in our privacy policy.
There's no technical reason for this to take 12 hours, particularly if we are able to avoid having to track NSFW status in both the catalog and API layers. |
Long term, I'd want the flags to be in the data layer, since that would be part of any pipeline where we, say, used ML to figure out if other unflagged pictures might be mature content. Ideally, any metadata that was just about displaying the image would stay in the service layer, and any metadata that was 'inherent' to the image would end up in the data layer eventually. I think whether the image is considered 'Mature' or not falls under the latter. But, we don't have a specific use for the 'Mature' flag in the data layer at the moment, so maybe I'm just falling prey to YAGNI fallacies. |
Your last point is a good one; we can always extract the data later for training. I think it is also debatable whether matureness is inherent since it is highly contextual to culture and personal bias; we're taking a "stance" that an image is mature based on what we think our audience wants, and that's probably going to be a moving target. I'll stick with keeping it internal to the API layer now and we can bridge that to the catalog later if the need arises. |
@brenoferreira we have the text to add to the first screen of the user reporting form regarding collection of IP addresses: "For security purposes, CC collects and retains anonymized IP addresses of those who complete and submit this form." @panchovm is updating the mockup right now. Thanks to @sarahpearson for review and language. |
The mockup is now updated with this text |
Everything is live in production and looks good. We have a couple of minor follow-up issues (as is to be expected). |
This is a meta ticket, for a feature that requires work across the frontend, catalog, and API.
Takedown API Endpoint Create endpoint for copyright takedowns cccatalog-api#294
Report API endpoint Create an endpoint for accepting reports on images cccatalog-api#474
API Filter Expose mature content flag data in API cccatalog-api#339
Django Interface Create a Django admin interface for handling pending mature reports and DMCA requests cccatalog-api#473
Process User Reported Content Queue [META] User Flag Functionality #848
Design of UI in Figma Convert User Flag UI from Moqups to Figma #851
Frontend UI Allow users to report problematic content #425
Frontend Filter Build filter for NSFW/mature content #435
Overview & Concept
We’d like to introduce a feature to CC Search to allow users to report problematic content quickly and easily. This is important for several reasons:
This feature will require an addition to the CC Search UI for users to interact with, as well as backend support for recording user reports and taking action on those. It is important to note that nothing happens automatically to content when it is reported by a user. Any decision to hide or remove content from CC Search or mark it as NSFW in the Catalog will be made by CC Staff.
Frontend functionality
Reporting UI
Initial Wireframes
Figma Mockup to base flow on
We will add a link to the top right corner of the Single Results Page, with a Flag icon, and the text “Report”.
Clicking on “Report” pops out a modal, which contains the following:
What is wrong with this {content type:image}?
These are the frontend interactions that happen for each of these selections:
“{content type:Image} infringes copyright.”
In a new tab, load the DMCA form
In modal, show Thank You message
“{content type:Image} is Mature Content.”
In modal, show Thank You message
“Other.”
New screen loads in modal, with:
Header: “Please describe the issue for us”.
Large text input field.
[Submit Issue] button
After user hits [Submit Issue], show Thank You message
Note: we’ll display {content type} where curly brackets are, based on the content type of the work on the page that the button renders on, on individual result pages.
Ticket: #425
Filter
We want to add a filter, at the very bottom of the filter list, called “Search Settings”. It should have one option for users to select, called “Show Mature Content”.
Ticket: #435
Backend process
Recording User Actions
These are the backend interactions that happen for each of the frontend user selections:
“{content type:Image} infringes copyright.”
Add to table with “DMCA flag” and outclick indicator
Currently, Legal takes actions based on response to DMCA form handled via email.
“{content type:Image} is Mature Content.”
Add to table with “Mature” flag
“Other” and text submission:
Add to table with issue description in field.
All reported content is stored in a table as soon as it is reported by a user. Two things happen when new content is added to the table:
Table Fields
This is the information we should store in the table:
Table Interface Options
After content is reviewed, we’d like to continue to store it in the table, so we have a record of all user reports. However, any content already reviewed and therefore marked with “no” should by default not be shown (i.e. be filtered out) from the Django admin interface. A backend user of the interface should be able to select it for inclusion, should they need to review prior actions.
The following actions should be available for each row in the table:
Mark Mature
De-Index Content
Do Nothing
Clicking this only changes the status of the record from “no” (not yet reviewed) to “yes” (reviewed).
After any available action is taken, the status of the record is updated from “no” (not yet reviewed) to “yes” (reviewed).
Data Processing
As indicated in the Table Interface Options, data processing needs to take place based on the actions taken.
We need to write a script, set to run every 12 hours, that looks for rows where the status of the record has been set to “yes” since the script last ran, and where the action taken in the table is either “Mark Mature” or “De-Index Content”.
Any record with the action “Mark Mature” should:
Have “Mature” added as metadata to the content record in the Catalog
Any record with the action “De-Index Content” should:
Use the existing endpoint (cc-archive/cccatalog-api#294) for copyright takedown.
The feature spec, including the Internal Process and Future Iterations, is available to CC Staff here.
The text was updated successfully, but these errors were encountered: