[META] User Flag Functionality #848

annatuma · 2020-04-02T22:11:27Z

This is a meta ticket, for a feature that requires work across the frontend, catalog, and API.

Takedown API Endpoint Create endpoint for copyright takedowns cccatalog-api#294
Report API endpoint Create an endpoint for accepting reports on images cccatalog-api#474
API Filter Expose mature content flag data in API cccatalog-api#339
Django Interface Create a Django admin interface for handling pending mature reports and DMCA requests cccatalog-api#473
Process User Reported Content Queue [META] User Flag Functionality #848
Design of UI in Figma Convert User Flag UI from Moqups to Figma #851
Frontend UI Allow users to report problematic content #425
Frontend Filter Build filter for NSFW/mature content #435

Overview & Concept

We’d like to introduce a feature to CC Search to allow users to report problematic content quickly and easily. This is important for several reasons:

Quicker identification of potential DMCA issues
Making use of user reporting to build a list of NSFW images
Ability to automatically hide NSFW images (with additional filter to switch those on)

This feature will require an addition to the CC Search UI for users to interact with, as well as backend support for recording user reports and taking action on those. It is important to note that nothing happens automatically to content when it is reported by a user. Any decision to hide or remove content from CC Search or mark it as NSFW in the Catalog will be made by CC Staff.

Frontend functionality

Reporting UI

Initial Wireframes
Figma Mockup to base flow on

We will add a link to the top right corner of the Single Results Page, with a Flag icon, and the text “Report”.

Clicking on “Report” pops out a modal, which contains the following:

What is wrong with this {content type:image}?

“{content type:Image} infringes copyright.”
“{content type:Image} is Mature Content.”
“Other.”

These are the frontend interactions that happen for each of these selections:

“{content type:Image} infringes copyright.”
In a new tab, load the DMCA form
In modal, show Thank You message

“{content type:Image} is Mature Content.”
In modal, show Thank You message

“Other.”
New screen loads in modal, with:
Header: “Please describe the issue for us”.
Large text input field.
[Submit Issue] button
After user hits [Submit Issue], show Thank You message

Note: we’ll display {content type} where curly brackets are, based on the content type of the work on the page that the button renders on, on individual result pages.
Ticket: #425

Filter

We want to add a filter, at the very bottom of the filter list, called “Search Settings”. It should have one option for users to select, called “Show Mature Content”.
Ticket: #435

Backend process

Recording User Actions

These are the backend interactions that happen for each of the frontend user selections:

“{content type:Image} infringes copyright.”
Add to table with “DMCA flag” and outclick indicator
Currently, Legal takes actions based on response to DMCA form handled via email.

“{content type:Image} is Mature Content.”
Add to table with “Mature” flag

“Other” and text submission:
Add to table with issue description in field.

All reported content is stored in a table as soon as it is reported by a user. Two things happen when new content is added to the table:

An email alert is sent to Product and Legal
Information of the report can be viewed and acted on through a Django admin interface.

Table Fields

This is the information we should store in the table:

CC Search unique ID
CC Search record URL
Date and time
Report Type: dmca (if we include this), illegal, mature, other
Report Description (applies only to “other” reports)
Reviewed: yes/no

Table Interface Options

After content is reviewed, we’d like to continue to store it in the table, so we have a record of all user reports. However, any content already reviewed and therefore marked with “no” should by default not be shown (i.e. be filtered out) from the Django admin interface. A backend user of the interface should be able to select it for inclusion, should they need to review prior actions.

The following actions should be available for each row in the table:

Mark Mature

The metadata for the record gets updated to indicate the content is mature
Note: by default, we will hide Mature content in the UI, but we will introduce a filter for users to purposefully show it

De-Index Content

The piece of content is permanently de-indexed in the API
Note: this does not delete the record from the Catalog, but it introduces a flag to ensure that it no longer shows in search results

Do Nothing
Clicking this only changes the status of the record from “no” (not yet reviewed) to “yes” (reviewed).

After any available action is taken, the status of the record is updated from “no” (not yet reviewed) to “yes” (reviewed).

Data Processing

As indicated in the Table Interface Options, data processing needs to take place based on the actions taken.

We need to write a script, set to run every 12 hours, that looks for rows where the status of the record has been set to “yes” since the script last ran, and where the action taken in the table is either “Mark Mature” or “De-Index Content”.

Any record with the action “Mark Mature” should:
Have “Mature” added as metadata to the content record in the Catalog

Any record with the action “De-Index Content” should:
Use the existing endpoint (cc-archive/cccatalog-api#294) for copyright takedown.

The feature spec, including the Internal Process and Future Iterations, is available to CC Staff here.

annatuma · 2020-04-02T22:23:35Z

Outstanding action items:

@aldenstpage we need a ticket (or tickets) for the backend processes here. Please review the spec, create ticket(s), and update the description to include those.
@kgodey @brenoferreira does the backend work need to be complete before any work can be done on the UI? If so, please add "blocked" labels to the frontend tickets.
@mathemancer can you confirm that we are indeed pulling in mature/NSFW metadata from some sources, and these can be referenced when @aldenstpage works on Expose mature content flag data in API cccatalog-api#339?

aldenstpage · 2020-04-03T14:53:59Z

Hey @annatuma, I've updated the tickets with the required work for the backend. ~~Could you take a look at cc-archive/cccatalog-api#474 and fill out the list potential reasons users will be able to report images for the NSFW action interface?~~ All of the details I need are here, thanks!

kgodey · 2020-04-03T16:05:30Z

We're going to assign this to everyone on the team in order to review this in this order. Once you're done reviewing it, please assign it to the next person in the queue:
@aldenstpage
@mathemancer
@brenoferreira
@kgodey

aldenstpage · 2020-04-03T18:22:16Z

(Deleted my question; I read the DMCA page and see that there's a Google form, and understand the desired DMCA flow now)

mathemancer · 2020-04-14T14:32:50Z

On the NSFW front, I'm not aware of any source where we're pulling content that we know is mature.

mathemancer · 2020-04-14T15:28:41Z

I'm quite concerned about potential abuse of the flagging system. This is a common vector for censorship attacks on the internet at the moment. I'd like to make sure we have the necessary data to detect such an attack, and to prevent abusive flagging.

On the other hand, preventing abuse of the flagging system could entail collecting data to identify abusive flaggers, and I am quite uncomfortable with keeping any data on users. I think we should make sure that we don't collect any data without good reason, and fair warning to the user.

One option might be storing an anonymized IP address in the table as well (with a large bold warning that the data will be stored on the submission form).

The alternate idea I had for that was to ask for an email, and send a one-time link to the 'real' form to the email they submit (thereby validating the email). We could then store an anonymized version of the validated email address.

mathemancer · 2020-04-14T15:36:46Z

Any record with the action “Mark NSFW” should:
Have “NSFW” added as metadata to the content record in the Catalog

Any record with the action “De-Index Content” should:
Use the existing endpoint (cc-archive/cccatalog-api#294) for copyright takedown.

To my understanding, the scope of this (meta) issue is limited to getting info into the DB at the service layer. @annatuma Is that correct? If so, do we want further issues to propagate that info back into the data layer, or should it stay in the service DB only? If so, we won't be able to use the meta_data field for holding the NSFW tag. We'd need some other plan there.

brenoferreira · 2020-04-21T14:24:00Z

LGTM.

@annatuma the frontend work can start without the API work having to be complete. The API spec is enough for the work to start.

kgodey · 2020-04-21T18:23:13Z

“{content type:Image} is Adult Content.”

I think we need to standardize on a single terminology here. The reporting calls it "Adult", the backend flag is "NSFW", and the filter is "Mature". This is going to cause confusion later. We should call it one thing everywhere.

We need to write a script, set to run every 12 hours, that looks for rows where the status of the record has been set to “yes” since the script last ran, and where the action taken in the table is either “Mark NSFW” or “De-Index Content”.

We're building API endpoints for this. Why do we want to run it every 12 hours instead of immediately? cc @aldenstpage

Additionally, I share @mathemancer's concerns about trying to identify and minimize abuse of the reporting functionality. We should address this in the design. I like the idea of storing anonymized IP addresses, but we need to update the mockup to clearly let people know that we're storing their anonymized IP address.

kgodey · 2020-04-21T18:24:30Z

@annatuma assigning back to you for review of all the comments and potential updates to the spec.

annatuma · 2020-04-21T20:10:00Z

Any record with the action “Mark NSFW” should:
Have “NSFW” added as metadata to the content record in the Catalog
Any record with the action “De-Index Content” should:
Use the existing endpoint (creativecommons/cccatalog-api#294) for copyright takedown.

To my understanding, the scope of this (meta) issue is limited to getting info into the DB at the service layer. @annatuma Is that correct? If so, do we want further issues to propagate that info back into the data layer, or should it stay in the service DB only? If so, we won't be able to use the meta_data field for holding the NSFW tag. We'd need some other plan there.

I'd prefer engineering input on what makes sense here. What's the use case for storing this in the data layer? Is there a reason the service layer doesn't suffice? Service layer is clearly enough for the frontend, but if there are reasons this should go back to the data layer we should evaluate those. @mathemancer @aldenstpage @kgodey please weigh in.

annatuma · 2020-04-21T20:15:04Z

“{content type:Image} is Adult Content.”

I think we need to standardize on a single terminology here. The reporting calls it "Adult", the backend flag is "NSFW", and the filter is "Mature". This is going to cause confusion later. We should call it one thing everywhere.

Fine by me. Updating it to "Mature" everywhere.

We need to write a script, set to run every 12 hours, that looks for rows where the status of the record has been set to “yes” since the script last ran, and where the action taken in the table is either “Mark NSFW” or “De-Index Content”.

We're building API endpoints for this. Why do we want to run it every 12 hours instead of immediately? cc @aldenstpage

Immediately would be better, the 12 hour run came out of our earlier conversations about this feature. That said, whatever @aldenstpage says is feasible here (12 hours versus immediate) is fine by me.

Additionally, I share @mathemancer's concerns about trying to identify and minimize abuse of the reporting functionality. We should address this in the design. I like the idea of storing anonymized IP addresses, but we need to update the mockup to clearly let people know that we're storing their anonymized IP address.

I'll sync with @sarahpearson about the appropriate way to do this.

kgodey · 2020-04-22T15:58:15Z

@aldenstpage please update the API spec to say "mature" instead of "nsfw" per @annatuma's update above.

aldenstpage · 2020-04-22T16:54:02Z

To my understanding, the scope of this (meta) issue is limited to getting info into the DB at the service layer. @annatuma Is that correct? If so, do we want further issues to propagate that info back into the data layer, or should it stay in the service DB only? If so, we won't be able to use the meta_data field for holding the NSFW tag. We'd need some other plan there.

I envisioned this as getting propagated back to the data layer through Kafka as described here; in my mind you would handle these events by setting the appropriate flag in the meta_data field but you can represent it however you want.

An alternative design is to keep all logic related to mature content and DMCAs in the API layer in separate tables. We could perform mature content detection during ingestion (or in the future, a proper ETL pipeline). However, if you are able to detect that something is marked mature upstream (e.g. you find a way to get the NSFW flag from the Flickr API), you should still mark it in the meta_data field (or wherever else) so we can flag those images during ingestion. This would remove all responsibility for handling DMCAs and flagging from the data layer. I'll take this approach instead if you don't have any objections.

Additionally, I share @mathemancer's concerns about trying to identify and minimize abuse of the reporting functionality. We should address this in the design. I like the idea of storing anonymized IP addresses, but we need to update the mockup to clearly let people know that we're storing their anonymized IP address.

We temporarily store IPs on our servers for rate limiting purposes and store them with the last octet erased in our server logs for up to 3 months; we should have the technical means to catch abuse already. Data retention practices are described in our privacy policy.

We're building API endpoints for this. Why do we want to run it every 12 hours instead of immediately? cc @aldenstpage

There's no technical reason for this to take 12 hours, particularly if we are able to avoid having to track NSFW status in both the catalog and API layers.

mathemancer · 2020-04-22T17:23:59Z

Long term, I'd want the flags to be in the data layer, since that would be part of any pipeline where we, say, used ML to figure out if other unflagged pictures might be mature content. Ideally, any metadata that was just about displaying the image would stay in the service layer, and any metadata that was 'inherent' to the image would end up in the data layer eventually. I think whether the image is considered 'Mature' or not falls under the latter.

But, we don't have a specific use for the 'Mature' flag in the data layer at the moment, so maybe I'm just falling prey to YAGNI fallacies.

aldenstpage · 2020-04-22T17:36:33Z

Your last point is a good one; we can always extract the data later for training. I think it is also debatable whether matureness is inherent since it is highly contextual to culture and personal bias; we're taking a "stance" that an image is mature based on what we think our audience wants, and that's probably going to be a moving target. I'll stick with keeping it internal to the API layer now and we can bridge that to the catalog later if the need arises.

annatuma · 2020-04-23T18:16:39Z

@brenoferreira we have the text to add to the first screen of the user reporting form regarding collection of IP addresses:

"For security purposes, CC collects and retains anonymized IP addresses of those who complete and submit this form."

@panchovm is updating the mockup right now. Thanks to @sarahpearson for review and language.

fcoveram · 2020-04-23T19:38:49Z

The mockup is now updated with this text

annatuma · 2020-05-12T22:06:10Z

Everything is live in production and looks good.

We have a couple of minor follow-up issues (as is to be expected).

annatuma added enhancement labels Apr 2, 2020

kgodey added this to Pending Review in Backlog Apr 2, 2020

annatuma added the ticket work required label Apr 2, 2020

annatuma moved this from Pending Review to Next Sprint in Backlog Apr 2, 2020

annatuma removed the awaiting triage label Apr 2, 2020

kgodey mentioned this issue Apr 2, 2020

Allow users to report problematic content #425

Closed

kgodey added the meta label Apr 3, 2020

annatuma mentioned this issue Apr 3, 2020

Process Objects from Queue of User Reported Content cc-archive/cccatalog#351

Closed

kgodey assigned aldenstpage Apr 3, 2020

kgodey added this to Ready for Development in Active Sprint via automation Apr 3, 2020

kgodey removed this from Next Sprint in Backlog Apr 3, 2020

aldenstpage assigned mathemancer Apr 3, 2020

kgodey unassigned aldenstpage Apr 7, 2020

mathemancer assigned brenoferreira and unassigned mathemancer Apr 14, 2020

aldenstpage mentioned this issue Apr 14, 2020

Expose mature content flag data in API cc-archive/cccatalog-api#339

Closed

kgodey mentioned this issue Apr 17, 2020

Build filter for NSFW/mature content #435

Closed

aldenstpage mentioned this issue Apr 17, 2020

Create an endpoint for accepting reports on images cc-archive/cccatalog-api#474

Closed

brenoferreira assigned kgodey and unassigned brenoferreira Apr 21, 2020

kgodey assigned annatuma and unassigned kgodey Apr 21, 2020

brenoferreira mentioned this issue Apr 30, 2020

NSFW filter #901

Merged

kgodey moved this from Ready for Development to In Progress in Active Sprint Apr 30, 2020

annatuma removed the ticket work required label Apr 30, 2020

annatuma closed this as completed May 12, 2020

Active Sprint automation moved this from In Progress to Done May 12, 2020

obulat mentioned this issue Apr 17, 2023

Process Objects from Queue of User Reported Content (original #351) WordPress/openverse#1745

Closed

TimidRobot removed this from Done in Active Sprint Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[META] User Flag Functionality #848

[META] User Flag Functionality #848

annatuma commented Apr 2, 2020 •

edited

annatuma commented Apr 2, 2020

aldenstpage commented Apr 3, 2020 •

edited

kgodey commented Apr 3, 2020

aldenstpage commented Apr 3, 2020

mathemancer commented Apr 14, 2020

mathemancer commented Apr 14, 2020 •

edited

mathemancer commented Apr 14, 2020

brenoferreira commented Apr 21, 2020 •

edited

kgodey commented Apr 21, 2020 •

edited

kgodey commented Apr 21, 2020

annatuma commented Apr 21, 2020

annatuma commented Apr 21, 2020

kgodey commented Apr 22, 2020

aldenstpage commented Apr 22, 2020 •

edited

mathemancer commented Apr 22, 2020

aldenstpage commented Apr 22, 2020

annatuma commented Apr 23, 2020

fcoveram commented Apr 23, 2020

annatuma commented May 12, 2020

[META] User Flag Functionality #848

[META] User Flag Functionality #848

Comments

annatuma commented Apr 2, 2020 • edited

Overview & Concept

Frontend functionality

Reporting UI

Filter

Backend process

Recording User Actions

Table Fields

Table Interface Options

Data Processing

annatuma commented Apr 2, 2020

aldenstpage commented Apr 3, 2020 • edited

kgodey commented Apr 3, 2020

aldenstpage commented Apr 3, 2020

mathemancer commented Apr 14, 2020

mathemancer commented Apr 14, 2020 • edited

mathemancer commented Apr 14, 2020

brenoferreira commented Apr 21, 2020 • edited

kgodey commented Apr 21, 2020 • edited

kgodey commented Apr 21, 2020

annatuma commented Apr 21, 2020

annatuma commented Apr 21, 2020

kgodey commented Apr 22, 2020

aldenstpage commented Apr 22, 2020 • edited

mathemancer commented Apr 22, 2020

aldenstpage commented Apr 22, 2020

annatuma commented Apr 23, 2020

fcoveram commented Apr 23, 2020

annatuma commented May 12, 2020

annatuma commented Apr 2, 2020 •

edited

aldenstpage commented Apr 3, 2020 •

edited

mathemancer commented Apr 14, 2020 •

edited

brenoferreira commented Apr 21, 2020 •

edited

kgodey commented Apr 21, 2020 •

edited

aldenstpage commented Apr 22, 2020 •

edited