Skip to content
This repository has been archived by the owner on Apr 25, 2024. It is now read-only.

[META] User Flag Functionality #848

Closed
8 tasks done
annatuma opened this issue Apr 2, 2020 · 19 comments
Closed
8 tasks done

[META] User Flag Functionality #848

annatuma opened this issue Apr 2, 2020 · 19 comments
Assignees

Comments

@annatuma
Copy link

annatuma commented Apr 2, 2020

This is a meta ticket, for a feature that requires work across the frontend, catalog, and API.

Overview & Concept

We’d like to introduce a feature to CC Search to allow users to report problematic content quickly and easily. This is important for several reasons:

  • Quicker identification of potential DMCA issues
  • Making use of user reporting to build a list of NSFW images
  • Ability to automatically hide NSFW images (with additional filter to switch those on)

This feature will require an addition to the CC Search UI for users to interact with, as well as backend support for recording user reports and taking action on those. It is important to note that nothing happens automatically to content when it is reported by a user. Any decision to hide or remove content from CC Search or mark it as NSFW in the Catalog will be made by CC Staff.

Frontend functionality

Reporting UI

Initial Wireframes
Figma Mockup to base flow on

We will add a link to the top right corner of the Single Results Page, with a Flag icon, and the text “Report”.

Clicking on “Report” pops out a modal, which contains the following:

What is wrong with this {content type:image}?

  • “{content type:Image} infringes copyright.”
  • “{content type:Image} is Mature Content.”
  • “Other.”

These are the frontend interactions that happen for each of these selections:

“{content type:Image} infringes copyright.”
In a new tab, load the DMCA form
In modal, show Thank You message

“{content type:Image} is Mature Content.”
In modal, show Thank You message

“Other.”
New screen loads in modal, with:
Header: “Please describe the issue for us”.
Large text input field.
[Submit Issue] button
After user hits [Submit Issue], show Thank You message

Note: we’ll display {content type} where curly brackets are, based on the content type of the work on the page that the button renders on, on individual result pages.
Ticket: #425

Filter

We want to add a filter, at the very bottom of the filter list, called “Search Settings”. It should have one option for users to select, called “Show Mature Content”.
Ticket: #435

Backend process

Recording User Actions

These are the backend interactions that happen for each of the frontend user selections:

“{content type:Image} infringes copyright.”
Add to table with “DMCA flag” and outclick indicator
Currently, Legal takes actions based on response to DMCA form handled via email.

“{content type:Image} is Mature Content.”
Add to table with “Mature” flag

“Other” and text submission:
Add to table with issue description in field.

All reported content is stored in a table as soon as it is reported by a user. Two things happen when new content is added to the table:

  • An email alert is sent to Product and Legal
  • Information of the report can be viewed and acted on through a Django admin interface.

Table Fields

This is the information we should store in the table:

  1. CC Search unique ID
  2. CC Search record URL
  3. Date and time
  4. Report Type: dmca (if we include this), illegal, mature, other
  5. Report Description (applies only to “other” reports)
  6. Reviewed: yes/no

Table Interface Options

After content is reviewed, we’d like to continue to store it in the table, so we have a record of all user reports. However, any content already reviewed and therefore marked with “no” should by default not be shown (i.e. be filtered out) from the Django admin interface. A backend user of the interface should be able to select it for inclusion, should they need to review prior actions.

The following actions should be available for each row in the table:

Mark Mature

  • The metadata for the record gets updated to indicate the content is mature
  • Note: by default, we will hide Mature content in the UI, but we will introduce a filter for users to purposefully show it

De-Index Content

  • The piece of content is permanently de-indexed in the API
  • Note: this does not delete the record from the Catalog, but it introduces a flag to ensure that it no longer shows in search results

Do Nothing
Clicking this only changes the status of the record from “no” (not yet reviewed) to “yes” (reviewed).

After any available action is taken, the status of the record is updated from “no” (not yet reviewed) to “yes” (reviewed).

Data Processing

As indicated in the Table Interface Options, data processing needs to take place based on the actions taken.

We need to write a script, set to run every 12 hours, that looks for rows where the status of the record has been set to “yes” since the script last ran, and where the action taken in the table is either “Mark Mature” or “De-Index Content”.

Any record with the action “Mark Mature” should:
Have “Mature” added as metadata to the content record in the Catalog

Any record with the action “De-Index Content” should:
Use the existing endpoint (cc-archive/cccatalog-api#294) for copyright takedown.

The feature spec, including the Internal Process and Future Iterations, is available to CC Staff here.

@kgodey kgodey added this to Pending Review in Backlog Apr 2, 2020
@annatuma annatuma moved this from Pending Review to Next Sprint in Backlog Apr 2, 2020
@annatuma
Copy link
Author

annatuma commented Apr 2, 2020

Outstanding action items:

  1. @aldenstpage we need a ticket (or tickets) for the backend processes here. Please review the spec, create ticket(s), and update the description to include those.

  2. @kgodey @brenoferreira does the backend work need to be complete before any work can be done on the UI? If so, please add "blocked" labels to the frontend tickets.

  3. @mathemancer can you confirm that we are indeed pulling in mature/NSFW metadata from some sources, and these can be referenced when @aldenstpage works on Expose mature content flag data in API cccatalog-api#339?

@aldenstpage
Copy link
Contributor

aldenstpage commented Apr 3, 2020

Hey @annatuma, I've updated the tickets with the required work for the backend. Could you take a look at cc-archive/cccatalog-api#474 and fill out the list potential reasons users will be able to report images for the NSFW action interface? All of the details I need are here, thanks!

@kgodey
Copy link
Contributor

kgodey commented Apr 3, 2020

We're going to assign this to everyone on the team in order to review this in this order. Once you're done reviewing it, please assign it to the next person in the queue:
@aldenstpage
@mathemancer
@brenoferreira
@kgodey

@kgodey kgodey added this to Ready for Development in Active Sprint via automation Apr 3, 2020
@kgodey kgodey removed this from Next Sprint in Backlog Apr 3, 2020
@aldenstpage
Copy link
Contributor

(Deleted my question; I read the DMCA page and see that there's a Google form, and understand the desired DMCA flow now)

@mathemancer
Copy link

On the NSFW front, I'm not aware of any source where we're pulling content that we know is mature.

@mathemancer
Copy link

mathemancer commented Apr 14, 2020

I'm quite concerned about potential abuse of the flagging system. This is a common vector for censorship attacks on the internet at the moment. I'd like to make sure we have the necessary data to detect such an attack, and to prevent abusive flagging.

On the other hand, preventing abuse of the flagging system could entail collecting data to identify abusive flaggers, and I am quite uncomfortable with keeping any data on users. I think we should make sure that we don't collect any data without good reason, and fair warning to the user.

One option might be storing an anonymized IP address in the table as well (with a large bold warning that the data will be stored on the submission form).

The alternate idea I had for that was to ask for an email, and send a one-time link to the 'real' form to the email they submit (thereby validating the email). We could then store an anonymized version of the validated email address.

@mathemancer
Copy link

Any record with the action “Mark NSFW” should:
Have “NSFW” added as metadata to the content record in the Catalog

Any record with the action “De-Index Content” should:
Use the existing endpoint (cc-archive/cccatalog-api#294) for copyright takedown.

To my understanding, the scope of this (meta) issue is limited to getting info into the DB at the service layer. @annatuma Is that correct? If so, do we want further issues to propagate that info back into the data layer, or should it stay in the service DB only? If so, we won't be able to use the meta_data field for holding the NSFW tag. We'd need some other plan there.

@brenoferreira
Copy link
Contributor

brenoferreira commented Apr 21, 2020

LGTM.

@annatuma the frontend work can start without the API work having to be complete. The API spec is enough for the work to start.

@kgodey
Copy link
Contributor

kgodey commented Apr 21, 2020

“{content type:Image} is Adult Content.”

I think we need to standardize on a single terminology here. The reporting calls it "Adult", the backend flag is "NSFW", and the filter is "Mature". This is going to cause confusion later. We should call it one thing everywhere.

We need to write a script, set to run every 12 hours, that looks for rows where the status of the record has been set to “yes” since the script last ran, and where the action taken in the table is either “Mark NSFW” or “De-Index Content”.

We're building API endpoints for this. Why do we want to run it every 12 hours instead of immediately? cc @aldenstpage

Additionally, I share @mathemancer's concerns about trying to identify and minimize abuse of the reporting functionality. We should address this in the design. I like the idea of storing anonymized IP addresses, but we need to update the mockup to clearly let people know that we're storing their anonymized IP address.

@kgodey kgodey assigned annatuma and unassigned kgodey Apr 21, 2020
@kgodey
Copy link
Contributor

kgodey commented Apr 21, 2020

@annatuma assigning back to you for review of all the comments and potential updates to the spec.

@annatuma
Copy link
Author

Any record with the action “Mark NSFW” should:
Have “NSFW” added as metadata to the content record in the Catalog
Any record with the action “De-Index Content” should:
Use the existing endpoint (creativecommons/cccatalog-api#294) for copyright takedown.

To my understanding, the scope of this (meta) issue is limited to getting info into the DB at the service layer. @annatuma Is that correct? If so, do we want further issues to propagate that info back into the data layer, or should it stay in the service DB only? If so, we won't be able to use the meta_data field for holding the NSFW tag. We'd need some other plan there.

I'd prefer engineering input on what makes sense here. What's the use case for storing this in the data layer? Is there a reason the service layer doesn't suffice? Service layer is clearly enough for the frontend, but if there are reasons this should go back to the data layer we should evaluate those. @mathemancer @aldenstpage @kgodey please weigh in.

@annatuma
Copy link
Author

“{content type:Image} is Adult Content.”

I think we need to standardize on a single terminology here. The reporting calls it "Adult", the backend flag is "NSFW", and the filter is "Mature". This is going to cause confusion later. We should call it one thing everywhere.

Fine by me. Updating it to "Mature" everywhere.

We need to write a script, set to run every 12 hours, that looks for rows where the status of the record has been set to “yes” since the script last ran, and where the action taken in the table is either “Mark NSFW” or “De-Index Content”.

We're building API endpoints for this. Why do we want to run it every 12 hours instead of immediately? cc @aldenstpage

Immediately would be better, the 12 hour run came out of our earlier conversations about this feature. That said, whatever @aldenstpage says is feasible here (12 hours versus immediate) is fine by me.

Additionally, I share @mathemancer's concerns about trying to identify and minimize abuse of the reporting functionality. We should address this in the design. I like the idea of storing anonymized IP addresses, but we need to update the mockup to clearly let people know that we're storing their anonymized IP address.

I'll sync with @sarahpearson about the appropriate way to do this.

@kgodey
Copy link
Contributor

kgodey commented Apr 22, 2020

@aldenstpage please update the API spec to say "mature" instead of "nsfw" per @annatuma's update above.

@aldenstpage
Copy link
Contributor

aldenstpage commented Apr 22, 2020

To my understanding, the scope of this (meta) issue is limited to getting info into the DB at the service layer. @annatuma Is that correct? If so, do we want further issues to propagate that info back into the data layer, or should it stay in the service DB only? If so, we won't be able to use the meta_data field for holding the NSFW tag. We'd need some other plan there.

I envisioned this as getting propagated back to the data layer through Kafka as described here; in my mind you would handle these events by setting the appropriate flag in the meta_data field but you can represent it however you want.

An alternative design is to keep all logic related to mature content and DMCAs in the API layer in separate tables. We could perform mature content detection during ingestion (or in the future, a proper ETL pipeline). However, if you are able to detect that something is marked mature upstream (e.g. you find a way to get the NSFW flag from the Flickr API), you should still mark it in the meta_data field (or wherever else) so we can flag those images during ingestion. This would remove all responsibility for handling DMCAs and flagging from the data layer. I'll take this approach instead if you don't have any objections.

Additionally, I share @mathemancer's concerns about trying to identify and minimize abuse of the reporting functionality. We should address this in the design. I like the idea of storing anonymized IP addresses, but we need to update the mockup to clearly let people know that we're storing their anonymized IP address.

We temporarily store IPs on our servers for rate limiting purposes and store them with the last octet erased in our server logs for up to 3 months; we should have the technical means to catch abuse already. Data retention practices are described in our privacy policy.

We're building API endpoints for this. Why do we want to run it every 12 hours instead of immediately? cc @aldenstpage

There's no technical reason for this to take 12 hours, particularly if we are able to avoid having to track NSFW status in both the catalog and API layers.

@mathemancer
Copy link

Long term, I'd want the flags to be in the data layer, since that would be part of any pipeline where we, say, used ML to figure out if other unflagged pictures might be mature content. Ideally, any metadata that was just about displaying the image would stay in the service layer, and any metadata that was 'inherent' to the image would end up in the data layer eventually. I think whether the image is considered 'Mature' or not falls under the latter.

But, we don't have a specific use for the 'Mature' flag in the data layer at the moment, so maybe I'm just falling prey to YAGNI fallacies.

@aldenstpage
Copy link
Contributor

Your last point is a good one; we can always extract the data later for training. I think it is also debatable whether matureness is inherent since it is highly contextual to culture and personal bias; we're taking a "stance" that an image is mature based on what we think our audience wants, and that's probably going to be a moving target. I'll stick with keeping it internal to the API layer now and we can bridge that to the catalog later if the need arises.

@annatuma
Copy link
Author

@brenoferreira we have the text to add to the first screen of the user reporting form regarding collection of IP addresses:

"For security purposes, CC collects and retains anonymized IP addresses of those who complete and submit this form."

@panchovm is updating the mockup right now. Thanks to @sarahpearson for review and language.

@fcoveram
Copy link

The mockup is now updated with this text

@kgodey kgodey moved this from Ready for Development to In Progress in Active Sprint Apr 30, 2020
Active Sprint automation moved this from In Progress to Done May 12, 2020
@annatuma
Copy link
Author

Everything is live in production and looks good.

We have a couple of minor follow-up issues (as is to be expected).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

6 participants