Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labeling & moderation updates [DRAFT] #1057

Merged
merged 27 commits into from
Aug 4, 2023
Merged

Conversation

pfrazee
Copy link
Collaborator

@pfrazee pfrazee commented Jul 25, 2023

Blocked on bluesky-social/atproto#1366

  • Switch to new labeling SDK (Labeling & moderation updates to SDK atproto#1366)
  • Improved moderation UIs including explainer modals
  • Implement rendering of blocked and deleted quote posts
  • Fix incorrect handling of moderation action on quoted posts
  • Place replies which have moderation states at the bottom of the thread
  • New moderation behaviors:
    • Post and profile "alerts"
    • Profile information (display name, bio) blurring
    • Show moderation state on profile listings (eg muted, blocked, content warnings, etc)
  • Fixes to filtering
    • Do not filter users from mutelists, search results, etc (this often creates confusion)
    • Do filter users from discovery (suggested follows)
    • Replies and quotes to muted and blocked content should now be filtered

Future tasks:

  • Update moderation config to use labeling SDK
CleanShot 2023-07-28 at 09 33 32@2x CleanShot 2023-07-28 at 09 34 02@2x CleanShot 2023-07-28 at 09 34 43@2x CleanShot 2023-07-28 at 09 34 59@2x

@pfrazee pfrazee force-pushed the paul/app-751-labeling-sdk branch from aea3f54 to 3bb4171 Compare July 28, 2023 03:23
@jvbreen1
Copy link

How are you planning on handling labeling accounts as "pornography"? Will that be self-determined by those people, or is it applied automatically based on an algorithm?

@schwegler
Copy link

I don't understand why flagging a whole account as porn makes sense vs just the content? And I assume this means those whole accounts will poof out of the default iOS app by default?

And how would an account get unflagged?

@csmithchicago
Copy link

Why label an entire account as pornography? Most sex worker accounts post things that are completely unrelated to their jobs and many non-pornographic accounts have explicit posts from time to time.

@Lyse369
Copy link

Lyse369 commented Jul 28, 2023

Please make it so that individual post can be labeled. Our SWs have more to offer than nudity. I also want to have their humor, their thoughts, and their interactions.

@sweetbeex
Copy link

I will also urge you not to label whole accounts as pornography, as I have many times before. This in addition to replies at bottom is becoming a level of shadowbanning and will likely lead to more targeting and other harms. A huge benefit of the platform is that posts are weighed individually, not as an aggregate for the account. With account level shadowbans there are few reasons for many to use this platform over others. I would also urge you to speak with a variety of industry professionals about these things such as Jessica Stoya who has offered before.

@makeready
Copy link

makeready commented Jul 28, 2023

the ability to self-flag nsfw posts would solve the same problem without having to label entire accounts. please consider how this feature will be used and abused

@arielsilvestri
Copy link

In the history of account labeling, marking whole accounts as "pornography" has been greatly weaponized against sex workers and trans folks. I highly recommend removing that tag.

@schnauzerface
Copy link

I think labeling entire accounts misses the nuance of the content users might be sharing. Blanket solutions like this don't work.

@pfrazee
Copy link
Collaborator Author

pfrazee commented Jul 28, 2023

Please do not read into the usage of the "Pornography" label in these screenshots -- it's used as an example of some specific behaviors and is not meant to communicate our plans to use the label in that context.

I do understand the concerns about the potential for moderation tools to suppress valid activity on the network. I shouldn't have used that label in the PR screenshots, and I apologize for creating concerns in doing so. What I was intending to do was demonstrate a particular set of generic behaviors; I was not attempting to communicate anything about our policies.


A bit more context:

The labeling system is a part of the overall moderation system that's still being finished. Aspects are still somewhat incomplete such as processes for appeals and tools for self-labeling. This update is specifically to improve on the implementation of labels, blocks, and mutes. It does not represent the final labeling system or the full moderation system, but is rather a piece of the overall system. You can read about these particular systems in the following (WIP) documents:

  • Labels Reference. This is not final and has some entries I'm pretty unsure of. IMPORTANTLY the set of labels which the systems supports is wider than the ones we do apply, which I'll expand upon below.
  • Post Moderation Behaviors. This shows the effects of different moderation actions on posts in different contexts.
  • Profile Moderation Behaviors. This shows the effects of different moderation actions on profiles in different contexts.

Now let me clarify the question about the Pornography label. If you look in the Moderation Behaviors documents above, you'll see that there are different effects depending on where a label is applied to. There are 3 main label targets being developed here (the account itself, the profile record, and a post record) and this has a diverse set of effects depending on the context. For instance:

  1. Placing a label that "blurs" content (for instance intolerant) on a profile will cause the avatar and the bio information to be hidden
  2. Placing a label that "blurs" media (for instance pornography) on a profile will only cause the avatar to be hidden
  3. Placing a label that "alerts" (for instance scam) on a profile will only cause a warning to show

There is a wide combination of behaviors here, and all of them are important capabilities for the system to have. Again, what I was trying to do was demonstrate the second behavior.


I have more to say but I'm going to do so in a followup so as to not leave people waiting.

@arielsilvestri
Copy link

@pfrazee What protections will there be for users to not be targeted with this sort of labelling structure? It's my understanding that the nature of AT protocol allows for a high degree of customization, building tools on top of existing infrastructure. How can you ensure this doesn't happen?

@pqcfox
Copy link

pqcfox commented Jul 28, 2023

I would second that it's not the actual specific labels, but the feature itself here which is inherently abusable.

It seems like all of the proposed use cases would better handled either (a) via account deletion for TOS violation or (b) by users simply seeing the account and bio and then deciding themselves whether they want to follow/interact.

Most of the stuff like filtering porn could, as far as I can tell, be solved much more cleanly by just allowing users to self-tag posts.

@PurposeUnknown
Copy link

@watsonkm core issue is where does the platform draw the line regarding proactively labeling content vs putting the onus on users to do so; I briefly go into detail here: https://bsky.app/profile/purposeunknown.xyz/post/3k3m42bqyja2i

@schwegler
Copy link

@pfrazee Here is an account that was hidden due to nudity in the PFP (understandable). The user had no clue. Had to tell them myself cause (in a funny turn) all the text posts were hitting the Lewds feed and it's still like he doesn't exist if you don't opt into NSFW even after changing stuff.

From your response, is this refinement something that would alleviate this kind of situation? If so, can't you see how these flags on the grounds of "pornography" is incredibly worrying if we already have an account that's now stuck in a flagged state?

This is the account: https://bsky.app/profile/0b1nr4y.bsky.social

@pfrazee
Copy link
Collaborator Author

pfrazee commented Jul 28, 2023

To be clear: it's an active dialogue to get these systems right, and then an active dialogue to create the policies for using these systems. It's going to take some iteration and active feedback. If you want to laugh, you should check out the insane matrix of effects in the post moderation doc. This thing is pretty complex, and I'm not going to claim I got it all right from the get go.

I want to address some specific questions:

  • Are the labels in the atproto repo the final set? No they are not. They are drawn from the original proposal document shared a while back, but the labels will be an evolving system. I have some pretty specific concerns about the ones present in there; see for instance this commit where I dropped the CSAM label in favor of using other systems due to concerns about highlighting the illegal content. I have a similar concern about the nonconsenual-nudity label. The label set and behaviors are a work in progress.
  • Should labels ever be placed on entire accounts instead of individual records? Generally we should avoid it. A case where we would do that is systemic behavior like spam. A case where that would be going too far is porn. At the moment the label definitions don't specify which targets are valid for a label -- the thinking was that moderation teams would make the choice at their discretion -- but I can see a case being made for codifying the allowed targets in the label definitions.
  • Which of the labels does Bluesky plan to use, and when? I'm talking with the T&S team to get their current policy table for this. Will follow up.
  • Some of the labels seem pretty subjective, what's that about? If you look at the list of labels in the other PR you'll notice some that are pretty subjective, such as the Curation group. These are oriented toward some speculative design-thinking about community driven labeling, where you have a higher trust relationship with the people doing the labeling. I still haven't fully wrapped my head around how that will work, so I'm pretty ambivalent about having included them in the PR. It might be wise to cut those out until we're confident they can be applied in a trustworthy fashion, but I was trying to think ahead.
  • How does a label get appealed? Right now, you'll be contacted by our moderation team by email if an action is taken and you can appeal by emailing them. I'm actively working on the protocols for this in the federated context.
  • Is self-labeling coming? Yes, it's driving me crazy that it's taken so long to get done. I much prefer self-labeling.
  • What protections will there be for users to not be targeted with this sort of labelling structure? from arielsilvestri. This is another protocol design choice I'm actively working on because I'm also actively concerned about how labeling and user-lists could be abused in a system with no mechanisms for accountability. For the past 2 weeks, I have been heads-down on this code during the day, and working on design docs for accountability and anti-abuse at night.

Again, I apologize for how these PRs came off. I'm honestly not used to this much focused external attention on PRs. These came together after 2 weeks of pretty extensive heads-down work, and I was racing to the finish line last night at around 1am.

@pfrazee
Copy link
Collaborator Author

pfrazee commented Jul 28, 2023

@pfrazee Here is an account that was hidden due to nudity in the PFP (understandable). The user had no clue. Had to tell them myself cause (in a funny turn) all the text posts were hitting the Lewds feed and it's still like he doesn't exist if you don't opt into NSFW even after changing stuff.

From your response, is this refinement something that would alleviate this kind of situation? If so, can't you see how these flags on the grounds of "pornography" is incredibly worrying if we already have an account that's now stuck in a flagged state?

This is the account: https://bsky.app/profile/0b1nr4y.bsky.social

Yes! That is a case which we are specifically attempting to fix with this update. Because the label targets are now much more refined, it will be possible to apply the nudity label to the profile record. The nudity label uses the blur-media behavior, which on profiles means it will only cause the PFP to get blurred.

@annab6382
Copy link

Paul thanks for that clarification, but do you have time to change the example images used to not mention pornography? x

@sweetbeex
Copy link

sweetbeex commented Jul 28, 2023

@pfrazee Here is an account that was hidden due to nudity in the PFP (understandable). The user had no clue. Had to tell them myself cause (in a funny turn) all the text posts were hitting the Lewds feed and it's still like he doesn't exist if you don't opt into NSFW even after changing stuff.

From your response, is this refinement something that would alleviate this kind of situation? If so, can't you see how these flags on the grounds of "pornography" is incredibly worrying if we already have an account that's now stuck in a flagged state?

This is the account: https://bsky.app/profile/0b1nr4y.bsky.social

Yes! That is a case which we are specifically attempting to fix with this update. Because the label targets are now much more refined, it will be possible to apply the nudity label to the profile record. The nudity label uses the blur-media behavior, which on profiles means it will only cause the PFP to get blurred.

it needs to be clear to new users that not using a SFW profile pic results in actions like this. additionally some sort of yes/no this is okay or not scan on the pic before publishing would be beneficial

@pfrazee
Copy link
Collaborator Author

pfrazee commented Jul 28, 2023

The T&S team gave me this list as the current labeling policy. I will describe their effects according to the PRs as they are right now.

  • nudity
    • Definition: Nudity which is not sexual, or that is primarily artistic in nature.
    • Applied to:
      • posts - blurs the images
      • profiles - blurs the PFP
  • porn
    • Definition: Images of full-frontal nudity (genitalia) in any sexualized context, or explicit sexual activity (meaning contact with genitalia or breasts) even if partially covered.
    • Applied to:
      • posts - blurs the images
      • profiles - blurs the PFP
  • sexual
    • Definition: Content that does not meet the level of "pornography", but is still sexual. Some common examples have been selfies and hornyposting with underwear on.
    • Applied to:
      • posts - blurs the images
      • profiles - blurs the PFP
  • gore
    • Definition: Intended for shocking images, typically involving blood or visible wounds
    • Applied to:
      • posts - blurs the images
      • profiles - blurs the avatar
  • nsfl
    • Definition: "Not Suitable For Life." This includes graphic images like goatse.
    • Applied to:
      • posts - blurs the images
      • profiles - blurs the avatar
  • torture
    • Definition: Depictions of torture of a human or animal
    • Applied to:
      • posts - blurs the post
      • profiles - blurs the avatar, display name, and bio
  • self-harm
    • Definition: A visual depiction (photo or figurative) of cutting, suicide, or similar. Used for images only.
    • Applied to:
      • posts - blurs the images
      • profiles - blurs the avatar
  • impersonation
    • Definition: Accounts which falsely assert some identity
    • Applied to:
      • account - places an alert on the profile, the avatar, and their posts
  • spam
    • Definition: Repeat, low-quality messages which are clearly not designed to add to a conversation or space
    • Applied to:
      • account - places an alert on the profile, blurs the avatar, blurs their posts
      • posts - blurs the post
  • intolerant-*
    • Definition: A set of labels applied to hateful or intolerant content; the individual labels include race, gender, sexual orientation, religion, and a catchall.
    • Applied to:
      • posts - blurs the post
      • profiles - blurs the avatar, display name, and bio
  • icon-intolerant
    • Definition: Visual imagery associated with a hate group, such as the KKK or Nazi, in any context (supportive, critical, documentary, etc)
    • Applied to:
      • posts - blurs the images
      • profiles - blurs the avatar
  • !warn
    • Definition: A general "content warning" used as an emergency fallback to handling situations that aren't covered by anything else.
    • Applied to:
      • posts - blurs the post
      • profiles - blurs the avatar, display name, and bio
      • account - blurs the avatar, display name, bio, and posts

@pfrazee
Copy link
Collaborator Author

pfrazee commented Jul 28, 2023

Paul thanks for that clarification, but do you have time to change the example images used to not mention pornography? x

I don't have time atm to replace them so I just removed them for now.

@antoniocali
Copy link

nsfl label? Didn't you mean NSFW?

@WizardShotTheFood
Copy link

nsfl label? Didn't you mean NSFW?

He specifically cited Goatse as an example of NSFL. And while you're right that it's not safe to work, very few people are going to sticking their hands in their pants as a result of an image like that.

@ds84182
Copy link

ds84182 commented Jul 28, 2023

@antoniocali nsfl looks to be related to gore but without blood. guessing something like an X-ray of a severely mangled hand would fit there. grotesque is clearer to me imo.

@sweetbeex
Copy link

notes:

  1. self flagging nsfw content will not solve this issue since the proposals say that the most extreme label will always be adopted

  2. I do not understand why a nude pfp would cause all media posts to be labeled instead of just the pfp itself being blurred. A NSFW profile photo does not effect everything else an account does, unlike a pfp that implies impersonation affecting all actions by an account

@pfrazee
Copy link
Collaborator Author

pfrazee commented Aug 1, 2023

I do not understand why a nude pfp would cause all media posts to be labeled instead of just the pfp itself being blurred. A NSFW profile photo does not effect everything else an account does, unlike a pfp that implies impersonation affecting all actions by an account

I think you may be responding to something which is currently happening to a user. That is not the desired or intended behavior, and part of this PR is to address that.

The correct behavior, as this PR is intended to establish, is to label the profile record for nudity. Since nudity's effect is to blur media, the only intended result is to blur the PFP. (The logic of this being that the PFP is an image record attached to the profile record.)

@sweetbeex
Copy link

I do not understand why a nude pfp would cause all media posts to be labeled instead of just the pfp itself being blurred. A NSFW profile photo does not effect everything else an account does, unlike a pfp that implies impersonation affecting all actions by an account

I think you may be responding to something which is currently happening to a user. That is not the desired or intended behavior, and part of this PR is to address that.

The correct behavior, as this PR is intended to establish, is to label the profile record for nudity. Since nudity's effect is to blur media, the only intended result is to blur the PFP. (The logic of this being that the PFP is an image record attached to the profile record.)

Ty Paul, I'll pass this forward

@schwegler
Copy link

I do not understand why a nude pfp would cause all media posts to be labeled instead of just the pfp itself being blurred. A NSFW profile photo does not effect everything else an account does, unlike a pfp that implies impersonation affecting all actions by an account

I think you may be responding to something which is currently happening to a user. That is not the desired or intended behavior, and part of this PR is to address that.

The correct behavior, as this PR is intended to establish, is to label the profile record for nudity. Since nudity's effect is to blur media, the only intended result is to blur the PFP. (The logic of this being that the PFP is an image record attached to the profile record.)

Thanks. I had assumed as much from your later comments and also tried to clarify for people freaking out on BSky tonight. That this is, on the whole, a current problem that this actually fixes somewhat. <3

@pfrazee pfrazee force-pushed the paul/app-751-labeling-sdk branch from d123042 to 763d983 Compare August 3, 2023 17:33
@pfrazee pfrazee marked this pull request as ready for review August 3, 2023 18:10
Copy link
Member

@estrattonbailey estrattonbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, will pull down and do some testing too

uri: (record.embed.record as ComAtprotoRepoStrongRef.Main).uri,
},
}
console.log(post.embed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. Removed.

| PostThreadItemModel
| AppBskyFeedDefs.NotFoundPost
| AppBskyFeedDefs.BlockedPost
function sortThread(item: MaybeThreadItem) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is real neato

@pfrazee pfrazee merged commit b154d3e into main Aug 4, 2023
@pfrazee pfrazee deleted the paul/app-751-labeling-sdk branch August 4, 2023 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.