Labeling & moderation updates [DRAFT] #1057

pfrazee · 2023-07-25T16:14:14Z

Blocked on bluesky-social/atproto#1366

Future tasks:

Update moderation config to use labeling SDK

jvbreen1 · 2023-07-28T18:18:27Z

How are you planning on handling labeling accounts as "pornography"? Will that be self-determined by those people, or is it applied automatically based on an algorithm?

schwegler · 2023-07-28T18:29:05Z

I don't understand why flagging a whole account as porn makes sense vs just the content? And I assume this means those whole accounts will poof out of the default iOS app by default?

And how would an account get unflagged?

csmithchicago · 2023-07-28T18:38:20Z

Why label an entire account as pornography? Most sex worker accounts post things that are completely unrelated to their jobs and many non-pornographic accounts have explicit posts from time to time.

Lyse369 · 2023-07-28T18:40:42Z

Please make it so that individual post can be labeled. Our SWs have more to offer than nudity. I also want to have their humor, their thoughts, and their interactions.

sweetbeex · 2023-07-28T18:44:11Z

I will also urge you not to label whole accounts as pornography, as I have many times before. This in addition to replies at bottom is becoming a level of shadowbanning and will likely lead to more targeting and other harms. A huge benefit of the platform is that posts are weighed individually, not as an aggregate for the account. With account level shadowbans there are few reasons for many to use this platform over others. I would also urge you to speak with a variety of industry professionals about these things such as Jessica Stoya who has offered before.

makeready · 2023-07-28T18:45:42Z

the ability to self-flag nsfw posts would solve the same problem without having to label entire accounts. please consider how this feature will be used and abused

arielsilvestri · 2023-07-28T18:48:00Z

In the history of account labeling, marking whole accounts as "pornography" has been greatly weaponized against sex workers and trans folks. I highly recommend removing that tag.

schnauzerface · 2023-07-28T18:50:35Z

I think labeling entire accounts misses the nuance of the content users might be sharing. Blanket solutions like this don't work.

pfrazee · 2023-07-28T18:51:27Z

Please do not read into the usage of the "Pornography" label in these screenshots -- it's used as an example of some specific behaviors and is not meant to communicate our plans to use the label in that context.

I do understand the concerns about the potential for moderation tools to suppress valid activity on the network. I shouldn't have used that label in the PR screenshots, and I apologize for creating concerns in doing so. What I was intending to do was demonstrate a particular set of generic behaviors; I was not attempting to communicate anything about our policies.

A bit more context:

The labeling system is a part of the overall moderation system that's still being finished. Aspects are still somewhat incomplete such as processes for appeals and tools for self-labeling. This update is specifically to improve on the implementation of labels, blocks, and mutes. It does not represent the final labeling system or the full moderation system, but is rather a piece of the overall system. You can read about these particular systems in the following (WIP) documents:

Labels Reference. This is not final and has some entries I'm pretty unsure of. IMPORTANTLY the set of labels which the systems supports is wider than the ones we do apply, which I'll expand upon below.
Post Moderation Behaviors. This shows the effects of different moderation actions on posts in different contexts.
Profile Moderation Behaviors. This shows the effects of different moderation actions on profiles in different contexts.

Now let me clarify the question about the Pornography label. If you look in the Moderation Behaviors documents above, you'll see that there are different effects depending on where a label is applied to. There are 3 main label targets being developed here (the account itself, the profile record, and a post record) and this has a diverse set of effects depending on the context. For instance:

Placing a label that "blurs" content (for instance intolerant) on a profile will cause the avatar and the bio information to be hidden
Placing a label that "blurs" media (for instance pornography) on a profile will only cause the avatar to be hidden
Placing a label that "alerts" (for instance scam) on a profile will only cause a warning to show

There is a wide combination of behaviors here, and all of them are important capabilities for the system to have. Again, what I was trying to do was demonstrate the second behavior.

I have more to say but I'm going to do so in a followup so as to not leave people waiting.

arielsilvestri · 2023-07-28T18:58:11Z

@pfrazee What protections will there be for users to not be targeted with this sort of labelling structure? It's my understanding that the nature of AT protocol allows for a high degree of customization, building tools on top of existing infrastructure. How can you ensure this doesn't happen?

pqcfox · 2023-07-28T19:03:33Z

I would second that it's not the actual specific labels, but the feature itself here which is inherently abusable.

It seems like all of the proposed use cases would better handled either (a) via account deletion for TOS violation or (b) by users simply seeing the account and bio and then deciding themselves whether they want to follow/interact.

Most of the stuff like filtering porn could, as far as I can tell, be solved much more cleanly by just allowing users to self-tag posts.

PurposeUnknown · 2023-07-28T19:08:14Z

@watsonkm core issue is where does the platform draw the line regarding proactively labeling content vs putting the onus on users to do so; I briefly go into detail here: https://bsky.app/profile/purposeunknown.xyz/post/3k3m42bqyja2i

schwegler · 2023-07-28T19:23:11Z

@pfrazee Here is an account that was hidden due to nudity in the PFP (understandable). The user had no clue. Had to tell them myself cause (in a funny turn) all the text posts were hitting the Lewds feed and it's still like he doesn't exist if you don't opt into NSFW even after changing stuff.

From your response, is this refinement something that would alleviate this kind of situation? If so, can't you see how these flags on the grounds of "pornography" is incredibly worrying if we already have an account that's now stuck in a flagged state?

This is the account: https://bsky.app/profile/0b1nr4y.bsky.social

pfrazee · 2023-07-28T19:26:24Z

To be clear: it's an active dialogue to get these systems right, and then an active dialogue to create the policies for using these systems. It's going to take some iteration and active feedback. If you want to laugh, you should check out the insane matrix of effects in the post moderation doc. This thing is pretty complex, and I'm not going to claim I got it all right from the get go.

I want to address some specific questions:

Are the labels in the atproto repo the final set? No they are not. They are drawn from the original proposal document shared a while back, but the labels will be an evolving system. I have some pretty specific concerns about the ones present in there; see for instance this commit where I dropped the CSAM label in favor of using other systems due to concerns about highlighting the illegal content. I have a similar concern about the nonconsenual-nudity label. The label set and behaviors are a work in progress.
Should labels ever be placed on entire accounts instead of individual records? Generally we should avoid it. A case where we would do that is systemic behavior like spam. A case where that would be going too far is porn. At the moment the label definitions don't specify which targets are valid for a label -- the thinking was that moderation teams would make the choice at their discretion -- but I can see a case being made for codifying the allowed targets in the label definitions.
Which of the labels does Bluesky plan to use, and when? I'm talking with the T&S team to get their current policy table for this. Will follow up.
Some of the labels seem pretty subjective, what's that about? If you look at the list of labels in the other PR you'll notice some that are pretty subjective, such as the Curation group. These are oriented toward some speculative design-thinking about community driven labeling, where you have a higher trust relationship with the people doing the labeling. I still haven't fully wrapped my head around how that will work, so I'm pretty ambivalent about having included them in the PR. It might be wise to cut those out until we're confident they can be applied in a trustworthy fashion, but I was trying to think ahead.
How does a label get appealed? Right now, you'll be contacted by our moderation team by email if an action is taken and you can appeal by emailing them. I'm actively working on the protocols for this in the federated context.
Is self-labeling coming? Yes, it's driving me crazy that it's taken so long to get done. I much prefer self-labeling.
What protections will there be for users to not be targeted with this sort of labelling structure? from arielsilvestri. This is another protocol design choice I'm actively working on because I'm also actively concerned about how labeling and user-lists could be abused in a system with no mechanisms for accountability. For the past 2 weeks, I have been heads-down on this code during the day, and working on design docs for accountability and anti-abuse at night.

Again, I apologize for how these PRs came off. I'm honestly not used to this much focused external attention on PRs. These came together after 2 weeks of pretty extensive heads-down work, and I was racing to the finish line last night at around 1am.

pfrazee · 2023-07-28T19:27:57Z

@pfrazee Here is an account that was hidden due to nudity in the PFP (understandable). The user had no clue. Had to tell them myself cause (in a funny turn) all the text posts were hitting the Lewds feed and it's still like he doesn't exist if you don't opt into NSFW even after changing stuff.

From your response, is this refinement something that would alleviate this kind of situation? If so, can't you see how these flags on the grounds of "pornography" is incredibly worrying if we already have an account that's now stuck in a flagged state?

This is the account: https://bsky.app/profile/0b1nr4y.bsky.social

Yes! That is a case which we are specifically attempting to fix with this update. Because the label targets are now much more refined, it will be possible to apply the nudity label to the profile record. The nudity label uses the blur-media behavior, which on profiles means it will only cause the PFP to get blurred.

annab6382 · 2023-07-28T19:31:36Z

Paul thanks for that clarification, but do you have time to change the example images used to not mention pornography? x

sweetbeex · 2023-07-28T19:43:22Z

@pfrazee Here is an account that was hidden due to nudity in the PFP (understandable). The user had no clue. Had to tell them myself cause (in a funny turn) all the text posts were hitting the Lewds feed and it's still like he doesn't exist if you don't opt into NSFW even after changing stuff.

From your response, is this refinement something that would alleviate this kind of situation? If so, can't you see how these flags on the grounds of "pornography" is incredibly worrying if we already have an account that's now stuck in a flagged state?

This is the account: https://bsky.app/profile/0b1nr4y.bsky.social

Yes! That is a case which we are specifically attempting to fix with this update. Because the label targets are now much more refined, it will be possible to apply the nudity label to the profile record. The nudity label uses the blur-media behavior, which on profiles means it will only cause the PFP to get blurred.

it needs to be clear to new users that not using a SFW profile pic results in actions like this. additionally some sort of yes/no this is okay or not scan on the pic before publishing would be beneficial

pfrazee · 2023-07-28T19:46:26Z

The T&S team gave me this list as the current labeling policy. I will describe their effects according to the PRs as they are right now.

nudity
- Definition: Nudity which is not sexual, or that is primarily artistic in nature.
- Applied to:
  - posts - blurs the images
  - profiles - blurs the PFP
porn
- Definition: Images of full-frontal nudity (genitalia) in any sexualized context, or explicit sexual activity (meaning contact with genitalia or breasts) even if partially covered.
- Applied to:
  - posts - blurs the images
  - profiles - blurs the PFP
sexual
- Definition: Content that does not meet the level of "pornography", but is still sexual. Some common examples have been selfies and hornyposting with underwear on.
- Applied to:
  - posts - blurs the images
  - profiles - blurs the PFP
gore
- Definition: Intended for shocking images, typically involving blood or visible wounds
- Applied to:
  - posts - blurs the images
  - profiles - blurs the avatar
nsfl
- Definition: "Not Suitable For Life." This includes graphic images like goatse.
- Applied to:
  - posts - blurs the images
  - profiles - blurs the avatar
torture
- Definition: Depictions of torture of a human or animal
- Applied to:
  - posts - blurs the post
  - profiles - blurs the avatar, display name, and bio
self-harm
- Definition: A visual depiction (photo or figurative) of cutting, suicide, or similar. Used for images only.
- Applied to:
  - posts - blurs the images
  - profiles - blurs the avatar
impersonation
- Definition: Accounts which falsely assert some identity
- Applied to:
  - account - places an alert on the profile, the avatar, and their posts
spam
- Definition: Repeat, low-quality messages which are clearly not designed to add to a conversation or space
- Applied to:
  - account - places an alert on the profile, blurs the avatar, blurs their posts
  - posts - blurs the post
intolerant-*
- Definition: A set of labels applied to hateful or intolerant content; the individual labels include race, gender, sexual orientation, religion, and a catchall.
- Applied to:
  - posts - blurs the post
  - profiles - blurs the avatar, display name, and bio
icon-intolerant
- Definition: Visual imagery associated with a hate group, such as the KKK or Nazi, in any context (supportive, critical, documentary, etc)
- Applied to:
  - posts - blurs the images
  - profiles - blurs the avatar
!warn
- Definition: A general "content warning" used as an emergency fallback to handling situations that aren't covered by anything else.
- Applied to:
  - posts - blurs the post
  - profiles - blurs the avatar, display name, and bio
  - account - blurs the avatar, display name, bio, and posts

pfrazee · 2023-07-28T19:47:16Z

Paul thanks for that clarification, but do you have time to change the example images used to not mention pornography? x

I don't have time atm to replace them so I just removed them for now.

antoniocali · 2023-07-28T21:06:43Z

nsfl label? Didn't you mean NSFW?

WizardShotTheFood · 2023-07-28T21:51:07Z

nsfl label? Didn't you mean NSFW?

He specifically cited Goatse as an example of NSFL. And while you're right that it's not safe to work, very few people are going to sticking their hands in their pants as a result of an image like that.

ds84182 · 2023-07-28T21:53:52Z

@antoniocali nsfl looks to be related to gore but without blood. guessing something like an X-ray of a severely mangled hand would fit there. grotesque is clearer to me imo.

sweetbeex · 2023-08-01T02:28:37Z

notes:

self flagging nsfw content will not solve this issue since the proposals say that the most extreme label will always be adopted
I do not understand why a nude pfp would cause all media posts to be labeled instead of just the pfp itself being blurred. A NSFW profile photo does not effect everything else an account does, unlike a pfp that implies impersonation affecting all actions by an account

pfrazee · 2023-08-01T02:35:28Z

I do not understand why a nude pfp would cause all media posts to be labeled instead of just the pfp itself being blurred. A NSFW profile photo does not effect everything else an account does, unlike a pfp that implies impersonation affecting all actions by an account

I think you may be responding to something which is currently happening to a user. That is not the desired or intended behavior, and part of this PR is to address that.

The correct behavior, as this PR is intended to establish, is to label the profile record for nudity. Since nudity's effect is to blur media, the only intended result is to blur the PFP. (The logic of this being that the PFP is an image record attached to the profile record.)

sweetbeex · 2023-08-01T02:48:49Z

I do not understand why a nude pfp would cause all media posts to be labeled instead of just the pfp itself being blurred. A NSFW profile photo does not effect everything else an account does, unlike a pfp that implies impersonation affecting all actions by an account

I think you may be responding to something which is currently happening to a user. That is not the desired or intended behavior, and part of this PR is to address that.

The correct behavior, as this PR is intended to establish, is to label the profile record for nudity. Since nudity's effect is to blur media, the only intended result is to blur the PFP. (The logic of this being that the PFP is an image record attached to the profile record.)

Ty Paul, I'll pass this forward

schwegler · 2023-08-01T02:55:41Z

I do not understand why a nude pfp would cause all media posts to be labeled instead of just the pfp itself being blurred. A NSFW profile photo does not effect everything else an account does, unlike a pfp that implies impersonation affecting all actions by an account

I think you may be responding to something which is currently happening to a user. That is not the desired or intended behavior, and part of this PR is to address that.

The correct behavior, as this PR is intended to establish, is to label the profile record for nudity. Since nudity's effect is to blur media, the only intended result is to blur the PFP. (The logic of this being that the PFP is an image record attached to the profile record.)

Thanks. I had assumed as much from your later comments and also tried to clarify for people freaking out on BSky tonight. That this is, on the whole, a current problem that this actually fixes somewhat. <3

estrattonbailey

Code LGTM, will pull down and do some testing too

estrattonbailey · 2023-08-03T20:35:08Z

src/lib/api/hack-add-deleted-embed.ts

+        uri: (record.embed.record as ComAtprotoRepoStrongRef.Main).uri,
+      },
+    }
+    console.log(post.embed)


Good catch, thanks. Removed.

estrattonbailey · 2023-08-03T20:40:45Z

src/state/models/content/post-thread.ts

+  | PostThreadItemModel
+  | AppBskyFeedDefs.NotFoundPost
+  | AppBskyFeedDefs.BlockedPost
+function sortThread(item: MaybeThreadItem) {


This is real neato

pfrazee force-pushed the paul/app-751-labeling-sdk branch from aea3f54 to 3bb4171 Compare July 28, 2023 03:23

intrnl mentioned this pull request Jul 30, 2023

Self Tagging Sensitive Content bluesky-social/atproto#1403

Closed

pfrazee added 2 commits August 3, 2023 10:32

First pass moving to the new labeling sdk (it compiles)

bfc8725

Correct behaviors around interpreting label moderation

52c7705

pfrazee added 20 commits August 3, 2023 10:32

Improve moderation state rendering

8b503f8

Improve hiders and alerts

c3066f2

Improve handling of mutes

5909b1a

Improve profile warnings

b94659b

Add profile blurring to profile header

f0c2b73

Add blocks to test cases

b4154be

Render labels on profile cards, do not filter

9884974

Filter profiles from suggestions using moderation

feab42a

Apply profile blurring to ProfileCard

0a71083

Handle blocked and deleted quote posts

b000ef0

Temporarily translate content filtering settings to new labels

085d13e

Fix types

3ecc3c0

Tune ContentHider & PostHider click targets

d4574db

Put a warning on profilecard label pills

f3580b6

Fix screenhider learnmore link on mobile

b4bc5d0

Enforce no-override on user avatar

9559ab9

Dont enumerate profile blur-media labels in alerts

aae350f

Fixes to muted posts (esp quotes of muted users)

dec92b5

Fixes to account/profile warnings

099c51f

Bump @atproto/api@0.5.0

763d983

pfrazee force-pushed the paul/app-751-labeling-sdk branch from d123042 to 763d983 Compare August 3, 2023 17:33

Bump @atproto/api@0.5.1

549b5c2

pfrazee marked this pull request as ready for review August 3, 2023 18:10

pfrazee added 2 commits August 3, 2023 11:18

Fix tests

78442f7

1.43

48cdfb3

estrattonbailey reviewed Aug 3, 2023

View reviewed changes

pfrazee added 2 commits August 3, 2023 16:46

Remove log

e4defed

Bump @atproto/api@0.5.2

ce45dae

pfrazee merged commit b154d3e into main Aug 4, 2023

pfrazee deleted the paul/app-751-labeling-sdk branch August 4, 2023 05:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Labeling & moderation updates [DRAFT] #1057

Labeling & moderation updates [DRAFT] #1057

pfrazee commented Jul 25, 2023 •

edited

Loading

jvbreen1 commented Jul 28, 2023

schwegler commented Jul 28, 2023

csmithchicago commented Jul 28, 2023

Lyse369 commented Jul 28, 2023

sweetbeex commented Jul 28, 2023

makeready commented Jul 28, 2023 •

edited

Loading

arielsilvestri commented Jul 28, 2023

schnauzerface commented Jul 28, 2023

pfrazee commented Jul 28, 2023

arielsilvestri commented Jul 28, 2023

pqcfox commented Jul 28, 2023

PurposeUnknown commented Jul 28, 2023

schwegler commented Jul 28, 2023

pfrazee commented Jul 28, 2023

pfrazee commented Jul 28, 2023

annab6382 commented Jul 28, 2023

sweetbeex commented Jul 28, 2023 •

edited

Loading

pfrazee commented Jul 28, 2023

pfrazee commented Jul 28, 2023

antoniocali commented Jul 28, 2023

WizardShotTheFood commented Jul 28, 2023

ds84182 commented Jul 28, 2023

sweetbeex commented Aug 1, 2023

pfrazee commented Aug 1, 2023 •

edited

Loading

sweetbeex commented Aug 1, 2023

schwegler commented Aug 1, 2023

estrattonbailey left a comment

estrattonbailey Aug 3, 2023

pfrazee Aug 3, 2023

estrattonbailey Aug 3, 2023

Labeling & moderation updates [DRAFT] #1057

Labeling & moderation updates [DRAFT] #1057

Conversation

pfrazee commented Jul 25, 2023 • edited Loading

jvbreen1 commented Jul 28, 2023

schwegler commented Jul 28, 2023

csmithchicago commented Jul 28, 2023

Lyse369 commented Jul 28, 2023

sweetbeex commented Jul 28, 2023

makeready commented Jul 28, 2023 • edited Loading

arielsilvestri commented Jul 28, 2023

schnauzerface commented Jul 28, 2023

pfrazee commented Jul 28, 2023

arielsilvestri commented Jul 28, 2023

pqcfox commented Jul 28, 2023

PurposeUnknown commented Jul 28, 2023

schwegler commented Jul 28, 2023

pfrazee commented Jul 28, 2023

pfrazee commented Jul 28, 2023

annab6382 commented Jul 28, 2023

sweetbeex commented Jul 28, 2023 • edited Loading

pfrazee commented Jul 28, 2023

pfrazee commented Jul 28, 2023

antoniocali commented Jul 28, 2023

WizardShotTheFood commented Jul 28, 2023

ds84182 commented Jul 28, 2023

sweetbeex commented Aug 1, 2023

pfrazee commented Aug 1, 2023 • edited Loading

sweetbeex commented Aug 1, 2023

schwegler commented Aug 1, 2023

estrattonbailey left a comment

Choose a reason for hiding this comment

estrattonbailey Aug 3, 2023

Choose a reason for hiding this comment

pfrazee Aug 3, 2023

Choose a reason for hiding this comment

estrattonbailey Aug 3, 2023

Choose a reason for hiding this comment

pfrazee commented Jul 25, 2023 •

edited

Loading

makeready commented Jul 28, 2023 •

edited

Loading

sweetbeex commented Jul 28, 2023 •

edited

Loading

pfrazee commented Aug 1, 2023 •

edited

Loading