Project Proposal: Detecting, filtering, and blurring results that include sensitive terms #873

sarayourfriend · 2023-03-09T05:09:24Z

Project proposal for #377

Due date

Review for this proposal should aim to be completed by 24 March 2023, roughly two weeks from when this PR was undrafted.

Notes

I've added a project-proposal.md template file based on the approach I took to assist future project proposal writers in getting started. The sections are based on the requirements of projects proposals described in the project process documentation.

Reviewers: Please make note of the two outstanding questions. I've made recommendations for each, but I would like particular consideration from y'all on these two points, whether you agree or disagree with my recommendations.

I pinged @krysal and @AetherUnbound semi-randomly as the reviewers, but if y'all aren't able to review this proposal over the next two weeks, please let me know and I'll find other folks to do so.

Checklist

My pull request has a descriptive title (not a vague title like
Update index.md).
My pull request targets the default branch of the repository (main) or
a parent feature branch.
My commit messages follow best practices.
My code follows the established code style of the repository.
[N/A] I added or updated tests for the changes I made (if applicable).
[N/A] I added or updated documentation (if applicable).
[N/A] I tried running the project locally and verified that there are no visible
errors.

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

github-actions · 2023-03-09T05:14:25Z

Full-stack documentation: Ready

https://WordPress.github.io/openverse/_preview/873

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

obulat · 2023-03-13T11:29:14Z

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

+quickly. I'm not totally convinced it'd be _that_ much faster though. Like I
+said above, BlurHash looks complicated, but really only in comparison to how
+simple the CSS approach is. Anything would look complicated when compared to
+just adding a CSS filter style to something.


What if we had a single generic blurred image instead of trying to blur the one that needs it? This way, the user would not be able to discern what's behind the blur, but is it really necessary?

This way, the user would not be able to discern what's behind the blur

I think it's desirable to be able to sort of "guess" at what's behind the blur. For example, if you notice a lot of red, maybe the image includes a large amount of blood or something you'd rather avoid seeing, even though some sensitive results might be of interest for your query.

On the other hand, there are issues with this assumption as well, as in the case of this Mastodon issue: mastodon/mastodon#10729

Our context might be different from Mastodon, though we do include Flickr, which is social media, and may have the same issues the Mastodon users are describing in that issue.

If we used a single generic blurred image, how would we choose it? We decided against using anything associated with Openverse (like the logo) to avoid implicitly associating Openverse with sensitive content. Are there other reasonable options that would potentially mislead people? Maybe a blank box or some other solid colour would be appropriate 🤔

Minor point: I can also see having variety on the search results page, when we use these for placeholders, being much more visually appealing than the same, uniform placeholder over every image.

Oh, I'll also note that the issue is slightly different from Mastodon in that on Mastodon images are often marked with author-composed "content warnings". We won't have those, at least not for the time being (until we implement some hypothetical categorisation of sensitive terms, for example). In our case, the blurred image would be an even more valuable signal for someone deciding whether a result might be relevant to their query.

FWIW Pixabay places a black overlay with text "Adult Content" (more or less like a Mastodon content warning) that toggles off when a user selects option to remove safe search filters. No, it does not provide a hint to the image.

That's an interesting alternative, thanks cogdog. We could do something similar with "Sensitive Content" for the text if folks felt it was a better approach.

If this feels important to further discuss, I would like to suggest tabling the discussion and having a separate proposal written to identify the particular approach. If we do that, I would change this proposal to merely reference "obscuring" images, without specifying how they are obscured (whether blurred or hidden entirely or something else).

I suggest this because the API implementation plan could go forward without needing to have a decision made on how to obscure the images, whereas this discussion only really blocks the frontend implementation plan.

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

obulat · 2023-03-13T11:33:19Z

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

+determination is made in Elasticsearch and benefits from Elasticsearch's full
+text search, then we may need to change the single result endpoint to pull the
+result in question from Elasticsearch rather than Postgres, leveraging the
+`_source` property on hits. This could be a big, complex change, though and


I am not sure we have access to the _source property on hits. @dhruvkb, could you confirm?

Elasticsearch DSL does indeed include _source on the Hit data: https://github.com/elastic/elasticsearch-dsl-py/blob/master/elasticsearch_dsl/response/hit.py#L24-L27

…osal_detecting_sensitive_textual_content.md Co-authored-by: Zack Krida <zackkrida@pm.me>

sarayourfriend · 2023-03-14T06:42:14Z

I've updated the proposal to recommend implementing CSS filtering first. I did this after testing the solution locally (by blurring all images) and visiting the site on the lowest-spec hardware I have access to and it doesn't appear to have any adverse effects.

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

AetherUnbound

Thank you for the proposal and template! I've included a few questions, as well as answers to your outstanding questions from my own perspective 😄

docs/projects/templates/project-proposal.md

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

AetherUnbound · 2023-03-14T15:00:13Z

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

+To simply phrase the question to reviewers: **What is your preference for
+storing the "include sensitive results" option and why?**.


I also agree that this is a reasonable balance! From my own experience with other settings on other applications I know that if I had to re-enable a setting every time I used the search, I probably wouldn't use the setting at all.

AetherUnbound · 2023-03-14T15:00:56Z

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

+I am almost entirely closed to the option of storing the option in the query
+params. It gives no discernible benefit in my estimation but comes with risks
+that are counter-productive to the goals of this project. If there is a strong
+argument to be made it favour if this approach, however, please share it.


I agree, especially given the case you mentioned.

AetherUnbound · 2023-03-14T17:56:34Z

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

+For reviewers: **What are you feelings on this issue and why?**. Do you think it
+is worth investing time up front in BlurHash, LQIP, or another solution; or is
+it fine to push that off to a later date as a potential future improvement to
+the feature?


I sincerely appreciate the effort you took to explore various options. While reading the proposal (before getting to your recommendation on the approach) my perspective was that we should go with the CSS-blur option initially. I like your idea of introducing a move to BlurHash down the line as its own proposal; the other applications of the hash seem fascinating and may be something we'd like to explore down the line.

In short, I agree that we should move forward with CSS blurring at this time, and this document can be used as a jumping off point for a move to BlurHash down the line.

AetherUnbound · 2023-03-14T18:03:39Z

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

+- Requires conditionally calculating the hash on the API side in the thumbnail
+  endpoint
+  - This would be done by using a new query parameter on the thumbnail endpoint
+    to request the blurred hash
+- Can have a performance impact on the thumbnail endpoint
+- Introduces a delay between when the image is unblurred when it is displayed
+  due to needing to download the actual thumbnail
+- Could have different performance issues if decoding the hash is a heavy


Much of this section discusses the thumbnail endpoint - do we use thumbnails for the single result page or do we load the whole image? If it's the latter, since we're not serving the images themselves, would we need to switch to using the thumbnail for sensitive results?

Per @zackkrida: It appears that we hotlink the upstream image in the single result case. I imagine that would impact how we handle navigating to the single result page for sensitive content and what the blurring process is for that, is that correct?

You're right, there's some nuance that needs to be teased out here.

The frontend does use the thumbnail on the single results page if navigating to the result from the search page. This allows us to display the result (albeit at lower quality) until we have the full image ready to display. It prevents some weird pop-in behaviours from happening. In that case, we're already using the thumbnail and so we could delay downloading the full image until it is unblurred.

In the case where someone lands directly on the single results page (without navigating from the search page), we do not use the thumbnail. We'd need to have the frontend decide to use the thumbnail instead of the full image initially for sensitive results and then only load the full image once it's been "unblurred".

This is mostly moot with the CSS filter approach though, as in that case we can always just apply the filter client side, regardless of where the image is coming from. We would only need to consider these nuances if we went with an API-side blurring approach, BlurHash, LQIP, etc.

Ah, okay that's a great point! I didn't realize that we could punt on these nuances with the CSS blur approach. That's even further argument for it at this time 🙂

krysal

This is a great proposal, it would be incredible to (re)add this feature to Openverse! I agree with most of what is written. My outstanding questions are related to the API implementation, which I know it's in planning still but I'd like to discuss it at a high level. When I talked to Zack about this initiative, I thought of it more as a work on the Catalog.

For what I read, the classification of the sensible content will happen at query time, either in Python or ES. I can see this as a first step to expedite the detection of the sensible media, given that presumably, it will take us a significant time to classify all our existing content. Now, this makes me wonder:

will these discoveries be saved to the database? I can see having this info stored useful for analytical purposes
is it a temporary solution (the API part) until we implement a sort of classifier DAG or add the checks to the MediaStores classes?

I'm intrigued by the ES solution since I saw your proof of concept PR, I hadn't thought of it before. I'm worried it may put too much burden on the cluster but I'm eager to see it working for us.

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

krysal · 2023-03-14T19:07:59Z

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

+For reviewers: **What are you feelings on this issue and why?**. Do you think it
+is worth investing time up front in BlurHash, LQIP, or another solution; or is
+it fine to push that off to a later date as a potential future improvement to
+the feature?


The CSS solution sounds good enough (I didn't even know there were other ways, TIL!), and I'm fine putting off a more advanced way of blurring the results. Indeed I don't have strong opinions on this end, so I'll let that to the rest of the reviewers.

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

krysal · 2023-03-14T20:24:56Z

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md

+#### Designation of results with sensitive terms
+
+Results that include sensitive terms in their textual content are so denoted in
+the API results in a new result field named `sensitivity`. Sensitivity should be
+an array that should be empty for results where `mature = false` and the textual
+content does not contain any sensitive terms. For results marked as `mature`,
+the array should include a `"mature"` string. For results with sensitive terms
+in their textual content, the array should include a `"sensitive_text"` string.


This is speaking of the Response serializers, which sounds great to me. However, do you know how the manually reported content will fit into this process? For example, will sensitive terms be included as tags? Do we need to add more columns? Will the api_mature<media> tables may be deprecated?

I hadn't considered the mature reports and had to look at the code to learn how they worked again, specifically, how the information gets incorporated into search. Forgive me if you already know this, but it was new to me and to understand my answer this is necessary context, so for my sake and for others who might not be familiar I am including it.

When the mature report is created, the .save method updates the document in ES to set mature=True and forces a refresh of the index.

There are a couple details here:

This will automatically work with the proposed solution, the reports just get folded into the rest of the documents already labelled as "mature". These documents would have the "mature" string in the sensitivity array.

I believe this is slightly broken in that this information is not preserved between data refreshes. As far as I can tell, there is no mechanism to re-apply the setting of mature=true for results that have confirmed reports during data refresh. Once a new index is created, the mature setting on those documents will go back to false. I tried digging for something that would prevent this, but I can't find it. If you take a look at the code and confirm my suspicion, I'll open an issue for us to investigate how to fix this. I suspect we would need to use a similar backport strategy as the others in discussion to get the confirmed reports from API to catalogue. We could also use Elasticsearch ingestion pipelines to flip the mature flag on documents as they're being ingested based on the mature report tables in the API database. Yet another alternative is to iterate through the mature report table after indexing and set the flag on the documents in a batch update. All are probably complex solutions to implement with their own pitfalls that would need to be considered.

So that being said, if the mature content reporting process didn't have that data refresh issue, I think it would work perfectly without modification with this proposal. If we fix that issue, then it will work with the proposal, but there's nothing this proposal needs to do to handle that bug. It is a separate bug and doesn't impede the sensitive terms detection discussed here. We would keep the mature tables around, provided we don't decide a different way of handling them is necessary in the course of fixing the bug described.

will sensitive terms be included as tags? Do we need to add more columns?

Do you mean tags on the Elasticsearch documents? I don't believe it's necessary and would make search more complicated as we'd have to somehow prevent those sensitive terms from affecting the overall score of the document. Each tag would be adding an instance of the term on the document's overall textual content.

As for adding new columns, I don't believe it's necessary. Right now we're not classifying the particular terms into categories, so the only token that would exist in the sensitivity array is sensitive_text.

I guess it is worth clarifying explicitly that documents labelled "mature" are separate entirely (at least conceptually) from those we find have potentially sensitive textual content.

I hope this answers your questions. Please do follow up if there's anything else I can clarify.

Great call about considering the data refresh pieces of this! I looked at the code, and it appears that when building the index we do already consider deleted and mature records:

openverse/ingestion_server/ingestion_server/queries.py

Lines 28 to 38 in 0f90f84

exists_in_deleted_table = SQL(exists_in_table).format(

table=Identifier(f"api_deleted{model}"),

identifier=Identifier(table, "identifier"),

name=Identifier("deleted"),

)

exists_in_mature_table = SQL(exists_in_table).format(

table=Identifier(f"api_mature{model}"),

identifier=Identifier(table, "identifier"),

name=Identifier("mature"),

)

return exists_in_deleted_table, exists_in_mature_table

That function is called in the update & reindex steps, and I believe the mature value returned there is applied to the index. So if the record exists in api_mature{media}, it will have mature=True in the resultant index:

openverse/ingestion_server/ingestion_server/indexer_worker.py

Lines 58 to 60 in 0f90f84

deleted, mature = get_existence_queries(model_name, table_name)

query = SQL(

"SELECT *, {deleted}, {mature} "

I was unsure about this myself because my initial assumption mirrored your own Sara, I'm glad that @dhruvkb ensured that we handled this appropriately during the refresh 😄

…th specific implementations

Co-authored-by: Zack Krida <zackkrida@pm.me>

…osal_detecting_sensitive_textual_content.md Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>

Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>

sarayourfriend · 2023-03-14T23:35:14Z

Thanks for the feedback, Krystle. I think it's fine to discuss the broad strokes of the API implementation to make sure we've got the right idea in mind as far as how everything will wire up, and especially to make sure we're planning to implement things in the right places. As you mentioned, there is an alternative approach that could see this implemented partially at the catalogue level.

For what I read, the classification of the sensible content will happen at query time, either in Python or ES.

This is mostly true, but I want to clarify that for the initial filtering, it will always happen in ES, as a separate re-indexing step, during data refresh. At least, that's the solution that I found previously that avoids us needing to include the exclusion of sensitive terms into every single query that doesn't want them (the majority of queries).

That being said, there's the secondary issue of how to determine which results in a set that we know might include sensitive results, are the sensitive ones. That's the part that is still up in the air and could happen in either ES or Python. I hope to know more about which approach we should take after getting answers back from folks more expert in Elasticsearch than me. I have one viable solution for doing it in Elasticsearch that I think would work very well and have little to no performance impact for us. The viability of that solution still needs to be verified with folks who know ES better than me.

it will take us a significant time to classify all our existing content

This is also my assumption. If we had the recrawler working and ready to go, we might be able to implement the classification work there and have it part of the data before it is ingested into Elasticsearch. That would allow us to avoid the re-indexing with every data refresh. However, I don't believe we yet know how long the recrawling process will take. If there's more up to date information about this, then it's worth looking into if the process takes a reasonable amount of time to run.

will these discoveries be saved to the database? I can see having this info stored useful for analytical purposes

It would be great to have this data saved somewhere. Both for analytical purposes but also to save us needing to do the reindex into the filtered index at every data refresh (if we're able to save it in the catalogue). The query to find results that match the sensitive content would not theoretically be a significant burden on our cluster, but I should test this during a downtime with the production cluster to make sure. If that's the case, then we can at least easily query for all documents that include sensitive content and then do whatever we want with those results, within the limitations of Elasticsearch's pagination.

More comprehensive analysis that wouldn't be limited by ES's pagination would require saving the data elsewhere, hopefully backported to the catalogue or managed by the recrawler.

is it a temporary solution (the API part) until we implement a sort of classifier DAG or add the checks to the MediaStores classes?

Definitely! If there are any ways for us to get this information before documents are indexed, that is 100% preferable to this solution. As above, my understanding is that this isn't viable at the moment. I'm happy to explicitly label the querying approach a stop-gap until we're able to implement something less ad-hoc in the future.

Thanks again for the review and feedback!

sarayourfriend · 2023-03-14T23:44:09Z

Thanks @krysal and @AetherUnbound for the initial reviews 🙏 It sounds like everyone is on the same page as far as implementing blurring first with the CSS filter.

I wanted to highlight for both of y'all, as reviewers, this discussion that @obulat started: #873 (comment)

Do either of y'all have concerns over specifically blurring over other obfuscation methods like using a placeholder or hiding the image behind a solid colour with explanatory text? If so, do you feel it's necessary to hash out those particulars in this proposal or is it something you'd be comfortable handling as a separate proposal to be written and approved before the frontend implementation plan is written?

cc @panchovm as I believe the designs would need to keep this in mind as well. Have you got any opinion on whether we need to more carefully consider other obfuscation methods?

fcoveram · 2023-03-15T09:45:55Z

The proposal is very clear. Thanks for putting all the info and testing the possible blur implementations.

I agree with how Mastodon approached the feature, but for this project, I have no concerns over the solution we are taking. Regarding the blur effect, I am ok with any path we take as all three solutions are in line with the design idea I have discussed.

AetherUnbound

Similarly I (personally) think that the blurring approach is more appealing than using a placeholder image. As others states it gives variety on our search results page and gives the user some indication of what the content is. The placeholder would not give any context, and until we develop more detailed ways to provide user with context before they decide to view an image marked as sensitive media, this would mean that images anywhere along the gradient of "sensitive" could be behind every placeholder. I'd personally rather have a sense of what the image looks like before viewing it, rather than have no clue whatsoever (especially since we also don't have the user-generated content warnings that Mastodon has).

Thanks for the clarifications on everything Sara, I think this proposal looks great! We can refine the Elasticsearch implementation details based on what we hear back from the Elasticsearch experts.

krysal

Thanks, Sara! Everything is clear with your thoughtful answers, so I feel confident approving this proposal ✅

Edit: About the blurring vs other obfuscation methods, I slightly prefer the blurring for the same reasons that Madison and Zack mention, nothing more to add.

sarayourfriend added 2 commits March 9, 2023 12:06

Rename trust and safety directory to match pattern

c2a5c78

Add project proposal document template

9b9448e

openverse-bot added this to In progress in Openverse PRs Mar 9, 2023

sarayourfriend mentioned this pull request Mar 9, 2023

Filter and blur sensitive results by term matching #377

Closed

10 tasks

openverse-bot added the 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work label Mar 9, 2023

sarayourfriend force-pushed the kickoff/detecting-sensitive-textual-content branch from 9bfdff4 to 55ab235 Compare March 9, 2023 05:39

Add project proposal

9b7a9eb

sarayourfriend force-pushed the kickoff/detecting-sensitive-textual-content branch from 55ab235 to 9b7a9eb Compare March 9, 2023 05:45

obulat reviewed Mar 9, 2023

View reviewed changes

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md Show resolved Hide resolved

Choose two reviewers

17d8c7f

sarayourfriend requested review from AetherUnbound and krysal March 9, 2023 05:49

sarayourfriend marked this pull request as ready for review March 9, 2023 05:49

sarayourfriend requested a review from a team as a code owner March 9, 2023 05:49

openverse-bot moved this from In progress to Needs review in Openverse PRs Mar 9, 2023

obulat assigned sarayourfriend Mar 10, 2023

krysal added 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API and removed 🟨 priority: medium Not blocking but should be addressed soon labels Mar 11, 2023

obulat reviewed Mar 13, 2023

View reviewed changes

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md Outdated Show resolved Hide resolved

obulat reviewed Mar 13, 2023

View reviewed changes

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md Outdated Show resolved Hide resolved

obulat reviewed Mar 13, 2023

View reviewed changes

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md Outdated Show resolved Hide resolved

obulat reviewed Mar 13, 2023

View reviewed changes

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md Show resolved Hide resolved

obulat reviewed Mar 13, 2023

View reviewed changes

sarayourfriend and others added 3 commits March 14, 2023 17:39

Change recommendation to implement CSS filter first

f565dc2

Update rfcs/detecting_sensitive_textual_content/20230309-project_prop…

2bfbbd3

…osal_detecting_sensitive_textual_content.md Co-authored-by: Zack Krida <zackkrida@pm.me>

Update rfcs/detecting_sensitive_textual_content/20230309-project_prop…

6b45080

…osal_detecting_sensitive_textual_content.md Co-authored-by: Zack Krida <zackkrida@pm.me>

zackkrida reviewed Mar 14, 2023

View reviewed changes

...g_sensitive_textual_content/20230309-project_proposal_detecting_sensitive_textual_content.md Outdated Show resolved Hide resolved

AetherUnbound reviewed Mar 14, 2023

View reviewed changes

krysal reviewed Mar 14, 2023

View reviewed changes

sarayourfriend and others added 5 commits March 15, 2023 09:46

Replace "session storage" with "session cookie" to avoid confusion wi…

41f7c0e

…th specific implementations

Fix typo

61af179

Co-authored-by: Zack Krida <zackkrida@pm.me>

Fix incorrectly reversed phrasing

90b8748

Update rfcs/detecting_sensitive_textual_content/20230309-project_prop…

3d3c03b

…osal_detecting_sensitive_textual_content.md Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>

Add design as stakeholders

4acfd93

Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>

Add accessibility section to project proposal template

b1b562e

sarayourfriend force-pushed the kickoff/detecting-sensitive-textual-content branch from 8b7bf53 to b1b562e Compare March 14, 2023 23:39

Add design to participants and clarify section in template

571d95c

AetherUnbound approved these changes Mar 15, 2023

View reviewed changes

Openverse PRs automation moved this from Needs review to Reviewer approved Mar 15, 2023

krysal approved these changes Mar 16, 2023

View reviewed changes

Add Madison and Krystle approvals

a50cbd0

sarayourfriend merged commit 2d65aa0 into main Mar 17, 2023
32 checks passed

Openverse PRs automation moved this from Reviewer approved to Merged! Mar 17, 2023

sarayourfriend deleted the kickoff/detecting-sensitive-textual-content branch March 17, 2023 01:07

This was referenced Mar 20, 2023

Add consent decision-making process documentation #887

Merged

Add analytics stack label #948

Closed

fcoveram mentioned this pull request Mar 23, 2023

Safe content browsing flow #791

Closed

obulat added the 🧭 project: proposal A proposal for a project label May 18, 2023

obulat mentioned this pull request Aug 29, 2023

Enabling fetching sensitive results when the UI switch is toggled #2880

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Proposal: Detecting, filtering, and blurring results that include sensitive terms #873

Project Proposal: Detecting, filtering, and blurring results that include sensitive terms #873

sarayourfriend commented Mar 9, 2023 •

edited

github-actions bot commented Mar 9, 2023 •

edited

obulat Mar 13, 2023

sarayourfriend Mar 13, 2023

zackkrida Mar 13, 2023

sarayourfriend Mar 14, 2023 •

edited

cogdog Mar 14, 2023

sarayourfriend Mar 14, 2023 •

edited

obulat Mar 13, 2023

sarayourfriend Mar 13, 2023

sarayourfriend commented Mar 14, 2023

AetherUnbound left a comment

AetherUnbound Mar 14, 2023

AetherUnbound Mar 14, 2023

AetherUnbound Mar 14, 2023

AetherUnbound Mar 14, 2023

AetherUnbound Mar 14, 2023

sarayourfriend Mar 14, 2023 •

edited by AetherUnbound

AetherUnbound Mar 15, 2023

krysal left a comment

krysal Mar 14, 2023

krysal Mar 14, 2023

sarayourfriend Mar 14, 2023 •

edited

AetherUnbound Mar 15, 2023

sarayourfriend commented Mar 14, 2023

sarayourfriend commented Mar 14, 2023

fcoveram commented Mar 15, 2023

AetherUnbound left a comment

krysal left a comment •

edited

		To simply phrase the question to reviewers: **What is your preference for
		storing the "include sensitive results" option and why?**.

	exists_in_deleted_table = SQL(exists_in_table).format(
	table=Identifier(f"api_deleted{model}"),
	identifier=Identifier(table, "identifier"),
	name=Identifier("deleted"),
	)
	exists_in_mature_table = SQL(exists_in_table).format(
	table=Identifier(f"api_mature{model}"),
	identifier=Identifier(table, "identifier"),
	name=Identifier("mature"),
	)
	return exists_in_deleted_table, exists_in_mature_table

	deleted, mature = get_existence_queries(model_name, table_name)
	query = SQL(
	"SELECT *, {deleted}, {mature} "

Project Proposal: Detecting, filtering, and blurring results that include sensitive terms #873

Project Proposal: Detecting, filtering, and blurring results that include sensitive terms #873

Conversation

sarayourfriend commented Mar 9, 2023 • edited

Due date

Notes

Checklist

Developer Certificate of Origin

github-actions bot commented Mar 9, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarayourfriend Mar 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarayourfriend Mar 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarayourfriend commented Mar 14, 2023

AetherUnbound left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarayourfriend Mar 14, 2023 • edited by AetherUnbound

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krysal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarayourfriend Mar 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarayourfriend commented Mar 14, 2023

sarayourfriend commented Mar 14, 2023

fcoveram commented Mar 15, 2023

AetherUnbound left a comment

Choose a reason for hiding this comment

krysal left a comment • edited

Choose a reason for hiding this comment

sarayourfriend commented Mar 9, 2023 •

edited

github-actions bot commented Mar 9, 2023 •

edited

sarayourfriend Mar 14, 2023 •

edited

sarayourfriend Mar 14, 2023 •

edited

sarayourfriend Mar 14, 2023 •

edited by AetherUnbound

sarayourfriend Mar 14, 2023 •

edited

krysal left a comment •

edited