Add a blocklist for URLs. #4515

flamingo-cant-draw · 2024-03-07T06:05:04Z

Implements a blocklist for URLs, as discussed in #4514.

It's currently very basic and fragile, but it should do as an initial implementation.

crates/utils/src/utils/markdown/mod.rs

src/api_routes_http.rs

sunaurus · 2024-03-07T22:16:18Z

Awesome work, code looks nice, and the feature itself is super useful!

Nutomic · 2024-03-08T10:21:48Z

Like I mentioned in the issue, I dont think its necessary to add new endpoints for this. Instead the url blocks can be updated with a parameter in EditSite, and then written to the db table here. For reading it can be added to GetSiteResponse.

crates/utils/src/utils/validation.rs

flamingo-cant-draw · 2024-03-09T15:31:37Z

Like I mentioned in the issue, I dont think its necessary to add new endpoints for this.

Ah, my bad, I misunderstood what you meant.

For reading it can be added to GetSiteResponse.

I don't personally think this a good idea. A blocklist is the kind of thing that can grow to be quite big overtime, so a scenario where most of the data in a /site request is the array of URLs isn't unrealistic. But maybe that's just me being paranoid.

crates/api_common/src/utils.rs

crates/db_schema/src/impls/local_site_url_blocklist.rs

crates/api_common/src/utils.rs

dessalines

Thanks, this is a great addition.

The main thing I see missing, is that this blocklist needs to be processed for all markdown bodies as well, not just URL fields (otherwise people could just spam the same links in private messages, post bodies, and comment bodies).

We already have a process_markdown function in crates/api_common/src/utils.rs , so I suggest adding a parameter to that function similar to the slur_regex already provided, and doing the check there also.

migrations/2024-03-06-201637_url_blocklist/up.sql

dessalines · 2024-03-11T15:57:05Z

crates/api_common/src/site.rs

@@ -268,6 +269,8 @@ pub struct EditSite {
  pub allowed_instances: Option<Vec<String>>,
  /// A list of blocked instances.
  pub blocked_instances: Option<Vec<String>>,
+  /// A list of blocked URLs
+  pub blocked_urls: Option<Vec<String>>,


Thanks, keeping to this standard is best for now. In the future we could potentially move all these lists to a more atomic method of updating.

crates/api_common/src/utils.rs

crates/utils/src/utils/validation.rs

Nutomic · 2024-03-12T11:11:18Z

crates/utils/src/utils/validation.rs

+          Url::parse(&format!("https://{}", url))?
+        } else {
+          Err(e)?
+        }


You can probably rewrite this as:

Err(ParseError::RelativeUrlWithoutBase) => Url::parse(&format!("https://{}", url))?, e => e

Nutomic

Looks good. The only other thing worth doing is add a test for this in api_tests, to ensure that the api works correctly. This requires adding the api changes to https://github.com/LemmyNet/lemmy-js-client

dessalines

Sweet, thx for this one! I'll let @Nutomic decide if that last change is necessary, or if we can merge as-is. IMO an API test isn't necessary here, as this isn't federation related, and it has plenty of local tests.

Nutomic · 2024-03-12T15:41:32Z

The API test in this case is to ensure that the API endpoints work as expected (and dont get broken by some random change in the future). But yes its not mandatory.

…cklist

flamingo-cant-draw · 2024-03-13T05:18:14Z

Added the API test, I don't know if there's a better way getting the regex to update than just using delay to wait a minute.

Nutomic · 2024-03-13T14:51:09Z

You can set a shorter caching interval in debug mode like this: https://github.com/LemmyNet/lemmy/blob/main/crates/federate/src/lib.rs#L21-L24

crates/db_schema/src/source/local_site_url_blocklist.rs

dessalines · 2024-03-13T16:41:33Z

@flamingo-cant-draw In a few minutes after its deployed, you can add lemmy-js-client: 0.19.4-alpha.8 to api_tests/package.json .

SleeplessOne1917 · 2024-03-13T20:55:56Z

crates/db_schema/src/source/local_site_url_blocklist.rs

+pub struct LocalSiteUrlBlocklist {
+  pub id: i32,
+  pub url: String,
+  pub published: DateTime<Utc>,
+  pub updated: Option<DateTime<Utc>>,
+}
+
+#[derive(Default, Clone)]
+#[cfg_attr(feature = "full", derive(Insertable, AsChangeset))]
+#[cfg_attr(feature = "full", diesel(table_name = local_site_url_blocklist))]
+pub struct LocalSiteUrlBlocklistForm {
+  pub url: String,
+  pub updated: Option<DateTime<Utc>>,
+}


These structs seem to represent individual URLs. If that's the case, the "List" part of these struct names don't make much sense.

Fair point, but this is the same naming scheme as FederationBlockList and FederationAllowList.

I'd also prefer not to use List or collection-type names in sql tables, which really should use singular names for what each row is, but flamingo did the right thing by sticking to the standard set by FederationBlockList, since they're really similar.

In the future, we could maybe rename all of these at the same time.

In the future, we could maybe rename all of these at the same time.

If/when we get around to this, there are also some inconsistencies with how API request and response structs are named that can be confusing.

Open up an issue for that one if you would, so we don't forget.

crates/utils/src/utils/validation.rs

Nutomic · 2024-03-14T10:45:51Z

api_tests/src/post.spec.ts

+
+  // Later tests need this to be empty
+  editSiteForm.blocked_urls = [];
+  await epsilon.editSite(editSiteForm);


I think the random test failures happen because the blocklist is still being cached for a few seconds after this. I would fix it by blocking a different domain such as https://evil.com, and passing an explicit link to createPost().

Tests seem to be passing now, so it was probably just an issue with my runner.

Or its due to the url blocklist and will fail again later. Better to resolve it now.

I changed the URL, but I'm still confused why "Sticky a post" was failing given it doesn't use epsilon.

crates/api_common/src/utils.rs

Nutomic · 2024-03-15T10:49:30Z

Thanks for the PR, well done!

flamingo-cant-draw requested review from dessalines, phiresky and Nutomic as code owners March 7, 2024 06:05

flamingo-cant-draw marked this pull request as draft March 7, 2024 06:42

Nothing4You reviewed Mar 7, 2024

View reviewed changes

crates/utils/src/utils/markdown/mod.rs Outdated Show resolved Hide resolved

flamingo-cant-draw marked this pull request as ready for review March 7, 2024 18:11

sunaurus reviewed Mar 7, 2024

View reviewed changes

src/api_routes_http.rs Outdated Show resolved Hide resolved

Nutomic reviewed Mar 8, 2024

View reviewed changes

crates/utils/src/utils/validation.rs Outdated Show resolved Hide resolved

flamingo-cant-draw added 7 commits March 10, 2024 05:44

Add a blocklist for URLs.

74c31e0

Fix SQL format

1e5ad45

Make clippy happy.

edcc40c

Use regex for URL matching.

7c470dd

Escape regex chars in URLs.

3cd8a0a

Use post for modification.

0a1e8d4

Make URL block regex static and remove API routes.

62b6a29

flamingo-cant-draw force-pushed the url-blocklist branch from cdd22a6 to 62b6a29 Compare March 10, 2024 05:51

flamingo-cant-draw requested a review from dullbananas as a code owner March 10, 2024 05:51

dullbananas requested changes Mar 10, 2024

View reviewed changes

crates/api_common/src/utils.rs Outdated Show resolved Hide resolved

crates/db_schema/src/impls/local_site_url_blocklist.rs Outdated Show resolved Hide resolved

crates/api_common/src/utils.rs Outdated Show resolved Hide resolved

dessalines reviewed Mar 11, 2024

View reviewed changes

flamingo-cant-draw added 6 commits March 12, 2024 00:39

Add date fields to table and use transaction.

fde7712

Use Cache for blocklist.

d4b7938

Rename check_links + move list to parameters of process_markdown.

51684b6

SQL format.

9e1a7f2

Format, again.

7d43413

Remove println.

c38c23a

Nutomic reviewed Mar 12, 2024

View reviewed changes

Nutomic approved these changes Mar 12, 2024

View reviewed changes

dullbananas approved these changes Mar 12, 2024

View reviewed changes

dessalines approved these changes Mar 12, 2024

View reviewed changes

Merge branch 'main' of https://github.com/LemmyNet/lemmy into url-blo…

18f14f6

…cklist

flamingo-cant-draw mentioned this pull request Mar 13, 2024

Add URL blocklist LemmyNet/lemmy-js-client#245

Merged

Add API test.

f60ce90

Set a shorter lifetime for regex in debug mode.

f249b5f

dessalines approved these changes Mar 13, 2024

View reviewed changes

dessalines requested changes Mar 13, 2024

View reviewed changes

crates/db_schema/src/source/local_site_url_blocklist.rs Show resolved Hide resolved

Add missing macro.

d6cbd1b

dessalines approved these changes Mar 13, 2024

View reviewed changes

flamingo-cant-draw added 3 commits March 13, 2024 16:52

Update lemmy-js-client

b71d04f

Update api_test/pnpm-lock.yaml

6869ef5

Don't break other tests

ab6a1c6

SleeplessOne1917 reviewed Mar 13, 2024

View reviewed changes

crates/utils/src/utils/validation.rs Outdated Show resolved Hide resolved

Nutomic reviewed Mar 14, 2024

View reviewed changes

crates/api_common/src/utils.rs Outdated Show resolved Hide resolved

flamingo-cant-draw and others added 3 commits March 14, 2024 15:30

Use different URL for test

caee651

Merge branch 'main' into url-blocklist

4a0e2e5

Merge branch 'main' into url-blocklist

89fdfa5

Nutomic enabled auto-merge (squash) March 15, 2024 10:49

Nutomic merged commit 19a1a07 into LemmyNet:main Mar 15, 2024
1 check passed

flamingo-cant-draw deleted the url-blocklist branch May 17, 2024 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a blocklist for URLs. #4515

Add a blocklist for URLs. #4515

flamingo-cant-draw commented Mar 7, 2024

sunaurus commented Mar 7, 2024 •

edited

Loading

Nutomic commented Mar 8, 2024

flamingo-cant-draw commented Mar 9, 2024

dessalines left a comment •

edited

Loading

dessalines Mar 11, 2024

Nutomic Mar 12, 2024

Nutomic left a comment

dessalines left a comment •

edited

Loading

Nutomic commented Mar 12, 2024

flamingo-cant-draw commented Mar 13, 2024

Nutomic commented Mar 13, 2024

dessalines commented Mar 13, 2024

SleeplessOne1917 Mar 13, 2024

flamingo-cant-draw Mar 13, 2024

dessalines Mar 14, 2024

SleeplessOne1917 Mar 14, 2024

dessalines Mar 14, 2024

Nutomic Mar 14, 2024

dessalines Mar 14, 2024

Nutomic Mar 14, 2024

flamingo-cant-draw Mar 14, 2024

Nutomic commented Mar 15, 2024

Add a blocklist for URLs. #4515

Add a blocklist for URLs. #4515

Conversation

flamingo-cant-draw commented Mar 7, 2024

sunaurus commented Mar 7, 2024 • edited Loading

Nutomic commented Mar 8, 2024

flamingo-cant-draw commented Mar 9, 2024

dessalines left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Nutomic left a comment

Choose a reason for hiding this comment

dessalines left a comment • edited Loading

Choose a reason for hiding this comment

Nutomic commented Mar 12, 2024

flamingo-cant-draw commented Mar 13, 2024

Nutomic commented Mar 13, 2024

dessalines commented Mar 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Nutomic commented Mar 15, 2024

sunaurus commented Mar 7, 2024 •

edited

Loading

dessalines left a comment •

edited

Loading

dessalines left a comment •

edited

Loading