Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a blocklist for URLs. #4515

Merged
merged 23 commits into from
Mar 15, 2024
Merged

Conversation

flamingo-cant-draw
Copy link
Contributor

Implements a blocklist for URLs, as discussed in #4514.

It's currently very basic and fragile, but it should do as an initial implementation.

@flamingo-cant-draw flamingo-cant-draw marked this pull request as ready for review March 7, 2024 18:11
src/api_routes_http.rs Outdated Show resolved Hide resolved
@sunaurus
Copy link
Collaborator

sunaurus commented Mar 7, 2024

Awesome work, code looks nice, and the feature itself is super useful!

@Nutomic
Copy link
Member

Nutomic commented Mar 8, 2024

Like I mentioned in the issue, I dont think its necessary to add new endpoints for this. Instead the url blocks can be updated with a parameter in EditSite, and then written to the db table here. For reading it can be added to GetSiteResponse.

@flamingo-cant-draw
Copy link
Contributor Author

Like I mentioned in the issue, I dont think its necessary to add new endpoints for this.

Ah, my bad, I misunderstood what you meant.

For reading it can be added to GetSiteResponse.

I don't personally think this a good idea. A blocklist is the kind of thing that can grow to be quite big overtime, so a scenario where most of the data in a /site request is the array of URLs isn't unrealistic. But maybe that's just me being paranoid.

crates/api_common/src/utils.rs Outdated Show resolved Hide resolved
crates/db_schema/src/impls/local_site_url_blocklist.rs Outdated Show resolved Hide resolved
crates/api_common/src/utils.rs Outdated Show resolved Hide resolved
Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a great addition.

The main thing I see missing, is that this blocklist needs to be processed for all markdown bodies as well, not just URL fields (otherwise people could just spam the same links in private messages, post bodies, and comment bodies).

We already have a process_markdown function in crates/api_common/src/utils.rs , so I suggest adding a parameter to that function similar to the slur_regex already provided, and doing the check there also.

migrations/2024-03-06-201637_url_blocklist/up.sql Outdated Show resolved Hide resolved
@@ -268,6 +269,8 @@ pub struct EditSite {
pub allowed_instances: Option<Vec<String>>,
/// A list of blocked instances.
pub blocked_instances: Option<Vec<String>>,
/// A list of blocked URLs
pub blocked_urls: Option<Vec<String>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, keeping to this standard is best for now. In the future we could potentially move all these lists to a more atomic method of updating.

crates/api_common/src/utils.rs Outdated Show resolved Hide resolved
crates/utils/src/utils/validation.rs Outdated Show resolved Hide resolved
Url::parse(&format!("https://{}", url))?
} else {
Err(e)?
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably rewrite this as:

Err(ParseError::RelativeUrlWithoutBase) => Url::parse(&format!("https://{}", url))?,
e => e

Copy link
Member

@Nutomic Nutomic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The only other thing worth doing is add a test for this in api_tests, to ensure that the api works correctly. This requires adding the api changes to https://github.com/LemmyNet/lemmy-js-client

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, thx for this one! I'll let @Nutomic decide if that last change is necessary, or if we can merge as-is. IMO an API test isn't necessary here, as this isn't federation related, and it has plenty of local tests.

@Nutomic
Copy link
Member

Nutomic commented Mar 12, 2024

The API test in this case is to ensure that the API endpoints work as expected (and dont get broken by some random change in the future). But yes its not mandatory.

@flamingo-cant-draw
Copy link
Contributor Author

Added the API test, I don't know if there's a better way getting the regex to update than just using delay to wait a minute.

@Nutomic
Copy link
Member

Nutomic commented Mar 13, 2024

You can set a shorter caching interval in debug mode like this: https://github.com/LemmyNet/lemmy/blob/main/crates/federate/src/lib.rs#L21-L24

@dessalines
Copy link
Member

@flamingo-cant-draw In a few minutes after its deployed, you can add lemmy-js-client: 0.19.4-alpha.8 to api_tests/package.json .

Comment on lines +15 to +28
pub struct LocalSiteUrlBlocklist {
pub id: i32,
pub url: String,
pub published: DateTime<Utc>,
pub updated: Option<DateTime<Utc>>,
}

#[derive(Default, Clone)]
#[cfg_attr(feature = "full", derive(Insertable, AsChangeset))]
#[cfg_attr(feature = "full", diesel(table_name = local_site_url_blocklist))]
pub struct LocalSiteUrlBlocklistForm {
pub url: String,
pub updated: Option<DateTime<Utc>>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These structs seem to represent individual URLs. If that's the case, the "List" part of these struct names don't make much sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, but this is the same naming scheme as FederationBlockList and FederationAllowList.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also prefer not to use List or collection-type names in sql tables, which really should use singular names for what each row is, but flamingo did the right thing by sticking to the standard set by FederationBlockList, since they're really similar.

In the future, we could maybe rename all of these at the same time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we could maybe rename all of these at the same time.

If/when we get around to this, there are also some inconsistencies with how API request and response structs are named that can be confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open up an issue for that one if you would, so we don't forget.


// Later tests need this to be empty
editSiteForm.blocked_urls = [];
await epsilon.editSite(editSiteForm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the random test failures happen because the blocklist is still being cached for a few seconds after this. I would fix it by blocking a different domain such as https://evil.com, and passing an explicit link to createPost().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests seem to be passing now, so it was probably just an issue with my runner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or its due to the url blocklist and will fail again later. Better to resolve it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the URL, but I'm still confused why "Sticky a post" was failing given it doesn't use epsilon.

@Nutomic Nutomic enabled auto-merge (squash) March 15, 2024 10:49
@Nutomic
Copy link
Member

Nutomic commented Mar 15, 2024

Thanks for the PR, well done!

@Nutomic Nutomic merged commit 19a1a07 into LemmyNet:main Mar 15, 2024
1 check passed
@flamingo-cant-draw flamingo-cant-draw deleted the url-blocklist branch May 17, 2024 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants