-
-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a blocklist for URLs. #4515
Conversation
Awesome work, code looks nice, and the feature itself is super useful! |
Like I mentioned in the issue, I dont think its necessary to add new endpoints for this. Instead the url blocks can be updated with a parameter in EditSite, and then written to the db table here. For reading it can be added to GetSiteResponse. |
Ah, my bad, I misunderstood what you meant.
I don't personally think this a good idea. A blocklist is the kind of thing that can grow to be quite big overtime, so a scenario where most of the data in a |
cdd22a6
to
62b6a29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a great addition.
The main thing I see missing, is that this blocklist needs to be processed for all markdown bodies as well, not just URL fields (otherwise people could just spam the same links in private messages, post bodies, and comment bodies).
We already have a process_markdown
function in crates/api_common/src/utils.rs
, so I suggest adding a parameter to that function similar to the slur_regex
already provided, and doing the check there also.
@@ -268,6 +269,8 @@ pub struct EditSite { | |||
pub allowed_instances: Option<Vec<String>>, | |||
/// A list of blocked instances. | |||
pub blocked_instances: Option<Vec<String>>, | |||
/// A list of blocked URLs | |||
pub blocked_urls: Option<Vec<String>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, keeping to this standard is best for now. In the future we could potentially move all these lists to a more atomic method of updating.
crates/utils/src/utils/validation.rs
Outdated
Url::parse(&format!("https://{}", url))? | ||
} else { | ||
Err(e)? | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably rewrite this as:
Err(ParseError::RelativeUrlWithoutBase) => Url::parse(&format!("https://{}", url))?,
e => e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. The only other thing worth doing is add a test for this in api_tests, to ensure that the api works correctly. This requires adding the api changes to https://github.com/LemmyNet/lemmy-js-client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, thx for this one! I'll let @Nutomic decide if that last change is necessary, or if we can merge as-is. IMO an API test isn't necessary here, as this isn't federation related, and it has plenty of local tests.
The API test in this case is to ensure that the API endpoints work as expected (and dont get broken by some random change in the future). But yes its not mandatory. |
Added the API test, I don't know if there's a better way getting the regex to update than just using |
You can set a shorter caching interval in debug mode like this: https://github.com/LemmyNet/lemmy/blob/main/crates/federate/src/lib.rs#L21-L24 |
@flamingo-cant-draw In a few minutes after its deployed, you can add |
pub struct LocalSiteUrlBlocklist { | ||
pub id: i32, | ||
pub url: String, | ||
pub published: DateTime<Utc>, | ||
pub updated: Option<DateTime<Utc>>, | ||
} | ||
|
||
#[derive(Default, Clone)] | ||
#[cfg_attr(feature = "full", derive(Insertable, AsChangeset))] | ||
#[cfg_attr(feature = "full", diesel(table_name = local_site_url_blocklist))] | ||
pub struct LocalSiteUrlBlocklistForm { | ||
pub url: String, | ||
pub updated: Option<DateTime<Utc>>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These structs seem to represent individual URLs. If that's the case, the "List" part of these struct names don't make much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, but this is the same naming scheme as FederationBlockList
and FederationAllowList
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also prefer not to use List
or collection-type names in sql tables, which really should use singular names for what each row is, but flamingo did the right thing by sticking to the standard set by FederationBlockList
, since they're really similar.
In the future, we could maybe rename all of these at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, we could maybe rename all of these at the same time.
If/when we get around to this, there are also some inconsistencies with how API request and response structs are named that can be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open up an issue for that one if you would, so we don't forget.
|
||
// Later tests need this to be empty | ||
editSiteForm.blocked_urls = []; | ||
await epsilon.editSite(editSiteForm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the random test failures happen because the blocklist is still being cached for a few seconds after this. I would fix it by blocking a different domain such as https://evil.com
, and passing an explicit link to createPost()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests seem to be passing now, so it was probably just an issue with my runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or its due to the url blocklist and will fail again later. Better to resolve it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the URL, but I'm still confused why "Sticky a post" was failing given it doesn't use epsilon
.
Thanks for the PR, well done! |
Implements a blocklist for URLs, as discussed in #4514.
It's currently very basic and fragile, but it should do as an initial implementation.