Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Regex filtering for black and whitelists #183

Open
Udinanon opened this issue Mar 2, 2023 · 6 comments
Open

Add Regex filtering for black and whitelists #183

Udinanon opened this issue Mar 2, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@Udinanon
Copy link

Udinanon commented Mar 2, 2023

Description of the problem

I want to filter files being downloaded based on the URLs in detail, to avoid downloading unwanted content such as videos
My university hosts a lot of content by itself, so I can't just do a domain filtering, i need to filter the URLs

Solution

Regex would be a very flexible tool to do it and would expand the capabilities of the tool while using a very common technology

I want to add a download_domains_blacklist_regex to the options, i think that's all in config.py
Then a small edit to the function is_filtered_external_domain in task.py, checking the full URLs against this Regex list, and then filtering by hostname using the old version of the blacklist
I might have to also add the option properties in the types.py dataclass

This should be all the needed steps to add the option correctly, although some of this is based on notes from a previous version of the library
I'll likely start working on a PR in the next few days, please tell me if anything here is incorrect or inaccurate

Alternatives

Perhaps the new filtering could cover the previous domain method, but that would break backwards compatibility unnecessarily
Other text schemes are possible, but Regex is extremely popular and well implemented in Python

@Udinanon Udinanon added the enhancement New feature or request label Mar 2, 2023
@Udinanon Udinanon changed the title Add Regex filtering for balck and whitelists Add Regex filtering for black and whitelists Mar 2, 2023
@C0D3D3V
Copy link
Owner

C0D3D3V commented Mar 2, 2023

The idea sounds fine to me, and was requested multiple times before. I think a lot of people will like it. And you are welcome to implement it :)

But please tell me more what kind of content you want to filter and how it is listed in your moodle?
E.g. how does the URL pattern looks like you want to filter?

This filter is definitely good, but I'm not sure if it will do what you want :D
for example:

if location['module_modname'] in ['kalvidres', 'helixmedia', 'lti']:

There is no filter for LTI Modules, like kalvidres, helixmedia, opencast ... these will not be filtered by these patterns because they are "cookie mods" (maybe someday someone implements them as normal mod) and handled here:

await self.external_download_url(add_token=False, delete_if_successful=True, use_cookies=True)

But anyway this regex filter is a good idea, and more filters for other parts of moodle-dl will come eventually.

@Udinanon
Copy link
Author

Udinanon commented Mar 4, 2023

Well, what i wanted to filer was exactly kalvidres content , so this is a bit of a bummer
Perhaps we could also filter requests going inside external_download_url directly, applying the same blacklist/whitelist?

@C0D3D3V
Copy link
Owner

C0D3D3V commented Mar 4, 2023

We could also add a regex filter to the end of filter_courses in moodle_service.py
I do not really like the idea but we can do it. You can do that instead of adding it to the external filter only. So all urls will be filtered.

@C0D3D3V
Copy link
Owner

C0D3D3V commented Mar 4, 2023

We will also somday add filters for modules, so that kalvidres module can also be filtered

@C0D3D3V
Copy link
Owner

C0D3D3V commented Mar 9, 2023

If you add a filter to filter_courses in moodle_service.py you can filter for
r'https://your.moodle.com/mod/kalvidres/view.php\?.*' that should do the trick. Adding a filter option only for kalvidres module will take some more time, because I want to thing about a good solution, that we can apply to all mods

@C0D3D3V
Copy link
Owner

C0D3D3V commented Apr 5, 2023

In principle related to #135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants