-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Sciebo downloads #117
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently have no Mooble room with sciebo available so I am not really able to test this. Feedback from anyone else would be appreciated though in generall I am in favor of improving the Sciebo scraping
@@ -24,6 +24,7 @@ install_requires = | |||
yt-dlp>=2021.12.27 | |||
pdfkit>=0.6.0 | |||
tqdm>=4.0.0 | |||
lxml>=5.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is lxml needed for? (I know it can be used by bs4 but I do not know why it is necessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the response of /public.php/webdav is a xml-file which describes the structure of the sharing link we found.
To be able to parse xml we need to install lxml so that bs4 can use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As that XML is highly structured it should be sufficient to take the XML parser from the stdlib
sciebo_links = list( | ||
set(re.findall("https://rwth-aachen.sciebo.de/s/[a-zA-Z0-9-]+", text)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just move the set()
to the for-loop and drop the list, it has no effect
Co-authored-by: Nils K <24257556+septatrix@users.noreply.github.com>
solves #110