Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto update the PEI dataset by downloading pdf and extracting data via Python #36

Open
mc51 opened this issue Jan 21, 2022 · 1 comment

Comments

@mc51
Copy link

mc51 commented Jan 21, 2022

First of: Important and good App. Thanks for the effort!
I also had the idea to make comparing tests more easy, because I was frustrated by how hard it was to find and use the data from Paul Ehrlich Institut (PEI). I've built an interactive website https://corona.pw displaying the original PEI data in a more convenient way. (It's also on GitHub)

The part which was a little tricky, was automatically extracting the data from the .pdf file. It's a bit hacky, but if the structure doesn't change to much, it should also work in the future. I guess that would be a good addition to this tool? You can check the Python code for that in the repo.
I'm a bit busy right now, that's why I've created an issue instead of PR. But if I find some time, I'll also gladly open a PR.

@Marcono1234
Copy link

Marcono1234 commented Feb 7, 2022

It looks like this data is also available as Excel spreadsheet on https://www.pei.de/DE/newsroom/dossier/coronavirus/coronavirus-inhalt.html?nn=169730&cms_pos=8:
Download section screenshot
("Excel-Tabellen: Vergleichende Evaluierung der Sensitivität von SARS-CoV-2 Antigenschnelltests (Selbsttests + Schnelltests)"; direct download link)

It is a bit weird that they provide the download links on two separate pages, and that the other page only contains the PDF download link without linking between these pages. The Excel file might also not exist that long yet, while writing this it has v=8 whereas the PDF file has v=77 in its URL (assuming that this is a version number; though its value seems to be ignored for HTTP requests).

Edit: Looks like the maintainers are aware of it as well and have created src/data/xlsx2all.py to parse that Excel file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants