Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bundesanzeiger: allow fetching multiple pages for one company #143

Merged
merged 3 commits into from
Jun 8, 2024

Conversation

jfhr
Copy link
Contributor

@jfhr jfhr commented Apr 27, 2024

The current Bundesanzeiger implementation only fetches one page of results with up to 20 reports, but sometimes it might be interesting to get older reports as well.

This adds a named parameter page_limit to Bundesanzeiger.get_reports. The default value is 1, which preserves the current behavior of fetching only one page. If a higher value is set, the client will search the returned HTML for a "next page" link, and keep generating reports until page_limits pages have been parsed or there is no "next page" link anymore. float('inf') can be passed to fetch all available pages.

This commit adds a unit test for the method to find the "next page" link and another to test that it actually generates more than 20 reports.

This also encodes the company name in the URL so that search terms like "Saxony Minerals & Exploration - SME AG" work correctly.

This adds a named parameter page_limit to Bundesanzeiger.get_reports. The default value is 1, which preserves the current behavior of fetching only one page. If a higher value is set, the client will search the returned HTML for a "next page" link, and keep generating reports until page_limits pages have been parsed or there is no "next page" link anymore. float('inf') can be passed to fetch all available pages.

This commit adds a unit test for the method to find the "next page" link and another to test that it actually generates more than 20 reports.
@wirthual
Copy link
Member

wirthual commented May 3, 2024

Hi,

thanks for the contribution. Can you black-format your code changes?

How do you execute your tests? It seems the execution in the CI does not find response.html. Probably because the working dir is different.

Maybe we can use importlib to find the file independent of the execution path?

This resolves the path to response.html in test_get_next_page_link relative to __file__ (which should be the local path to the python test file). This should hopefully work on the CI server.

I also ran black ./src and black ./tests
@jfhr
Copy link
Contributor Author

jfhr commented May 11, 2024

Hey, thanks for the feedback. I ran

black ./src
black ./tests

I also used os.path.dirname(__file__) in the test code to get the correct path to response.html. As far as I can tell the tests work now, but some tests in other packages are failing.

I'm not sure how importlib would differ from __file__ (not really a python developer).

@wirthual
Copy link
Member

wirthual commented Jun 8, 2024

In some exotic cases, __file__ might not be populated. But in a test thats fine. Thank you for the changes.

@wirthual wirthual merged commit 3178e4b into bundesAPI:main Jun 8, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants