Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundesanzeiger: query of a string starting with a number returns an error #108

Open
andmbg opened this issue Jun 23, 2023 · 5 comments
Open

Comments

@andmbg
Copy link

andmbg commented Jun 23, 2023

How to reproduce:

from deutschland.bundesanzeiger import Bundesanzeiger
ba = Bundesanzeiger()
data = ba.get_reports('4steps systems')

Expected result:

  • assignment of the result dict to data.

What I got instead:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 3
      1 from deutschland.bundesanzeiger import Bundesanzeiger
      2 ba = Bundesanzeiger()
----> 3 data = ba.get_reports('4steps systems')

File [~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:186](https://file+.vscode-resource.vscode-cdn.net/home/adomberg/projects/20230616_Unternehmensregister/~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:186), in Bundesanzeiger.get_reports(self, company_name)
    182 # perform the search
    183 response = self.session.get(
    184     f"https://www.bundesanzeiger.de/pub/de/start?0-2.-top%7Econtent%7Epanel-left%7Ecard-form=&fulltext={company_name}&area_select=&search_button=Suchen"
    185 )
--> 186 return self.__generate_result(response.text)

File [~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:120](https://file+.vscode-resource.vscode-cdn.net/home/adomberg/projects/20230616_Unternehmensregister/~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:120), in Bundesanzeiger.__generate_result(self, content)
    118 """iterate trough all results and try to fetch single reports"""
    119 result = {}
--> 120 for element in self.__find_all_entries_on_page(content):
    121     get_element_response = self.session.get(element.content_url)
    123     if self.__is_captcha_needed(get_element_response.text):

File [~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:90](https://file+.vscode-resource.vscode-cdn.net/home/adomberg/projects/20230616_Unternehmensregister/~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:90), in Bundesanzeiger.__find_all_entries_on_page(self, page_content)
     88 soup = BeautifulSoup(page_content, "html.parser")
     89 wrapper = soup.find("div", {"class": "result_container"})
---> 90 rows = wrapper.find_all("div", {"class": "row"})
     91 for row in rows:
     92     info_element = row.find("div", {"class": "info"})

AttributeError: 'NoneType' object has no attribute 'find_all'

I tried other numerals and non-numerals with the described error pattern.

My env: Ubuntu 22.04, python 3.8.17

@wirthual
Copy link
Member

Hi thank you for reporting.

Do you get the same error if the string does not start with a number?

@andmbg
Copy link
Author

andmbg commented Jun 28, 2023

If I assign a Bundesanzeiger() object to a variable, then calling the get_reports('string') method of this object will go on to deliver results successfully whenever called and if string does not start with a number. Once I call it with a number at the beginning, this object will then also produce the error if subsequently called with a letter-first string. You can then regard this object as "broken". Any new instance of Bundesanzeiger will again function on non-number-starting strings. And to update: minus as first character works normally.

@wirthual
Copy link
Member

wirthual commented Jul 6, 2023

Interesting observation, good job!

Do you have an idea where this could originate from?

Here is a refactored version, might be worth to give it a shot if its working: #87

@timtensor
Copy link

Hi ,there is there a change in output. I have been trying to run the API but i get the following output , by printing out the data.key(). I am trying to execute it on google colab enviroment
Output
dict_keys(['5ba681c9115aaecdafc8c38bb108c3db', '4b2f77bf02e816bb8499faf15caefedd', '2634689c43cb89a694dcdbb4cab78e02', '292ca3b5f4810955aca539406eeb76cf', 'c61cd2984dae315fafddb9734ebe62f6', '2abad8c6b58e34b6d30f338fe26ec2f2', '67ce34620f61384c3cf57c0e27e2b6df', 'd745f647588351fbe83e28da65805f22', 'dc4e92d299d36b81f5b9396b0c086835', '48447cfb08c917d023b1eca9c5b2600a', '0321a5615eecb0eb209e755c467b9355', '86a6887e3dcfcde4e74137f90e1c418c', 'e2ffcd430071329d49e7c6cdc41321db', '0593e7ae90f18e1729d582019682aceb', 'ea23a1ab5abbe17edf5863b60821d6e6', '15c2d9e390f07feeac1dd8d7ca3c35b8', '0faf2776804538ca0ccfba9eae40aed4', '832955731452a656644742ff77f67631', '7a0874b3ef79b15f7a88c14a21e13c3c', '07d3bf28953ce5354e0bf502b1593d69'])

expected output
# dict_keys(['Jahresabschluss zum Geschäftsjahr vom 01.01.2020 bis zum 31.12.2020', 'Konzernabschluss zum Geschäftsjahr vom 01.01.2020 bis zum 31.12.2020\nErgänzung der Veröffentlichung vom 04.06.2021',

@wirthual
Copy link
Member

Hi tim,

regarding the keys in the dictionary, the output changed in a more recent version to use hashes instead of the names. This is due to multiple entries for a name. If you want to access the data you used to see in the previous version as key, you can simply access the data with:

from deutschland.bundesanzeiger import Bundesanzeiger
ba = Bundesanzeiger()
data = ba.get_reports('dfki')
for hash in data.keys():
   print(data[hash]["name"])

Which results in this example in:

Jahresabschluss zum Geschäftsjahr vom 01.01.2021 bis zum 31.12.2021
Jahresabschluss zum Geschäftsjahr vom 01.01.2020 bis zum 31.12.2020
Jahresabschluss zum Geschäftsjahr vom 01.01.2019 bis zum 31.12.2019
Corporate Governance Bericht 2020
Corporate Governance Bericht 2019
Corporate Governance Bericht 2019
Einladung zur ordentlichen Hauptversammlung
Honorarkonsularische Vertretung von Frankreich in Aachen
Corporate Governance Bericht 2018
Corporate Governance Bericht 2018
Corporate Governance Bericht 2017
Einladung zur ordentlichen Hauptversammlung
Honorarkonsularische Vertretung von Frankreich in Aachen
Jahresbericht zum 30. Juni 2008
Hauptversammlung

You can see some entries names are duplicated which is why the change to another key was necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants