Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing apps are not skipped #63

Open
Holtder opened this issue Oct 28, 2019 · 1 comment
Open

Missing apps are not skipped #63

Holtder opened this issue Oct 28, 2019 · 1 comment
Labels

Comments

@Holtder
Copy link

Holtder commented Oct 28, 2019

Describe the bug
In rare situations, an app will be listed as a result in the search function, while the app actually has been (temporarily) removed from the Play store. When using the detailed=True argument; the package will throw an error once the missing app is scraped, as it tries to access the actual app page.

To Reproduce
Steps to reproduce the behavior, e.g. the full example code, not just a snippet of where the error occurs!

 $ print(play_scraper.search('CAUTI', gl='nl', detailed='True', page=6))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/api.py", line 79, in search
    return s.search(query, page, detailed)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/scraper.py", line 224, in search
    apps = self._parse_multiple_apps(response)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/scraper.py", line 71, in _parse_multiple_apps
    return multi_futures_app_request(app_ids, params=self.params)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 531, in multi_futures_app_request
    result = response.result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/requests/sessions.py", line 653, in send
    r = dispatch_hook('response', hooks, r, **kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/requests/hooks.py", line 31, in dispatch_hook
    _hook_data = hook(hook_data, **kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 504, in parse_app_details_response_hook
    details = parse_app_details(soup)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 239, in parse_app_details
    title = soup.select_one('h1[itemprop="name"] span').text
AttributeError: 'NoneType' object has no attribute 'text'

In the original usecase (function that iterated over the pages using celery) the following error was thrown as well:

[2019-10-28 10:48:41,362: ERROR/ForkPoolWorker-1] Error occurred fetching uk.incrediblesoftware.mpcmachine.demo: 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=uk.incrediblesoftware.mpcmachine.demo&hl=en&gl=nl&q=CAUTI&c=apps

From this I tried to check out the actual play store page for uk.incrediblesoftware.mpcmachine.demo; which as expected, throws an HTTP 404 error.

Expected behavior
I hoped the package would print the 404-error; skip over this one and still return the remaining results. I can catch errors in my code to prevent problems, but that way an entire page of apps will still be excluded from the results.

Desktop (please complete the following information):

  • OS: Windows 10 - Running WSL Ubuntu 18.04
  • Python Version 3.6.8
  • play_scraper Version 0.6.0
@Holtder Holtder added the bug label Oct 28, 2019
@Holtder
Copy link
Author

Holtder commented Nov 5, 2019

It should be noted that the app I mentioned is back online again, so apparently some apps just go missing from time to time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant