Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pagingControls Error #14

Open
wangrunzu opened this issue May 31, 2019 · 10 comments
Open

pagingControls Error #14

wangrunzu opened this issue May 31, 2019 · 10 comments

Comments

@wangrunzu
Copy link

wangrunzu commented May 31, 2019

I got the following error about the paging control when I try to scrap the data.

python.exe main.py --headless --url "https://www.glassdoor.com/Reviews/Walmart-Reviews-E715.htm" --limit 100 -f test.csv

2019-05-31 15:06:49,643 INFO 377 :main.py(17796) - Configuring browser

DevTools listening on ws://127.0.0.1:50831/devtools/browser/8c7890e8-fe24-41f7-b77f-d22dae3f6c3e
2019-05-31 15:06:51,700 INFO 419 :main.py(17796) - Scraping up to 100 reviews.
2019-05-31 15:06:51,717 INFO 358 :main.py(17796) - Signing in to ******@ou.edu
2019-05-31 15:06:55,478 INFO 339 :main.py(17796) - Navigating to company reviews
2019-05-31 15:07:08,137 INFO 286 :main.py(17796) - Extracting reviews from page 1
2019-05-31 15:07:08,200 INFO 291 :main.py(17796) - Found 10 reviews on page 1
2019-05-31 15:07:08,677 INFO 297 :main.py(17796) - Scraped data for "The Best in Retail"(Thu May 30 2019 20:24:44 GMT-0500 (Central Daylight Time))
2019-05-31 15:07:09,171 INFO 297 :main.py(17796) - Scraped data for "Walmart needs to bring worker dignity back into focus"(Wed May 29 2019 18:04:43 GMT-0500 (Central Daylight Time))
2019-05-31 15:07:09,673 INFO 297 :main.py(17796) - Scraped data for "Great for college students"(Thu May 30 2019 12:25:57 GMT-0500 (Central Daylight Time))
2019-05-31 15:07:10,042 INFO 297 :main.py(17796) - Scraped data for "Retail"(Thu May 30 2019 17:09:02 GMT-0500 (Central Daylight Time))
2019-05-31 15:07:10,497 INFO 297 :main.py(17796) - Scraped data for "walmart"(Mon May 27 2019 17:17:41 GMT-0500 (Central Daylight Time))
2019-05-31 15:07:10,966 INFO 297 :main.py(17796) - Scraped data for "Maintenance is well taken care of"(Tue May 28 2019 08:32:17 GMT-0500
(Central Daylight Time))
2019-05-31 15:07:11,437 INFO 297 :main.py(17796) - Scraped data for "It was the best job that I had to be honest"(Wed May 29 2019 20:29:39 GMT-0500 (Central Daylight Time))
2019-05-31 15:07:11,896 INFO 297 :main.py(17796) - Scraped data for "Great"(Wed May 29 2019 20:36:02 GMT-0500 (Central Daylight Time))
2019-05-31 15:07:12,281 INFO 297 :main.py(17796) - Scraped data for "floater pharmacist"(Wed May 29 2019 21:10:58 GMT-0500 (Central Daylight Time))
2019-05-31 15:07:12,708 INFO 297 :main.py(17796) - Scraped data for "cashier"(Wed May 29 2019 23:11:49 GMT-0500 (Central Daylight Time))
Traceback (most recent call last):
File "main.py", line 461, in
main()
File "main.py", line 446, in main
while more_pages() and
File "main.py", line 314, in more_pages
paging_control = browser.find_element_by_class_name('pagingControls')
File "C:\Users\wang0040\AppData\Local\Continuum\miniconda3\envs\Default\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 564, in find_element_by_class_name
return self.find_element(by=By.CLASS_NAME, value=name)
File "C:\Users\wang0040\AppData\Local\Continuum\miniconda3\envs\Default\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Users\wang0040\AppData\Local\Continuum\miniconda3\envs\Default\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\wang0040\AppData\Local\Continuum\miniconda3\envs\Default\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line
242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"pagingControls"}
(Session info: headless chrome=74.0.3729.169)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Windows NT 6.1.7601 SP1 x86_64)

I also got the No Such Element Exception #8 error, but overcoming it by hide the scrape_years part. I do not think this action cause the above issue but I am not sure.

@jhatamyar
Copy link

I am suddenly getting the exact same exception error using chromedriver 73.0.3683.6 on Mac OS X 10.13.6. The code was working 100% perfectly a few weeks ago. I am looking into get_current_page() as I'm curious if find_elements by class name or xpath might be the problem, but I am a total beginner with selenium. Hoping the author can help.

@MatthewChatham
Copy link
Owner

Thanks folks, I may have time to look at this in the coming week. But if you're able to figure it out and make a PR to fix, I'll merge it!

@guoruijiao
Copy link

I'm seeing the exact same error as above. It would be great if this can be resolved.

@heraldnithesh
Copy link

Hi, Is this resolved ?

@batordavid
Copy link

Replacing some line of codes helped me.

Original (3 places in the codes):
paging_control = browser.find_element_by_class_name('pagingControls')
Updated:
paging_control = browser.find_element_by_css_selector('.eiReviews__EIReviewsPageContainerStyles__pagination.noTabover.mt')

Original (2 places in the codes):
next_ = paging_control.find_element_by_class_name('next')
Updated:
next_ = paging_control.find_element_by_class_name('pagination__PaginationStyle__next')

@tsp2123
Copy link

tsp2123 commented Jul 28, 2019

Hey, so does anyone have an issue where they fix the paging_control options but it breaks later on? I'm trying to scrape around 30k worth of data. And the code keeps breaking for me on around p176. I used the following for paging_control

`
def more_pages():
paging_control = browser.find_element_by_css_selector('.eiReviews__EIReviewsPageContainerStyles__pagination.noTabover.mt')
next_ = paging_control.find_element_by_class_name('pagination__PaginationStyle__next')
try:
next_.find_element_by_tag_name('a')
return True
except selenium.common.exceptions.NoSuchElementException:
return False

def go_to_next_page():
logger.info(f'Going to page {page[0] + 1}')
paging_control = browser.find_element_by_class_name('pagination__PaginationStyle__pagination')
next_ = paging_control.find_element_by_class_name(
'pagination__PaginationStyle__next').find_element_by_tag_name('a')
browser.get(next_.get_attribute('href'))
time.sleep(1)
page[0] = page[0] + 1

`

I'm messing around with both to see what works but my code keeps breaking not even a quarter way through the scraping. Does anyone have a work around?

@carlotorniai
Copy link

Hi all I've tried both suggestions and still the code breaks.
Any clue?
Traceback below:
Traceback (most recent call last):
File "main.py", line 483, in
main()
File "main.py", line 468, in main
while more_pages() and
File "main.py", line 315, in more_pages
paging_control = browser.find_element_by_css_selector('.eiReviews__EIReviewsPageContainerStyles__pagination.noTabover.mt')
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 598, in find_element_by_css_selector
return self.find_element(by=By.CSS_SELECTOR, value=css_selector)
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 978, in find_element
'value': value})['value']
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".eiReviews__EIReviewsPageContainerStyles__pagination.noTabover.mt"}
(Session info: headless chrome=79.0.3945.130)

@carlotorniai
Copy link

Getting the latest code form MuhammadMehran pull request fixed the issue.

@EdiLacic123
Copy link

@carlotorniai Could you post the code by any chance? I have been trying to fix the same issue as well. Thanks

@carlotorniai
Copy link

@EdiLacic123 just grab the main.py, test.py and schema. py from this pull request: https://github.com/MatthewChatham/glassdoor-review-scraper/pull/37/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants