Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping comments doesn't work correctly #14

Open
shaklev opened this issue Apr 29, 2020 · 2 comments
Open

Scraping comments doesn't work correctly #14

shaklev opened this issue Apr 29, 2020 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@shaklev
Copy link

shaklev commented Apr 29, 2020

Looking at the following code segment:

        cmmBtn = browser.find_elements_by_xpath('//a[@class="_3hg- _42ft"]')
        for btn in cmmBtn:
            try:
                btn.click()
            except:
                pass
        time.sleep(1)
        moreCmm= browser.find_elements_by_xpath('//a[@class="_4sxc _42ft"]')
        for moreCmmBtn in moreCmm:
            try:
                moreCmmBtn.click()
            except:
                pass
        moreComments = browser.find_elements_by_xpath('//a[@class="_6w8_"]')

When you try to get all the "X comments" button ( with the line cmmBtn = browser.find_elements_by_xpath('//a[@class="_3hg- _42ft"]') ) and later in the for loop when you click them all, notice that if a post already has listed few comments (on page load before clicking X comments button - look picture ), than by clicking the button with class _3hg- _42ft you basically hide all the comments from that post.

There needs to be added additional checking to see if there already exists a _4sxc _42ft class whiting the post div ( meaning the view more comments button is shown = the _3hg- _42ft button doesn't need to be clicked )

Image

@brutalsavage brutalsavage self-assigned this Apr 29, 2020
@brutalsavage brutalsavage added the bug Something isn't working label Apr 29, 2020
@MatteoSerafino
Copy link

Is this bug fixed?

@shaklev
Copy link
Author

shaklev commented Sep 28, 2020

Is this bug fixed?

There needs to be added additional checking to see if there already exists a _4sxc _42ft class whiting the post div ( meaning the view more comments button is shown = the _3hg- _42ft button doesn't need to be clicked )

I wrote this as a simple solution back then, you can test it ( the principle should be the same now )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants