Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add playwright scraper #26

Merged
merged 38 commits into from
Aug 19, 2022
Merged

feat: add playwright scraper #26

merged 38 commits into from
Aug 19, 2022

Conversation

B4nan
Copy link
Member

@B4nan B4nan commented Aug 8, 2022

No description provided.

@B4nan
Copy link
Member Author

B4nan commented Aug 8, 2022

cc @barjin, maybe you can reuse something from this

@AndreyBykov AndreyBykov marked this pull request as ready for review August 17, 2022 15:27
@AndreyBykov AndreyBykov requested review from barjin and removed request for barjin August 17, 2022 15:27
@AndreyBykov
Copy link
Contributor

@B4nan could not add you as a reviewer 🤷‍♂️

Updated the readme (as well as quite a few other things), tried to build on the platform - seems like all is good. There are some rough things to be polished in scraper tools, but seem to be working fine.

Using big playwright image and runs both chromium/firefox. Webkit has some issues with fingerprints for some reason. I guess if it's gonna be slow - in the future we could create Playwright scraper+chromium and separate with firefox one.

@AndreyBykov
Copy link
Contributor

btw - I put some links to Apify SDK with sdk.apify.com so that we would not need to return to it, hopefully, I used the correct links (but I'll check it of course).

Copy link
Contributor

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, great job :)

@AndreyBykov
Copy link
Contributor

@B4nan so, apparently I fucked up package-lock on merging changes from master here: bff9f9e
Then I guess I re-generated it here 3716dc3 (not sure really, as tests were failing because of the version first, then because of the badly merged changes from master, I guess I just removed and npm installed everything again) and it screwed things up completely.

I compared the changes from this branch with master and tried to install everything/run the tests locally - it was fine when I noticed that the only difference is lock file and there's something off with it.

So yeah, I thought I am going crazy 🤯🤯🤯

But finally all green.

@B4nan
Copy link
Member Author

B4nan commented Aug 19, 2022

The thing with reinstalling I noticed some time ago - if you want to wipe node modules, lock file and reinstall, you also need to wipe the node modules folder inside packages/*. Dont ask me why, you know my stand on how dumb NPM is :D Just the fact that the yesterdays weirdness was resolved by doing yarn instead and checking its error/warning logs is self explanatory :D

Copy link
Member Author

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gave it just a very bried look, i'd say lets merge and push to the platform as beta to test it properly there

@AndreyBykov
Copy link
Contributor

As for the previous comment (node modules wiping etc) - that would actually explain it - I was kinda thinking even how I ended up with that new lock file - as when I tried to wipe it and npm install fresh - it was still different - so yeah - must have left some node_modules in packages

@AndreyBykov AndreyBykov merged commit 2dcd50d into master Aug 19, 2022
@AndreyBykov AndreyBykov deleted the pw-scraper branch August 19, 2022 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants