Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puppeteer Scraper readme and input schema improvements #122

Open
mnmkng opened this issue Mar 17, 2021 · 1 comment
Open

Puppeteer Scraper readme and input schema improvements #122

mnmkng opened this issue Mar 17, 2021 · 1 comment

Comments

@mnmkng
Copy link
Member

mnmkng commented Mar 17, 2021

Web Scraper and Cheerio Scraper already have READMEs and schemas of sufficient quality, but Puppeteer Scraper is lacking. The structure and format should be exactly the same as the existing ones, but the contents will differ. In some places not so much, in other places a lot.

Whoever attempts this should:

  • read the tutorials for all the scrapers, to understand how they work
  • make sure they understand the differences between Web/Cheerio and Puppeteer scrapers (read this for difference between web scraper and puppeteer scraper)
  • reuse as much as possible from the existing readmes, no need to reinvent the wheel, but make sure that the differences are not missed or obscured in the readme. Ideally, we would point out the differences where appropriate
  • while writing, run the scraper regularly with different inputs to see what they actually do and how they work
  • INPUT_SCHEMA.json description fields need to be updated and changed too, see input schemas of Web/Cheerio for inspiration. The descriptions are shown in the scraper UI as tooltips, so the descriptions need to look good in the UI.
@mnmkng
Copy link
Member Author

mnmkng commented Mar 31, 2021

To see the changes on the Apify platform, one must first build the scraper. To do that:

  1. Create a new actor on the Apify platform
  2. Go to Source tab
  3. Under Source code, select Type: Git repository
  4. Add the git URL of your fork. Example for this repository: https://github.com/apifytech/actor-crawler.git#master:puppeteer-scraper. Your URL will be similar but different.
  5. Click Save and then Build now.
  6. The Developer Console below should refresh with the new changes. If not, refresh your browser window.

@mnmkng mnmkng changed the title Puppeteer Scraper tutorial and input schema improvements Puppeteer Scraper readme and input schema improvements Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant