From 8a41fbd8ee2d794fb92b563e679a639e3aa53d88 Mon Sep 17 00:00:00 2001
From: Vratislav Bartonicek <vratislav@vbartonicek.cz>
Date: Tue, 1 Oct 2019 14:05:39 +0200
Subject: [PATCH 1/3] Scraping category - content

---
 docs/scraping/cheerio_scraper.md          | 364 ++++++++++++++-
 docs/scraping/index.md                    |  38 +-
 docs/scraping/introduction.md             | 294 +++++++++++++
 docs/scraping/legacy_phantomjs_crawler.md |   6 +-
 docs/scraping/puppeteer_scraper.md        | 511 +++++++++++++++++++++-
 docs/scraping/web_scraper.md              | 401 ++++++++++++++++-
 6 files changed, 1594 insertions(+), 20 deletions(-)
 create mode 100644 docs/scraping/introduction.md
diff --git a/docs/scraping/cheerio_scraper.md b/docs/scraping/cheerio_scraper.md
index 301d55f291..1125be6e2b 100644
--- a/docs/scraping/cheerio_scraper.md
+++ b/docs/scraping/cheerio_scraper.md
@@ -2,12 +2,366 @@
 title: Cheerio Scraper
 ---
 
-## [](#cheerio-scraper)Cheerio Scraper
+# [](#scraping-with-cheerio-scraper)Scraping with Cheerio Scraper
 
-Cheerio Scraper is a ready-made solution for crawling the web using plain HTTP requests to retrieve HTML pages and then parsing and inspecting the HTML using the Cheerio library. It's blazing fast.
+This scraping tutorial will go into the nitty gritty details of extracting data from `https://apify.com/store` using the `apify/cheerio-scraper`. If you arrived here from the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction), tutorial, great! You are ready to continue where we left off. If you haven't seen the Getting started yet, check it out, it will help you learn about Apify and scraping in general and set you up for this tutorial, because this one builds on topics and code examples discussed there.
 
-Cheerio is a server-side version of the popular jQuery library that does not run in the browser, but instead constructs a DOM out of a HTML string and then provides the user with API to work with that DOM.
+## [](#getting-to-know-our-tools)Getting to know our tools
 
-Cheerio Scraper is ideal for scraping websites that do not rely on client-side JavaScript to serve their content. It can be as much as 20 times faster than using a full browser solution such as Puppeteer.
+In the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction) tutorial, we've confirmed that the scraper works as expected, so now it's time to add more data to the results.
 
-[Visit the Cheerio Scraper tutorial to get started!](./scraping/tutorial/cheerio-scraper)
+To do that, we'll be using the [`Cheerio`](https://github.com/cheeriojs/cheerio) library. This may not sound familiar, so let me try again. Does [`jQuery` library](https://jquery.com/) ring a bell? If it does you're in luck, because `Cheerio` is just `jQuery` that doesn't need an actual browser to run. Everything else is the same. All the functions you already know are there and even the familiar `<div class="rendered-markdown" is used. If you still have no idea what either of those are, don't worry. We'll walk you through using them step by step.
+
+> To learn more about `Cheerio`, see [the docs on their GitHub page](https://github.com/cheeriojs/cheerio).
+
+Now that's out of the way, let's open one of the actor detail pages in the Store, for example the [`apify/web-scraper`](https://apify.com/apify/web-scraper) page and use our DevTools-Fu to scrape some data.
+
+> If you're wondering why we're using `apify/web-scraper` as an example instead of `cheerio-scraper`, it's only because we didn't want to triple the number of screenshots we needed to make. Lazy developers!
+
+## [](#quick-recap)Quick recap
+
+Before we start, let's do a quick recap of the data we chose to scrape:
+
+1.  **URL** - The URL that goes directly to the actor's detail page.
+2.  **Unique identifier** - Such as `apify/web-scraper`.
+3.  **Title** - The title visible in the actor's detail page.
+4.  **Description** - The actor's description.
+5.  **Last run date**- When the actor was last run.
+6.  **Number of runs** - How many times the actor was run.
+
+![data to scrape](https://apifyusercontent.com/7274765d35b9a7c781e5bcc705a3dbdcf3c308ec/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7363726170696e672d70726163746963652e6a7067 "Overview of data to be scraped.")
+
+We've already scraped number 1 and 2 in the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction) tutorial, so let's get to the next one on the list: Title
+
+### [](#title)Title
+
+![actor title](https://apifyusercontent.com/5274e02a1c45ed96a7d8c0147ac6e3d99f883ed0/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7469746c652e6a7067 "Finding actor title in DevTools.")
+
+By using the element selector tool, we find out that the title is there under an `<h1>` tag, as titles should be. Maybe surprisingly, we find that there are actually two `<h1>` tags on the detail page. This should get us thinking. Is there any parent element that includes our `<h1>` tag, but not the other ones? Yes, there is! There is a `<header>` element that we can use to select only the heading we're interested in.
+
+> Remember that you can press CTRL+F (CMD+F) in the Elements tab of DevTools to open the search bar where you can quickly search for elements using their selectors. And always make sure to use the DevTools to verify your scraping process and assumptions. It's faster than changing the crawler code all the time.
+
+To get the title we just need to find it using a `header h1` selector, which selects all `<h1>` elements that have a `<header>` ancestor. And as we already know, there's only one.
+
+    // Using Cheerio.
+    return {
+        title: $('header h1').text(),
+    };
+
+### [](#description)Description
+
+Getting the actor's description is a little more involved, but still pretty straightforward. We can't just simply search for a `<p>` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the actor description is nested within the `<header>` element too, same as the title. Sadly, we're still left with two `<p>` tags. To finally select only the description, we choose the `<p>` tag that has a `class` that starts with `Text__Paragraph`.
+
+![actor description selector](https://apifyusercontent.com/28dee1e51c6ac3e8ec67f0eb953b4a71c775f217/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6465736372697074696f6e2e6a7067 "Finding actor description in DevTools.")
+
+    return {
+        title: $('header h1').text(),
+        description: $('header p[class^=Text__Paragraph]').text(),
+    };
+
+### [](#last-run-date)Last run date
+
+The DevTools tell us that the `lastRunDate` can be found in the second of the two `<time>` elements in the page.
+
+![actor last run date selector](https://apifyusercontent.com/6fe3f03692a7dc3acc35be74b3b8baacb98d7ac3/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6c6173742d72756e2d646174652e6a7067 "Finding actor last run date in DevTools.")
+
+    return {
+        title: $('header h1').text(),
+        description: $('header p[class^=Text__Paragraph]').text(),
+        lastRunDate: new Date(
+            Number(
+                $('time')
+                    .eq(1)
+                    .attr('datetime'),
+            ),
+        ),
+    };
+
+It might look a little too complex at first glance, but let me walk you through it. We find all the `<time>` elements. There are two, so we grab the second one using the `.eq(1)` call (it's zero indexed) and then we read its `datetime` attribute, because that's where a unix timestamp is stored as a `string`.
+
+But we would much rather see a readable date in our results, not a unix timestamp, so we need to convert it. Unfortunately the `new Date()` constructor will not accept a `string`, so we cast the `string` to a `number` using the `Number()` function before actually calling `new Date()`. Phew!
+
+### [](#run-count)Run count
+
+And so we're finishing up with the `runCount`. There's no specific element like `<time>`, so we need to create a complex selector and then do a transformation on the result.
+
+    return {
+        title: $('header h1').text(),
+        description: $('header p[class^=Text__Paragraph]').text(),
+        lastRunDate: new Date(
+            Number(
+                $('time')
+                    .eq(1)
+                    .attr('datetime'),
+            ),
+        ),
+        runCount: Number(
+            $('ul.stats li:nth-of-type(3)')
+                .text()
+                .match(/\d+/)[0],
+        ),
+    };
+
+The `ul.stats > li:nth-of-type(3)` looks complicated, but it only reads that we're looking for a `<ul class="stats ...">` element and within that element we're looking for the third `<li>` element. We grab its text, but we're only interested in the number of runs. So we parse the number out using a regular expression, but its type is still a `string`, so we finally convert the result to a `number` by wrapping it with a `Number()` call.
+
+### [](#wrapping-it-up)Wrapping it up
+
+And there we have it! All the data we needed in a single object. For the sake of completeness, let's add the properties we parsed from the URL earlier and we're good to go.
+
+    const { url } = request;
+
+    // ...
+
+    const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+    return {
+        url,
+        uniqueIdentifier,
+        title: $('header h1').text(),
+        description: $('header p[class^=Text__Paragraph]').text(),
+        lastRunDate: new Date(
+            Number(
+                $('time')
+                    .eq(1)
+                    .attr('datetime'),
+            ),
+        ),
+        runCount: Number(
+            $('ul.stats li:nth-of-type(3)')
+                .text()
+                .match(/\d+/)[0],
+        ),
+    };
+
+All we need to do now is add this to our `pageFunction`:
+
+    async function pageFunction(context) {
+        const { request, log, skipLinks, $ } = context; // $ is Cheerio
+        if (request.userData.label === 'START') {
+            log.info('Store opened!');
+            // Do some stuff later.
+        }
+        if (request.userData.label === 'DETAIL') {
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            return {
+                url,
+                uniqueIdentifier,
+                title: $('header h1').text(),
+                description: $('header p[class^=Text__Paragraph]').text(),
+                lastRunDate: new Date(
+                    Number(
+                        $('time')
+                            .eq(1)
+                            .attr('datetime'),
+                    ),
+                ),
+                runCount: Number(
+                    $('ul.stats li:nth-of-type(3)')
+                        .text()
+                        .match(/\d+/)[0],
+                ),
+            };
+        }
+    }
+
+### [](#test-run-3)Test run 3
+
+As always, try hitting that **Save & Run** button and visit the Dataset preview of clean items. You should see a nice table of all the attributes correctly scraped. You nailed it!
+
+## [](#pagination)Pagination
+
+Pagination is just a term that represents "going to the next page of results". You may have noticed that we did not actually scrape all the actors, just the first page of results. That's because to load the rest of the actors, one needs to click the orange **Show more** button at the very bottom of the list. This is pagination.
+
+> This is a typical JavaScript pagination, sometimes called infinite scroll. Other pages may use links that take you to the next page. If you encounter those, just make a Pseudo URL for those links and they will be automatically enqueued to the request queue. Use a label to let the scraper know what kind of URL it's processing.
+
+If you paid close attention, you may now see a problem. How do we click a button in the page when we're working with Cheerio? We don't have a browser to do it and we only have the HTML of the page to work with. So the simple answer is that we can't click a button. Does that mean that we cannot get the data at all? Usually not, but it requires some clever DevTools-Fu.
+
+### [](#analyzing-the-page)Analyzing the page
+
+While with `apify/web-scraper` and `apify/puppeteer-scraper`, we could get away with simply clicking a button, with `apify/cheerio-scraper` we need to dig a little deeper into the page's architecture. For this, we will use the Network tab of the Chrome DevTools.
+
+> It's a very powerful tool with a lot of features, so if you're not familiar with it, please see this tutorial: [https://developers.google.com/web/tools/chrome-devtools/network/](https://developers.google.com/web/tools/chrome-devtools/network/) which explains everything much better than we ever could.
+
+We want to know what happens when we click the **Show more** button, so we open the DevTools Network tab and clear it. Then we click the Show more button and wait for incoming requests to appear in the list.
+
+![inspect-network](https://apifyusercontent.com/2b51728bb8363c8ac71d8bab191c938fa3a5ddc9/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f696e73706563742d6e6574776f726b2e6a7067 "Inspecting network in DevTools.")
+
+Now, this is interesting. It seems that we've only received two images after clicking the button and no additional data. This means that the data about actors must already be available in the page and the Show more button only displays it. This is good news.
+
+### [](#finding-the-actors)Finding the actors
+
+Now that we know the information we seek is already in the page, we just need to find it. The first actor in the store is `apify/web-scraper` so let's try using the search tool in the Elements tab to find some reference to it. The first few hits do not provide any interesting information, but in the end, we find our goldmine. There is a `<script>` tag, with the ID `__NEXT_DATA__` that seems to hold a lot of information about `apify/web-scraper`. In DevTools, you can right click an element and click **Store as global variable** to make this element available in the Console.
+
+![find-data](https://apifyusercontent.com/7b5b800b5544349cd486c3cca2c61240a043e38e/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f66696e642d646174612e6a7067 "Finding the hidden actor data.")
+
+A `temp1` variable is now added to your console. We're mostly interested in its contents and we can get that using the `temp1.textContent` property. You can see that it's a rather large JSON string. How do we know? The `type` attribute of the `<script>` element says `application/json`. But working with a string would be very cumbersome, so we need to parse it.
+
+    const data = JSON.parse(temp1.textContent);
+
+After entering the above command into the console, we can inspect the `data` variable and see that all the information we need is there, in the `data.props.pageProps.items` array. Great!
+
+![inspect-data](https://apifyusercontent.com/e121274d88789fc535f4389ad1e36e61155fec23/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f696e73706563742d646174612e6a7067 "Inspecting the hidden actor data.")
+
+> It's obvious that all the information we set to scrape is available in this one data object, so you might already be wondering, can I just make one request to the store to get this JSON and then parse it out and be done with it in a single request? Yes you can! And that's the power of clever page analysis.
+
+### [](#using-the-data-to-enqueue-all-actor-details)Using the data to enqueue all actor details
+
+We don't really need to go to all the actor details now, but for the sake of practice, let's imagine we only found actor names such as `cheerio-scraper` and their owners, such as `apify` in the data. We will use this information to construct URLs that will take us to the actor detail pages and enqueue those URLs into the request queue.
+
+    // We're not in DevTools anymore, so we use Cheerio to get the data.
+    const dataJson = $('#__NEXT_DATA__').text();
+    const data = JSON.parse(dataJson);
+
+    for (const item of data.props.pageProps.items) {
+        const { name, username } = item;
+        const actorDetailUrl = `https://apify.com/${username}/${name}`;
+        await context.enqueueRequest({
+            url: actorDetailUrl,
+            userData: {
+                label: 'DETAIL', // Don't forget the label.
+            }
+        });
+    }
+
+We iterate through the items we found, build actor detail URLs from the available properties and then enqueue those URLs into the request queue. We need to specify the label too, otherwise our page function wouldn't know how to route those requests.
+
+> If you're wondering how we know the structure of the URL, see the [Getting started with Apify Scrapers](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/build/intro-scraper-tutorial) tutorial again.
+
+### [](#plugging-it-into-the--code-pagefunction--code-)Plugging it into the `pageFunction`
+
+We've got the general algorithm ready, so all that's left is to integrate it into our earlier `pageFunction`. Remember the `// Do some stuff later` comment? Let's replace it.
+
+    async function pageFunction(context) {
+        const { request, log, skipLinks, $ } = context;
+        if (request.userData.label === 'START') {
+            log.info('Store opened!');
+
+            const dataJson = $('#__NEXT_DATA__').text();
+            const data = JSON.parse(dataJson);
+
+            for (const item of data.props.pageProps.items) {
+                const { name, username } = item;
+                const actorDetailUrl = `https://apify.com/${username}/${name}`;
+                await context.enqueueRequest({
+                    url: actorDetailUrl,
+                    userData: {
+                        label: 'DETAIL',
+                    }
+                });
+            }
+        }
+        if (request.userData.label === 'DETAIL') {
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            return {
+                url,
+                uniqueIdentifier,
+                title: $('header h1').text(),
+                description: $('header p[class^=Text__Paragraph]').text(),
+                lastRunDate: new Date(
+                    Number(
+                        $('time')
+                            .eq(1)
+                            .attr('datetime'),
+                    ),
+                ),
+                runCount: Number(
+                    $('ul.stats li:nth-of-type(3)')
+                        .text()
+                        .match(/\d+/)[0],
+                ),
+            };
+        }
+    }
+
+That's it! You can now remove the **Max pages per run** limit, **Save & Run** your task and watch the scraper scrape all of the actors' data. After it succeeds, open the Dataset again and see the clean items. You should have a table of all the actor's details in front of you. If you do, great job! You've successfully scraped the Apify Store. And if not, no worries, just go through the code examples again, it's probably just some typo.
+
+> There's an important caveat. The way we implemented pagination here is in no way a generic system that you can easily use with other websites. Cheerio is fast (and that means it's cheap), but it's not easy. Sometimes there's just no way to get all results with Cheerio only and other times it takes hours of research. Keep this in mind when choosing the right scraper for your job. But don't get discouraged. Often times, the only thing you will ever need is to define a correct Pseudo URL. So do your research first before giving up on Cheerio Scraper.
+
+## [](#downloading-the-scraped-data)Downloading the scraped data
+
+You already know the DATASET tab of the run console since this is where we've always previewed our data. Notice that at the bottom, there is a table with multiple data formats, such as JSON, CSV or an Excel sheet, and to the right, there are options to download the scraping results in any of those formats. Go ahead and try it.
+
+> If you prefer working with an API, you can find an example in the API tab of the run console: **Get dataset items**.
+
+### [](#items-and-clean-items)Items and Clean items
+
+There are two types of data available for download. Items and Clean items. The Items will always include a record for each `pageFunction` invocation, even if you did not return any results. The record also includes hidden fields such as `#debug`, where you can find various information that can help you with debugging your scrapers.
+
+Clean items, on the other hand, include only the data you returned from the `pageFunction`. If you're only interested in the data you scraped, this format is what you will be using most of the time.
+
+## [](#bonus--making-your-code-neater)Bonus: Making your code neater
+
+You may have noticed that the `pageFunction` gets quite bulky. To make better sense of your code and have an easier time maintaining or extending your task, feel free to define other functions inside the `pageFunction` that encapsulate all the different logic. You can, for example, define a function for each of the different pages:
+
+    async function pageFunction(context) {
+        switch (context.request.userData.label) {
+            case 'START': return handleStart(context);
+            case 'DETAIL': return handleDetail(context);
+        }
+
+        async function handleStart({ log, waitFor, $ }) {
+            log.info('Store opened!');
+
+            const dataJson = $('#__NEXT_DATA__').text();
+            const data = JSON.parse(dataJson);
+
+            for (const item of data.props.pageProps.items) {
+                const { name, username } = item;
+                const actorDetailUrl = `https://apify.com/${username}/${name}`;
+                await context.enqueueRequest({
+                    url: actorDetailUrl,
+                    userData: {
+                        label: 'DETAIL',
+                    }
+                });
+            }
+        }
+
+        async function handleDetail({ request, log, skipLinks, $ }) {
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            return {
+                url,
+                uniqueIdentifier,
+                title: $('header h1').text(),
+                description: $('header p[class^=Text__Paragraph]').text(),
+                lastRunDate: new Date(
+                    Number(
+                        $('time')
+                            .eq(1)
+                            .attr('datetime'),
+                    ),
+                ),
+                runCount: Number(
+                    $('ul.stats li:nth-of-type(3)')
+                        .text()
+                        .match(/\d+/)[0],
+                ),
+            };
+        }
+    }
+
+> If you're confused by the functions being declared below their executions, it's called hoisting and it's a feature of JavaScript. It helps you put what matters on top, if you so desire.
+
+## [](#final-word)Final word
+
+Thank you for reading this whole tutorial! Really! It's important to us that our users have the best information available to them so that they can use Apify easily and effectively. We're glad that you made it all the way here and congratulations on creating your first scraping task. We hope that you liked the tutorial and if there's anything you'd like to ask, [do it on Stack Overflow](https://stackoverflow.com/questions/tagged/apify)!
+
+Finally, `apify/cheerio-scraper` is just an actor and writing your own actors is a breeze with the [Apify SDK](https://sdk.apify.com). It's a bit more complex and involved than writing a simple `pageFunction`, but it allows you to fine-tune all the details of your scraper to your liking. Perhaps some other time, when you're in the mood for yet another tutorial, visit the [Getting Started](https://sdk.apify.com/docs/guides/gettingstarted). We think you'd like it!
diff --git a/docs/scraping/index.md b/docs/scraping/index.md
index ef8ab5eb27..6f4c5f45cc 100644
--- a/docs/scraping/index.md
+++ b/docs/scraping/index.md
@@ -4,8 +4,40 @@ title: Scraping
 
 # [](./scraping)Scraping with Apify
 
-Scraping and crawling the web can be difficult and time consuming without the right tools. That's why Apify provides ready-made solutions to crawl and scrape any website. They are based on our [Actor](/actors) product and the [Apify SDK](https://sdk.apify.com).
+Scraping and crawling the web can be difficult and time consuming without the right tools. That's why Apify provides ready-made solutions to crawl and scrape any website. They are based on our [Actor](https://apify.com/actors) product and the [Apify SDK](https://sdk.apify.com).
 
-Don't let the number of options confuse you. Unless you're really sure that you need to use a specific tool, just go ahead and use the [Web Scraper](#web-scraper). It is the easiest to pick up and can handle almost anything. Look at [Puppeteer Scraper](#puppeteer-scraper) or [Cheerio Scraper](#cheerio-scraper) only after you know your target websites well and need to optimize your scraper.
+Don't let the number of options confuse you. Unless you're really sure that you need to use a specific tool, just go ahead and use the [Web Scraper]({{@link scraping/web_scraper.md}}). It is the easiest to pick up and can handle almost anything. Look at [Puppeteer Scraper]({{@link scraping/puppeteer_scraper.md}}) or [Cheerio Scraper]({{@link scraping/cheerio_scraper.md}}) only after you know your target websites well and need to optimize your scraper.
 
-[Visit the Scraper introduction tutorial to get started!](./scraping/tutorial/introduction)
+[Visit the Scraper introduction tutorial to get started!]({{@link scraping/introduction.md}})
+
+## [](#web-scraper)Web Scraper
+
+Web Scraper is a ready-made solution for scraping the web using the Chrome browser. It takes away all the work necessary to set up a browser for crawling, controls the browser automatically and produces machine readable results in several common formats.
+
+Underneath, it uses the Puppeteer library to control the browser, but you don't need to worry about that. Using a simple web UI and a little of basic JavaScript, you can tweak it to serve almost any scraping need.
+
+[Visit the Web Scraper tutorial to get started!]({{@link scraping/web_scraper.md}})
+
+## [](#cheerio-scraper)Cheerio Scraper
+
+Cheerio Scraper is a ready-made solution for crawling the web using plain HTTP requests to retrieve HTML pages and then parsing and inspecting the HTML using the Cheerio library. It's blazing fast.
+
+Cheerio is a server-side version of the popular jQuery library that does not run in the browser, but instead constructs a DOM out of a HTML string and then provides the user with API to work with that DOM.
+
+Cheerio Scraper is ideal for scraping websites that do not rely on client-side JavaScript to serve their content. It can be as much as 20 times faster than using a full browser solution such as Puppeteer.
+
+[Visit the Cheerio Scraper tutorial to get started!]({{@link scraping/cheerio_scraper.md}})
+
+## [](#puppeteer-scraper)Puppeteer Scraper
+
+Puppeteer Scraper is the most powerful scraper tool in our arsenal (aside from developing your own actors). It uses the Puppeteer library to programmatically control a headless Chrome browser and it can make it do almost anything. If using the Web Scraper does not cut it, Puppeteer Scraper is what you need.
+
+Puppeteer is a Node.js library, so knowledge of Node.js and its paradigms is expected when working with the Puppeteer Scraper.
+
+[Visit the Puppeteer Scraper tutorial to get started!]({{@link scraping/puppeteer_scraper.md}})
+
+## [](#phantomjs-crawler)Legacy PhantomJS Crawler
+
+Legacy PhantomJS Crawler is the actor compatible with an original Apify Crawler that you may have known. It supports the same input and produces the same output. But it uses legacy technology and if you're starting a new project, we recommend using our other solutions that run on the Apify Actor platform and use Chrome as the browser instead, such as [Web Scraper]({{@link scraping/web_scraper.md}}) above.
+
+[Visit Legacy PhantomJS Crawler in store.](https://apify.com/apify/legacy-phantomjs-crawler)
diff --git a/docs/scraping/introduction.md b/docs/scraping/introduction.md
new file mode 100644
index 0000000000..c9d9ccfe1d
--- /dev/null
+++ b/docs/scraping/introduction.md
@@ -0,0 +1,294 @@
+---
+title: Introduction
+---
+
+# [](#getting-started-with-apify-scrapers)Getting started with Apify scrapers
+
+Welcome to the getting started tutorial to walk you through creating your first scraping task step by step. You will learn how to set up all the different configuration options, code a `pageFunction` and finally download the scraped data as an Excel sheet, or in another format, such as JSON or CSV. But first, let's give you a brief introduction to Apify.
+
+## [](#what-is-an-apify-scraper)What is an Apify scraper
+
+It doesn't matter whether you arrived here from `apify/web-scraper`, `apify/puppeteer-scraper` or `apify/cheerio-scraper`. All of them are **actors** and for now, let's just think of **actor** as an application that you can use with your own configuration. `apify/web-scraper` is therefore an application called `web-scraper`, built by `apify`, that you can configure to scrape any webpage. We call these configurations **tasks**.
+
+> If you need help choosing the right scraper, see this [great knowledge base article](https://kb.apify.com/tutorials-getting-started/choosing-the-right-scraper). And if you just want to learn more about actors in general, you can read our [actors page](https://apify.com/actors) or [browse the documentation]({{@link actor/index.md}}).
+
+You can create 10 different **tasks** for 10 different websites, with very different options, but there will always be just one **actor**, the `apify/*-scraper` you chose. This is the essence of tasks. They are nothing but **saved configurations** of the actor that you can run easily and repeatedly.
+
+## [](#trying-it-out)Trying it out
+
+Depending on how you arrived at this tutorial, you may already have your first task created for the scraper of your choice. If not, the easiest way is to go to [Apify tasks](https://my.apify.com/tasks) and click the **Create a new task** button. This will present you with a list of actors to choose from. Once you select one of the actors, it will take you to its task configuration page.
+
+> This tutorial covers the use of **Web**, **Cheerio** and **Puppeteer** scrapers, but a lot of the information here can be used with all actors.
+
+![actor-selection](https://apifyusercontent.com/8dbaeafb7e45277d68a2011447cc28d21e5be3cc/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6163746f722d73656c656374696f6e2e6a7067 "Selecting the best actor")
+
+### [](#running-a-task)Running a task
+
+You are now in the INPUT tab of the task configuration. Before we delve into the details, let's just see how the example works. There are already some values pre-configured in the INPUT. It says that the task should visit `https://apify.com` and all its subpages, such as `https://apify.com/contact` and scrape some data using the provided `pageFunction`, specifically the `<title>` of the page and its URL.
+
+Scroll down a bit and set the `Max pages per run` option to `10`. This tells your task to finish after 10 pages have been visited. We don't need to crawl the whole domain just to see that it works.
+
+> It also helps with keeping your compute unit (CU) consumption low. Just to get an idea, the free plan includes 10 CUs and this run will consume about 0.04 CU, so you can run it 250 times a month for free. If you accidentally go over the limit, no worries, we won't charge you for it. You just won't be able to run more tasks that month.
+
+Now click **Save & Run**! _(either at the very bottom or in the top-right corner of your screen)_
+
+### [](#the-run-detail)The run detail
+
+After clicking **Save & Run**, the window will change to the run detail. Here, you will see the Log of the run. If it seems that nothing is happening, don't worry, it takes a few seconds for the run to fully boot up. In under a minute, you should have the 10 pages scraped. You will know that the run successfully completed when the `RUNNING` card in top-left corner changes to `SUCCEEDED`.
+
+> Feel free to browse through the various new tabs: LOG, INFO, INPUT and other, but for the sake of brevity, we will not explain all their features in this tutorial.
+
+Now that the run has `SUCCEEDED`, click on the rightmost card labeled **Clean items** to see the results of the scrape. This takes you to the DATASET tab, where you can display or download the results in various formats. For now, just click the blue **Preview data** button. Voila, the scraped data.
+
+![run detail](https://apifyusercontent.com/44d8fb566bd35bec26dc9725cfa168795338acff/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7468652d72756e2d64657461696c2e6a7067 "Viewing results in the run detail.")
+
+Good job! We've run our first task and got some results. Let's learn how to change the default configuration to scrape something more interesting than just the page's `<title>`.
+
+## [](#creating-your-own-task)Creating your own task
+
+Before we jump into the scraping itself, let's just have a quick look at the user interface that's available to us.
+
+### [](#input)INPUT
+
+The INPUT tab is where we started and it's the place where you create your scraping configuration. The creator of the actor prepares the INPUT form so that you can easily tell the actor what to do. Feel free to check the tooltips of the various options to get a better idea of what they do. To display the tooltip, just click the name of any of the input fields.
+
+> We will not go through all the available INPUT options in this tutorial. See the actor's README under the ACTOR INFO tab for detailed information.
+
+### [](#settings)SETTINGS
+
+In the settings tab, you can set various options that are common to all tasks and not directly related to the scraping itself. Unless you've already changed the task's name, it's `my-task`, so why not try changing it to `my-first-scraper` and clicking save. Below are the Build, Timeout and Memory options. Let's keep them at default settings for now. Just remember that if you see a big red `TIMED-OUT` after running your task, you might want to come back here and increase the timeout.
+
+> Timeouts are there to prevent tasks from running forever. Always set a reasonable timeout to prevent a rogue task from eating up all your compute units.
+
+### [](#actor-info)ACTOR INFO
+
+Since tasks are just configurations for actors, this tab shows you all the information about the underlying actor, the Apify scraper of your choice. You can see the available versions and their READMEs and it's always a good idea to read an actor's README first before creating a task for it.
+
+### [](#webhooks)WEBHOOKS
+
+Webhooks are a feature that help keep you aware of what's happening with your tasks. You can set them up to inform you when a task starts, finishes, fails and so on, or you can even use them to run more tasks, depending on the outcome of the original one. You can find the [documentation on webhooks here]({{@link webhooks/index.md}}).
+
+### [](#runs)RUNS
+
+You can find all the task runs and their detail pages here. Every time you start a task, it will appear here in the list. All runs of your task including their results will be stored here for the data retention period, [which you can find under your plan](https://apify.com/pricing).
+
+### [](#api)API
+
+The API tab gives you a quick overview of all the available API calls, if you would like to use your task programmatically. It also includes links to detailed API documentation. You can even try it out immediately using the **TEST** button.
+
+## [](#scraping-theory)Scraping theory
+
+Since this is a tutorial, we'll be scraping our own website. A great candidate for some scraping practice is the [Apify Store](https://apify.com/store). It's a page that uses modern web technologies and displays a lot of different items in various categories, just like an online store, a typical scraping target, would.
+
+### [](#the-goal)The goal
+
+We want to create a scraper that scrapes all the actors (i.e. not crawlers, our legacy product) in the store and collects the following attributes for each actor:
+
+1.  **URL** - The URL that goes directly to the actor's detail page.
+2.  **Unique identifier** - Such as `apify/web-scraper`.
+3.  **Title** - The title visible in the actor's detail page.
+4.  **Description** - The actor's description.
+5.  **Last run date**- When the actor was last run.
+6.  **Number of runs** - How many times the actor was run.
+
+Some of this information may be scraped directly from the listing pages, but for the rest, we will need to visit all detail pages of all the actors.
+
+### [](#the-start-url)The Start URL
+
+Let's start with something simple. In the INPUT tab of the task we have, we'll change the Start URL from `https://apify.com`. This will tell the scraper to start by opening a different URL. You can add more Start URLs or even use a file with a list of thousands of them, but in this case, we'll be good with just one.
+
+How do we choose the new Start URL? The goal is to scrape all actors in the store and the store is available at [https://apify.com/store](https://apify.com/store) so we choose this URL as our Start URL.
+
+    https://apify.com/store
+
+We also need to somehow distinguish the Start URL from all the other URLs that the scraper will add later. To do this, click the green **Details** icon in the Start URL form and see the **User data** input. Here you can add any information you'll need during the scrape in a JSON format. For now, just add a label to the Start URL.
+
+    {
+      "label": "START"
+    }
+
+![start url input](https://apifyusercontent.com/90c552bd3267b500260d27eba7c1b5792b10c370/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7468652d73746172742d75726c2e6a7067 "Adding new Start URL.")
+
+### [](#crawling-the-website-with-pseudo-urls)Crawling the website with Pseudo URLs
+
+What is a Pseudo URL? Let me explain. Before we can start scraping the actor details, we need to find all the links to the details. If the links follow a set structure, we can use a certain pattern to describe this structure. And that's what a Pseudo URL is. A pattern that describes a URL structure. By setting a Pseudo URL, all links that follow the given structure will automatically be added to the crawling queue.
+
+Let's see an example. To find the pattern, open some of the actor details in the store. You'll find that the URLs are always structured the same:
+
+    https://apify.com/{OWNER}/{NAME}
+
+Where only the `OWNER` and `NAME` changes. We can leverage this in a Pseudo URL.
+
+#### [](#making-a-pseudo-url)Making a Pseudo URL
+
+If you'd like to learn more about Pseudo URLs, [visit a quick tutorial in our docs](https://sdk.apify.com/docs/guides/gettingstarted#introduction-to-pseudo-urls), but for now, let's keep it simple. Pseudo URLs are really just URLs with some variable parts in them. Those variable parts are represented by [regular expressions](https://regexone.com/) enclosed in brackets `[]`.
+
+So, working with our actor details example, we could produce a Pseudo URL like this:
+
+    https://apify.com/[.+]/[.+]
+
+This Pseudo URL will match all actor detail pages, such as:
+
+    https://apify.com/apify/web-scraper
+
+But it will not match pages we're not interested in, such as:
+
+    https://apify.com/contact
+
+Let's use the above Pseudo URL in our task. We should also add a label as we did with our Start URL. This label will be added to all pages that were enqueued into the request queue using the given Pseudo URL.
+
+    {
+      "label": "DETAIL"
+    }
+
+![pseudo url input](https://apifyusercontent.com/8d4802058a4f68753345a4097b292a82d38dab83/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6d616b696e672d612d70736575646f2d75726c2e6a7067 "Adding new Pseudo URL.")
+
+### [](#filtering-with-a-link-selector)Filtering with a link selector
+
+Pseudo URLs are just one part of your URL matching arsenal. The other one is the **Link selector** which you can find right under the Pseudo URLs input field. It's a CSS selector and its purpose is to select the HTML elements where the scraper should look for URLs. And by looking for URLs we mean finding the elements' 'href' attributes. For example, to enqueue URLs from `<div class="my-class" href=...>` tags, you would enter `'div.my-class'`.
+
+What's the connection to Pseudo URLs? Well, first, all the URLs found in the elements that match the link selector are collected. Then, Pseudo URLs are used to filter those URLs an enqueue only the ones that match the Pseudo URL structure. Simple.
+
+To scrape all the actors in store, we should use the Link selector to further filter the links that our Pseudo URL matches. For example, we're not interested in the following URL:
+
+    https://apify.com/docs/actor
+
+Even though it matches our Pseudo URL, it's not a link to an actor, but a link to documenation. To prevent links like those from being visited, we should specify a Link selector that filters them out. For now, let us just tell you that the Link selector you're looking for is:
+
+    div.item > a
+
+Save it as your Link selector. If you're wondering how we figured this out, just follow along with the tutorial. By the time we finish, you'll know why we used this selector too.
+
+### [](#test-run)Test run
+
+We've added some configuration, so it's time to test it. Just run the task, keeping the **Max pages per run** set to `10` and **Page function** the same. You should see in the log that the scraper first visits the Start URL and then several of the actor details, matching the Pseudo URL.
+
+### [](#the--code-pagefunction--code-)The `pageFunction`
+
+The Page function is a JavaScript function that gets executed for each page the scraper visits. To figure out how to create the `pageFunction`, you must first inspect the page's structure to get an idea of its inner workings. The best tools for that are Developer Tools in browsers, DevTools.
+
+#### [](#using-devtools)Using DevTools
+
+Open the [store page](https://apify.com/store) in the Chrome browser (or use any other browser, just note that the DevTools may differ slightly) and open the DevTools, either by right-clicking on the page and selecting `Inspect` or by pressing `F12`.
+
+The DevTools window will pop up, and display a lot of, perhaps unfamiliar, information. Don't worry about that too much and just open the Elements tab (the one with the page's HTML). The Elements tab allows you to browse the structure of the page and search within it using the search tool. You can open the search tool by pressing `CTRL+F` or `CMD+F`. Try typing `<title>` into the search bar.
+
+You'll see that the Element tab jumps to the first `<title>` element of the current page and that the title is `Store`. It's always good practice to do your research using the DevTools before writing the `pageFunction` and running your task.
+
+![devtools](https://apifyusercontent.com/b6ba4c89bd65705450f1fee33c37198186b175fc/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7573696e672d646576746f6f6c732e6a7067 "Finding title element in DevTools.")
+
+> For the sake of brevity, we won't go into the details of using the DevTools in this tutorial. If you're just starting out with DevTools, this [Google tutorial](https://developers.google.com/web/tools/chrome-devtools/) is a good place to begin.
+
+#### [](#understanding--code-context--code-)Understanding `context`
+
+The `pageFunction` has access to global variables, such as `window` or `document`, which are provided by the browser, but also to `context`, which is the single argument of the `pageFunction`. `context` carries a lot of useful information and helpful functions. A full reference can be found in the actor's README in the ACTOR INFO tab.
+
+#### [](#new--code-pagefunction--code--boilerplate)New `pageFunction` boilerplate
+
+We know that we'll visit two kinds of pages, the list page (Start URL) and the detail pages (enqueued using the Pseudo URL). We want to enqueue links on the list page and scrape data on the detail page.
+
+    async function pageFunction(context) {
+        const { request, log, skipLinks } = context;
+        if (request.userData.label === 'START') {
+            log.info('Store opened!');
+            // Do some stuff later.
+        }
+        if (request.userData.label === 'DETAIL') {
+            log.info(`Scraping ${request.url}`);
+            await skipLinks();
+            // Do some scraping.
+            return {
+                // Scraped data.
+            }
+        }
+    }
+
+This may seem like a lot of new things, but it's all connected to our earlier configuration.
+
+#### [](#-code-context-request--code-)`context.request`
+
+The `request` is an instance of the [`Request`](https://sdk.apify.com/docs/api/request) class and holds information about the currently processed page, such as its `url`. Each `request` also has the `request.userData` property of type `Object`. While configuring the Start URL and the Pseudo URL, we set a `label` to it. We're now using it in the `pageFunction` to distinguish between the store page and the detail pages.
+
+#### [](#-code-context-skiplinks----code-)`context.skipLinks()`
+
+When a Pseudo URL is set, the scraper attempts to enqueue matching links on all pages it visits. `skipLinks()` is used to tell the scraper that we don't want this to happen on the current page.
+
+#### [](#-code-context-log--code-)`context.log`
+
+`log` is used for printing messages to the console. You may be tempted to use `console.log()`, but this will not work, unless you turn on the **Browser log** option. `log.info()` should be used for general messages, but you can also use `log.debug()` for messages that will only be shown when you turn on the **Debug log** option. [See the docs for more info](https://sdk.apify.com/docs/api/log).
+
+#### [](#return-value-of-the--code-pagefunction--code-)Return value of the `pageFunction`
+
+The `pageFunction` may only return nothing, `null`, `Object` or `Object[]`. If an `Object` is returned, it will be saved as a single result. Returning an `Array` of `Objects` will save each item in the array as a result.
+
+The scraping results are saved in Dataset (one of the tabs in the run console, as you may remember). It behaves like a table. Each item is a row in the table and its properties are its columns. Returning the following `Object`:
+
+    {
+        url: 'https://apify.com',
+        title: 'Web Scraping, Data Extraction and Automation - Apify'
+    }
+
+Will produce the following table:
+
+|title|url|
+|--- |--- |
+|Web Scraping, Data Extraction and Automation - Apify|[https://apify.com](https://apify.com)|
+
+### [](#scraper-lifecycle)Scraper Lifecycle
+
+Now that we're familiar with all the pieces in the puzzle, we'll quickly take a look at the scraper lifecycle, or in other words, what does the scraper actually do when it scrapes. It's quite straightforward.
+
+The scraper
+
+1.  visits the first **Start URL** and waits for the page to load.
+2.  executes the `pageFunction`.
+3.  finds all the elements matching the **Link selector** and extracts their `href` attributes (URLs).
+4.  uses the **Pseudo URLs** to filter the extracted URLs and throws away those that don't match.
+5.  enqueues the matching URLs to the end of the crawling queue.
+6.  closes the page and selects a new URL to visit, either from the Start URLs if there are any left, or from the beginning of the crawling queue.
+
+> When you're not using the request queue, the scraper just repeats the steps 1 and 2\. You would not use the request queue when you already know all the URLs you want to visit. For example, when you have a pre-existing list of a thousand URLs that you uploaded as a text file. Or when scraping just a single URL.
+
+## [](#scraping-basics)Scraping basics
+
+We've covered all the concepts that we need to understand to successfully scrape the data in our goal, so let's get to it and start with something really simple. We will only output data that are already available to us in the page's URL. Remember from [our goal](#the-goal) that we also want to include the **URL** and a **Unique identifier** in our results. To get those, we just need the `request.url` because it is the URL and includes the Unique identifier.
+
+    const { url } = request; 
+    const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+### [](#test-run-2)Test run 2
+
+We'll add our first data to the `pageFunction` and carry out a test run to see that everything works as expected.
+
+    async function pageFunction(context) {
+        const { request, log, skipLinks } = context;
+        if (request.userData.label === 'START') {
+            log.info('Store opened!');
+            // Do some stuff later.
+        }
+        if (request.userData.label === 'DETAIL') {
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            return {
+                url,
+                uniqueIdentifier,
+            }
+        }
+    }
+
+Now **Save & Run** the task and once it finishes, check the results by going to the Dataset, either by clicking the **Clean items** card, or by going to the **DATASET** tab. Click **Preview data** again (and check Clean data, if unchecked). You should see the URLs and Unique identifiers scraped. Great job!
+
+## [](#choosing-sides)Choosing sides
+
+Up until now, everything has been the same for all the Apify scrapers. Whether you're using `apify/web-scraper`, `apify/puppeteer-scraper` or `apify/cheerio-scraper`, what you've learned now will always be the same. This is great if you ever need to switch scrapers, because there's no need to learn everything from scratch.
+
+There are differences in the code we use in the `pageFunction` though. Often subtle, sometimes large. In the next part of the tutorial, we'll focus on specific implementation details of the individual scrapers. So it's time to choose sides. But don't worry, at Apify, no side is the dark side.
+
+*   [Continue to Web Scraper tutorial]({{@link scraping/web_scraper.md}})
+*   [Continue to Cheerio Scraper tutorial]({{@link scraping/cheerio_scraper.md}})
+*   [Continue to Puppeteer Scraper tutorial]({{@link scraping/puppeteer_scraper.md}})
diff --git a/docs/scraping/legacy_phantomjs_crawler.md b/docs/scraping/legacy_phantomjs_crawler.md
index 737c639f00..7c55e21d57 100644
--- a/docs/scraping/legacy_phantomjs_crawler.md
+++ b/docs/scraping/legacy_phantomjs_crawler.md
@@ -2,8 +2,6 @@
 title: Legacy PhantomJS Crawler
 ---
 
-## [](#phantomjs-crawler)Legacy PhantomJS Crawler
+# [](#legacy-phantomjs-crawler)Leagcy Phantomjs crawler
 
-Legacy PhantomJS Crawler is the actor compatible with an original Apify Crawler that you may have known. It supports the same input and produces the same output. But it uses legacy technology and if you're starting a new project, we recommend using our other solutions that run on the Apify Actor platform and use Chrome as the browser instead, such as [Web Scraper](#web-scraper) above.
-
-[Visit Legacy PhantomJS Crawler in store.](/apify/legacy-phantomjs-crawler)
+[Visit Legacy PhantomJS Crawler in store.](https://apify.com/apify/legacy-phantomjs-crawler)
diff --git a/docs/scraping/puppeteer_scraper.md b/docs/scraping/puppeteer_scraper.md
index d0417b2ab9..6aadfccd3e 100644
--- a/docs/scraping/puppeteer_scraper.md
+++ b/docs/scraping/puppeteer_scraper.md
@@ -2,10 +2,513 @@
 title: Puppeteer Scraper
 ---
 
-## [](#puppeteer-scraper)Puppeteer Scraper
+# [](#scraping-with-puppeteer-scraper)Scraping with Puppeteer Scraper
 
-Puppeteer Scraper is the most powerful scraper tool in our arsenal (aside from developing your own actors). It uses the Puppeteer library to programmatically control a headless Chrome browser and it can make it do almost anything. If using the Web Scraper does not cut it, Puppeteer Scraper is what you need.
+This scraping tutorial will go into the nitty gritty details of extracting data from `https://apify.com/store` using the `apify/puppeteer-scraper`. If you arrived here from the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction), tutorial, great! You are ready to continue where we left off. If you haven't seen the Getting started yet, check it out, it will help you learn about Apify and scraping in general and set you up for this tutorial, because this one builds on topics and code examples discussed there.
 
-Puppeteer is a Node.js library, so knowledge of Node.js and its paradigms is expected when working with the Puppeteer Scraper.
+## [](#getting-to-know-our-tools)Getting to know our tools
 
-[Visit the Puppeteer Scraper tutorial to get started!](./scraping/tutorial/puppeteer-scraper)
+In the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction) tutorial, we've confirmed that the scraper works as expected, so now it's time to add more data to the results.
+
+To do that, we'll be using the [`Puppeteer` library](https://github.com/GoogleChrome/puppeteer). Puppeteer is a browser automation library that allows you to control a browser using JavaScript. That is, simulate a real human sitting in front of a computer, using a mouse and a keyboard. It gives you almost unlimited possibilites, but you need to learn quite a lot before you'll be able to use all of its features. We'll walk you through some of the basics of Puppeteer, so that you can start using it for some of the most typical scraping tasks, but if you really want to master it, you'll need to visit its [documentation](https://pptr.dev/) and really dive deep into its intricacies.
+
+> The purpose of Puppeteer Scraper is to remove some of the difficulty faced when using Puppeteer by wrapping it in a nice, manageable UI. It provides almost all of its features in a format that is much easier to grasp when first trying to scrape using Puppeteer.
+
+### [](#web-scraper-differences)Web Scraper differences
+
+At first glance, it may seem like Web Scraper and Puppeteer Scraper are almost the same. Well, they are. In fact, Web Scraper uses Puppeteer underneath. The difference is the amount of control they give you. Where Web Scraper only gives you access to in-browser JavaScript and the `pageFunction` is executed in the browser context, Puppeteer Scraper's `pageFunction` is executed in Node.js context, giving you much more freedom to bend the browser to your will. You're the puppeteer and the browser is your puppet. It's also much easier to work with external APIs, databases or the [Apify SDK](https://sdk.apify.com) in the Node.js context. The tradeoff is simple. Power vs simplicity. Web Scraper is simple, Puppeteer Scraper is powerful (and the [Apify SDK](https://sdk.apify.com) is super-powerful).
+
+> Simply put, Web Scraper `pageFunction` is just a single [page.evaluate()](https://pptr.dev/#?product=Puppeteer&show=api-pageevaluatepagefunction-args) call.
+
+Now that's out of the way, let's open one of the actor detail pages in the Store, for example the [`apify/web-scraper`](https://apify.com/apify/web-scraper) page and use our DevTools-Fu to scrape some data.
+
+> If you're wondering why we're using `apify/web-scraper` as an example instead of `puppeteer-scraper`, it's only because we didn't want to triple the number of screenshots we needed to make. Lazy developers!
+
+## [](#quick-recap)Quick recap
+
+Before we start, let's do a quick recap of the data we chose to scrape:
+
+1.  **URL** - The URL that goes directly to the actor's detail page.
+2.  **Unique identifier** - Such as `apify/web-scraper`.
+3.  **Title** - The title visible in the actor's detail page.
+4.  **Description** - The actor's description.
+5.  **Last run date**- When the actor was last run.
+6.  **Number of runs** - How many times the actor was run.
+
+![data to scrape](https://apifyusercontent.com/7274765d35b9a7c781e5bcc705a3dbdcf3c308ec/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7363726170696e672d70726163746963652e6a7067 "Overview of data to be scraped.")
+
+We've already scraped number 1 and 2 in the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction) tutorial, so let's get to the next one on the list: Title
+
+### [](#title)Title
+
+![actor title](https://apifyusercontent.com/5274e02a1c45ed96a7d8c0147ac6e3d99f883ed0/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7469746c652e6a7067 "Finding actor title in DevTools.")
+
+By using the element selector tool, we find out that the title is there under an `<h1>` tag, as titles should be. Maybe surprisingly, we find that there are actually two `<h1>` tags on the detail page. This should get us thinking. Is there any parent element that includes our `<h1>` tag, but not the other ones? Yes, there is! There is a `<header>` element that we can use to select only the heading we're interested in.
+
+> Remember that you can press CTRL+F (CMD+F) in the Elements tab of DevTools to open the search bar where you can quickly search for elements using their selectors. And always make sure to use the DevTools to verify your scraping process and assumptions. It's faster than changing the crawler code all the time.
+
+To get the title we just need to find it using a `header h1` selector, which selects all `<h1>` elements that have a `<header>` ancestor. And as we already know, there's only one.
+
+    // Using Puppeteer
+    const title = await page.$eval('header h1', (el => el.textContent));
+
+    return {
+        title,
+    }
+
+The [`page.$eval`](https://pptr.dev/#?product=Puppeteer&show=api-elementhandleevalselector-pagefunction-args-1) function allows you to run a function in the browser, with the selected element as the first argument. Here we use it to extract the text content of a `h1` element that's in the page. The return value of the function is automatically passed back to the Node.js context, so we receive an actual `string` with the element's text.
+
+### [](#description)Description
+
+Getting the actor's description is a little more involved, but still pretty straightforward. We can't just simply search for a `<p>` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the actor description is nested within the `<header>` element too, same as the title. Sadly, we're still left with two `<p>` tags. To finally select only the description, we choose the `<p>` tag that has a `class` that starts with `Text__Paragraph`.
+
+![actor description selector](https://apifyusercontent.com/28dee1e51c6ac3e8ec67f0eb953b4a71c775f217/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6465736372697074696f6e2e6a7067 "Finding actor description in DevTools.")
+
+    const title = await page.$eval('header h1', (el => el.textContent));
+    const description = await page.$eval('header p[class^=Text__Paragraph]', (el => el.textContent));
+
+    return {
+        title,
+        description
+    };
+
+### [](#last-run-date)Last run date
+
+The DevTools tell us that the `lastRunDate` can be found in the second of the two `<time>` elements in the page.
+
+![actor last run date selector](https://apifyusercontent.com/6fe3f03692a7dc3acc35be74b3b8baacb98d7ac3/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6c6173742d72756e2d646174652e6a7067 "Finding actor last run date in DevTools.")
+
+    const title = await page.$eval('header h1', (el => el.textContent));
+    const description = await page.$eval('header p[class^=Text__Paragraph]', (el => el.textContent));
+
+    const lastRunTimestamp = await page.$eval('time', (els) => els[1].getAttribute('datetime'));
+    const lastRunDate = new Date(Number(lastRunTimestamp));
+
+    return {
+        title,
+        description,
+        lastRunDate,
+    };
+
+Similarly to `page.$eval`, the [`page.$eval`](https://pptr.dev/#?product=Puppeteer&show=api-elementhandleevalselector-pagefunction-args) function runs a function in the browser, only this time, it does not provide you with a single `Element` as the function's argument, but rather with an `Array` of `Elements`. Once again, the return value of the function will be passed back to the Node.js context.
+
+It might look a little too complex at first glance, but let me walk you through it. We find all the `<time>` elements. There are two, so we grab the second one using the `.eq(1)` call (it's zero indexed) and then we read its `datetime` attribute, because that's where a unix timestamp is stored as a `string`.
+
+But we would much rather see a readable date in our results, not a unix timestamp, so we need to convert it. Unfortunately the `new Date()` constructor will not accept a `string`, so we cast the `string` to a `number` using the `Number()` function before actually calling `new Date()`. Phew!
+
+### [](#run-count)Run count
+
+And so we're finishing up with the `runCount`. There's no specific element like `<time>`, so we need to create a complex selector and then do a transformation on the result.
+
+    const title = await page.$eval('header h1', (el => el.textContent));
+    const description = await page.$eval('header p[class^=Text__Paragraph]', (el => el.textContent));
+
+    const lastRunTimestamp = await page.$eval('time', (els) => els[1].getAttribute('datetime'));
+    const lastRunDate = new Date(Number(lastRunTimestamp));
+
+    const runCountText = await page.$eval('ul.stats li:nth-of-type(3)', (el => el.textContent));
+    const runCount = Number(runCountText.match(/\d+/)[0]);
+
+    return {
+        title,
+        description,
+        lastRunDate,
+        runCount,
+    };
+
+The `ul.stats > li:nth-of-type(3)` looks complicated, but it only reads that we're looking for a `<ul class="stats ...">` element and within that element we're looking for the third `<li>` element. We grab its text, but we're only interested in the number of runs. So we parse the number out using a regular expression, but its type is still a `string`, so we finally convert the result to a `number` by wrapping it with a `Number()` call.
+
+### [](#wrapping-it-up)Wrapping it up
+
+And there we have it! All the data we needed in a single object. For the sake of completeness, let's add the properties we parsed from the URL earlier and we're good to go.
+
+    const { url } = request;
+
+    // ...
+
+    const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+    const title = await page.$eval('header h1', (el => el.textContent));
+    const description = await page.$eval('header p[class^=Text__Paragraph]', (el => el.textContent));
+
+    const lastRunTimestamp = await page.$eval('time', (els) => els[1].getAttribute('datetime'));
+    const lastRunDate = new Date(Number(lastRunTimestamp));
+
+    const runCountText = await page.$eval('ul.stats li:nth-of-type(3)', (el => el.textContent));
+    const runCount = Number(runCountText.match(/\d+/)[0]);
+
+    return {
+        url,
+        uniqueIdentifier,
+        title,
+        description,
+        lastRunDate,
+        runCount,
+    };
+
+All we need to do now is add this to our `pageFunction`:
+
+    async function pageFunction(context) {
+        const { request, log, skipLinks, page } = context; // page is Puppeteer's page
+
+        if (request.userData.label === 'START') {
+            log.info('Store opened!');
+            // Do some stuff later.
+        }
+        if (request.userData.label === 'DETAIL') {
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            // Get attributes in parallel to speed up the process.
+            const titleP = page.$eval('header h1', (el => el.textContent));
+            const descriptionP = page.$eval('header p[class^=Text__Paragraph]', (el => el.textContent));
+            const lastRunTimestampP = page.$eval('time', (els) => els[1].getAttribute('datetime'));
+            const runCountTextP = page.$eval('ul.stats li:nth-of-type(3)', (el => el.textContent));
+
+            const [title, description, lastRunTimestamp, runCountText] = await Promise.all([titleP, descriptionP, lastRunTimestampP, runCountTextP]);
+
+            const lastRunDate = new Date(Number(lastRunTimestamp));
+            const runCount = Number(runCountText.match(/\d+/)[0]);
+
+            return {
+                url,
+                uniqueIdentifier,
+                title,
+                description,
+                lastRunDate,
+                runCount,
+            };
+        }
+    }
+
+> You have definitely noticed that we changed up the code a little bit. This is because the back and forth communication between Node.js and browser takes some time and it slows down the scraper. To limit the effect of this, we changed all the functions to start at the same time and only wait for all of them to finish at the end. This is called concurrency or parallelism. Unless the functions need to be executed in a specific order, it's often a good idea to run them concurrently to speed things up.
+
+### [](#test-run-3)Test run 3
+
+As always, try hitting that **Save & Run** button and visit the Dataset preview of clean items. You should see a nice table of all the attributes correctly scraped. You nailed it!
+
+## [](#pagination)Pagination
+
+Pagination is just a term that represents "going to the next page of results". You may have noticed that we did not actually scrape all the actors, just the first page of results. That's because to load the rest of the actors, one needs to click the orange **Show more** button at the very bottom of the list. This is pagination.
+
+> This is a typical JavaScript pagination, sometimes called infinite scroll. Other pages may just use links that take you to the next page. If you encounter those, just make a Pseudo URL for those links and they will be automatically enqueued to the request queue. Use a label to let the scraper know what kind of URL it's processing.
+
+### [](#waiting-for-dynamic-content)Waiting for dynamic content
+
+Before we talk about paginating, we need to have a quick look at dynamic content. Since the Apify Store is a JavaScript application (as many, if not most modern websites are), the button might not exist in the page when the scraper runs the `pageFunction`.
+
+How is this possible? Because the scraper only waits with executing the `pageFunction` for the page to load its HTML. If there's additional JavaScript that modifies the DOM afterwards, the `pageFunction` may execute before this JavaScript had the time to run.
+
+At first, you may think that the scraper is broken, but it just cannot wait for all the JavaScript in the page to finish executing. For a lot of pages, there's always some JavaScript executing or some network requests being made. It would never stop waiting. It is therefore up to you, the programmer, to wait for the elements you need. Fortunately, we have an easy solution.
+
+#### [](#the--code-context-page-waitfor----code--function)The `context.page.waitFor()` function
+
+`waitFor()` is a function that's available on the Puppeteer `page` object that's in turn available on the `context` argument of the `pageFunction` (as you already know from previous chapters). It helps you with, well, waiting for stuff. It accepts either a number of milliseconds to wait, a selector to await in the page, or a function to execute. It will stop waiting once the time elapses, the selector appears or the provided function returns `true`.
+
+> See [`page.waitFor()`](https://pptr.dev/#?product=Puppeteer&show=api-pagewaitforselectororfunctionortimeout-options-args) in the Puppeteer documentation.
+
+    await page.waitFor(2000); // Waits for 2 seconds.
+    await page.waitFor('#my-id'); // Waits until an element with id "my-id" appears in the page.
+    await page.waitFor(() => !!window.myObject); // Waits until a "myObject" variable appears on the window object.
+
+The selector may never be found and the function might never return `true`, so the `page.waitFor()` function also has a timeout. The default is `30` seconds. You can override it by providing an options object as the second parameter, with a `timeout` property.
+
+    await page.waitFor('.bad-class', { timeout: 5000 });
+
+With those tools, you should be able to handle any dynamic content the website throws at you.
+
+### [](#how-to-paginate)How to paginate
+
+With the theory out of the way, this should be pretty easy. The algorithm is a loop:
+
+1.  Wait for the **Show more** button.
+2.  Click it.
+3.  Is there another **Show more** button?
+    *   Yes? Repeat the above. (loop)
+    *   No? We're done. We have all the actors.
+
+#### [](#waiting-for-the-button)Waiting for the button
+
+Before we can wait for the button, we need to know its unique selector. A quick look in the DevTools tells us that the button's class is some weird randomly generated string, but fortunately, there's an enclosing `<div>` with a class of `show-more`. Great! Our unique selector:
+
+    div.show-more > button
+
+> Don't forget to confirm our assumption in the DevTools finder tool (CTRL/CMD + F).
+
+![waiting for the button](https://apifyusercontent.com/fbf97b35b4cb63cb5438c84dfc255a3b765ed176/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f77616974696e672d666f722d7468652d627574746f6e2e6a7067 "Finding show more button in DevTools.")
+
+Now that we know what to wait for, we just plug it into the `waitFor()` function.
+
+    await page.waitFor('div.show-more > button');
+
+#### [](#clicking-the-button)Clicking the button
+
+We have a unique selector for the button and we know that it's already rendered in the page. Clicking it is a piece of cake. We'll use the Puppeteer `page` again to issue the click. Puppeteer will actually simulate dragging the mouse and making a left mouse click in the element.
+
+    await page.click('div.show-more > button');
+
+This will show the next page of actors.
+
+#### [](#repeating-the-process)Repeating the process
+
+We've shown two function calls, but how do we make this work together in the `pageFunction`?
+
+    async function pageFunction(context) {
+
+    // ...
+
+    let timeout; // undefined
+    const buttonSelector = 'div.show-more > button';
+    while (true) {
+        log.info('Waiting for the "Show more" button.');
+        try {
+            await page.waitFor(buttonSelector, { timeout }); // Default timeout first time.
+            timeout = 2000; // 2 sec timeout after the first.
+        } catch (err) {
+            // Ignore the timeout error.
+            log.info('Could not find the "Show more button", we\'ve reached the end.');
+            break;
+        }
+        log.info('Clicking the "Show more" button.');
+        await page.click(buttonSelector);
+    }
+
+    // ...
+
+    }
+
+We want to run this until the `waitFor()` function throws, so that's why we use a `while(true)` loop. We're also not interested in the error, because we're expecting it, so we just ignore it and print a log message instead.
+
+You might be wondering what's up with the `timeout`. Well, for the first page load, we want to wait longer, so that all the page's JavaScript has had a chance to execute, but for the other iterations, the JavaScript is already loaded and we're just waiting for the page to re-render so waiting for `2` seconds is enough to confirm that the button is not there. We don't want to stall the scraper for `30` seconds just to make sure that there's no button.
+
+### [](#plugging-it-into-the--code-pagefunction--code-)Plugging it into the `pageFunction`
+
+We've got the general algorithm ready, so all that's left is to integrate it into our earlier `pageFunction`. Remember the `// Do some stuff later` comment? Let's replace it.
+
+    async function pageFunction(context) {
+        const { request, log, skipLinks, page } = context;
+        if (request.userData.label === 'START') {
+            log.info('Store opened!');
+            let timeout; // undefined
+            const buttonSelector = 'div.show-more > button';
+            while (true) {
+                log.info('Waiting for the "Show more" button.');
+                try {
+                    await page.waitFor(buttonSelector, { timeout }); // Default timeout first time.
+                    timeout = 2000; // 2 sec timeout after the first.
+                } catch (err) {
+                    // Ignore the timeout error.
+                    log.info('Could not find the "Show more button", we\'ve reached the end.');
+                    break;
+                }
+                log.info('Clicking the "Show more" button.');
+                await page.click(buttonSelector);
+            }
+        }
+
+        if (request.userData.label === 'DETAIL') {
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            // Get attributes in parallel to speed up the process.
+            const titleP = page.$eval('header h1', (el => el.textContent));
+            const descriptionP = page.$eval('header p[class^=Text__Paragraph]', (el => el.textContent));
+            const lastRunTimestampP = page.$eval('time', (els) => els[1].getAttribute('datetime'));
+            const runCountTextP = page.$eval('ul.stats li:nth-of-type(3)', (el => el.textContent));
+
+            const [title, description, lastRunTimestamp, runCountText] = await Promise.all([titleP, descriptionP, lastRunTimestampP, runCountTextP]);
+
+            const lastRunDate = new Date(Number(lastRunTimestamp));
+            const runCount = Number(runCountText.match(/\d+/)[0]);
+
+            return {
+                url,
+                uniqueIdentifier,
+                title,
+                description,
+                lastRunDate,
+                runCount,
+            };
+        }
+    }
+
+That's it! You can now remove the **Max pages per run** limit, **Save & Run** your task and watch the scraper paginate through all the actors and then scrape all of their data. After it succeeds, open the Dataset again and see the clean items. You should have a table of all the actor's details in front of you. If you do, great job! You've successfully scraped the Apify Store. And if not, no worries, just go through the code examples again, it's probably just some typo.
+
+![final results](https://apifyusercontent.com/7efc451548c50f3495439b673b9298d9d8ec4f1b/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f706c756767696e672d69742d696e746f2d7468652d7061676566756e6374696f6e2e6a7067 "Final results.")
+
+## [](#downloading-the-scraped-data)Downloading the scraped data
+
+You already know the DATASET tab of the run console since this is where we've always previewed our data. Notice that at the bottom, there is a table with multiple data formats, such as JSON, CSV or an Excel sheet, and to the right, there are options to download the scraping results in any of those formats. Go ahead and try it.
+
+> If you prefer working with an API, you can find an example in the API tab of the run console: **Get dataset items**.
+
+### [](#items-and-clean-items)Items and Clean items
+
+There are two types of data available for download. Items and Clean items. The Items will always include a record for each `pageFunction` invocation, even if you did not return any results. The record also includes hidden fields such as `#debug`, where you can find various information that can help you with debugging your scrapers.
+
+Clean items, on the other hand, include only the data you returned from the `pageFunction`. If you're only interested in the data you scraped, this format is what you will be using most of the time.
+
+## [](#bonus--making-your-code-neater)Bonus: Making your code neater
+
+You may have noticed that the `pageFunction` gets quite bulky. To make better sense of your code and have an easier time maintaining or extending your task, feel free to define other functions inside the `pageFunction` that encapsulate all the different logic. You can, for example, define a function for each of the different pages:
+
+    async function pageFunction(context) {
+        switch (context.request.userData.label) {
+            case 'START': return handleStart(context);
+            case 'DETAIL': return handleDetail(context);
+        }
+
+        async function handleStart({ log, page }) {
+            log.info('Store opened!');
+            let timeout; // undefined
+            const buttonSelector = 'div.show-more > button';
+            while (true) {
+                log.info('Waiting for the "Show more" button.');
+                try {
+                    await page.waitFor(buttonSelector, { timeout }); // Default timeout first time.
+                    timeout = 2000; // 2 sec timeout after the first.
+                } catch (err) {
+                    // Ignore the timeout error.
+                    log.info('Could not find the "Show more button", we\'ve reached the end.');
+                    break;
+                }
+                log.info('Clicking the "Show more" button.');
+                await page.click(buttonSelector);
+            }
+        }
+
+        async function handleDetail({ request, log, skipLinks, page }) {
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            // Get attributes in parallel to speed up the process.
+            const titleP = page.$eval('header h1', (el => el.textContent));
+            const descriptionP = page.$eval('header p[class^=Text__Paragraph]', (el => el.textContent));
+            const lastRunTimestampP = page.$eval('time', (els) => els[1].getAttribute('datetime'));
+            const runCountTextP = page.$eval('ul.stats li:nth-of-type(3)', (el => el.textContent));
+
+            const [title, description, lastRunTimestamp, runCountText] = await Promise.all([titleP, descriptionP, lastRunTimestampP, runCountTextP]);
+
+            const lastRunDate = new Date(Number(lastRunTimestamp));
+            const runCount = Number(runCountText.match(/\d+/)[0]);
+
+            return {
+                url,
+                uniqueIdentifier,
+                title,
+                description,
+                lastRunDate,
+                runCount,
+            };
+        }
+    }
+
+> If you're confused by the functions being declared below their executions, it's called hoisting and it's a feature of JavaScript. It helps you put what matters on top, if you so desire.
+
+## [](#bonus-2--using-jquery-with-puppeteer-scraper)Bonus 2: Using jQuery with Puppeteer Scraper
+
+If you're familiar with the [`jQuery` library](https://jquery.com/), you may have looked at the scraping code and thought that it's unnecessarily complicated. That's probably up to everyone to decide on their own, but the good news is, you can easily use `jQuery` with Puppeteer Scraper too.
+
+### [](#injecting-jquery)Injecting jQuery
+
+To be able to use jQuery, we first need to introduce it to the browser. Fortunately, we have a helper function to do just that: [`Apify.utils.puppeteer.injectJQuery`](https://sdk.apify.com/docs/api/puppeteer#puppeteer.injectJQuery)
+
+> Just a friendly warning. Injecting `jQuery` into a page may break the page itself, if it expects a specific version of `jQuery` to be available and you override it with an incompatible one. So, be careful.
+
+You can either call this function directly in your `pageFunction`, or you can set up `jQuery` injection in the **Pre goto function** in the INPUT UI.
+
+    async function pageFunction(context) {
+        const { Apify, page } = context;
+        await Apify.utils.puppeteer.injectJQuery(page);
+
+        // your code ...
+    }
+
+    async function preGotoFunction({ page, Apify }) {
+        await Apify.utils.puppeteer.injectJQuery(page);
+    }
+
+The implementations are almost equal in effect. That means that in some cases, you may see performance differences, or one might work while the other does not. Depending on the target website.
+
+Let's try refactoring the Bonus 1 version of the `pageFunction` to use `jQuery`.
+
+    async function pageFunction(context) {
+        switch (context.request.userData.label) {
+            case 'START': return handleStart(context);
+            case 'DETAIL': return handleDetail(context);
+        }
+
+        async function handleStart({ log, page }) {
+            log.info('Store opened!');
+            let timeout; // undefined
+            const buttonSelector = 'div.show-more > button';
+            while (true) {
+                log.info('Waiting for the "Show more" button.');
+                try {
+                    await page.waitFor(buttonSelector, { timeout });
+                    timeout = 2000;
+                } catch (err) {
+                    log.info('Could not find the "Show more button", we\'ve reached the end.');
+                    break;
+                }
+                log.info('Clicking the "Show more" button.');
+                await page.click(buttonSelector);
+            }
+        }
+
+        async function handleDetail({ request, log, skipLinks, page, Apify }) { // <-------- Destructure Apify.
+            await Apify.utils.puppeteer.injectJQuery(page); // <-------- Inject jQuery.
+
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            const results = await page.evaluate(() => { // <-------- Use jQuery only inside page.evaluate (inside browser).
+                return {
+                    title: $('header h1').text(),
+                    description: $('header p[class^=Text__Paragraph]').text(),
+                    lastRunDate: new Date(
+                        Number(
+                            $('time')
+                                .eq(1)
+                                .attr('datetime'),
+                        ),
+                    ),
+                    runCount: Number(
+                        $('ul.stats li:nth-of-type(3)')
+                            .text()
+                            .match(/\d+/)[0],
+                    ),
+                };
+            })
+
+            return {
+                url,
+                uniqueIdentifier,
+                ...results, // <-------- Add results from browser to output.
+            };
+        }
+    }
+
+> There's an important takeaway from the example code. You can only use jQuery in the browser scope, even though you're injecting it outside of the browser. We're using the [`page.evaluate()`](https://pptr.dev/#?product=Puppeteer&show=api-pageevaluatepagefunction-args) function to run the script in the context of the browser and the return value is passed back to Node.js. Keep this in mind.
+
+## [](#final-word)Final word
+
+Thank you for reading this whole tutorial! Really! It's important to us that our users have the best information available to them so that they can use Apify easily and effectively. We're glad that you made it all the way here and congratulations on creating your first scraping task. We hope that you liked the tutorial and if there's anything you'd like to ask, [do it on Stack Overflow](https://stackoverflow.com/questions/tagged/apify)!
+
+Finally, `apify/puppeteer-scraper` is just an actor and writing your own actors is a breeze with the [Apify SDK](https://sdk.apify.com). It's a bit more complex and involved than writing a simple `pageFunction`, but it allows you to fine-tune all the details of your scraper to your liking. Perhaps some other time, when you're in the mood for yet another tutorial, visit the [Getting Started](https://sdk.apify.com/docs/guides/gettingstarted). We think you'd like it!
diff --git a/docs/scraping/web_scraper.md b/docs/scraping/web_scraper.md
index 4f1f286fd8..f966ee478e 100644
--- a/docs/scraping/web_scraper.md
+++ b/docs/scraping/web_scraper.md
@@ -2,10 +2,403 @@
 title: Web Scraper
 ---
 
-## [](#web-scraper)Web Scraper
+# [](#scraping-with-web-scraper)Scraping with Web Scraper
 
-Web Scraper is a ready-made solution for scraping the web using the Chrome browser. It takes away all the work necessary to set up a browser for crawling, controls the browser automatically and produces machine readable results in several common formats.
+This scraping tutorial will go into the nitty gritty details of extracting data from `https://apify.com/store` using the `apify/web-scraper`. If you arrived here from the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction), tutorial, great! You are ready to continue where we left off. If you haven't seen the Getting started yet, check it out, it will help you learn about Apify and scraping in general and set you up for this tutorial, because this one builds on topics and code examples discussed there.
 
-Underneath, it uses the Puppeteer library to control the browser, but you don't need to worry about that. Using a simple web UI and a little of basic JavaScript, you can tweak it to serve almost any scraping need.
+## [](#getting-to-know-our-tools)Getting to know our tools
 
-[Visit the Web Scraper tutorial to get started!](./scraping/tutorial/web-scraper)
+In the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction) tutorial, we've confirmed that the scraper works as expected, so now it's time to add more data to the results.
+
+To do that, we'll be using the [`jQuery` library](https://jquery.com/), because it provides some nice tools and a lot of people familiar with JavaScript already know how to use it.
+
+> If you're not familiar with `jQuery`, you can find good information [in the docs](https://api.jquery.com/) and if you just don't want to use it, that's okay. Everything can be done using pure JavaScript too.
+
+To add `jQuery`, all we need to do is turn on **Inject jQuery** under INPUT **Options**. This will add a `context.jQuery` function that you can use.
+
+Now that's out of the way, let's open one of the actor detail pages in the Store, for example the [`apify/web-scraper`](https://apify.com/apify/web-scraper) page and use our DevTools-Fu to scrape some data.
+
+## [](#quick-recap)Quick recap
+
+Before we start, let's do a quick recap of the data we chose to scrape:
+
+1.  **URL** - The URL that goes directly to the actor's detail page.
+2.  **Unique identifier** - Such as `apify/web-scraper`.
+3.  **Title** - The title visible in the actor's detail page.
+4.  **Description** - The actor's description.
+5.  **Last run date**- When the actor was last run.
+6.  **Number of runs** - How many times the actor was run.
+
+![data to scrape](https://apifyusercontent.com/7274765d35b9a7c781e5bcc705a3dbdcf3c308ec/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7363726170696e672d70726163746963652e6a7067 "Overview of data to be scraped.")
+
+We've already scraped number 1 and 2 in the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction) tutorial, so let's get to the next one on the list: Title
+
+### [](#title)Title
+
+![actor title](https://apifyusercontent.com/5274e02a1c45ed96a7d8c0147ac6e3d99f883ed0/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7469746c652e6a7067 "Finding actor title in DevTools.")
+
+By using the element selector tool, we find out that the title is there under an `<h1>` tag, as titles should be. Maybe surprisingly, we find that there are actually two `<h1>` tags on the detail page. This should get us thinking. Is there any parent element that includes our `<h1>` tag, but not the other ones? Yes, there is! There is a `<header>` element that we can use to select only the heading we're interested in.
+
+> Remember that you can press CTRL+F (CMD+F) in the Elements tab of DevTools to open the search bar where you can quickly search for elements using their selectors. And always make sure to use the DevTools to verify your scraping process and assumptions. It's faster than changing the crawler code all the time.
+
+To get the title we just need to find it using a `header h1` selector, which selects all `<h1>` elements that have a `<header>` ancestor. And as we already know, there's only one.
+
+    // Using jQuery.
+    return {
+        title: $('header h1').text(),
+    };
+
+### [](#description)Description
+
+Getting the actor's description is a little more involved, but still pretty straightforward. We can't just simply search for a `<p>` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the actor description is nested within the `<header>` element too, same as the title. Sadly, we're still left with two `<p>` tags. To finally select only the description, we choose the `<p>` tag that has a `class` that starts with `Text__Paragraph`.
+
+![actor description selector](https://apifyusercontent.com/28dee1e51c6ac3e8ec67f0eb953b4a71c775f217/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6465736372697074696f6e2e6a7067 "Finding actor description in DevTools.")
+
+    return {
+        title: $('header h1').text(),
+        description: $('header p[class^=Text__Paragraph]').text(),
+    };
+
+### [](#last-run-date)Last run date
+
+The DevTools tell us that the `lastRunDate` can be found in the second of the two `<time>` elements in the page.
+
+![actor last run date selector](https://apifyusercontent.com/6fe3f03692a7dc3acc35be74b3b8baacb98d7ac3/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6c6173742d72756e2d646174652e6a7067 "Finding actor last run date in DevTools.")
+
+    return {
+        title: $('header h1').text(),
+        description: $('header p[class^=Text__Paragraph]').text(),
+        lastRunDate: new Date(
+            Number(
+                $('time')
+                    .eq(1)
+                    .attr('datetime'),
+            ),
+        ),
+    };
+
+It might look a little too complex at first glance, but let me walk you through it. We find all the `<time>` elements. There are two, so we grab the second one using the `.eq(1)` call (it's zero indexed) and then we read its `datetime` attribute, because that's where a unix timestamp is stored as a `string`.
+
+But we would much rather see a readable date in our results, not a unix timestamp, so we need to convert it. Unfortunately the `new Date()` constructor will not accept a `string`, so we cast the `string` to a `number` using the `Number()` function before actually calling `new Date()`. Phew!
+
+### [](#run-count)Run count
+
+And so we're finishing up with the `runCount`. There's no specific element like `<time>`, so we need to create a complex selector and then do a transformation on the result.
+
+    return {
+        title: $('header h1').text(),
+        description: $('header p[class^=Text__Paragraph]').text(),
+        lastRunDate: new Date(
+            Number(
+                $('time')
+                    .eq(1)
+                    .attr('datetime'),
+            ),
+        ),
+        runCount: Number(
+            $('ul.stats li:nth-of-type(3)')
+                .text()
+                .match(/\d+/)[0],
+        ),
+    };
+
+The `ul.stats > li:nth-of-type(3)` looks complicated, but it only reads that we're looking for a `<ul class="stats ...">` element and within that element we're looking for the third `<li>` element. We grab its text, but we're only interested in the number of runs. So we parse the number out using a regular expression, but its type is still a `string`, so we finally convert the result to a `number` by wrapping it with a `Number()` call.
+
+### [](#wrapping-it-up)Wrapping it up
+
+And there we have it! All the data we needed in a single object. For the sake of completeness, let's add the properties we parsed from the URL earlier and we're good to go.
+
+    const { url } = request;
+
+    // ...
+
+    const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+    return {
+        url,
+        uniqueIdentifier,
+        title: $('header h1').text(),
+        description: $('header p[class^=Text__Paragraph]').text(),
+        lastRunDate: new Date(
+            Number(
+                $('time')
+                    .eq(1)
+                    .attr('datetime'),
+            ),
+        ),
+        runCount: Number(
+            $('ul.stats li:nth-of-type(3)')
+                .text()
+                .match(/\d+/)[0],
+        ),
+    };
+
+All we need to do now is add this to our `pageFunction`:
+
+    async function pageFunction(context) {
+        const { request, log, skipLinks, jQuery: $ } = context; // use jQuery as $
+
+        if (request.userData.label === 'START') {
+            log.info('Store opened!');
+            // Do some stuff later.
+        }
+        if (request.userData.label === 'DETAIL') {
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            return {
+                url,
+                uniqueIdentifier,
+                title: $('header h1').text(),
+                description: $('header p[class^=Text__Paragraph]').text(),
+                lastRunDate: new Date(
+                    Number(
+                        $('time')
+                            .eq(1)
+                            .attr('datetime'),
+                    ),
+                ),
+                runCount: Number(
+                    $('ul.stats li:nth-of-type(3)')
+                        .text()
+                        .match(/\d+/)[0],
+                ),
+            };
+        }
+    }
+
+### [](#test-run-3)Test run 3
+
+As always, try hitting that **Save & Run** button and visit the Dataset preview of clean items. You should see a nice table of all the attributes correctly scraped. You nailed it!
+
+## [](#pagination)Pagination
+
+Pagination is just a term that represents "going to the next page of results". You may have noticed that we did not actually scrape all the actors, just the first page of results. That's because to load the rest of the actors, one needs to click the orange **Show more** button at the very bottom of the list. This is pagination.
+
+> This is a typical JavaScript pagination, sometimes called infinite scroll. Other pages may just use links that take you to the next page. If you encounter those, just make a Pseudo URL for those links and they will be automatically enqueued to the request queue. Use a label to let the scraper know what kind of URL it's processing.
+
+### [](#waiting-for-dynamic-content)Waiting for dynamic content
+
+Before we talk about paginating, we need to have a quick look at dynamic content. Since the Apify Store is a JavaScript application (as many, if not most modern websites are), the button might not exist in the page when the scraper runs the `pageFunction`.
+
+How is this possible? Because the scraper only waits with executing the `pageFunction` for the page to load its HTML. If there's additional JavaScript that modifies the DOM afterwards, the `pageFunction` may execute before this JavaScript had the time to run.
+
+At first, you may think that the scraper is broken, but it just cannot wait for all the JavaScript in the page to finish executing. For a lot of pages, there's always some JavaScript executing or some network requests being made. It would never stop waiting. It is therefore up to you, the programmer, to wait for the elements you need. Fortunately, we have an easy solution.
+
+#### [](#the--code-context-waitfor----code--function)The `context.waitFor()` function
+
+`waitFor()` is a function that's available on the `context` object passed to the `pageFunction` and helps you with, well, waiting for stuff. It accepts either a number of milliseconds to wait, a selector to await in the page, or a function to execute. It will stop waiting once the time elapses, the selector appears or the provided function returns `true`.
+
+    await waitFor(2000); // Waits for 2 seconds.
+    await waitFor('#my-id'); // Waits until an element with id "my-id" appears in the page.
+    await waitFor(() => !!window.myObject); // Waits until a "myObject" variable appears on the window object.
+
+The selector may never be found and the function might never return `true`, so the `waitFor()` function also has a timeout. The default is `20` seconds. You can override it by providing an options object as the second parameter, with a `timeoutMillis` property.
+
+    await waitFor('.bad-class', { timeoutMillis: 5000 });
+
+With those tools, you should be able to handle any dynamic content the website throws at you.
+
+### [](#how-to-paginate)How to paginate
+
+With the theory out of the way, this should be pretty easy. The algorithm is a loop:
+
+1.  Wait for the **Show more** button.
+2.  Click it.
+3.  Is there another **Show more** button?
+    *   Yes? Repeat the above. (loop)
+    *   No? We're done. We have all the actors.
+
+#### [](#waiting-for-the-button)Waiting for the button
+
+Before we can wait for the button, we need to know its unique selector. A quick look in the DevTools tells us that the button's class is some weird randomly generated string, but fortunately, there's an enclosing `<div>` with a class of `show-more`. Great! Our unique selector:
+
+    div.show-more > button
+
+> Don't forget to confirm our assumption in the DevTools finder tool (CTRL/CMD + F).
+
+![waiting for the button](https://apifyusercontent.com/fbf97b35b4cb63cb5438c84dfc255a3b765ed176/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f77616974696e672d666f722d7468652d627574746f6e2e6a7067 "Finding show more button in DevTools.")
+
+Now that we know what to wait for, we just plug it into the `waitFor()` function.
+
+    await waitFor('div.show-more > button');
+
+#### [](#clicking-the-button)Clicking the button
+
+We have a unique selector for the button and we know that it's already rendered in the page. Clicking it is a piece of cake. We'll use `jQuery` again, but feel free to use plain JavaScript, it works the same.
+
+    $('div.show-more > button').click()
+
+This will show the next page of actors.
+
+#### [](#repeating-the-process)Repeating the process
+
+We've shown two function calls, but how do we make this work together in the `pageFunction`?
+
+    async function pageFunction(context) {
+
+    // ...
+
+    let timeoutMillis; // undefined
+    const buttonSelector = 'div.show-more > button';
+    while (true) {
+        log.info('Waiting for the "Show more" button.');
+        try {
+            await waitFor(buttonSelector, { timeoutMillis }); // Default timeout first time.
+            timeoutMillis = 2000; // 2 sec timeout after the first.
+        } catch (err) {
+            // Ignore the timeout error.
+            log.info('Could not find the "Show more button", we\'ve reached the end.');
+            break;
+        }
+        log.info('Clicking the "Show more" button.');
+        $(buttonSelector).click();   
+    }
+
+    // ...
+
+    }
+
+We want to run this until the `waitFor()` function throws, so that's why we use a `while(true)` loop. We're also not interested in the error, because we're expecting it, so we just ignore it and print a log message instead.
+
+You might be wondering what's up with the `timeoutMillis`. Well, for the first page load, we want to wait longer, so that all the page's JavaScript has had a chance to execute, but for the other iterations, the JavaScript is already loaded and we're just waiting for the page to re-render so waiting for `2` seconds is enough to confirm that the button is not there. We don't want to stall the scraper for `20` seconds just to make sure that there's no button.
+
+### [](#plugging-it-into-the--code-pagefunction--code-)Plugging it into the `pageFunction`
+
+We've got the general algorithm ready, so all that's left is to integrate it into our earlier `pageFunction`. Remember the `// Do some stuff later` comment? Let's replace it. And don't forget to destructure the `waitFor()` function on the first line.
+
+    async function pageFunction(context) {
+        const { request, log, skipLinks, jQuery: $, waitFor } = context;
+        if (request.userData.label === 'START') {
+            log.info('Store opened!');
+            let timeoutMillis; // undefined
+            const buttonSelector = 'div.show-more > button';
+            while (true) {
+                log.info('Waiting for the "Show more" button.');
+                try {
+                    await waitFor(buttonSelector, { timeoutMillis }); // Default timeout first time.
+                    timeoutMillis = 2000; // 2 sec timeout after the first.
+                } catch (err) {
+                    // Ignore the timeout error.
+                    log.info('Could not find the "Show more button", we\'ve reached the end.');
+                    break;
+                }
+                log.info('Clicking the "Show more" button.');
+                $(buttonSelector).click();
+            }
+
+        }
+        if (request.userData.label === 'DETAIL') {
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            return {
+                url,
+                uniqueIdentifier,
+                title: $('header h1').text(),
+                description: $('header p[class^=Text__Paragraph]').text(),
+                lastRunDate: new Date(
+                    Number(
+                        $('time')
+                            .eq(1)
+                            .attr('datetime'),
+                    ),
+                ),
+                runCount: Number(
+                    $('ul.stats li:nth-of-type(3)')
+                        .text()
+                        .match(/\d+/)[0],
+                ),
+            };
+        }
+    }
+
+That's it! You can now remove the **Max pages per run** limit, **Save & Run** your task and watch the scraper paginate through all the actors and then scrape all of their data. After it succeeds, open the Dataset again and see the clean items. You should have a table of all the actor's details in front of you. If you do, great job! You've successfully scraped the Apify Store. And if not, no worries, just go through the code examples again, it's probably just some typo.
+
+![final results](https://apifyusercontent.com/7efc451548c50f3495439b673b9298d9d8ec4f1b/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f706c756767696e672d69742d696e746f2d7468652d7061676566756e6374696f6e2e6a7067 "Final results.")
+
+## [](#downloading-the-scraped-data)Downloading the scraped data
+
+You already know the DATASET tab of the run console since this is where we've always previewed our data. Notice that at the bottom, there is a table with multiple data formats, such as JSON, CSV or an Excel sheet, and to the right, there are options to download the scraping results in any of those formats. Go ahead and try it.
+
+> If you prefer working with an API, you can find an example in the API tab of the run console: **Get dataset items**.
+
+### [](#items-and-clean-items)Items and Clean items
+
+There are two types of data available for download. Items and Clean items. The Items will always include a record for each `pageFunction` invocation, even if you did not return any results. The record also includes hidden fields such as `#debug`, where you can find various information that can help you with debugging your scrapers.
+
+Clean items, on the other hand, include only the data you returned from the `pageFunction`. If you're only interested in the data you scraped, this format is what you will be using most of the time.
+
+## [](#bonus--making-your-code-neater)Bonus: Making your code neater
+
+You may have noticed that the `pageFunction` gets quite bulky. To make better sense of your code and have an easier time maintaining or extending your task, feel free to define other functions inside the `pageFunction` that encapsulate all the different logic. You can, for example, define a function for each of the different pages:
+
+    async function pageFunction(context) {
+        switch (context.request.userData.label) {
+            case 'START': return handleStart(context);
+            case 'DETAIL': return handleDetail(context);
+        }
+
+        async function handleStart({ log, waitFor }) {
+            log.info('Store opened!');
+            let timeoutMillis; // undefined
+            const buttonSelector = 'div.show-more > button';
+            while (true) {
+                log.info('Waiting for the "Show more" button.');
+                try {
+                    await waitFor(buttonSelector, { timeoutMillis }); // Default timeout first time.
+                    timeoutMillis = 2000; // 2 sec timeout after the first.
+                } catch (err) {
+                    // Ignore the timeout error.
+                    log.info('Could not find the "Show more button", we\'ve reached the end.');
+                    break;
+                }
+                log.info('Clicking the "Show more" button.');
+                $(buttonSelector).click();
+            }
+        }
+
+        async function handleDetail({ request, log, skipLinks, jQuery: $ }) {
+            const { url } = request;
+            log.info(`Scraping ${url}`);
+            await skipLinks();
+
+            // Do some scraping.
+            const uniqueIdentifier = url.split('/').slice(-2).join('/');
+
+            return {
+                url,
+                uniqueIdentifier,
+                title: $('header h1').text(),
+                description: $('header p[class^=Text__Paragraph]').text(),
+                lastRunDate: new Date(
+                    Number(
+                        $('time')
+                            .eq(1)
+                            .attr('datetime'),
+                    ),
+                ),
+                runCount: Number(
+                    $('ul.stats li:nth-of-type(3)')
+                        .text()
+                        .match(/\d+/)[0],
+                ),
+            };
+        }
+    }
+
+> If you're confused by the functions being declared below their executions, it's called hoisting and it's a feature of JavaScript. It helps you put what matters on top, if you so desire.
+
+## [](#final-word)Final word
+
+Thank you for reading this whole tutorial! Really! It's important to us that our users have the best information available to them so that they can use Apify easily and effectively. We're glad that you made it all the way here and congratulations on creating your first scraping task. We hope that you liked the tutorial and if there's anything you'd like to ask, [do it on Stack Overflow](https://stackoverflow.com/questions/tagged/apify)!
+
+Finally, `apify/web-scraper` is just an actor and writing your own actors is a breeze with the [Apify SDK](https://sdk.apify.com). It's a bit more complex and involved than writing a simple `pageFunction`, but it allows you to fine-tune all the details of your scraper to your liking. Perhaps some other time, when you're in the mood for yet another tutorial, visit the [Getting Started](https://sdk.apify.com/docs/guides/gettingstarted). We think you'd like it!

From f86e2d4a37527ca1beb7addf4595fff8537a2328 Mon Sep 17 00:00:00 2001
From: Vratislav Bartonicek <vratislav@vbartonicek.cz>
Date: Tue, 1 Oct 2019 14:12:35 +0200
Subject: [PATCH 2/3] Link fix

---
 docs/actor/build.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/actor/build.md b/docs/actor/build.md
index 412ceca167..5da138a59b 100644
--- a/docs/actor/build.md
+++ b/docs/actor/build.md
@@ -4,7 +4,7 @@ title: Build
 
 ## [](#build)Build
 
-Before the actor can be run, it first needs to be built. The build effectively creates a snapshot of a specific version of the actor's settings such as the [Source code]({{@link actor/source_code.md}}) and [Environment variables]({{@link actor/run.md#run-env-vars}}), and creates a Docker image that contains everything the actor needs for its run, including necessary NPM packages, web browsers, etc.
+Before the actor can be run, it first needs to be built. The build effectively creates a snapshot of a specific version of the actor's settings such as the [Source code]({{@link actor/source_code.md}}) and [Environment variables]({{@link actor/run.md}}#run-env-vars), and creates a Docker image that contains everything the actor needs for its run, including necessary NPM packages, web browsers, etc.
 
 Each build is assigned a unique build number of the form `MAJOR.MINOR.BUILD` (e.g. `1.2.345`), where `MAJOR.MINOR` corresponds to the actor version number (see [Versions](#versions)) and `BUILD` is an automatically-incremented number starting at `1`.
 

From cb8c389322dff5b2f256a989d82dd5ea4964ef19 Mon Sep 17 00:00:00 2001
From: Vratislav Bartonicek <vratislav@vbartonicek.cz>
Date: Thu, 3 Oct 2019 15:55:50 +0200
Subject: [PATCH 3/3] Img links fixed

---
 docs/scraping/cheerio_scraper.md   | 14 +++++++-------
 docs/scraping/introduction.md      | 10 +++++-----
 docs/scraping/puppeteer_scraper.md | 13 ++++++-------
 docs/scraping/web_scraper.md       | 12 ++++++------
 4 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/docs/scraping/cheerio_scraper.md b/docs/scraping/cheerio_scraper.md
index 1125be6e2b..3f088de31e 100644
--- a/docs/scraping/cheerio_scraper.md
+++ b/docs/scraping/cheerio_scraper.md
@@ -29,13 +29,13 @@ Before we start, let's do a quick recap of the data we chose to scrape:
 5.  **Last run date**- When the actor was last run.
 6.  **Number of runs** - How many times the actor was run.
 
-![data to scrape](https://apifyusercontent.com/7274765d35b9a7c781e5bcc705a3dbdcf3c308ec/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7363726170696e672d70726163746963652e6a7067 "Overview of data to be scraped.")
+![data to scrape](../img/scraping-practice.jpg "Overview of data to be scraped.")
 
 We've already scraped number 1 and 2 in the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction) tutorial, so let's get to the next one on the list: Title
 
 ### [](#title)Title
 
-![actor title](https://apifyusercontent.com/5274e02a1c45ed96a7d8c0147ac6e3d99f883ed0/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7469746c652e6a7067 "Finding actor title in DevTools.")
+![actor title](../img/title.jpg "Finding actor title in DevTools.")
 
 By using the element selector tool, we find out that the title is there under an `<h1>` tag, as titles should be. Maybe surprisingly, we find that there are actually two `<h1>` tags on the detail page. This should get us thinking. Is there any parent element that includes our `<h1>` tag, but not the other ones? Yes, there is! There is a `<header>` element that we can use to select only the heading we're interested in.
 
@@ -52,7 +52,7 @@ To get the title we just need to find it using a `header h1` selector, which sel
 
 Getting the actor's description is a little more involved, but still pretty straightforward. We can't just simply search for a `<p>` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the actor description is nested within the `<header>` element too, same as the title. Sadly, we're still left with two `<p>` tags. To finally select only the description, we choose the `<p>` tag that has a `class` that starts with `Text__Paragraph`.
 
-![actor description selector](https://apifyusercontent.com/28dee1e51c6ac3e8ec67f0eb953b4a71c775f217/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6465736372697074696f6e2e6a7067 "Finding actor description in DevTools.")
+![actor description selector](../img/description.jpg "Finding actor description in DevTools.")
 
     return {
         title: $('header h1').text(),
@@ -63,7 +63,7 @@ Getting the actor's description is a little more involved, but still pretty stra
 
 The DevTools tell us that the `lastRunDate` can be found in the second of the two `<time>` elements in the page.
 
-![actor last run date selector](https://apifyusercontent.com/6fe3f03692a7dc3acc35be74b3b8baacb98d7ac3/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6c6173742d72756e2d646174652e6a7067 "Finding actor last run date in DevTools.")
+![actor last run date selector](../img/last-run-date.jpg "Finding actor last run date in DevTools.")
 
     return {
         title: $('header h1').text(),
@@ -190,7 +190,7 @@ While with `apify/web-scraper` and `apify/puppeteer-scraper`, we could get away
 
 We want to know what happens when we click the **Show more** button, so we open the DevTools Network tab and clear it. Then we click the Show more button and wait for incoming requests to appear in the list.
 
-![inspect-network](https://apifyusercontent.com/2b51728bb8363c8ac71d8bab191c938fa3a5ddc9/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f696e73706563742d6e6574776f726b2e6a7067 "Inspecting network in DevTools.")
+![inspect-network](../img/inspect-network.jpg "Inspecting network in DevTools.")
 
 Now, this is interesting. It seems that we've only received two images after clicking the button and no additional data. This means that the data about actors must already be available in the page and the Show more button only displays it. This is good news.
 
@@ -198,7 +198,7 @@ Now, this is interesting. It seems that we've only received two images after cli
 
 Now that we know the information we seek is already in the page, we just need to find it. The first actor in the store is `apify/web-scraper` so let's try using the search tool in the Elements tab to find some reference to it. The first few hits do not provide any interesting information, but in the end, we find our goldmine. There is a `<script>` tag, with the ID `__NEXT_DATA__` that seems to hold a lot of information about `apify/web-scraper`. In DevTools, you can right click an element and click **Store as global variable** to make this element available in the Console.
 
-![find-data](https://apifyusercontent.com/7b5b800b5544349cd486c3cca2c61240a043e38e/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f66696e642d646174612e6a7067 "Finding the hidden actor data.")
+![find-data](../img/find-data.jpg "Finding the hidden actor data.")
 
 A `temp1` variable is now added to your console. We're mostly interested in its contents and we can get that using the `temp1.textContent` property. You can see that it's a rather large JSON string. How do we know? The `type` attribute of the `<script>` element says `application/json`. But working with a string would be very cumbersome, so we need to parse it.
 
@@ -206,7 +206,7 @@ A `temp1` variable is now added to your console. We're mostly interested in its
 
 After entering the above command into the console, we can inspect the `data` variable and see that all the information we need is there, in the `data.props.pageProps.items` array. Great!
 
-![inspect-data](https://apifyusercontent.com/e121274d88789fc535f4389ad1e36e61155fec23/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f696e73706563742d646174612e6a7067 "Inspecting the hidden actor data.")
+![inspect-data](../img/inspect-data.jpg "Inspecting the hidden actor data.")
 
 > It's obvious that all the information we set to scrape is available in this one data object, so you might already be wondering, can I just make one request to the store to get this JSON and then parse it out and be done with it in a single request? Yes you can! And that's the power of clever page analysis.
 
diff --git a/docs/scraping/introduction.md b/docs/scraping/introduction.md
index c9d9ccfe1d..b5c495044e 100644
--- a/docs/scraping/introduction.md
+++ b/docs/scraping/introduction.md
@@ -20,7 +20,7 @@ Depending on how you arrived at this tutorial, you may already have your first t
 
 > This tutorial covers the use of **Web**, **Cheerio** and **Puppeteer** scrapers, but a lot of the information here can be used with all actors.
 
-![actor-selection](https://apifyusercontent.com/8dbaeafb7e45277d68a2011447cc28d21e5be3cc/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6163746f722d73656c656374696f6e2e6a7067 "Selecting the best actor")
+![actor-selection](../img/actor-selection.jpg "Selecting the best actor")
 
 ### [](#running-a-task)Running a task
 
@@ -40,7 +40,7 @@ After clicking **Save & Run**, the window will change to the run detail. Here, y
 
 Now that the run has `SUCCEEDED`, click on the rightmost card labeled **Clean items** to see the results of the scrape. This takes you to the DATASET tab, where you can display or download the results in various formats. For now, just click the blue **Preview data** button. Voila, the scraped data.
 
-![run detail](https://apifyusercontent.com/44d8fb566bd35bec26dc9725cfa168795338acff/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7468652d72756e2d64657461696c2e6a7067 "Viewing results in the run detail.")
+![run detail](../img/the-run-detail.jpg "Viewing results in the run detail.")
 
 Good job! We've run our first task and got some results. Let's learn how to change the default configuration to scrape something more interesting than just the page's `<title>`.
 
@@ -107,7 +107,7 @@ We also need to somehow distinguish the Start URL from all the other URLs that t
       "label": "START"
     }
 
-![start url input](https://apifyusercontent.com/90c552bd3267b500260d27eba7c1b5792b10c370/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7468652d73746172742d75726c2e6a7067 "Adding new Start URL.")
+![start url input](../img/the-start-url.jpg "Adding new Start URL.")
 
 ### [](#crawling-the-website-with-pseudo-urls)Crawling the website with Pseudo URLs
 
@@ -141,7 +141,7 @@ Let's use the above Pseudo URL in our task. We should also add a label as we did
       "label": "DETAIL"
     }
 
-![pseudo url input](https://apifyusercontent.com/8d4802058a4f68753345a4097b292a82d38dab83/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6d616b696e672d612d70736575646f2d75726c2e6a7067 "Adding new Pseudo URL.")
+![pseudo url input](../img/making-a-pseudo-url.jpg "Adding new Pseudo URL.")
 
 ### [](#filtering-with-a-link-selector)Filtering with a link selector
 
@@ -175,7 +175,7 @@ The DevTools window will pop up, and display a lot of, perhaps unfamiliar, infor
 
 You'll see that the Element tab jumps to the first `<title>` element of the current page and that the title is `Store`. It's always good practice to do your research using the DevTools before writing the `pageFunction` and running your task.
 
-![devtools](https://apifyusercontent.com/b6ba4c89bd65705450f1fee33c37198186b175fc/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7573696e672d646576746f6f6c732e6a7067 "Finding title element in DevTools.")
+![devtools](../img/using-devtools.jpg "Finding title element in DevTools.")
 
 > For the sake of brevity, we won't go into the details of using the DevTools in this tutorial. If you're just starting out with DevTools, this [Google tutorial](https://developers.google.com/web/tools/chrome-devtools/) is a good place to begin.
 
diff --git a/docs/scraping/puppeteer_scraper.md b/docs/scraping/puppeteer_scraper.md
index 6aadfccd3e..2f6a6ff80e 100644
--- a/docs/scraping/puppeteer_scraper.md
+++ b/docs/scraping/puppeteer_scraper.md
@@ -35,13 +35,13 @@ Before we start, let's do a quick recap of the data we chose to scrape:
 5.  **Last run date**- When the actor was last run.
 6.  **Number of runs** - How many times the actor was run.
 
-![data to scrape](https://apifyusercontent.com/7274765d35b9a7c781e5bcc705a3dbdcf3c308ec/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7363726170696e672d70726163746963652e6a7067 "Overview of data to be scraped.")
+![data to scrape](../img/scraping-practice.jpg "Overview of data to be scraped.")
 
 We've already scraped number 1 and 2 in the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction) tutorial, so let's get to the next one on the list: Title
 
 ### [](#title)Title
 
-![actor title](https://apifyusercontent.com/5274e02a1c45ed96a7d8c0147ac6e3d99f883ed0/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7469746c652e6a7067 "Finding actor title in DevTools.")
+![actor title](../img/title.jpg "Finding actor title in DevTools.")
 
 By using the element selector tool, we find out that the title is there under an `<h1>` tag, as titles should be. Maybe surprisingly, we find that there are actually two `<h1>` tags on the detail page. This should get us thinking. Is there any parent element that includes our `<h1>` tag, but not the other ones? Yes, there is! There is a `<header>` element that we can use to select only the heading we're interested in.
 
@@ -62,8 +62,7 @@ The [`page.$eval`](https://pptr.dev/#?product=Puppeteer&show=api-elementhandleev
 
 Getting the actor's description is a little more involved, but still pretty straightforward. We can't just simply search for a `<p>` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the actor description is nested within the `<header>` element too, same as the title. Sadly, we're still left with two `<p>` tags. To finally select only the description, we choose the `<p>` tag that has a `class` that starts with `Text__Paragraph`.
 
-![actor description selector](https://apifyusercontent.com/28dee1e51c6ac3e8ec67f0eb953b4a71c775f217/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6465736372697074696f6e2e6a7067 "Finding actor description in DevTools.")
-
+![actor description selector](../img/description.jpg "Finding actor description in DevTools.")
     const title = await page.$eval('header h1', (el => el.textContent));
     const description = await page.$eval('header p[class^=Text__Paragraph]', (el => el.textContent));
 
@@ -76,7 +75,7 @@ Getting the actor's description is a little more involved, but still pretty stra
 
 The DevTools tell us that the `lastRunDate` can be found in the second of the two `<time>` elements in the page.
 
-![actor last run date selector](https://apifyusercontent.com/6fe3f03692a7dc3acc35be74b3b8baacb98d7ac3/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6c6173742d72756e2d646174652e6a7067 "Finding actor last run date in DevTools.")
+![actor last run date selector](../img/last-run-date.jpg "Finding actor last run date in DevTools.")
 
     const title = await page.$eval('header h1', (el => el.textContent));
     const description = await page.$eval('header p[class^=Text__Paragraph]', (el => el.textContent));
@@ -239,7 +238,7 @@ Before we can wait for the button, we need to know its unique selector. A quick
 
 > Don't forget to confirm our assumption in the DevTools finder tool (CTRL/CMD + F).
 
-![waiting for the button](https://apifyusercontent.com/fbf97b35b4cb63cb5438c84dfc255a3b765ed176/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f77616974696e672d666f722d7468652d627574746f6e2e6a7067 "Finding show more button in DevTools.")
+![waiting for the button](../img/waiting-for-the-button.jpg "Finding show more button in DevTools.")
 
 Now that we know what to wait for, we just plug it into the `waitFor()` function.
 
@@ -342,7 +341,7 @@ We've got the general algorithm ready, so all that's left is to integrate it int
 
 That's it! You can now remove the **Max pages per run** limit, **Save & Run** your task and watch the scraper paginate through all the actors and then scrape all of their data. After it succeeds, open the Dataset again and see the clean items. You should have a table of all the actor's details in front of you. If you do, great job! You've successfully scraped the Apify Store. And if not, no worries, just go through the code examples again, it's probably just some typo.
 
-![final results](https://apifyusercontent.com/7efc451548c50f3495439b673b9298d9d8ec4f1b/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f706c756767696e672d69742d696e746f2d7468652d7061676566756e6374696f6e2e6a7067 "Final results.")
+![final results](../img/plugging-it-into-the-pagefunction.jpg "Final results.")
 
 ## [](#downloading-the-scraped-data)Downloading the scraped data
 
diff --git a/docs/scraping/web_scraper.md b/docs/scraping/web_scraper.md
index f966ee478e..57e06d1fb0 100644
--- a/docs/scraping/web_scraper.md
+++ b/docs/scraping/web_scraper.md
@@ -29,13 +29,13 @@ Before we start, let's do a quick recap of the data we chose to scrape:
 5.  **Last run date**- When the actor was last run.
 6.  **Number of runs** - How many times the actor was run.
 
-![data to scrape](https://apifyusercontent.com/7274765d35b9a7c781e5bcc705a3dbdcf3c308ec/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7363726170696e672d70726163746963652e6a7067 "Overview of data to be scraped.")
+![data to scrape](../img/scraping-practice.jpg "Overview of data to be scraped.")
 
 We've already scraped number 1 and 2 in the [Getting started with Apify scrapers](https://apify.com/docs/scraping/tutorial/introduction) tutorial, so let's get to the next one on the list: Title
 
 ### [](#title)Title
 
-![actor title](https://apifyusercontent.com/5274e02a1c45ed96a7d8c0147ac6e3d99f883ed0/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f7469746c652e6a7067 "Finding actor title in DevTools.")
+![actor title](../img/title.jpg "Finding actor title in DevTools.")
 
 By using the element selector tool, we find out that the title is there under an `<h1>` tag, as titles should be. Maybe surprisingly, we find that there are actually two `<h1>` tags on the detail page. This should get us thinking. Is there any parent element that includes our `<h1>` tag, but not the other ones? Yes, there is! There is a `<header>` element that we can use to select only the heading we're interested in.
 
@@ -52,7 +52,7 @@ To get the title we just need to find it using a `header h1` selector, which sel
 
 Getting the actor's description is a little more involved, but still pretty straightforward. We can't just simply search for a `<p>` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the actor description is nested within the `<header>` element too, same as the title. Sadly, we're still left with two `<p>` tags. To finally select only the description, we choose the `<p>` tag that has a `class` that starts with `Text__Paragraph`.
 
-![actor description selector](https://apifyusercontent.com/28dee1e51c6ac3e8ec67f0eb953b4a71c775f217/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6465736372697074696f6e2e6a7067 "Finding actor description in DevTools.")
+![actor description selector](../img/description.jpg "Finding actor description in DevTools.")
 
     return {
         title: $('header h1').text(),
@@ -63,7 +63,7 @@ Getting the actor's description is a little more involved, but still pretty stra
 
 The DevTools tell us that the `lastRunDate` can be found in the second of the two `<time>` elements in the page.
 
-![actor last run date selector](https://apifyusercontent.com/6fe3f03692a7dc3acc35be74b3b8baacb98d7ac3/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f6c6173742d72756e2d646174652e6a7067 "Finding actor last run date in DevTools.")
+![actor last run date selector](../img/last-run-date.jpg "Finding actor last run date in DevTools.")
 
     return {
         title: $('header h1').text(),
@@ -221,7 +221,7 @@ Before we can wait for the button, we need to know its unique selector. A quick
 
 > Don't forget to confirm our assumption in the DevTools finder tool (CTRL/CMD + F).
 
-![waiting for the button](https://apifyusercontent.com/fbf97b35b4cb63cb5438c84dfc255a3b765ed176/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f77616974696e672d666f722d7468652d627574746f6e2e6a7067 "Finding show more button in DevTools.")
+![waiting for the button](../img/waiting-for-the-button.jpg "Finding show more button in DevTools.")
 
 Now that we know what to wait for, we just plug it into the `waitFor()` function.
 
@@ -323,7 +323,7 @@ We've got the general algorithm ready, so all that's left is to integrate it int
 
 That's it! You can now remove the **Max pages per run** limit, **Save & Run** your task and watch the scraper paginate through all the actors and then scrape all of their data. After it succeeds, open the Dataset again and see the clean items. You should have a table of all the actor's details in front of you. If you do, great job! You've successfully scraped the Apify Store. And if not, no worries, just go through the code examples again, it's probably just some typo.
 
-![final results](https://apifyusercontent.com/7efc451548c50f3495439b673b9298d9d8ec4f1b/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6170696679746563682f6163746f722d736372617065722f6d61737465722f646f63732f6275696c642f2e2e2f696d672f706c756767696e672d69742d696e746f2d7468652d7061676566756e6374696f6e2e6a7067 "Final results.")
+![final results](../img/plugging-it-into-the-pagefunction.jpg "Final results.")
 
 ## [](#downloading-the-scraped-data)Downloading the scraped data