diff --git a/repomix-output.xml b/repomix-output.xml new file mode 100644 index 0000000000..04cfcaa9ab --- /dev/null +++ b/repomix-output.xml @@ -0,0 +1,28987 @@ +This file is a merged representation of the entire codebase, combined into a single document by Repomix. + + +This section contains a summary of this file. + + +This file contains a packed representation of the entire repository's contents. +It is designed to be easily consumable by AI systems for analysis, code review, +or other automated processes. + + + +The content is organized as follows: +1. This summary section +2. Repository information +3. Directory structure +4. Repository files, each consisting of: + - File path as an attribute + - Full contents of the file + + + +- This file should be treated as read-only. Any changes should be made to the + original repository files, not this packed version. +- When processing this file, use the file path to distinguish + between different files in the repository. +- Be aware that this file may contain sensitive information. Handle it with + the same level of security as you would the original repository. + + + +- Some files may have been excluded based on .gitignore rules and Repomix's configuration +- Binary files are not included in this packed representation. Please refer to the Repository Structure section for a complete list of file paths, including binary files +- Files matching patterns in .gitignore are excluded +- Files matching default ignore patterns are excluded +- Files are sorted by Git change count (files with more changes are at the bottom) + + + + + + + + + +glossary/ + concepts/ + css_selectors.md + dynamic_pages.md + html_elements.md + http_cookies.md + http_headers.md + index.md + querying_css_selectors.md + robot_process_automation.md + tools/ + apify_cli.md + edit_this_cookie.md + index.md + insomnia.md + modheader.md + postman.md + proxyman.md + quick_javascript_switcher.md + switchyomega.md + user_agent_switcher.md + glossary.md +platform/ + deploying_your_code/ + deploying.md + docker_file.md + index.md + input_schema.md + inputs_outputs.md + output_schema.md + expert_scraping_with_apify/ + solutions/ + handling_migrations.md + index.md + integrating_webhooks.md + managing_source.md + rotating_proxies.md + saving_stats.md + using_api_and_client.md + using_storage_creating_tasks.md + actors_webhooks.md + apify_api_and_client.md + bypassing_anti_scraping.md + index.md + managing_source_code.md + migrations_maintaining_state.md + saving_useful_stats.md + tasks_and_storage.md + get_most_of_actors/ + actor_basics/ + _category_.yaml + actor_description_seo_description.md + actors-and-emojis.md + how-to-create-actor-readme.md + importance-of-actor-url.md + name-your-actor.md + interact_with_users/ + _category_.yaml + emails_to_actor_users.md + issues_tab.md + your_store_bio.md + product_optimization/ + _category_.yaml + actor_bundles.md + how_to_create_a_great_input_schema.md + promote_your_actor/ + _category_.yaml + blogs_and_blog_resources.md + parasite_seo.md + product_hunt.md + seo.md + social_media.md + video_tutorials.md + webinars.md + store_basics/ + _category_.yaml + actor_success_stories.md + how_actor_monetization_works.md + how_store_works.md + how_to_build_actors.md + ideas_page_and_its_use.md + index.md + monetizing_your_actor.md + getting_started/ + actors.md + apify_api.md + apify_client.md + creating_actors.md + index.md + inputs_outputs.md + apify_platform.md + running_a_web_server.md +tutorials/ + api/ + index.md + retry_failed_requests.md + run_actor_and_retrieve_data_via_api.md + apify_scrapers/ + cheerio_scraper.md + getting_started.md + index.md + puppeteer_scraper.md + web_scraper.md + node_js/ + add_external_libraries_web_scraper.md + analyzing_pages_and_fixing_errors.md + apify_free_google_serp_api.md + avoid_eacces_error_in_actor_builds.md + block_requests_puppeteer.md + caching_responses_in_puppeteer.js + caching_responses_in_puppeteer.md + choosing_the_right_scraper.md + dealing_with_dynamic_pages.js + dealing_with_dynamic_pages.md + debugging_web_scraper.md + filter_blocked_requests_using_sessions.md + handle_blocked_requests_puppeteer.md + how_to_fix_target_closed.md + how_to_save_screenshots_puppeteer.md + index.md + js_in_html.md + multiple-runs-scrape.md + optimizing_scrapers.md + processing_multiple_pages_web_scraper.md + request_labels_in_apify_actors.md + scraping_from_sitemaps.js + scraping_from_sitemaps.md + scraping_shadow_doms.md + scraping_urls_list_from_google_sheets.md + submitting_form_with_file_attachment.md + submitting_forms_on_aspx_pages.md + using_proxy_to_intercept_requests_puppeteer.md + waiting_for_dynamic_content.md + when_to_use_puppeteer_scraper.md + php/ + index.md + using_apify_from_php.md + python/ + index.md + process_data_using_python.md + scrape_data_python.md + tutorials/ + index.md +webscraping/ + advanced_web_scraping/ + crawling/ + crawling-sitemaps.md + crawling-with-search.md + sitemaps-vs-search.md + index.md + tips_and_tricks_robustness.md + anti_scraping/ + mitigation/ + cloudflare_challenge.md + generating_fingerprints.md + index.md + using_proxies.md + techniques/ + browser_challenges.md + captchas.md + fingerprinting.md + firewalls.md + geolocation.md + index.md + rate_limiting.md + index.md + api_scraping/ + general_api_scraping/ + cookies_headers_tokens.md + handling_pagination.md + index.md + locating_and_learning.md + graphql_scraping/ + custom_queries.md + index.md + introspection.md + modifying_variables.md + index.md + puppeteer_playwright/ + common_use_cases/ + downloading_files.md + index.md + logging_into_a_website.md + paginating_through_results.md + scraping_iframes.md + submitting_a_form_with_a_file_attachment.md + executing_scripts/ + extracting_data.md + index.md + injecting_code.md + page/ + index.md + interacting_with_a_page.md + page_methods.md + waiting.md + browser_contexts.md + browser.md + index.md + proxies.md + reading_intercepting_requests.md + scraping_basics_javascript/ + challenge/ + index.md + initializing_and_setting_up.md + modularity.md + scraping_amazon.md + crawling/ + exporting_data.md + filtering_links.md + finding_links.js + finding_links.md + first_crawl.md + headless_browser.md + index.md + pro_scraping.md + recap_extraction_basics.md + relative_urls.md + scraping_the_data.md + data_extraction/ + browser_devtools.md + computer_preparation.md + devtools_continued.md + index.md + node_continued.md + node_js_scraper.md + project_setup.md + save_to_csv.md + using_devtools.md + best_practices.md + index.md + introduction.md + scraping_basics_python/ + _exercises.mdx + 01_devtools_inspecting.md + 02_devtools_locating_elements.md + 03_devtools_extracting_data.md + 04_downloading_html.md + 05_parsing_html.md + 06_locating_elements.md + 07_extracting_data.md + 08_saving_data.md + 09_getting_links.md + 10_crawling.md + 11_scraping_variants.md + 12_framework.md + 13_platform.md + index.md + typescript/ + enums.md + index.md + installation.md + interfaces.md + mini_project.md + type_aliases.md + unknown_and_type_assertions.md + using_types_continued.md + using_types.md + watch_mode_and_tsconfig.md +homepage_content.json +index.mdx +sidebars.js + + + +This section contains the contents of the repository's files. + + +--- +title: CSS selectors +description: Learn about CSS selectors. What they are, their types, why they are important for web scraping and how to use them in browser Console with JavaScript. +sidebar_position: 8.4 +slug: /concepts/css-selectors +--- + +CSS selectors are patterns used to select [HTML elements](./html_elements.md) on a web page. They are used in combination with CSS styles to change the appearance of web pages, and also in JavaScript to access and manipulate the elements on a web page. + +> Querying of CSS selectors with JavaScript is done using [query selector functions](./querying_css_selectors.md). + +## Common types of CSS selectors + +Some of the most common types of CSS selectors are: + +### Element selector + +This is used to select elements by their tag name. For example, to select all `

` elements, you would use the `p` selector. + +```js +const paragraphs = document.querySelectorAll('p'); +``` + +### Class selector + +This is used to select elements by their class attribute. For example, to select all elements with the class of `highlight`, you would use the `.highlight` selector. + +```js +const highlightedElements = document.querySelectorAll('.highlight'); +``` + +### ID selector + +This is used to select an element by its `id` attribute. For example, to select an element with the id of `header`, you would use the `#header` selector. + +```js +const header = document.querySelector(`#header`); +``` + +### Attribute selector + +This is used to select elements based on the value of an attribute. For example, to select all elements with the attribute `data-custom` whose value is `yes`, you would use the `[data-custom="yes"]` selector. + +```js +const customElements = document.querySelectorAll('[data-custom="yes"]'); +``` + +### Chaining selectors + +You can also chain multiple selectors together to select elements more precisely. For example, to select an element with the class `highlight` that is inside a `

` element, you would use the `p.highlight` selector. + +```js +const highlightedParagraph = document.querySelectorAll('p.highlight'); +``` + +## CSS selectors in web scraping + +CSS selectors are important for web scraping because they allow you to target specific elements on a web page and extract their data. When scraping a web page, you typically want to extract specific pieces of information from the page, such as text, images, or links. CSS selectors allow you to locate these elements on the page, so you can extract the data that you need. + +For example, if you wanted to scrape a list of all the titles of blog posts on a website, you could use a CSS selector to select all the elements that contain the title text. Once you have selected these elements, you can extract the text from them and use it for your scraping project. + +Additionally, when web scraping it is important to understand the structure of the website and CSS selectors can help you to navigate it. With them, you can select specific elements and their children, siblings, or parent elements. This allows you to extract data that is nested within other elements, or to navigate through the page structure to find the data you need. + +## Resources + +- Find all the available CSS selectors and their syntax on the [MDN CSS Selectors page](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors). + + + +--- +title: Dynamic pages +description: Understand what makes a page dynamic, and how a page being dynamic might change your approach when writing a scraper for it. +sidebar_position: 8.3 +slug: /concepts/dynamic-pages +--- + +# Dynamic pages and single-page applications (SPAs) {#dynamic-pages} + +**Understand what makes a page dynamic, and how a page being dynamic might change your approach when writing a scraper for it.** + +--- + +Oftentimes, web pages load additional information dynamically, long after their main body is loaded in the browser. A subset of dynamic pages takes this approach further and loads all of its content dynamically. Such style of constructing websites is called Single-page applications (SPAs), and it's widespread thanks to some popular JavaScript libraries, such as [React](https://react.dev/) or [Vue](https://vuejs.org/). + +As you progress in your scraping journey, you'll quickly realize that different websites load their content and populate their pages with data in different ways. Some pages are rendered entirely on the server, some retrieve the data dynamically, and some use a combination of both those methods. + +## How page loading works {#about-page-loading} + +The process of loading a page involves three main events, each with a designated corresponding name: + +1. `DOMContentLoaded` - The initial HTML document is loaded, which contains the HTML as it was rendered on the website's server. It also includes all of the JavaScript which will be run in the next step. +2. `load` - The page's JavaScript is executed. +3. `networkidle` - Network [XHR/Fetch requests](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest) are sent and loaded, and data from these requests is populated onto the page. Many websites load essential data this way. These requests might be sent upon certain page events as well (not just the first load), such as scrolling or clicking. + +Now that we have a solid understanding of the different stages of page-loading, and the order they happen in, we can fully understand what a dynamic page is. + +## What is dynamic content {#what-is-dynamic-content} + +Dynamic content is any content that is rendered **after** the `DOMContentLoaded` event, which means any content loaded by JavaScript during the `load` event, or after any network XHR/Fetch requests have been made. + +Sometimes, it can be quite obvious when content is dynamically being rendered. For example, take a look at this gif: + + + + +![Image](https://blog.apify.com/content/images/2022/02/dynamicLoading-1--1--2.gif) + +Here, it's very clear that new content is being generated. As we scroll down the Twitter feed, we can see the scroll bar jumping back up, signifying that more elements have been created using JavaScript. + +Other times, it's less obvious though. Content can appear to be static (non-dynamic) when it is not, or even sometimes the other way around. + + + +--- +title: HTML elements +description: Learn about HTML elements. What they are, their types and how to work with them in a browser environment using JavaScript. +sidebar_position: 8.6 +slug: /concepts/html-elements +--- + +An HTML element is a building block of an HTML document. It is used to represent a piece of content on a web page, such as text, images, or videos. Each element is defined by a tag, which is a set of characters enclosed in angle brackets, such as `

`, ``, or `

This is a paragraph of text.

+``` + +You can also add **attributes** to an element to provide additional information or to control how the element behaves. For example, the `src` attribute is used to specify the source of an image, like this: + +```html +A description of the image +``` + +In JavaScript, you can use the **DOM** (Document Object Model) to interact with elements on a web page. For example, you can use the [`querySelector()` method](./querying_css_selectors.md) to select an element by its [CSS selector](./css_selectors.md), like this: + +```js +const myElement = document.querySelector('#myId'); +``` + +You can also use `getElementById()` method to select an element by its `id`, like this: + +```js +const myElement = document.getElementById('myId'); +``` + +You can also use `getElementsByTagName()` method to select all elements of a certain type, like this: + +```js +const myElements = document.getElementsByTagName('p'); +``` + +Once you have selected an element, you can use JavaScript to change its content, style, or behavior. + +In summary, an HTML element is a building block of a web page. It is defined by a **tag** with **attributes**, which provide additional information or control how the element behaves. You can use the **DOM** (Document Object Model) to interact with elements on a web page. +
+ + +--- +title: HTTP cookies +description: Learn a bit about what cookies are, and how they are utilized in scrapers to appear logged-in, view specific data, or even avoid blocking. +sidebar_position: 8.2 +slug: /concepts/http-cookies +--- + +# HTTP cookies {#cookies} + +**Learn a bit about what cookies are, and how they are utilized in scrapers to appear logged-in, view specific data, or even avoid blocking.** + +--- + +HTTP cookies are small pieces of data sent by the server to the user's web browser, which are typically stored by the browser and used to send later requests to the same server. Cookies are usually represented as a string (if used together with a plain HTTP request) and sent with the request under the **Cookie** [header](./http_headers.md). + +## Most common uses of cookies in crawlers {#uses-in-crawlers} + +1. To make the website show data to you as if you were a logged-in user. +2. To make the website show location-specific data (works for websites where you could set a zip code or country directly on the page, but unfortunately doesn't work for some location-based ads). +3. To make the website less suspicious of the crawler and let the crawler's traffic blend in with regular user traffic. + +For local testing, we recommend using the [**EditThisCookie**](https://chrome.google.com/webstore/detail/fngmhnnpilhplaeedifhccceomclgfbg) Chrome extension. + + + +--- +title: HTTP headers +description: Understand what HTTP headers are, what they're used for, and three of the biggest differences between HTTP/1.1 and HTTP/2 headers. +sidebar_position: 8.1 +slug: /concepts/http-headers +--- + +# HTTP headers {#headers} + +**Understand what HTTP headers are, what they're used for, and three of the biggest differences between HTTP/1.1 and HTTP/2 headers.** + +--- + +[HTTP headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers) let the client and the server pass additional information with an HTTP request or response. Headers are represented by an object where the keys are header names. Headers can also contain certain authentication tokens. + +In general, there are 4 different paths you'll find yourself on when scraping a website and dealing with headers: + +## No headers {#no-headers} + +For some websites, you won't need to worry about modifying headers at all, as there are no checks or verifications in place. + +## Some default headers required {#needs-default-headers} + +Some websites will require certain default browser headers to work properly, such as **User-Agent** (though, this header is becoming more obsolete, as there are more sophisticated ways to detect and block a suspicious user). + +Another example of such a "default" header is **Referer**. Some e-commerce websites might share the same platform, and data is loaded through XMLHttpRequests to that platform, which would not know which data to return without knowing which exact website is requesting it. + +## Custom headers required {#needs-custom-headers} + +A custom header is a non-standard HTTP header used for a specific website. For example, an imaginary website of **cool-stuff.com** might have a header with the name **X_Cool_Stuff_Token** which is required for every single request to a product page. + +Dealing with cases like these usually isn't difficult, but can sometimes be tedious. + +## Very specific headers required {#needs-specific-headers} + +The most challenging websites to scrape are the ones that require a full set of site-specific headers to be included with the request. For example, not only would they potentially require proper **User-Agent** and **Referer** headers mentioned above, but also **Accept**, **Accept-Language**, **Accept-Encoding**, etc. with specific values. + +Another big one to mention is the **Cookie** header. We cover this in more detail within the [cookies](./http_cookies.md) lesson. + +You could use Chrome DevTools to inspect request headers, and [Insomnia](../tools/insomnia.md) or [Postman](../tools/postman.md) to test how the website behaves with or without specific headers. + +## HTTP/1.1 vs HTTP/2 headers {#http1-vs-http2} + +HTTP/1.1 and HTTP/2 headers have several differences. Here are the three key differences that you should be aware of: + +1. HTTP/2 headers do not include status messages. They only contain status codes. +2. Certain headers are no longer used in HTTP/2 (such as **Connection** along with a few others related to it like **Keep-Alive**). In HTTP/2, connection-specific headers are prohibited. While some browsers will ignore them, Safari and other Webkit-based browsers will outright reject any response that contains them. Easy to do by accident, and a big problem. +3. While HTTP/1.1 headers are case-insensitive and could be sent by the browsers with capitalized letters (e.g. **Accept-Encoding**, **Cache-Control**, **User-Agent**), HTTP/2 headers must be lower-cased (e.g. **accept-encoding**, **cache-control**, **user-agent**). + +> To learn more about the difference between HTTP/1.1 and HTTP/2 headers, check out [this](https://httptoolkit.com/blog/translating-http-2-into-http-1/) article + + + +--- +title: Concepts +description: Learn about some common yet tricky concepts and terms that are used frequently within the academy, as well as in the world of scraper development. +sidebar_position: 18 +category: glossary +slug: /concepts +--- + +# Concepts 🤔 {#concepts} + +**Learn about some common yet tricky concepts and terms that are used frequently within the academy, as well as in the world of scraper development.** + +--- + +You'll see some terms and concepts frequently repeated throughout various courses in the academy. Many of these concepts are common, and even fundamental in the scraping world, which makes it necessary to explain them to our course-takers; however it would be inconvenient for our readers to explain these terms each time they appear in a lesson. + +Because of this slight dilemma, and because there are no outside resources which compile all of these concepts into an educational and digestible form, we've decided to do just that. Welcome to the **Concepts** section of the Apify Academy's **Glossary**! + +> It's important to note that there is no specific order to these concepts. All of them range in their relevance and importance to your every day scraping endeavors. + + + +--- +title: Querying elements +description: Learn how to query DOM elements using CSS selectors with the document.querySelector() and document.querySelectorAll() functions. +sidebar_position: 8.5 +slug: /concepts/querying-css-selectors +--- + +`document.querySelector()` and `document.querySelectorAll()` are JavaScript functions that allow you to select elements on a web page using [CSS selectors](./css_selectors.md). + +`document.querySelector()` is used to select the first element that matches the provided [CSS selector](./css_selectors.md). It returns the first matching element or null if no matching element is found. + +Here's an example of how you can use it: + +```js +const firstButton = document.querySelector('button'); +``` + +This will select the first button element on the page and store it in the variable **firstButton**. + +`document.querySelectorAll()` is used to select all elements that match the provided CSS selector. It returns a `NodeList` (a collection of elements) that can be accessed and manipulated like an array. + +Here's an example of how you can use it: + +```js +const buttons = document.querySelectorAll('button'); +``` + +This will select all button elements on the page and store them in the variable "buttons". + +Both functions can be used to access and manipulate the elements in the web page. Here's an example on how you can use it to extract the text of all buttons. + +```js +const buttons = document.querySelectorAll('button'); +const buttonTexts = buttons.forEach((button) => button.textContent); +``` + +It's important to note that when using `querySelectorAll()` in a browser environment, it returns a live `NodeList`, which means that if the DOM changes, the NodeList will also change. + + + +--- +title: Robotic process automation +description: Learn the basics of robotic process automation. Make your processes on the web and other software more efficient by automating repetitive tasks. +sidebar_position: 8.7 +slug: /concepts/robotic-process-automation +--- + +# What is robotic process automation (RPA)? {#what-is-robotic-process-automation-rpa} + +**Learn the basics of robotic process automation. Make your processes on the web and other software more efficient by automating repetitive tasks.** + +--- + +RPA allows you to create software (also known as **bots**), which can imitate your digital actions. You can program bots to perform repetitive tasks faster, more reliably and more accurately than humans. Plus, they can do these tasks all day, every day. + +## What can I use RPA for? {#what-can-i-use-rpa-for} + +You can [use](https://apify.com/use-cases/rpa) RPA to automate any repetitive task you perform using software. The tasks can range from [analyzing content](https://apify.com/jakubbalada/content-checker) to monitoring web pages for changes (such as changes in your competitors' pricing). + +Other use cases for RPA include filling forms or [uploading files](https://apify.com/lukaskrivka/google-sheets) while you get on with more important tasks. And it's not just simple tasks you can automate. How about [processing your invoices](https://apify.com/katerinahronik/toggl-invoice-download) or posting content across several marketing channels at once? + +## How does RPA work? {#how-does-rpa-work} + +In a traditional automation workflow, you + +1. Break a repetitive process down into [manageable chunks](https://kissflow.com/workflow/workflow-automation/an-8-step-checklist-to-get-your-workflow-ready-for-automation/), e.g. open website => log into website => click button "X" => download section "Y", etc. +2. Program a bot that does each of those chunks. +3. Execute the chunks of code in the right order (or in parallel). + +With the advance of [machine learning](https://en.wikipedia.org/wiki/Machine_learning), it is becoming possible to [record](https://www.nice.com/info/rpa-guide/process-recorder-function-in-rpa/) your workflows and analyze which can be automated. However, this technology is still not perfected and at times can even be less practical than the manual process. + +## Is RPA the same as web scraping? {#is-rpa-the-same-as-web-scraping} + +While [web scraping](../../webscraping/scraping_basics_javascript/index.md) is a kind of RPA, it focuses on extracting structured data. RPA focuses on the other tasks in browsers - everything except for extracting information. + +## Additional resources {#additional-resources} + +An easy-to-follow [video](https://www.youtube.com/watch?v=9URSbTOE4YI) on what RPA is. + +To learn about RPA in plain English, check out [this](https://enterprisersproject.com/article/2019/5/rpa-robotic-process-automation-how-explain) article. + +[This](https://www.cio.com/article/227908/what-is-rpa-robotic-process-automation-explained.html) article explains what RPA is and discusses both its advantages and disadvantages. + +You might also like to check out this article on [12 Steps to Automate Workflows](https://quandarycg.com/automating-workflows/). + + + +--- +title: The Apify CLI +description: Learn about, install, and log into the Apify CLI - your best friend for interacting with the Apify platform via your terminal. +sidebar_position: 9.1 +slug: /tools/apify-cli +--- + +# The Apify CLI {#the-apify-cli} + +**Learn about, install, and log into the Apify CLI - your best friend for interacting with the Apify platform via your terminal.** + +--- + +The [Apify CLI](/cli) helps you create, develop, build and run Apify Actors, and manage the Apify cloud platform from any computer. It can be used to automatically generate the boilerplate for different types of projects, initialize projects, remotely call Actors on the platform, and run your own projects. + +## Installing {#installing} + +To install the Apfiy CLI, you'll first need npm, which comes preinstalled with Node.js. If you haven't yet installed Node, learn how to do that [here](../../webscraping/scraping_basics_javascript/data_extraction/computer_preparation.md). Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential. + +Open up a terminal instance and run the following command: + +```shell +npm i -g apify-cli +``` + +This will install the CLI via npm. + +## Logging in {#logging-in} + +After the CLI has finished installing, navigate to the [Apify Console](https://console.apify.com?asrc=developers_portal) and click on **Settings**. Then, within your account settings, click **Integrations**. The page should look like this: + +![Integrations tab on the Apify platform](./images/settings-integrations.jpg) + +> We've censored out the **User ID** in the image because it is private information which should not be shared with anyone who is not trusted. The same goes for your **Personal API Token**. + +Copy the **Personal API Token** and return to your terminal, entering this command: + +```shell +apify login -t YOUR_TOKEN_HERE +``` + +If you see a log which looks like this, + +```text +Success: You are logged in to Apify as YOUR_USERNAME! +``` + +If you see a log which looks like **Success: You are logged in to Apify as YOUR_USERNAME!**, you're in! + + + +--- +title: EditThisCookie +description: Learn how to add, delete, and modify different cookies in your browser for testing purposes using the EditThisCookie Chrome extension. +sidebar_position: 9.7 +slug: /tools/edit-this-cookie +--- + +# What's EditThisCookie? {#what-is-it} + +**Learn how to add, delete, and modify different cookies in your browser for testing purposes using the EditThisCookie Chrome extension.** + +--- + +**EditThisCookie** is a Chrome extension to manage your browser's cookies. It can be added through the [Chrome Web Store](https://chrome.google.com/webstore/category/extensions). After adding it to Chrome, you'll see a button with a delicious cookie icon next to any other Chrome extensions you might have installed. Clicking on it will open a pop-up window with a list of all saved cookies associated with the currently opened page domain. + +![EditThisCookie popup](./images/edit-this-cookie-popup.png) + +## Functionalities {#functions} + +At the top of the popup, there is a row of buttons. From left to right, here is an explanation for each one: + +### Delete all cookies + +Clicking this button will remove all cookies associated with the current domain. For example, if you're logged into your Apify account and delete all the cookies, the website will ask you to log in again. + +### Reset + +A refresh button. + +### Add a new cookie + +Manually add a new cookie for the current domain. + +### Import cookies + +Allows you to add cookies in bulk. For example, if you have saved some cookies inside your crawler, or someone provided you with some cookies for the purpose of testing a certain website in your browser, they can be imported and automatically applied with this button. + +### Export cookies + +Copies an array of cookies associated with the current domain to the clipboard. The cookies can then be later inspected, added to your crawler, or imported by someone else using EditThisCookie. + +### Search + +Allows you to filter through cookies by name. + +### Options + +Will open a new browser tab with a bunch of EditThisCookie options. The options page allows you to tweak a few settings such as changing the export format, but you will most likely never need to change anything there. + +![EditThisCookie options](./images/edit-this-cookie-options.png) + + + +--- +title: Tools +description: Discover a variety of tools that can be used to enhance the scraper development process, or even unlock doors to new scraping possibilities. +sidebar_position: 17 +category: glossary +slug: /tools +--- + +# Tools 🔧 {#tools} + +**Discover a variety of tools that can be used to enhance the scraper development process, or even unlock doors to new scraping possibilities.** + +--- + +Here at Apify, we've found many tools, some quite popular and well-known and some niche, which can aid any developer in their scraper development process. We've compiled some of our favorite developer tools into this short section. Each tool featured here serves a specific purpose, if not multiple purposes, which are directly relevant to Web Scraping and Web Automation. + +In any lesson in the academy where a tool which was not already discussed in the course is being used, a short lesson about the tool will be featured in the **Tools** section right here in the Apify Academy's **Glossary** and referenced with a link within the lesson. + + + +--- +title: Insomnia +description: Learn about Insomnia, a valuable tool for testing requests and proxies when building scalable web scrapers. +sidebar_position: 9.2 +slug: /tools/insomnia +--- + +# What is Insomnia {#what-is-insomnia} + +**Learn about Insomnia, a valuable tool for testing requests and proxies when building scalable web scrapers.** + +--- + +Despite its name, the [Insomnia](https://insomnia.rest/download) desktop application has absolutely nothing to do with having a lack of sleep. Rather, it is a tool to build and test APIs. If you've already read about [Postman](./postman.md), you already know what Insomnia can be used for, as they both practically do the same exact things. +While Insomnia shares similarities with Postman, such as the ability to send requests with specific headers, cookies, and payloads, it has a few notable differences. One key difference is Insomnia's feature to display the entire request timeline. + +Insomnia can be downloaded from its [official website](https://insomnia.rest/download), and its features can be read about in the [official documentation](https://docs.insomnia.rest/). + +## The Insomnia interface {#insomnia-interface} + +After opening the app, you'll first need to create a new request. After creating the request, you'll see an interface that looks like this: + +![Insomnia interface](./images/insomnia-interface.jpg) + +Let's break down the main sections: + +### List of requests + +You can configure multiple requests with a custom payload, headers, cookies, parameters, etc. They are automatically saved in the list of requests until deleted. + +### Address bar + +The place where you select the type of request to send (**GET**, **POST**, **PUT**, **DELETE**, etc.), specify the URI of the request and send the request with the **Send** button. + +### Request options + +Here, you can add a request payload, specify authorization parameters, add query parameters, and attach headers to the request. + +### Response + +Where the response body is displayed after the request has been sent. Like in Postman, the request can be viewed in preview mode, pretty-printed, or in its raw form. This section also has the **Headers** and **Cookies** tabs, which respectively show the request headers and cookies. + +## Request timeline {#request-timeline} + +The one feature of Insomnia that separates it from Postman is the **Timeline**. + +![Request timeline](./images/insomnia-timeline.jpg) + +This feature allows you to see information about the request that is not present in the response body. + +## Using proxies in Insomnia {#using-proxies} + +In order to use a proxy, you need to specify the proxy's parameters in Insomnia's preferences. In preferences, scroll down to the **HTTP Network Proxy** section under the **General** tab and specify the full proxy URL there: + +![Configuring a proxy](./images/insomnia-proxy.png) + +## Managing the cookies cache {#managing-cookies-cache} + +Insomnia keeps the cookies for the requests you have already sent before. This might result in you receiving a different response within your scraper from what you're receiving in Insomnia, as a necessary cookie is not present in the request sent by the scraper. To check whether or not some cookies associated with a certain request have been cached, click on the **Cookies** button at the top of the list of requests: + +![Click on the "Cookies" button](./images/insomnia-cookies.png) + +This will bring up the **Manage cookies** window, where all cached cookies can be viewed, edited, or deleted. + +![The "Manage Cookies" tab](./images/insomnia-manage-cookies.jpg) + +## Postman or Insomnia {#postman-or-insomnia} + +The application you choose to use is completely up to your personal preference, and will not affect your development workflow. If viewing timelines of the requests you send is important to you, then you should go with Insomnia; however, if that doesn't matter, choose the one that has the most intuitive interface for you. + + + +--- +title: ModHeader +description: Discover a super useful Chrome extension called ModHeader, which allows you to modify your browser's HTTP request headers. +sidebar_position: 9.5 +slug: /tools/modheader +--- + +# What is ModHeader? {#what-is-modheader} + +**Discover a super useful Chrome extension called ModHeader, which allows you to modify your browser's HTTP request headers.** + +--- + +If you read about [Postman](./postman.md), you might remember that you can use it to modify request headers before sending a request. This is great, but the main problem is that Postman can only make static requests - meaning, it is unable to load JavaScript or any [dynamic content](../concepts/dynamic_pages.md). + +[ModHeader](https://chrome.google.com/webstore/detail/idgpnmonknjnojddfkpgkljpfnnfcklj) is a Chrome extension which can be used to modify the HTTP headers of the requests you make with your browser. This means that, for example, if your scraper using a headless browser Puppeteer is being blocked due to an improper **User-Agent** header, you can use ModHeader to test the target website and quickly solve the issue. + +## The ModHeader interface {#interface} + +After you install the ModHeader extension, you should see it pinned in Chrome's task bar. When you click it, you'll see an interface like this pop up: + +![Modheader's interface](./images/modheader.jpg) + +Here, you can add headers, remove headers, and even save multiple collections of headers that you can toggle between (which are called **Profiles** within the extension itself). + +## Use cases {#use-cases} + +When scraping dynamic websites, sometimes some specific headers are required to access certain pages. The most popularly required headers are generally `User-Agent` and `referer`. ModHeader, and other tools like it, make it easy to test requests to these websites right in your browser before writing logic for your scraper. + + + +--- +title: Postman +description: Learn about Postman, a valuable tool for testing requests and proxies when building scalable web scrapers. +sidebar_position: 9.3 +slug: /tools/postman +--- + +# What is Postman? {#what-is-postman} + +**Learn about Postman, a valuable tool for testing requests and proxies when building scalable web scrapers.** + +--- + +[Postman](https://www.postman.com/) is a powerful collaboration platform for API development and testing. For scraping use-cases, it's mainly used to test requests and proxies (such as checking the response body of a raw request, without loading any additional resources such as JavaScript or CSS). This tool can do much more than that, but we will not be discussing all of its capabilities here. Postman allows us to test requests with cookies, headers, and payloads so that we can be entirely sure what the response looks like for a request URL we plan to eventually use in a scraper. + +The desktop app can be downloaded from its [official download page](https://www.postman.com/downloads/), or the web app can be used with a signup - no download required. If this is your first time working with a tool like Postman, we recommend checking out their [Getting Started guide](https://learning.postman.com/docs/introduction/overview/). + +## Understanding the interface {#understanding-the-interface} + +![A basic outline of Postman's interface](./images/postman-interface.png) + +Following four sections are essential to get familiar with Postman: + +### Tabs + +Multiple test endpoints/requests can be opened at one time, each of which will be held within its own tab. + +### Address bar + +The section in which you select the type of request to send, the URL of the request, and of course, send the request with the **Send Request** button. + +### Request options + +This is a very useful section where you can view and edit structured query parameters, as well as specify any authorization parameters, headers, or payloads. + +### Response + +After sending a request, the response's body will be found here, along with its cookies and headers. The response body can be viewed in various formats - **Pretty-Print**, **Raw**, or **Preview**. + +## Using and testing proxies {#using-proxies} + +In order to use a proxy, the proxy's server and configuration must be provided in the **Proxy** tab in Postman settings. + +![Proxy configuration in Postman settings](./images/postman-proxy.png) + +After configuring a proxy, the next request sent will attempt to use it. To switch off the proxy, its details don't need to be deleted. The **Add a custom proxy configuration** option in settings needs to be un-ticked to disable it. + +## Managing the cookies cache {#managing-cookies} + +Postman keeps a cache of the cookies from all previous responses of a certain domain, which can be a blessing, but also a curse. Sometimes, you might notice that a request is going through just fine with Postman, but that your scraper is being blocked. + +More often than not in these cases, the reason is because the endpoint being reached requires a valid `cookie` header to be present when sending the request, and because of Postman's cache, it is sending a valid cookie within each request's headers, while your scraper is not. Another reason this may happen is because you are sending Postman requests without a proxy (using your local IP address), while your scraper is using a proxy that could potentially be getting blocked. + +In order to check whether there are any cookies associated with a certain request are cached in Postman, click on the **Cookies** button in any opened request tab: + +![Button to view the cached cookies](./images/postman-cookies-button.png) + +Clicking on this button opens a **MANAGE COOKIES** window, where a list of all cached cookies per domain can be seen. If we had been previously sending multiple requests to **https://github.com/apify**, within this window we would be able to find cached cookies associated with github.com. Cookies can also be edited (to update some specific values), or deleted (to send a "clean" request without any cached data) here. + +![Managing cookies in Postman with the "MANAGE COOKIES" window](./images/postman-manage-cookies.png) + +### Some alternatives to Postman {#alternatives} + +- [Hoppscotch](https://hoppscotch.io/) +- [Insomnia](./insomnia.md) +- [Testfully](https://testfully.io/) + + + +--- +title: Proxyman +description: Learn about Proxyman, a tool for viewing all network requests that are coming through your system. Filter by response type, by a keyword, or by application. +sidebar_position: 9.4 +slug: /tools/proxyman +--- + +# What's Proxyman? {#what-is-proxyman} + +**Learn about Proxyman, a tool for viewing all network requests that are coming through your system. Filter by response type, by a keyword, or by application.** + +--- + +Though the name sounds very similar to [Postman](./postman.md), [**Proxyman**](https://proxyman.io/) is used for a different purpose. Rather than for manually sending and analyzing the responses of requests, Proxyman is a tool for macOS that allows you to view and analyze the HTTP/HTTPS requests that are going through your device. This is done by routing all of your requests through a proxy, which intercepts them and allows you to view data about them. Because it's just a proxy, the HTTP/HTTPS requests going through iOS devices, Android devices, and even iOS simulators can also be viewed with Proxyman. + +If you've already gone through the [**Locating and learning** lesson](../../webscraping/api_scraping/general_api_scraping/locating_and_learning.md) in the **API scraping** section, you can think of Proxyman as an advanced Network Tab, where you can see requests that you sometimes can't see in regular browser DevTools. + +## The basics {#the-basics} + +Though the application offers a whole lot of advanced features, there are only a few main features you'll be utilizing when using Proxyman for scraper development purposes. Let's open up Proxyman and take a look at some of the basic features: + +### Apps + +The **Apps** tab allows you to both view all of the applications on your machine which are sending requests, as well as filter requests based on application. + +![Apps tab in Proxyman](./images/proxyman-apps-tab.png) + +### Results + +Let's open up Safari and visit **apify.com**, then check back in Proxyman to see all of the requests Safari has made when visiting the website. + +![Results in Proxyman](./images/proxyman-results.jpg) + +We can see all of the requests related to us visiting **apify.com**. Then, by clicking a request, we can see a whole lot of information about it. The most important information for you, however, will usually be the request and response **headers** and **body**. + +![View a request](./images/proxyman-view-request.jpg) + +### Filtering + +Sometimes, there can be hundreds (or even thousands) of requests that appear in the list. Rather than spending your time rooting through all of them, you can use the plethora of filtering methods that Proxyman offers to find exactly what you are looking for. + +![Filter requests with the filter options](./images/proxyman-filter.png) + +## Alternatives {#alternatives} + +Since Proxyman is only available for macOS, it's only appropriate to list some alternatives to it that are accessible to our Windows and Linux friends: + +- [Burp Suite](https://portswigger.net/burp) +- [Charles Proxy](https://www.charlesproxy.com/documentation/installation/) +- [Fiddler](https://www.telerik.com/fiddler) + + + +--- +title: Quick JavaScript Switcher +description: Discover a handy tool for disabling JavaScript on a certain page to determine how it should be scraped. Great for detecting SPAs. +sidebar_position: 9.9 +slug: /tools/quick-javascript-switcher +--- + +# Quick JavaScript Switcher + +**Discover a handy tool for disabling JavaScript on a certain page to determine how it should be scraped. Great for detecting SPAs.** + +--- + +**Quick JavaScript Switcher** is a Chrome extension that allows you to switch on/off the JavaScript for the current page with one click. It can be added to your browser via the [Chrome Web Store](https://chrome.google.com/webstore/category/extensions). After adding it to Chrome, you'll see its respective button next to any other Chrome extensions you might have installed. + +If JavaScript is enabled - clicking the button will switch it off and reload the page. The next click will re-enable JavaScript and refresh the page. This extension is useful for checking whether a certain website will work without JavaScript (and thus could be parsed without using a browser with a plain HTTP request) or not. + +![JavaScript toggled on (enabled)](./images/js-on.png) + +![JavaScript toggled off (disabled)](./images/js-off.png) + + + +--- +title: SwitchyOmega +description: Discover SwitchyOmega, a Chrome extension to manage and switch between proxies, which is extremely useful when testing proxies for a scraper. +sidebar_position: 9.6 +slug: /tools/switchyomega +--- + +# What is SwitchyOmega? {#what-is-switchyomega} + +**Discover SwitchyOmega, a Chrome extension to manage and switch between proxies, which is extremely useful when testing proxies for a scraper.** + +--- + +SwitchyOmega is a Chrome extension for managing and switching between proxies which can be added in the [Chrome Webstore](https://chrome.google.com/webstore/detail/padekgcemlokbadohgkifijomclgjgif). + +After adding it to Chrome, you can see the SwitchyOmega icon somewhere amongst all your other Chrome extension icons. Clicking on it will display a menu, where you can select various different connection profiles, as well as open the extension's options. + +![The SwitchyOmega interface](./images/switchyomega.png) + +## Options {#options} + +The options page has the following: + +- General settings/interface settings (which you can keep to their default values). +- A list of proxy profiles (separate profiles can be added for different proxy groups, or for different countries for the residential proxy group, etc). +- The **New profile** button +- The main section, which shows the selected settings sub-section or selected proxy profile connection settings. + +![SwitchyOmega options page](./images/switchyomega-options.png) + +## Adding a new proxy {#adding-a-new-proxy} + +After clicking on **New profile**, you'll be greeted with a **New profile** popup, where you can give the profile a name and select the type of profile you'd like to create. To add a proxy profile, select the respective option and click **Create**. + +![Adding a proxy profile](./images/switchyomega-proxy-profile.png) + +Then, you need to fill in the proxy settings: + +![Adding proxy settings](./images/switchyomega-proxy-settings.png) + +If the proxy requires authentication, click on the lock icon and fill in the details within the popup. + +![Authenticating a proxy](./images/switchyomega-auth.png) + +Don't forget to click on **Apply changes** within the left-hand side menu under **Actions**! + +## Selecting proxy profiles {#selecting-profiles} + +And that's it! All of your proxy profiles will appear in the menu. When one is chosen, the page you are currently on will be reloaded using the selected proxy profile. + +![SwitchyOmega menu](./images/switchyomega-menu.png) + + + +--- +title: User-Agent Switcher +description: Learn how to switch your User-Agent header to different values in order to monitor how a certain site responds to the changes. +sidebar_position: 9.8 +slug: /tools/user-agent-switcher +--- + +# User-Agent Switcher + +**Learn how to switch your User-Agent header to different values in order to monitor how a certain site responds to the changes.** + +--- + +**User-Agent Switcher** is a Chrome extension that allows you to quickly change your **User-Agent** and see how a certain website would behave with different user agents. After adding it to Chrome, you'll see a **Chrome UA Spoofer** button in the extension icons area. Clicking on it will open up a list of various **User-Agent** groups. + +![User-Agent Switcher groups](./images/user-agent-switcher-groups.png) + +Clicking on a group will display a list of possible User-Agents to set. + +![Default available Internet Explorer agents](./images/user-agent-switcher-agents.png) + +After setting the **User-Agent**, the page will be refreshed. + +## Configuration + +The extension configuration page allows you to edit the **User-Agent** list in case you want to add a specific User-Agent that isn't already provided. You can find some other options, but most likely you will never need to modify those. + +![User-Agent Switcher configuration page](./images/user-agent-switcher-config.png) + + + +--- +title: Why a glossary? +description: Browse important web scraping concepts, tools and topics in succinct articles explaining common web development terms in a web scraping and automation context. +sidebar_position: 16 +category: glossary +slug: /glossary +--- + +# Why a glossary? {#why-a-glossary} + +**Browse important web scraping concepts, tools and topics in succinct articles explaining common web development terms in a web scraping and automation context.** + +--- + +Web scraping comes with a lot of terms that are specific to the area. Some of them are tools and libraries, like [Playwright](../webscraping/puppeteer_playwright/index.md) or Insomnia. Others are general topics that have a special place in web scraping, like headless browsers or browser fingerprints. And some topics are related to all web development, but play a special role in web scraping, such as HTTP headers and cookies. + +When writing the academy, we very early on realized that we needed a place to reference these terms, but quickly found out that the usual tutorials and guides available all over the web weren't the most ideal. The explanations were too broad and generic and did not fit the web scraping context. With the **Apify Academy** glossary, we aim to provide you with short articles and lessons that provide the necessary web scraping context for specific terms, then link to other parts of the web for further in-depth reading. + + + +--- +title: Deploying +description: Push local code to the platform, or create a new Actor on the console and integrate it with a Git repo to optionally automatically rebuild any new changes. +sidebar_position: 5 +slug: /deploying-your-code/deploying +--- + +# Deploying {#deploying} + +**Push local code to the platform, or create a new Actor on the console and integrate it with a Git repo to optionally automatically rebuild any new changes.** + +--- + +Once you've **actorified** your code, there are two ways to deploy it to the Apify platform. You can either push the code directly from your local machine onto the platform, or you can create a blank Actor in the web interface, and then integrate its source code with a GitHub repository. + +## With a Git repository {#with-git-repository} + +Before we deploy our project onto the Apify platform, let's ensure that we've pushed the changes we made in the last 3 lessons into our remote GitHub repository. + +> The benefit of using this method is that any time you push to the Git repo, the code on the platform is also updated and the Actor is automatically rebuilt. Also, you don't have to use a GitHub repository - you can use GitLab or any other service you'd like. + +### Creating the Actor + +Before anything can be integrated, we've gotta create a new Actor. Let's head over to our [Apify Console](https://console.apify.com?asrc=developers_portal), navigate to the **Development** subsection and click on the **Develop new** button, then select the **Empty** template. + +![Create new button](../getting_started/images/develop-new-actor.png) + +### Changing source code location {#change-source-code} + +In the **Source** tab on the new Actor's page, we'll click the dropdown menu under **Source code** and select **Git repository**. By default, this is set to **Web IDE**. + +![Select source code location](../expert_scraping_with_apify/images/select-source-location.png) + +Now we'll paste the link to our GitHub repository into the **Git URL** text field and click **Save**. + +### Adding the webhook to the repository {#adding-repo-webhook} + +The final step is to click on **API** in the top right corner of our Actor's page: + +![API button](../expert_scraping_with_apify/images/api-button.jpg) + +And scroll through all of the links until we find the **Build Actor** API endpoint. Now we'll copy this endpoint's URL, head back over to our GitHub repository and navigate to **Settings > Webhooks > Add webhook**. The final thing to do is to paste the URL and save the webhook. + +![Adding a webhook to your GitHub repo](../../../platform/actors/development/deployment/images/ci-github-integration.png) + +That's it! The Actor should now pull its source code from the repo and automatically build. + +## Without a GitHub repository (using the Apify CLI) {#with-apify-cli} + +> If you don't yet have the Apify CLI, learn how to install it and log in by following along with [this brief lesson](../../glossary/tools/apify_cli.md) about it. + +If you're logged in to the Apify CLI, the `apify push` command can be used to push the code straight onto the Apify platform from your local machine (no GitHub repository required), where it will automatically be built for you. Prior to running this command, make sure that you have an **.actor/actor.json** file at the root of the project. If you don't already have one, you can use `apify init .` to automatically generate one for you. + +One important thing to note is that you can use a `.gitignore` file to exclude files from being pushed. When you use `apify push` without a `.gitignore`, the full folder contents will be pushed, meaning that even the **storage** and **node_modules** will be pushed. These files are unnecessary to push, as they are both generated on the platform. + +> The `apify push` command should only really be used for quickly pushing and testing Actors on the platform during development. If you are ready to make your Actor public, use a Git repository instead, as you will reap the benefits of using Git and others will be able to contribute to the project. + +## Deployed! {#deployed} + +Great! Once you've pushed your Actor to the platform, you should see it in the list of Actors under the **Actors** tab. If you used `apify push`, you'll have access to the **multifile editor** (discussed [here](../getting_started/creating_actors.md)). + +![Deployed Actor on the Apify platform](./images/actor-page.jpg) + +The next step is to test your Actor and experiment with the vast amount of features the platform has to offer. + +## Wrap up {#next} + +That's it! In this short section, you've learned how to take your code written in any programming language and turn it into a usable Actor that can run on the Apify platform! The next step is to start looking into the [paid Actors](/platform/actors/publishing) program, which allows you to monetize your work. + + + +--- +title: Dockerfile +description: Understand how to write a Dockerfile (Docker image blueprint) for your project so that it can be run within a Docker container on the Apify platform. +sidebar_position: 4 +slug: /deploying-your-code/docker-file +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# Dockerfile {#dockerfile} + +**Understand how to write a Dockerfile (Docker image blueprint) for your project so that it can be run within a Docker container on the Apify platform.** + +--- + +The **Dockerfile** is a file which gives the Apify platform (or Docker, more specifically) instructions on how to create an environment for your code to run in. Every Actor must have a Dockerfile, as Actors run in Docker containers. + +> Actors on the platform are always run in Docker containers; however, they can also be run in local Docker containers. This is not common practice though, as it requires more setup and a deeper understanding of Docker. For testing, it's best to run the Actor on the local OS (this requires you to have the underlying runtime installed, such as Node.js, Python, Rust, GO, etc). + +## Base images {#base-images} + +If your project doesn’t already contain a Dockerfile, don’t worry! Apify offers [many base images](/sdk/js/docs/guides/docker-images) that are optimized for building and running Actors on the platform, which can be found [here](https://hub.docker.com/u/apify). When using a language for which Apify doesn't provide a base image, [Docker Hub](https://hub.docker.com/) provides a ton of free Docker images for most use-cases, upon which you can create your own images. + +> Tip: You can see all of Apify's Docker images [on DockerHub](https://hub.docker.com/u/apify). + +At the base level, each Docker image contains a base operating system and usually also a programming language runtime (such as Node.js or Python). You can also find images with preinstalled libraries or install them yourself during the build step. + +Once you find the base image you need, you can add it as the initial `FROM` statement: + +```Dockerfile +FROM apify/actor-node:16 +``` + +> For syntax highlighting in your Dockerfiles, download the [**Docker** VSCode extension](https://code.visualstudio.com/docs/containers/overview#_installation). + +## Writing the file {#writing-the-file} + +The rest of the Dockerfile is about copying the source code from the local filesystem into the container's filesystem, installing libraries, and setting the `RUN` command (which falls back to the parent image). + +> If you are not using a base image from Apify, then you should specify how to launch the source code of your Actor with the `CMD` instruction. + +Here's the Dockerfile for our Node.js example project's Actor: + + + + +```Dockerfile +FROM apify/actor-node:16 + +# Second, copy just package.json and package-lock.json since they are the only files +# that affect npm install in the next step +COPY package*.json ./ + +# Install npm packages, skip optional and development dependencies to keep the +# image small. Avoid logging too much and print the dependency tree for debugging +RUN npm --quiet set progress=false \ + && npm install --only=prod --no-optional \ + && echo "Installed npm packages:" \ + && (npm list --all || true) \ + && echo "Node.js version:" \ + && node --version \ + && echo "npm version:" \ + && npm --version + +# Next, copy the remaining files and directories with the source code. +# Since we do this after npm install, quick build will be really fast +# for simple source file changes. +COPY . ./ + +``` + + + + +```Dockerfile +# First, specify the base Docker image. +# You can also use any other image from Docker Hub. +FROM apify/actor-python:3.9 + +# Second, copy just requirements.txt into the Actor image, +# since it should be the only file that affects "pip install" in the next step, +# in order to speed up the build +COPY requirements.txt ./ + +# Install the packages specified in requirements.txt, +# Print the installed Python version, pip version +# and all installed packages with their versions for debugging +RUN echo "Python version:" \ + && python --version \ + && echo "Pip version:" \ + && pip --version \ + && echo "Installing dependencies from requirements.txt:" \ + && pip install -r requirements.txt \ + && echo "All installed Python packages:" \ + && pip freeze + +# Next, copy the remaining files and directories with the source code. +# Since we do this after installing the dependencies, quick build will be really fast +# for most source file changes. +COPY . ./ + +# Specify how to launch the source code of your Actor. +# By default, the main.py file is run +CMD python3 main.py + +``` + + + + +## Examples {#examples} + +The examples above show how to deploy Actors written in Node.js or Python, but you can use any language. As an inspiration, here are a few examples for other languages: Go, Rust, Julia. + + + + +```Dockerfile +FROM golang:1.17.1-alpine + +WORKDIR /app +COPY . . + +RUN go mod download + +RUN go build -o /example-actor +CMD ["/example-actor"] + +``` + + + + +```Dockerfile +# Image with prebuilt Rust. We use the newest 1.* version +# https://hub.docker.com/_/rust +FROM rust:1 + +# We copy only package setup so we cache building all dependencies +COPY Cargo* ./ + +# We need to have dummy main.rs file to be able to build +RUN mkdir src && echo "fn main() {}" > src/main.rs + +# Build dependencies only +# Since we do this before copying the rest of the files, +# the dependencies will be cached by Docker, allowing fast +# build times for new code changes +RUN cargo build --release + +# Delete dummy main.rs +RUN rm -rf src + +# Copy rest of the files +COPY . ./ + +# Build the source files +RUN cargo build --release + +CMD ["./target/release/actor-example"] + +``` + + + + +```Dockerfile +FROM julia:1.7.1-alpine + +WORKDIR /app +COPY . . + +RUN julia install.jl + +CMD ["julia", "main.jl"] + +``` + + + + +## Next up {#next} + +In the [next lesson](./deploying.md), we'll push our code directly to the Apify platform, or create and integrate a new Actor on the Apify platform with our project's GitHub repository. + + + +--- +title: Deploying your code +description: In this course learn how to take an existing project of yours and deploy it to the Apify platform as an Actor. +sidebar_position: 9 +category: apify platform +slug: /deploying-your-code +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# Deploying your code to Apify {#deploying} + +**In this course learn how to take an existing project of yours and deploy it to the Apify platform as an Actor.** + +--- + +This section will discuss how to use your newfound knowledge of the Apify platform and Actors from the [**Getting started**](../getting_started/index.md) section to deploy your existing project's code to the Apify platform as an Actor. +Any program running in a Docker container can become an Apify Actor. + +![The deployment workflow](../../images/deployment-workflow.png) + +Apify provides detailed guidance on how to deploy Node.js and Python programs as Actors, but apart from that you're not limited in what programming language you choose for your scraper. + +![Supported languages](../../images/supported-languages.jpg) + +Here are a few examples of Actors in other languages: + +- [Rust Actor](https://apify.com/lukaskrivka/rust-actor-example) +- [Go Actor](https://apify.com/jirimoravcik/go-actor-example) +- [Julia Actor](https://apify.com/jirimoravcik/julia-actor-example) + +## The "actorification" workflow {#workflow} + +Follow these four main steps to turn a piece of code into an Actor: + +1. Handle [accepting inputs and writing outputs](./inputs_outputs.md). +2. Create an [input schema](./input_schema.md) **(optional)**. +3. Add a [Dockerfile](./docker_file.md). +4. [Deploy](./deploying.md) to the Apify platform! + +## Our example project + +For this section, we'll be turning this example project into an Actor: + + + + +```js +// index.js +const addAllNumbers = (...nums) => nums.reduce((total, curr) => total + curr, 0); + +console.log(addAllNumbers(1, 2, 3, 4)); // -> 10 +``` + + + + +```py +# index.py +def add_all_numbers (nums): + total = 0 + + for num in nums: + total += num + + return total + +print(add_all_numbers([1, 2, 3, 4])) # -> 10 + +``` + + + + +> For all lessons in this section, we'll have examples for both Node.js and Python so that you can follow along in either language. + + + +## Next up {#next} + +[Next lesson](./inputs_outputs.md), we'll be learning how to accept input into our Actor as well as deliver output. + + + +--- +title: Input schema +description: Learn how to generate a user interface on the platform for your Actor's input with a single file - the INPUT_SCHEMA.json file. +sidebar_position: 2 +slug: /deploying-your-code/input-schema +--- + +# Input schema {#input-schema} + +**Learn how to generate a user interface on the platform for your Actor's input with a single file - the INPUT_SCHEMA.json file.** + +--- + +Though writing an [input schema](/platform/actors/development/actor-definition/input-schema) for an Actor is not a required step, it is most definitely an ideal one. The Apify platform will read the **INPUT_SCHEMA.json** file within the root of your project and generate a user interface for entering input into your Actor, which makes it significantly easier for non-developers (and even developers) to configure and understand the inputs your Actor can receive. Because of this, we'll be writing an input schema for our example Actor. + +> Without an input schema, the users of our Actor will have to provide the input in JSON format, which can be problematic for those who are not familiar with JSON. + +## Schema title & description {#title-and-description} + +In the root of our project, we'll create a file named **INPUT_SCHEMA.json** and start writing the first part of the schema. + +```json +{ + "title": "Adding Actor input", + "description": "Add all values in list of numbers with an arbitrary length.", + "type": "object", + "schemaVersion": 1 +} +``` + +The **title** and **description** describe what the input schema is for, and a bit about what the Actor itself does. + +## Properties {#properties} + +In order to define all of the properties our Actor is expecting, we must include them within an object with a key of **properties**. + +```json +{ + "title": "Adding Actor input", + "description": "Add all values in list of numbers with an arbitrary length.", + "type": "object", + "schemaVersion": 1, + "properties": { + "numbers": { + "title": "Number list", + "description": "The list of numbers to add up." + } + } +} +``` + +Each property's key corresponds to the name we're expecting within our code, while the **title** and **description** are what the user will see when configuring input on the platform. + +## Property types & editor types {#property-types} + +Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema/specification/v1#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings. + +```json +{ + "title": "Adding Actor input", + "description": "Add all values in list of numbers with an arbitrary length.", + "type": "object", + "schemaVersion": 1, + "properties": { + "numbers": { + "title": "Number list", + "description": "The list of numbers to add up.", + "type": "array", + "editor": "json" + } + } +} +``` + +## Required fields {#required-fields} + +The great thing about building an input schema is that it will automatically validate your inputs based on their type, maximum value, minimum value, etc. Sometimes, you want to ensure that the user will always provide input for certain fields, as they are crucial to the Actor's run. This can be done by using the **required** field and passing in the names of the fields you'd like to require. + +```json +{ + "title": "Adding Actor input", + "description": "Add all values in list of numbers with an arbitrary length.", + "type": "object", + "schemaVersion": 1, + "properties": { + "numbers": { + "title": "Number list", + "description": "The list of numbers to add up.", + "type": "array", + "editor": "json" + } + }, + "required": ["numbers"] +} +``` + +For our case, we've made the **numbers** field required, as it is crucial to our Actor's run. + +## Final thoughts {#final-thoughts} + +Here is what the input schema we wrote will render on the platform: + +![Rendered UI from input schema](./images/rendered-ui.png) + +Later on, we'll be building more complex input schemas, as well as discussing how to write quality input schemas that allow the user to understand the Actor and not become overwhelmed. + +It's not expected to memorize all of the fields that properties can take or the different editor types available, which is why it's always good to reference the [input schema documentation](/platform/actors/development/actor-definition/input-schema) when writing a schema. + +## Next up {#next} + +In the [next lesson](/platform/actors/development/actor-definition/dataset-schema), we'll learn how to generate an appealing Overview table to display our Actor's results in real time, so users can get immediate feedback about the data being extracted. + + + +--- +title: Inputs & outputs +description: Learn to accept input into your Actor, do something with it, and then return output. Actors can be written in any language, so this concept is language agnostic. +sidebar_position: 1 +slug: /deploying-your-code/inputs-outputs +--- + +# Inputs & outputs {#inputs-outputs} + +**Learn to accept input into your Actor, do something with it, and then return output. Actors can be written in any language, so this concept is language agnostic.** + +--- + +Most of the time when you're creating a project, you are expecting some sort of input from which your software will run off. Oftentimes as well, you want to provide some sort of output once your software has completed running. Apify provides a convenient way to handle inputs and deliver outputs. + +An important thing to understand regarding inputs and outputs is that they are read/written differently depending on where the Actor is running: + +- If your Actor is running locally, the inputs/outputs are usually provided in the filesystem, and environment variables are injected either by you, the developer, or by the Apify CLI by running the project with the `apify run` command. + +- While running in a Docker container on the platform, environment variables are automatically injected, and inputs & outputs are provided and modified using Apify's REST API. + +## A bit about storage {#about-storage} + +You can read/write your inputs/outputs: to the [key-value store](/platform/storage/key-value-store), or to the [dataset](/platform/storage/dataset). The key-value store can be used to store any sort of unorganized/unrelated data in any format, while the data pushed to a dataset typically resembles a table with columns (fields) and rows (items). Each Actor's run is allocated both a default dataset and a default key-value store. + +When running locally, these storages are accessible through the **storage** folder within your project's root directory, while on the platform they are accessible via Apify's API. + +## Accepting input {#accepting-input} + +You can utilize multiple ways to accept input into your project. The option you go with depends on the language you have written your project in. If you are using Node.js for your repo's code, you can use the [`apify`](https://www.npmjs.com/package/apify) package. Otherwise, you can use the useful environment variables automatically set up for you by Apify to write utility functions which read the Actor's input and return it. + +### Accepting input with the Apify SDK + +Since we're using Node.js, let's install the `apify` package by running the following command: + +```shell +npm install apify +``` + +Now, let's import `Actor` from `apify` and use the `Actor.getInput()` function to grab our input. + +```js +// index.js +import { Actor } from 'apify'; + +// We must initialize and exit the Actor. The rest of our code +// goes in between these two. +await Actor.init(); + +const input = await Actor.getInput(); +console.log(input); + +await Actor.exit(); +``` + +If we run this right now, we'll see **null** in our terminal - this is because we never provided any sort of test input, which should be provided in the default key-value store. The `Actor.getInput()` function has detected that there is no **storage** folder and generated one for us. + +![Default key-value store filepath](./images/filepath.jpg) + +We'll now add an **INPUT.json** file within **storage/key_value_stores/default** to match what we're expecting in our code. + +```json +{ + "numbers": [5, 5, 5, 5] +} +``` + +Then we can add our example project code from earlier. It will grab the input and use it to generate a solution which is logged into the console. + +```js +// index.js +import { Actor } from 'apify'; + +await Actor.init(); + +const { numbers } = await Actor.getInput(); + +const addAllNumbers = (...nums) => nums.reduce((total, curr) => total + curr, 0); + +const solution = addAllNumbers(...numbers); + +console.log(solution); + +await Actor.exit(); +``` + +Cool! When we run `node index.js`, we see **20**. + +### Accepting input without the Apify SDK + +Alternatively, when writing in a language other than JavaScript, we can create our own `get_input()` function which utilizes the Apify API when the Actor is running on the platform. For this example, we are using the [Apify Client](../getting_started/apify_client.md) for Python to access the API. + +```py +# index.py +from apify_client import ApifyClient +from os import environ +import json + +client = ApifyClient(token='YOUR_TOKEN') + +# If being run on the platform, the "APIFY_IS_AT_HOME" environment variable +# will be "1". Otherwise, it will be undefined/None +def is_on_apify (): + return 'APIFY_IS_AT_HOME' in environ + +# Get the input +def get_input (): + if not is_on_apify(): + with open('./apify_storage/key_value_stores/default/INPUT.json') as actor_input: + return json.load(actor_input) + + kv_store = client.key_value_store(environ.get('APIFY_DEFAULT_KEY_VALUE_STORE_ID')) + return kv_store.get_record('INPUT')['value'] + +def add_all_numbers (nums): + total = 0 + + for num in nums: + total += num + + return total + +actor_input = get_input()['numbers'] + +solution = add_all_numbers(actor_input) + +print(solution) +``` + +> For a better understanding of the API endpoints for reading and modifying key-value stores, check the [official API reference](/api/v2#/reference/key-value-stores). + +## Writing output {#writing-output} + +Similarly to reading input, you can write the Actor's output either by using the Apify SDK in Node.js or by manually writing a utility function to do so. + +### Writing output with the Apify SDK + +In the SDK, we can write to the dataset with the `Actor.pushData()` function. Let's go ahead and write the solution of the `addAllNumbers()` function to the dataset store using this function: + +```js +// index.js + +// This is our example project code from earlier. +// We will use the Apify input as its input. +import { Actor } from 'apify'; + +await Actor.init(); + +const { numbers } = await Actor.getInput(); + +const addAllNumbers = (...nums) => nums.reduce((total, curr) => total + curr, 0); + +const solution = addAllNumbers(...numbers); + +// And save its output to the default dataset +await Actor.pushData({ solution }); + +await Actor.exit(); +``` + +### Writing output without the Apify SDK + +Just as with the custom `get_input()` utility function, you can write a custom `set_output()` function as well if you cannot use the Apify SDK. + +> You can read and write your output anywhere; however, it is standard practice to use a folder named **storage**. + +```py +# index.py +from apify_client import ApifyClient +from os import environ +import json + +client = ApifyClient(token='YOUR_TOKEN') + +def is_on_apify (): + return 'APIFY_IS_AT_HOME' in environ + +def get_input (): + if not is_on_apify(): + with open('./apify_storage/key_value_stores/default/INPUT.json') as actor_input: + return json.load(actor_input) + + kv_store = client.key_value_store(environ.get('APIFY_DEFAULT_KEY_VALUE_STORE_ID')) + return kv_store.get_record('INPUT')['value'] + +# Push the solution to the dataset +def set_output (data): + if not is_on_apify(): + with open('./apify_storage/datasets/default/solution.json', 'w') as output: + return output.write(json.dumps(data, indent=2)) + + dataset = client.dataset(environ.get('APIFY_DEFAULT_DATASET_ID')) + dataset.push_items('OUTPUT', value=[json.dumps(data, indent=4)]) + +def add_all_numbers (nums): + total = 0 + + for num in nums: + total += num + + return total + +actor_input = get_input()['numbers'] + +solution = add_all_numbers(actor_input) + +set_output({ 'solution': solution }) +``` + +## Testing locally {#testing-locally} + +Since we've changed our code a lot from the way it originally was by wrapping it in the Apify SDK to accept inputs and return outputs, we most definitely should test it locally before worrying about pushing it to the Apify platform. + +After running our script, there should be a single item in the default dataset that looks like this: + +```json +{ + "solution": 20 +} +``` + +## Next up {#next} + +That's it! We've now added all of the files and code necessary to convert our software into an Actor. In the [next lesson](./input_schema.md), we'll be learning how to generate a user interface for our Actor's input so that users don't have to provide the input in raw JSON format. + + + +--- +title: Dataset schema +description: Learn how to generate an appealing Overview table interface to preview your Actor results in real time on the Apify platform. +sidebar_position: 3 +slug: /deploying-your-code/dataset-schema +--- + +# Dataset schema + +**Learn how to generate an appealing Overview table interface to preview your Actor results in real time on the Apify platform.** + +--- + +The Dataset schema generates an interface that enables users to instantly preview their Actor results in real time. + +![Dataset Schema](../../../platform/actors/development/actor_definition/images/output-schema-example.png) + +In this quick tutorial, you will learn how to set up an output tab for your own Actor. + +## Implementation + +Firstly, create a `.actor` folder in the root of your Actor's source code. Then, create a `actor.json` file in this folder, after which you'll have .actor/actor.json. + +![.actor/actor.json](./images/actor-json-example.webp) + +Next, copy-paste the following template code into your `actor.json` file. + +```json +{ + "actorSpecification": 1, + "name": "___ENTER_ACTOR_NAME____", + "title": "___ENTER_ACTOR_TITLE____", + "version": "1.0.0", + "storages": { + "dataset": { + "actorSpecification": 1, + "views": { + "overview": { + "title": "Overview", + "transformation": { + "fields": [ + "___EXAMPLE_NUMERIC_FIELD___", + "___EXAMPLE_PICTURE_URL_FIELD___", + "___EXAMPLE_LINK_URL_FIELD___", + "___EXAMPLE_TEXT_FIELD___", + "___EXAMPLE_BOOLEAN_FIELD___" + ] + }, + "display": { + "component": "table", + "properties": { + "___EXAMPLE_NUMERIC_FIELD___": { + "label": "ID", + "format": "number" + }, + "___EXAMPLE_PICTURE_URL_FIELD___": { + "format": "image" + }, + "___EXAMPLE_LINK_URL_FIELD___": { + "label": "Clickable link", + "format": "link" + } + } + } + } + } + } + } +} +``` + +To configure the dataset schema, replace the fields in the template with the relevant fields to your Actor. + +For reference, you can use the [Zappos Scraper source code](https://github.com/PerVillalva/zappos-scraper-actor/blob/main/.actor/actor.json) as an example of how the final implementation of the output tab should look in a live Actor. + +```json +{ + "actorSpecification": 1, + "name": "zappos-scraper", + "title": "Zappos Scraper", + "description": "", + "version": "1.0.0", + "storages": { + "dataset": { + "actorSpecification": 1, + "title": "Zappos.com Dataset", + "description": "", + "views": { + "products": { + "title": "Overview", + "description": "It can take about one minute until the first results are available.", + "transformation": { + "fields": [ + "imgUrl", + "brand", + "name", + "SKU", + "inStock", + "onSale", + "price", + "url" + ] + }, + "display": { + "component": "table", + "properties": { + "imgUrl": { + "label": "Product image", + "format": "image" + }, + "url": { + "label": "Link", + "format": "link" + }, + "brand": { + "format": "text" + }, + "name": { + "format": "text" + }, + "SKU": { + "format": "text" + }, + "inStock": { + "format": "boolean" + }, + "onSale": { + "format": "boolean" + }, + "price": { + "format": "text" + } + } + } + } + } + } + } +} +``` + +Note that the fields specified in the dataset schema should match the object keys of your resulting dataset. + +Also, if your desired label has the same name as the defined object key, then you don't need to specify a label name. The schema will, by default, show a capitalized version of the key and even split camel case into separate words and capitalize all of them. + +The matching object for the Zappos Scraper shown in the example above will look something like this: + +```js +const results = { + url: request.loadedUrl, + imgUrl: $('#stage button[data-media="image"] img[itemprop="image"]').attr('src'), + brand: $('span[itemprop="brand"]').text().trim(), + name: $('meta[itemprop="name"]').attr('content'), + SKU: $('*[itemprop~="sku"]').text().trim(), + inStock: !request.url.includes('oosRedirected=true'), + onSale: !$('div[itemprop="offers"]').text().includes('OFF'), + price: $('span[itemprop="price"]').text(), +}; +``` + +## Final result {#final-result} + +Great! Now that everything is set up, it's time to run the Actor and admire your Actor's brand new output tab. + +> Need some extra guidance? Visit the [dataset schema documentation](/platform/actors/development/actor-definition/dataset-schema) for more detailed information about how to implement this feature. + +A few seconds after running the Actor, you should see its results displayed in the `Overview` table. + +![Output table overview](./images/output-schema-final-example.webp) + +## Next up {#next} + +In the [next lesson](./docker_file.md), we'll learn about a very important file that is required for our project to run on the Apify platform - the Dockerfile. + + + +--- +title: V - Handling migrations +description: Get real-world experience of maintaining a stateful object stored in memory, which will be persisted through migrations and even graceful aborts. +sidebar_position: 5 +slug: /expert-scraping-with-apify/solutions/handling-migrations +--- + +# Handling migrations {#handling-migrations} + +**Get real-world experience of maintaining a stateful object stored in memory, which will be persisted through migrations and even graceful aborts.** + +--- + +Let's first head into our **demo-actor** and create a new file named **asinTracker.js** in the **src** folder. Within this file, we are going to build a utility class which will allow us to store, modify, persist, and log our tracked ASIN data. + +Here's the skeleton of our class: + +```js +// asinTracker.js +class ASINTracker { + constructor() { + this.state = {}; + + // Log the state to the console every ten + // seconds + setInterval(() => console.log(this.state), 10000); + } + + // Add an offer to the ASIN's offer count + // If ASIN doesn't exist yet, set it to 0 + incrementASIN(asin) { + if (this.state[asin] === undefined) { + this.state[asin] = 0; + return; + } + + this.state[asin] += 1; + } +} + +// It is only a utility class, so we will immediately +// create an instance of it and export that. We only +// need one instance for our use case. +module.exports = new ASINTracker(); +``` + +Multiple techniques exist for storing data in memory; however, this is the most modular way, as all state-persistence and modification logic will be held in this file. + +Here is our updated **routes.js** file which is now utilizing this utility class to track the number of offers for each product ASIN: + +```js +// routes.js +import { createCheerioRouter } from '@crawlee/cheerio'; +import { BASE_URL, OFFERS_URL, labels } from './constants'; +import tracker from './asinTracker'; +import { dataset } from './main.js'; + +export const router = createCheerioRouter(); + +router.addHandler(labels.START, async ({ $, crawler, request }) => { + const { keyword } = request.userData; + + const products = $('div > div[data-asin]:not([data-asin=""])'); + + for (const product of products) { + const element = $(product); + const titleElement = $(element.find('.a-text-normal[href]')); + + const url = `${BASE_URL}${titleElement.attr('href')}`; + + // For each product, add it to the ASIN tracker + // and initialize its collected offers count to 0 + tracker.incrementASIN(element.attr('data-asin')); + + await crawler.addRequest([{ + url, + label: labels.PRODUCT, + userData: { + data: { + title: titleElement.first().text().trim(), + asin: element.attr('data-asin'), + itemUrl: url, + keyword, + }, + }, + }]); + } +}); + +router.addHandler(labels.PRODUCT, async ({ $, crawler, request }) => { + const { data } = request.userData; + + const element = $('div#productDescription'); + + await crawler.addRequests([{ + url: OFFERS_URL(data.asin), + label: labels.OFFERS, + userData: { + data: { + ...data, + description: element.text().trim(), + }, + }, + }]); +}); + +router.addHandler(labels.OFFERS, async ({ $, request }) => { + const { data } = request.userData; + + const { asin } = data; + + for (const offer of $('#aod-offer')) { + // For each offer, add 1 to the ASIN's + // offer count + tracker.incrementASIN(asin); + + const element = $(offer); + + await dataset.pushData({ + ...data, + sellerName: element.find('div[id*="soldBy"] a[aria-label]').text().trim(), + offer: element.find('.a-price .a-offscreen').text().trim(), + }); + } +}); +``` + +## Persisting state {#persisting-state} + +The **persistState** event is automatically fired (by default) every 60 seconds by the Apify SDK while the Actor is running and is also fired when the **migrating** event occurs. + +In order to persist our ASIN tracker object, let's use the `Actor.on` function to listen for the **persistState** event and store it in the key-value store each time it is emitted. + +```js +// asinTracker.js +import { Actor } from 'apify'; +// We've updated our constants.js file to include the name +// of this new key in the key-value store +const { ASIN_TRACKER } = require('./constants'); + +class ASINTracker { + constructor() { + this.state = {}; + + Actor.on('persistState', async () => { + await Actor.setValue(ASIN_TRACKER, this.state); + }); + + setInterval(() => console.log(this.state), 10000); + } + + incrementASIN(asin) { + if (this.state[asin] === undefined) { + this.state[asin] = 0; + return; + } + + this.state[asin] += 1; + } +} + +module.exports = new ASINTracker(); +``` + +## Handling resurrections {#handling-resurrections} + +Great! Now our state will be persisted every 60 seconds in the key-value store. However, we're not done. Let's say that the Actor migrates and is resurrected. We never actually update the `state` variable of our `ASINTracker` class with the state stored in the key-value store, so as our code currently stands, we still don't support state-persistence on migrations. + +In order to fix this, let's create a method called `initialize` which will be called at the very beginning of the Actor's run, and will check the key-value store for a previous state under the key **ASIN-TRACKER**. If a previous state does live there, then it will update the class' `state` variable with the value read from the key-value store: + +```js +// asinTracker.js +import { Actor } from 'apify'; +import { ASIN_TRACKER } from './constants'; + +class ASINTracker { + constructor() { + this.state = {}; + + Actor.on('persistState', async () => { + await Actor.setValue(ASIN_TRACKER, this.state); + }); + + setInterval(() => console.log(this.state), 10000); + } + + async initialize() { + // Read the data from the key-value store. If it + // doesn't exist, it will be undefined + const data = await Actor.getValue(ASIN_TRACKER); + + // If the data does exist, replace the current state + // (initialized as an empty object) with the data + if (data) this.state = data; + } + + incrementASIN(asin) { + if (this.state[asin] === undefined) { + this.state[asin] = 0; + return; + } + + this.state[asin] += 1; + } +} + +module.exports = new ASINTracker(); +``` + +We'll now call this function at the top level of the **main.js** file to ensure it is the first thing that gets called when the Actor starts up: + +```js +// main.js + +// ... +import tracker from './asinTracker'; + +// The Actor.init() function should be executed before +// the tracker's initialization +await Actor.init(); + +await tracker.initialize(); +// ... +``` + +That's everything! Now, even if the Actor migrates (or is gracefully aborted and then resurrected), this `state` object will always be persisted. + +## Quiz answers 📝 {#quiz-answers} + +**Q: Actors have an option in the Settings tab to Restart on error. Would you use this feature for regular Actors? When would you use this feature?** + +**A:** It's not best to use this option by default. If it fails, there must be a reason, which would need to be thought through first - meaning that the edge case of failing should be handled when resurrecting the Actor. The state should be persisted beforehand. + +**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running/runs-and-builds#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run?** + +**A:** After aborting or throwing an error mid-process, it manages to start back from where it was upon resurrection. + +**Q: Why don't you (usually) need to add any special migration handling code for a standard crawling/scraping Actor? Are there any features in Crawlee or Apify SDK that handle this under the hood?** + +**A:** Because Apify SDK handles all of the migration handling code for us. If you want to add custom migration-handling code, you can use `Actor.events` to listen for the `migrating` or `persistState` events to save the current state in key-value store (or elsewhere). + +**Q: How can you intercept the migration event? How much time do you have after this event happens and before the Actor migrates?** + +**A:** By using the `Actor.on` function. You have a maximum of a few seconds before shutdown after the `migrating` event has been fired. + +**Q: When would you persist data to the default key-value store instead of to a named key-value store?** + +**A:** Persisting data to the default key-value store would help when handling an Actor's run state or with storing metadata about the run (such as results, miscellaneous files, or logs). Using a named key-value store allows you to persist data at the account level to handle data across multiple Actor runs. + +## Wrap up {#wrap-up} + +In this activity, we learned how to persist custom values on an interval as well as after Actor migrations by using the `persistState` event and the key-value store. With this knowledge, you can safely increase your Actor's performance by storing data in variables and then pushing them to the dataset periodically/at the end of the Actor's run as opposed to pushing data immediately after it's been collected. + +One important thing to note is that this workflow can be used to replace the usage of `userData` to pass data between requests, as it allows for the creation of a "global store" which all requests have access to at any time. + + + +--- +title: Solutions +description: View all of the solutions for all of the activities and tasks of this course. Please try to complete each task on your own before reading the solution! +sidebar_position: 6.7 +slug: /expert-scraping-with-apify/solutions +--- + +# Solutions + +**View all of the solutions for all of the activities and tasks of this course. Please try to complete each task on your own before reading the solution!** + +--- + +The final section of each lesson in this course will be a task which you as the course-taker are expected to complete before moving on to the next lesson. Each task's completion and understanding plays an important role in the ability to continue through the course. + +If you ever get stuck, or if you feel like your solution could be more optimal, you can always refer to the **Solutions** section of the course. Each solution will have all of the code and explanations needed to understand it. + +**Please** try to do each task **on your own** prior to checking out the solution! + + + +--- +title: I - Integrating webhooks +description: Learn how to integrate webhooks into your Actors. Webhooks are a super powerful tool, and can be used to do almost anything! +sidebar_position: 1 +slug: /expert-scraping-with-apify/solutions/integrating-webhooks +--- + +# Integrating webhooks {#integrating-webhooks} + +**Learn how to integrate webhooks into your Actors. Webhooks are a super powerful tool, and can be used to do almost anything!** + +--- + +In this lesson we'll be writing a new Actor and integrating it with our beloved Amazon scraping Actor. First, we'll navigate to the same directory where our **demo-actor** folder lives, and run `apify create filter-actor` _(once again, you can name the Actor whatever you want, but for this lesson, we'll be calling the new Actor **filter-actor**)_. When prompted for which type of boilerplate to start out with, select **Empty**. + +![Selecting an empty template to start with](./images/select-empty.jpg) + +Cool! Now, we're ready to get started. + +## Building the new Actor {#building-the-new-actor} + +First of all, we should clear out any of the boilerplate code within **main.js** to get a clean slate: + +```js +// main.js +import { Actor } from 'apify'; + +await Actor.init(); + +// ... + +await Actor.exit(); +``` + +We'll be passing the ID of the Amazon Actor's default dataset along to the new Actor, so we can expect that as an input: + +```js +const { datasetId } = await Actor.getInput(); +const dataset = await Actor.openDataset(datasetId); +// ... +``` + +> Tip: You will need to use `forceCloud` option - `Actor.openDataset(, { forceCloud: true });` - to open dataset from platform storage while running Actor locally. + +Next, we'll grab hold of the dataset's items with the `dataset.getData()` function: + +```js +const { items } = await dataset.getData(); +``` + +While several methods can achieve the goal output of this Actor, using the [`Array.reduce()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce) is the most concise approach + +```js +const filtered = items.reduce((acc, curr) => { + // Grab the price of the item matching our current + // item's ASIN in the map. If it doesn't exist, set + // "prevPrice" to null + const prevPrice = acc?.[curr.asin] ? +acc[curr.asin].offer.slice(1) : null; + + // Grab the price of our current offer + const price = +curr.offer.slice(1); + + // If the item doesn't yet exist in the map, add it. + // Or, if the current offer's price is less than the + // saved one, replace the saved one + if (!acc[curr.asin] || prevPrice > price) acc[curr.asin] = curr; + + // Return the map + return acc; +}, {}); +``` + +The results should be an array, so we can take the map we just created and push an array of its values to the Actor's default dataset: + +```js +await Actor.pushData(Object.values(filtered)); +``` + +Our final code looks like this: + +```js +import { Actor } from 'apify'; + +await Actor.init(); + +const { datasetId } = await Actor.getInput(); +const dataset = await Actor.openDataset(datasetId); + +const { items } = await dataset.getData(); + +const filtered = items.reduce((acc, curr) => { + const prevPrice = acc?.[curr.asin] ? +acc[curr.asin].offer.slice(1) : null; + const price = +curr.offer.slice(1); + + if (!acc[curr.asin] || prevPrice > price) acc[curr.asin] = curr; + + return acc; +}, {}); + +await Actor.pushData(Object.values(filtered)); + +await Actor.exit(); +``` + +Cool! But **wait**, don't forget to configure the **INPUT_SCHEMA.json** file as well! It's not necessary to do this step, as we'll be calling the Actor through Apify's API within a webhook, but it's still good to get into the habit of writing quality input schemas that describe the input values your Actors are expecting. + +```json +{ + "title": "Amazon Filter Actor", + "type": "object", + "schemaVersion": 1, + "properties": { + "datasetId": { + "title": "Dataset ID", + "type": "string", + "description": "Enter the ID of the dataset.", + "editor": "textfield" + } + }, + "required": ["datasetId"] +} +``` + +Now we're done, and we can push it up to the Apify platform with the `apify push` command. + +## Setting up the webhook {#setting-up-the-webhook} + +Since we'll be calling the Actor via the [Apify API](/academy/api/run-actor-and-retrieve-data-via-api), we'll need to grab hold of the ID of the Actor we just created and pushed to the platform. The ID is always accessible through the **Settings** page of the Actor. + +![Actor ID in Actor settings](./images/actor-settings.jpg) + +With this `actorId`, and our `token`, which is retrievable through **Settings > Integrations** on the Apify Console, we can construct a link which will call the Actor: + +```text +https://api.apify.com/v2/acts/Yk1bieximsduYDydP/runs?token=YOUR_TOKEN_HERE +``` + +We can also use our username and the name of the Actor like this: + +```text +https://api.apify.com/v2/acts/USERNAME~filter-actor/runs?token=YOUR_TOKEN_HERE +``` + +Whichever one you choose is totally up to your preference. + +Next, within the Amazon scraping Actor, we will click the **Integrations** tab and choose **Webhook**, then fill out the details to look like this: + +![Configuring a webhook](./images/adding-webhook.jpg) + +We have chosen to run the webhook once the Actor has succeeded, which means that its default dataset will surely be populated. Since the filtering Actor is expecting the default dataset ID of the Amazon Actor, we use the `resource` variable to grab hold of the `defaultDatasetId`. + +Click **Save**, then run the Amazon **demo-actor** again. + +## Making sure it worked {#checking-the-webhook} + +If everything worked, then at the end of the **demo-actor**'s run, we should see this within the **Integrations** tab: + +![Webhook succeeded](./images/webhook-succeeded.png) + +Additionally, we should be able to see that our **filter-actor** was run, and have access to its dataset: + +![Dataset preview](./images/dataset-preview.png) + +## Quiz answers 📝 {#quiz-answers} + +**Q: How do you allocate more CPU for an Actor's run?** + +**A:** On the platform, more memory can be allocated in the Actor's input configuration, and the default allocated CPU can be changed in the Actor's **Settings** tab. When running locally, you can use the **APIFY_MEMORY_MBYTES** environment variable to set the allocated CPU. 4GB is equal to 1 CPU core on the Apify platform. + +**Q: Within itself, can you get the exact time that an Actor was started?** + +**A:** Yes. The time the Actor was started can be retrieved through the `startedAt` property from the `Actor.getEnv()` function, or directly from `process.env.APIFY_STARTED_AT` + +**Q: What are the types of default storages connected to an Actor's run?** + +Every Actor's run is given a default key-value store and a default dataset. The default key-value store by default has the `INPUT` and `OUTPUT` keys. The Actor's request queue is also stored. + +**Q: Can you change the allocated memory of an Actor while it's running?** + +**A:** Not while it's running. You'd need to stop it and run a new one. However, there is an option to soft abort an Actor, then resurrect then run with a different memory configuration. + +**Q: How can you run an Actor with Puppeteer on the Apify platform with headless mode set to `false`?** + +**A:** This can be done by using the `actor-node-puppeteer-chrome` Docker image and making sure that `launchContext.launchOptions.headless` in `PuppeteerCrawlerOptions` is set to `false`. + +## Wrap up {#wrap-up} + +See that?! Integrating webhooks is a piece of cake on the Apify platform! You'll soon discover that the platform factors away a lot of complex things and allows you to focus on what's most important - developing and releasing Actors. + + + +--- +title: II - Managing source +description: View in-depth answers for all three of the quiz questions that were provided in the corresponding lesson about managing source code. +sidebar_position: 2 +slug: /expert-scraping-with-apify/solutions/managing-source +--- + +# Managing source + +**View in-depth answers for all three of the quiz questions that were provided in the corresponding lesson about managing source code.** + +--- + +In the lesson corresponding to this solution, we discussed an extremely important topic: source code management. Though we solved the task right in the lesson, we've still included the quiz answers here. + +## Quiz answers {#quiz-answers} + +**Q: Do you have to rebuild an Actor each time the source code is changed?** + +**A:** Yes. It needs to be built into an image, saved in a registry, and later on run in a container. + +**Q: In Git, what is the difference between pushing changes and making a pull request?** + +**A:** Pushing changes to the remote branch based on the content on the local branch. The pushing of code changes is usually made to a branch parallel to the one you want to eventually push it to. + +When creating a pull request, the code is meant to be reviewed, or at least pass all the test suites before being merged into the target branch. + +**Q: Based on your knowledge and experience, is the `apify push` command worth using (in your opinion)?** + +**A:** The `apify push` command can sometimes be useful when testing ideas; however, it is much more ideal to use GitHub integration rather than directly pushing to the platform. + + + +--- +title: VI - Rotating proxies/sessions +description: Learn firsthand how to rotate proxies and sessions in order to avoid the majority of the most common anti-scraping protections. +sidebar_position: 6 +slug: /expert-scraping-with-apify/solutions/rotating-proxies +--- + +# Rotating proxies/sessions {#rotating-proxy-sessions} + +**Learn firsthand how to rotate proxies and sessions in order to avoid the majority of the most common anti-scraping protections.** + +--- + +If you take a look at our current code for the Amazon scraping Actor, you might notice this snippet: + +```js +const proxyConfiguration = await Actor.createProxyConfiguration({ + groups: ['RESIDENTIAL'], +}); +``` + +We didn't provide much explanation for this initially, as it was not directly relevant to the lesson at hand. When you [create a **ProxyConfiguration**](../../../webscraping/anti_scraping/mitigation/using_proxies.md) and pass it to a crawler, Crawlee will make the crawler automatically rotate through the proxies. This entire time, we've been using the **RESIDENTIAL** proxy group to avoid being blocked by Amazon. + +> Go ahead and try commenting out the proxy configuration code then running the scraper. What happens? + +In order to rotate sessions, we must utilize the [**SessionPool**](https://crawlee.dev/api/core/class/AutoscaledPool), which we've actually also already been using by setting the **useSessionPool** option in our crawler's configuration to **true**. The SessionPool advances the concept of proxy rotation by tying proxies to user-like sessions and rotating those instead. In addition to a proxy, each user-like session has cookies attached to it (and potentially a browser fingerprint as well). + +## Configuring SessionPool {#configuring-session-pool} + +Let's go ahead and add a **sessionPoolOptions** key to our crawler's configuration so that we can modify the default settings: + +```js +const crawler = new CheerioCrawler({ + requestList, + requestQueue, + proxyConfiguration, + useSessionPool: true, + // This is where our session pool + // configuration lives + sessionPoolOptions: { + // We can add options for each + // session created by the session + // pool here + sessionOptions: { + + }, + }, + maxConcurrency: 50, + // ... +}); +``` + +Now, we'll use the **maxUsageCount** key to force each session to be thrown away after 5 uses and **maxErrorScore** to trash a session once it receives an error. + +```js +const crawler = new CheerioCrawler({ + requestList, + requestQueue, + proxyConfiguration, + useSessionPool: true, + sessionPoolOptions: { + sessionOptions: { + maxUsageCount: 5, + maxErrorScore: 1, + }, + }, + maxConcurrency: 50, + // ... +}); +``` + +And that's it! We've successfully configured the session pool to match the task's requirements. + +## Limiting proxy location {#limiting-proxy-location} + +The final requirement was to use proxies only from the US. Back in our **ProxyConfiguration**, we need to add the **countryCode** key and set it to **US**: + +```js +const proxyConfiguration = await Actor.createProxyConfiguration({ + groups: ['RESIDENTIAL'], + countryCode: 'US', +}); +``` + +## Quiz answers {#quiz-answers} + +**Q: What are the different types of proxies that Apify proxy offers? What are the main differences between them?** + +**A:** Datacenter, residential, and Google SERP proxies with sub-groups. Datacenter proxies are fast and cheap but have a higher chance of being blocked on certain sites in comparison to residential proxies, which are IP addresses located in homes and offices around the world. Google SERP proxies are specifically for Google. + +**Q: Which proxy groups do users get on the free plan? Can they access the proxy from their computer?** + +**A:** All users have access to the **BUYPROXIES94952**, **GOOGLE_SERP** and **RESIDENTIAL** groups. Free users cannot access the proxy from outside the Apify platform (paying users can). + +**Q: How can you prevent an error from occurring if one of the proxy groups that a user has is removed? What are the best practices for these scenarios?** + +**A:** By making the proxy for the scraper to use be configurable by the user through the Actor's input. That way, they can switch proxies if the Actor stops working due to proxy-related issues. It can also be done by using the **AUTO** proxy instead of specific groups. + +**Q: Does it make sense to rotate proxies when you are logged into a website?** + +**A:** No, because most websites tie an IP address to a session. If you start making requests with cookies used with a different IP address, the website might see it as unusual activity and either block the scraper or automatically log out. + +**Q: Construct a proxy URL that will select proxies only from the US.** + +**A:** `http://country-US:@proxy.apify.com:8000` + +**Q: What do you need to do to rotate a proxy (one proxy usually has one IP)? How does this differ for CheerioCrawler and PuppeteerCrawler?** + +**A:** Making a new request with the proxy endpoint above will automatically rotate it. Sessions can also be used to automatically do this. While proxy rotation is fairly straightforward for Cheerio, it's more complex in Puppeteer, as you have to retire the browser each time a new proxy is rotated in. The SessionPool will automatically retire a browser when a session is retired. Sessions can be manually retired with `session.retire()`. + +**Q: Name a few different ways how a website can prevent you from scraping it.** + +**A:** IP detection and rate-limiting, browser/fingerprint detection, user behavior tracking, etc. + +## Wrap up {#wrap-up} + +In this solution, you learned one of the most important concepts in web scraping - proxy/session rotation. With your newfound knowledge of the SessionPool, you'll be (practically) unstoppable! + + + +--- +title: VII - Saving run stats +description: Implement the saving of general statistics about an Actor's run, as well as adding request-specific statistics to dataset items. +sidebar_position: 7 +slug: /expert-scraping-with-apify/solutions/saving-stats +--- + +# Saving run stats {#saving-stats} + +**Implement the saving of general statistics about an Actor's run, as well as adding request-specific statistics to dataset items.** + +--- + +The code in this solution will be similar to what we already did in the **Handling migrations** solution; however, we'll be storing and logging different data. First, let's create a new file called **Stats.js** and write a utility class for storing our run stats: + +```js +import Actor from 'apify'; + +class Stats { + constructor() { + this.state = { + errors: {}, + totalSaved: 0, + }; + } + + async initialize() { + const data = await Actor.getValue('STATS'); + + if (data) this.state = data; + + Actor.on('persistState', async () => { + await Actor.setValue('STATS', this.state); + }); + + setInterval(() => console.log(this.state), 10000); + } + + addError(url, errorMessage) { + if (!this.state.errors?.[url]) this.state.errors[url] = []; + this.state.errors[url].push(errorMessage); + } + + success() { + this.state.totalSaved += 1; + } +} + +module.exports = new Stats(); +``` + +Cool, very similar to the **AsinTracker** class we wrote earlier. We'll now import **Stats** into our **main.js** file and initialize it along with the ASIN tracker: + +```js +// ... +import Stats from './Stats.js'; + +await Actor.init(); +await asinTracker.initialize(); +await Stats.initialize(); +// ... +``` + +## Tracking errors {#tracking-errors} + +In order to keep track of errors, we must write a new function within the crawler's configuration called **errorHandler**. Passed into this function is an object containing an **Error** object for the error which occurred and the **Request** object, as well as information about the session and proxy which were used for the request. + +```js +const crawler = new CheerioCrawler({ + proxyConfiguration, + useSessionPool: true, + sessionPoolOptions: { + persistStateKey: 'AMAZON-SESSIONS', + sessionOptions: { + maxUsageCount: 5, + maxErrorScore: 1, + }, + }, + maxConcurrency: 50, + requestHandler: router, + // Handle all failed requests + errorHandler: async ({ error, request }) => { + // Add an error for this url to our error tracker + Stats.addError(request.url, error?.message); + }, +}); +``` + +## Tracking total saved {#tracking-total-saved} + +Now, we'll increment our **totalSaved** count for every offer added to the dataset. + +```js +router.addHandler(labels.OFFERS, async ({ $, request }) => { + const { data } = request.userData; + + const { asin } = data; + + for (const offer of $('#aod-offer')) { + tracker.incrementASIN(asin); + // Add 1 to totalSaved for every offer + Stats.success(); + + const element = $(offer); + + await dataset.pushData({ + ...data, + sellerName: element.find('div[id*="soldBy"] a[aria-label]').text().trim(), + offer: element.find('.a-price .a-offscreen').text().trim(), + }); + } +}); +``` + +## Saving stats with dataset items {#saving-stats-with-dataset-items} + +Still, in the **OFFERS** handler, we need to add a few extra keys to the items which are pushed to the dataset. Luckily, all of the data required by the task is accessible in the context object. + +```js +router.addHandler(labels.OFFERS, async ({ $, request }) => { + const { data } = request.userData; + + const { asin } = data; + + for (const offer of $('#aod-offer')) { + tracker.incrementASIN(asin); + // Add 1 to totalSaved for every offer + Stats.success(); + + const element = $(offer); + + await dataset.pushData({ + ...data, + sellerName: element.find('div[id*="soldBy"] a[aria-label]').text().trim(), + offer: element.find('.a-price .a-offscreen').text().trim(), + // Store the handledAt date or current date if that is undefined + dateHandled: request.handledAt || new Date().toISOString(), + // Access the number of retries on the request object + numberOfRetries: request.retryCount, + // Grab the number of pending requests from the requestQueue + currentPendingRequests: (await requestQueue.getInfo()).pendingRequestCount, + }); + } +}); +``` + +## Quiz answers {#quiz-answers} + +**Q: Why might you want to store statistics about an Actor's run (or a specific request)?** + +**A:** If certain types of requests are error-prone, you might want to save stats about the run to look at them later to either eliminate or better handle the errors. Things like **dateHandled** can be generally useful information. + +**Q: In our Amazon scraper, we are trying to store the number of retries of a request once its data is pushed to the dataset. Where would you get this information? Where would you store it?** + +**A:** This information is available directly on the request object under the property **retryCount**. + +**Q: What is the difference between the `failedRequestHandler` and `errorHandler`?** + +**A:** `failedRequestHandler` runs after a request has failed and reached its `maxRetries` count. `errorHandler` runs on every failure and retry. + + + +--- +title: IV - Using the Apify API & JavaScript client +description: Learn how to interact with the Apify API directly through the well-documented RESTful routes, or by using the proprietary Apify JavaScript client. +sidebar_position: 4 +slug: /expert-scraping-with-apify/solutions/using-api-and-client +--- + +# Using the Apify API & JavaScript client {#using-api-and-client} + +**Learn how to interact with the Apify API directly through the well-documented RESTful routes, or by using the proprietary Apify JavaScript client.** + +--- + +Since we need to create another Actor, we'll once again use the `apify create` command and start from an empty template. + +![Selecting an empty template to start with](./images/select-empty.jpg) + +This time, let's call our project **actor-caller**. + +Let's also set up some boilerplate, grabbing our inputs and creating a constant variable for the task: + +```js +import { Actor } from 'apify'; +import axios from 'axios'; + +await Actor.init(); + +const { useClient, memory, fields, maxItems } = await Actor.getInput(); + +const TASK = 'YOUR_USERNAME~demo-actor-task'; + +// our future code will go here + +await Actor.exit(); +``` + +## Calling a task via JavaScript client {#calling-a-task-via-client} + +When using the `apify-client` package, you can create a new client instance by using `new ApifyClient()`. Within the Apify SDK however, it is not necessary to even install the `apify-client` package, as the `Actor.newClient()` function is available for use. + +We'll start by creating a function called `withClient()` and creating a new client, then calling the task: + +```js +const withClient = async () => { + const client = Actor.newClient(); + const task = client.task(TASK); + + const { id } = await task.call({ memory }); +}; +``` + +After the task has run, we'll grab hold of its dataset, then attempt to download the items, plugging in our `maxItems` and `fields` inputs. Then, once the data has been downloaded, we'll push it to the default key-value store under a key named **OUTPUT.csv**. + +```js +const withClient = async () => { + const client = Actor.newClient(); + const task = client.task(TASK); + + const { id } = await task.call({ memory }); + + const dataset = client.run(id).dataset(); + + const items = await dataset.downloadItems('csv', { + limit: maxItems, + fields, + }); + + // If the content type is anything other than JSON, it must + // be specified within the third options parameter + return Actor.setValue('OUTPUT', items, { contentType: 'text/csv' }); +}; +``` + +## Calling a task via API {#calling-a-task-via-api} + +First, we'll create a function (right under the `withClient()`) function named `withAPI` and instantiate a new variable which represents the API endpoint to run our task: + +```js +const withAPI = async () => { + const uri = `https://api.apify.com/v2/actor-tasks/${TASK}/run-sync-get-dataset-items?`; +}; +``` + +To add the query parameters to the URL, we could create a super long string literal, plugging in all of our input values; however, there is a much better way: [`URLSearchParams`](https://nodejs.org/api/url.html#new-urlsearchparams). By using `URLSearchParams`, we can add the query parameters in an object: + +```js +const withAPI = async () => { + const uri = `https://api.apify.com/v2/actor-tasks/${TASK}/run-sync-get-dataset-items?`; + const url = new URL(uri); + + url.search = new URLSearchParams({ + memory, + format: 'csv', + limit: maxItems, + fields: fields.join(','), + token: process.env.APIFY_TOKEN, + }); +}; +``` + +Finally, let's make a `POST` request to our endpoint. You can use any library you want, but in this example, we'll use [`axios`](https://www.npmjs.com/package/axios). Don't forget to run `npm install axios` if you're going to use this package too! + +```js +const withAPI = async () => { + const uri = `https://api.apify.com/v2/actor-tasks/${TASK}/run-sync-get-dataset-items?`; + const url = new URL(uri); + + url.search = new URLSearchParams({ + memory, + format: 'csv', + limit: maxItems, + fields: fields.join(','), + token: process.env.APIFY_TOKEN, + }); + + const { data } = await axios.post(url.toString()); + + return Actor.setValue('OUTPUT', data, { contentType: 'text/csv' }); +}; +``` + +## Finalizing the Actor {#finalizing-the-actor} + +Now, since we've written both of these functions, all we have to do is write a conditional statement based on the boolean value from `useClient`: + +```js +if (useClient) await withClient(); +else await withAPI(); +``` + +And before we push to the platform, let's not forget to write an input schema in the **INPUT_SCHEMA.JSON** file: + +```json +{ + "title": "Actor Caller", + "type": "object", + "schemaVersion": 1, + "properties": { + "memory": { + "title": "Memory", + "type": "integer", + "description": "Select memory in megabytes.", + "default": 4096, + "maximum": 32768, + "unit": "MB" + }, + "useClient": { + "title": "Use client?", + "type": "boolean", + "description": "Specifies whether the Apify JS client, or the pure Apify API should be used.", + "default": true + }, + "fields": { + "title": "Fields", + "type": "array", + "description": "Enter the dataset fields to export to CSV", + "prefill": ["title", "url", "price"], + "editor": "stringList" + }, + "maxItems": { + "title": "Max items", + "type": "integer", + "description": "Fill the maximum number of items to export.", + "default": 10 + } + }, + "required": ["useClient", "memory", "fields", "maxItems"] +} +``` + +## Final code {#final-code} + +To ensure we're on the same page, here is what the final code looks like: + +```js +import { Actor } from 'apify'; +import axios from 'axios'; + +await Actor.init(); + +const { useClient, memory, fields, maxItems } = await Actor.getInput(); + +const TASK = 'YOUR_USERNAME~demo-actor-task'; + +const withClient = async () => { + const client = Actor.newClient(); + const task = client.task(TASK); + + const { id } = await task.call({ memory }); + + const dataset = client.run(id).dataset(); + + const items = await dataset.downloadItems('csv', { + limit: maxItems, + fields, + }); + + return Actor.setValue('OUTPUT', items, { contentType: 'text/csv' }); +}; + +const withAPI = async () => { + const uri = `https://api.apify.com/v2/actor-tasks/${TASK}/run-sync-get-dataset-items?`; + const url = new URL(uri); + + url.search = new URLSearchParams({ + memory, + format: 'csv', + limit: maxItems, + fields: fields.join(','), + token: process.env.APIFY_TOKEN, + }); + + const { data } = await axios.post(url.toString()); + + return Actor.setValue('OUTPUT', data, { contentType: 'text/csv' }); +}; + +if (useClient) { + await withClient(); +} else { + await withAPI(); +} + +await Actor.exit(); +``` + +## Quiz answers 📝 {#quiz-answers} + +**Q: What is the relationship between the Apify API and Apify client? Are there any significant differences?** + +**A:** The Apify client mimics the Apify API, so there aren't any super significant differences. It's super handy as it helps with managing the API calls (parsing, error handling, retries, etc) and even adds convenience functions. + +The one main difference is that the Apify client automatically uses [**exponential backoff**](/api/client/js/docs#retries-with-exponential-backoff) to deal with errors. + +**Q: How do you pass input when running an Actor or task via API?** + +**A:** The input should be passed into the **body** of the request when running an actor/task via API. + +**Q: Do you need to install the `apify-client` npm package when already using the `apify` package?** + +**A:** No. The Apify client is available right in the SDK with the `Actor.newClient()` function. + +## Wrap up {#wrap-up} + +That's it! Now, if you want to go above and beyond, you should create a GitHub repository for this Actor, integrate it with a new one on the Apify platform, and test if it works there as well (with multiple input configurations). + + + +--- +title: III - Using storage & creating tasks +description: Get quiz answers and explanations for the lesson about using storage and creating tasks on the Apify platform. +sidebar_position: 3 +slug: /expert-scraping-with-apify/solutions/using-storage-creating-tasks +--- + +# Using storage & creating tasks {#using-storage-creating-tasks} + +## Quiz answers 📝 {#quiz-answers} + +**Q: What is the relationship between Actors and tasks?** + +**A:** Tasks are pre-configured runs of Actors. The configurations of an Actor can be saved as a task so that it doesn't have to be manually configured every single time. + +**Q: What are the differences between default (unnamed) and named storage? Which one would you use for everyday usage?** + +**A:** Unnamed storage is persisted for only 7 days, while named storage is persisted indefinitely. For everyday usage, it is best to use default unnamed storages unless the data should explicitly be persisted for more than 7 days. + +> With named storages, it's easier to verify that you're using the correct store, as they can be referred to by name rather than by an ID. + +**Q: What is data retention, and how does it work for all types of storages (default and named)?** + +**A:** Default/unnamed storages expire after 7 days unless otherwise specified. Named storages are retained indefinitely. + +## Wrap up {#wrap-up} + +You've learned how to use the different storage options available on Apify, the two different types of storage, as well as how to create tasks for Actors. + + + +--- +title: I - Webhooks & advanced Actor overview +description: Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks. +sidebar_position: 6.1 +slug: /expert-scraping-with-apify/actors-webhooks +--- + +# Webhooks & advanced Actor overview {#webhooks-and-advanced-actors} + +**Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks.** + +--- + +Thus far, you've run Actors on the platform and written an Actor of your own, which you published to the platform yourself using the Apify CLI; therefore, it's fair to say that you are becoming more familiar and comfortable with the concept of **Actors**. Within this lesson, we'll take a more in-depth look at Actors and what they can do. + +## Advanced Actor overview {#advanced-actors} + +In this course, we'll be working out of the Amazon scraper project from the **Web scraping for beginners** course. If you haven't already built that project, you can do it in three short lessons [here](../../webscraping/scraping_basics_javascript/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same. + +Take another look at the files within your Amazon scraper project. You'll notice that there is a **Dockerfile**. Every single Actor has a Dockerfile (the Actor's **Image**) which tells Docker how to spin up a container on the Apify platform which can successfully run the Actor's code. "Apify Actors" is a serverless platform that runs multiple Docker containers. For a deeper understanding of Actor Dockerfiles, refer to the [Apify Actor Dockerfile docs](/sdk/js/docs/guides/docker-images#example-dockerfile). + +## Webhooks {#webhooks} + +Webhooks are a powerful tool that can be used for just about anything. You can set up actions to be taken when an Actor reaches a certain state (started, failed, succeeded, etc). These actions usually take the form of an API call (generally a POST request). + +## Learning 🧠 {#learning} + +Prior to moving forward, please read over these resources: + +- Read about [running Actors, handling Actor inputs, memory and CPU](/platform/actors/running). +- Learn about [Actor webhooks](/platform/integrations/webhooks), which we will implement in the next lesson. +- Learn [how to run Actors](/academy/api/run-actor-and-retrieve-data-via-api) using Apify's REST API. + +## Knowledge check 📝 {#quiz} + +1. How do you allocate more CPU for an Actor's run? +2. Within itself, can you get the exact time that an Actor was started? +3. What are the types of default storages connected to an Actor's run? +4. Can you change the allocated memory of an Actor while it's running? +5. How can you run an Actor with Puppeteer on the Apify platform with headless mode set to `false`? + +## Our task {#our-task} + +In this task, we'll be building on top of what we already created in the [Web scraping for beginners](/academy/web-scraping-for-beginners/challenge) course's final challenge, so keep those files safe! + +Once our Amazon Actor has completed its run, we will, rather than sending an email to ourselves, call an Actor through a webhook. The Actor called will be a new Actor that we will create together, which will take the dataset ID as input, then subsequently filter through all of the results and return only the cheapest one for each product. All of the results of the Actor will be pushed to its default dataset. + +[**Solution**](./solutions/integrating_webhooks.md) + +## Next up {#next} + +This course's [next lesson](./managing_source_code.md) is brief, but discusses a very important topic: managing your code and storing it in a safe place. + + + +--- +title: IV - Apify API & client +description: Gain an in-depth understanding of the two main ways of programmatically interacting with the Apify platform - through the API, and through a client. +sidebar_position: 6.4 +slug: /expert-scraping-with-apify/apify-api-and-client +--- + +# Apify API & client {#api-and-client} + +**Gain an in-depth understanding of the two main ways of programmatically interacting with the Apify platform - through the API, and through a client.** + +--- + +You can use one of the two main ways to programmatically interact with the Apify platform: by directly using [Apify's RESTful API](/api/v2), or by using the [JavaScript](/api/client/js) and [Python](/api/client/python) API clients. In the next two lessons, we'll be focusing on the first two. + +> Apify's API and JavaScript API client allow us to do anything a regular user can do when interacting with the platform's web interface, only programmatically. + +## Learning 🧠 {#learning} + +- Scroll through the [Apify API docs](/api/v2) (there's a whole lot there, so you're not expected to memorize everything). +- Read about the Apify client in [Apify's docs](/api/client/js). It can also be seen on [GitHub](https://github.com/apify/apify-client-js) and [npm](https://www.npmjs.com/package/apify-client). +- Learn about the [`Actor.newClient()`](/sdk/js/reference/class/Actor#newClient) function in the Apify SDK. +- Skim through [this article](https://help.apify.com/en/articles/2868670-how-to-pass-data-from-web-scraper-to-another-actor) about API integration (this article is old; however, still relevant). + +## Knowledge check 📝 {#quiz} + +1. What is the relationship between the Apify API and the Apify client? Are there any significant differences? +2. How do you pass input when running an Actor or task via API? +3. Do you need to install the `apify-client` npm package when already using the `apify` package? + +## Our task + +We'll be creating another new Actor, which will have two jobs: + +1. Programmatically call the task for the Amazon Actor. +2. Export its results into CSV format under a new key called **OUTPUT.csv** in the default key-value store. + +Though it's a bit unintuitive, this is a perfect activity for learning how to use both the Apify API and the Apify JavaScript client. + +The new Actor should take the following input values, which be mapped to parameters in the API calls: + +```json +{ + // How much memory to allocate to the Amazon Actor + // Must be a power of 2 + "memory": 4096, + + // Whether to use the JavaScript client to make the + // call, or to use the API + "useClient": false, + + // The fields in each item to return back. All other + // fields should be ommitted + "fields": ["title", "itemUrl", "offer"], + + // The maximum number of items to return back + "maxItems": 10 +} +``` + +[**Solution**](./solutions/using_api_and_client.md) + +## Next up {#next} + +[Lesson VI](./migrations_maintaining_state.md) will teach us everything we need to know about migrations and how to handle them properly to avoid losing any state; therefore, increasing the reliability of our `demo-actor` Amazon scraper. + + + +--- +title: VI - Bypassing anti-scraping methods +description: Learn about bypassing anti-scraping methods using proxies and proxy/session rotation together with Crawlee and the Apify SDK. +sidebar_position: 6.6 +slug: /expert-scraping-with-apify/bypassing-anti-scraping +--- + +# Bypassing anti-scraping methods {#bypassing-anti-scraping-methods} + +**Learn about bypassing anti-scraping methods using proxies and proxy/session rotation together with Crawlee and the Apify SDK.** + +--- + +Effectively bypassing anti-scraping software is one of the most crucial, but also one of the most difficult skills to master. The different types of [anti-scraping protections](../../webscraping/anti_scraping/index.md) can vary a lot on the web. Some websites aren't even protected at all, some require only moderate IP rotation, and some cannot even be scraped without using advanced techniques and workarounds. Additionally, because the web is evolving, anti-scraping techniques are also evolving and becoming more advanced. + +It is generally quite difficult to recognize the anti-scraping protections a page may have when first inspecting it, so it is important to thoroughly investigate a site prior to writing any lines of code, as anti-scraping measures can significantly change your approach as well as complicate the development process of an Actor. As your skills expand, you will be able to spot anti-scraping measures quicker, and better evaluate the complexity of a new project. + +You might have already noticed that we've been using the **RESIDENTIAL** proxy group in the `proxyConfiguration` within our Amazon scraping Actor. But what does that mean? This is a proxy group from [Apify Proxy](https://apify.com/proxy) which has been preventing us from being blocked by Amazon this entire time. We'll be learning more about proxies and Apify Proxy in this lesson. + +## Learning 🧠 {#learning} + +- Skim [this page](https://apify.com/proxy) for a general idea of Apify Proxy. +- Give the [proxy documentation](/platform/proxy) a solid readover (feel free to skip most of the examples). +- Check out the [anti-scraping guide](../../webscraping/anti_scraping/index.md). +- Gain a solid understanding of the [SessionPool](https://crawlee.dev/api/core/class/SessionPool). +- Look at a few Actors on the [Apify store](https://apify.com/store). How are they utilizing proxies? + +## Knowledge check 📝 {#quiz} + +1. What are the different types of proxies that Apify proxy offers? What are the main differences between them? +2. Which proxy groups do users get on the free plan? Can they access the proxy from their computer? +3. How can you prevent an error from occurring if one of the proxy groups that a user has is removed? What are the best practices for these scenarios? +4. Does it make sense to rotate proxies when you are logged into a website? +5. Construct a proxy URL that will select proxies **only from the US**. +6. What do you need to do to rotate a proxy (one proxy usually has one IP)? How does this differ for CheerioCrawler and PuppeteerCrawler? +7. Name a few different ways how a website can prevent you from scraping it. + +## Our task + +This time, we're going to build a trivial proxy-session manager for our Amazon scraping Actor. A session should be used a maximum of 5 times before being rotated; however, if a request fails, the IP should be rotated immediately. + +Additionally, the proxies used by our scraper should now only be from the US. + +[**Solution**](./solutions/rotating_proxies.md) + +## Next up {#next} + +Up [next](./saving_useful_stats.md), we'll be learning about how to save useful stats about our run, which becomes more and more useful as a project scales. + + + +--- +title: Expert scraping with Apify +description: After learning the basics of Actors and Apify, learn to develop pro-level scrapers on the Apify platform with this advanced course. +sidebar_position: 12 +category: apify platform +slug: /expert-scraping-with-apify +--- + +# Expert scraping with Apify {#expert-scraping} + +**After learning the basics of Actors and Apify, learn to develop pro-level scrapers on the Apify platform with this advanced course.** + +--- + +This course will teach you the nitty gritty of what it takes to build pro-level scrapers with Apify. We recommend that you've at least looked through all of the other courses in the academy prior to taking this one. + +## Preparations {#preparations} + +Before developing a pro-level Apify scraper, there are some important things you should have at least a bit of knowledge about (knowing the basics of each is enough to continue through this section), as well as some things that you should have installed on your system. + +> If you've already gone through the [Web scraping for beginners course](../../webscraping/scraping_basics_javascript/index.md) and the first courses of the [Apify platform category](../apify_platform.md), you will be more than well equipped to continue on with the lessons in this course. + + + +### Crawlee, Apify SDK, and the Apify CLI {#crawlee-apify-sdk-and-cli} + +If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5–10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to [this lesson](../../webscraping/scraping_basics_javascript/crawling/pro_scraping.md) in the **Web scraping for beginners** course (and ideally follow along). To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category. + +The Apify CLI will play a core role in the running and testing of the Actor you will build, so if you haven't gotten it installed already, please refer to [this short lesson](../../glossary/tools/apify_cli.md). + +### Git {#git} + +In one of the later lessons, we'll be learning how to integrate our Actor on the Apify platform with a GitHub repository. For this, you'll need to understand at least the basics of [Git](https://git-scm.com/docs). Here's a [great tutorial](https://product.hubspot.com/blog/git-and-github-tutorial-for-beginners) to help you get started with Git. + +### Docker {#docker} + +Docker is a massive topic on its own, but don't be worried! We only expect you to know and understand the very basics of it, which can be learned about in [this short article](https://docs.docker.com/guides/docker-overview/) (10 minute read). + +### The basics of Actors {#actor-basics} + +Part of this course will be learning more in-depth about Actors; however, some basic knowledge is already assumed. If you haven't yet gone through the [Actors](../getting_started/actors.md) lesson of the **Apify platform** course, it's highly recommended to at least give it a glance before moving forward. + +## First up {#first} + +[First up](./actors_webhooks.md), we'll be learning in-depth about integrating Actors with each other using webhooks. + +> Each lesson will have a short _(and optional)_ quiz that you can take at home to test your skills and knowledge related to the lesson's content. Some questions have straight factual answers, but some others can have varying opinionated answers. + + + +--- +title: II - Managing source code +description: Learn how to manage your Actor's source code more efficiently by integrating it with a GitHub repository. This is standard on the Apify platform. +sidebar_position: 6.2 +slug: /expert-scraping-with-apify/managing-source-code +--- + +# Managing source code {#managing-source-code} + +**Learn how to manage your Actor's source code more efficiently by integrating it with a GitHub repository. This is standard on the Apify platform.** + +--- + +In this brief lesson, we'll discuss how to better manage an Actor's source code. Up 'til now, you've been developing your scripts locally, and then pushing the code directly to the Actor on the Apify platform; however, there is a much more optimal (and standard) way. + +## Learning 🧠 {#learning} + +Thus far, every time we've updated our code on the Apify platform, we've used the `apify push` CLI command; however, this can be problematic for a few reasons - mainly because, if someone else wants to make a change to/maintain your code, they don't have access to it, as it is on your local machine. + +If you're not yet familiar with Git, please get familiar with it through the [Git documentation](https://git-scm.com/docs), then take a quick moment to read about [GitHub integration](/platform/integrations/github) in the Apify docs. + +Also, try to explore the **Multifile editor** in one of the Actors you developed in the previous lessons before moving forward. + +## Knowledge check 📝 {#quiz} + +1. Do you have to rebuild an Actor each time the source code is changed? +2. In Git, what is the difference between **pushing** changes and making a **pull request**? +3. Based on your knowledge and experience, is the `apify push` command worth using (in your opinion)? + +[**Answers**](./solutions/managing_source.md) + +## Our task {#our-task} + +First, we must initialize a GitHub repository (you can use Gitlab if you like, but this lesson's examples will be using GitHub). Then, after pushing our main Amazon Actor's code to the repo, we must switch its source code to use the content of the GitHub repository instead. + +## Integrating GitHub source code {#integrating-github} + +First, let's create a repository. This can be done [in a number of ways](https://kbroman.org/github_tutorial/pages/init.html), but in this lesson, we'll do it by creating the remote repository on GitHub's website: + +![Create a new GitHub repo](./images/github-new-repo.png) + +Then, we'll run the commands it tells us in our terminal (while within the **demo-actor** directory) to initialize the repository locally, and then push all of the files to the remote one. + +After you've created your repo, navigate on the Apify platform to the Actor we called **demo-actor**. In the **Source** tab, click the dropdown menu under **Source code** and select **Git repository**. By default, this is set to **Web IDE**, which is what we've been using so far. + +![Select source code location](./images/select-source-location.png) + +Then, go ahead and paste the link to your repository into the **Git URL** text field and click **Save**. + +The final step is to click on **API** in the top right corner of your Actor's page: + +![API button](./images/api-button.jpg) + +And scroll through all of the links until you find the **Build Actor** API endpoint. Copy this endpoint's URL, then head back over to your GitHub repository and navigate to **Settings > Webhooks > Add webhook**. The final thing to do is to paste the URL and save the webhook. + +![Adding a webhook to your GitHub repo](../../../platform/actors/development/deployment/images/ci-github-integration.png) + +And you're done! 🎉 + +## Quick chat about code management {#code-management} + +This was a bit of overhead, but the good news is that you don't ever have to configure this stuff again for this Actor. Now, every time the content of your **main**/**master** branch changes, the Actor on the Apify platform will rebuild based on the newest code. + +Think of it as combining two steps into one! Normally, you'd have to do a `git push` from your terminal in order to get the newest code onto GitHub, then run `apify push` to push it to the platform. + +It's also important to know that GitHub/Gitlab repository integration is standard practice. As projects grow and the number of contributors and maintainers increases, it only makes sense to have a GitHub repository integrated with the project's Actor. For the remainder of this course, all Actors created will be integrated with a GitHub repository. + +## Next up {#next} + +[Next up](./tasks_and_storage.md), you'll learn about the different ways to store scraped data, as well as how to utilize a cool feature to run pre-configured Actors. + + + +--- +title: V - Migrations & maintaining state +description: Learn about what Actor migrations are and how to handle them properly so that the state is not lost and runs can safely be resurrected. +sidebar_position: 6.5 +slug: /expert-scraping-with-apify/migrations-maintaining-state +--- + +# Migrations & maintaining state {#migrations-maintaining-state} + +**Learn about what Actor migrations are and how to handle them properly so that the state is not lost and runs can safely be resurrected.** + +--- + +We already know that Actors are Docker containers that can be run on any server. This means that they can be allocated anywhere there is space available, making them very efficient. Unfortunately, there is one big caveat: Actors move - a lot. When an Actor moves, it is called a **migration**. + +On migration, the process inside of an Actor is completely restarted and everything in its memory is lost, meaning that any values stored within variables or classes are lost. + +When a migration happens, you want to do a so-called "state transition", which means saving any data you care about so the Actor can continue right where it left off before the migration. + +## Learning 🧠 {#learning} + +Read this [article](/platform/actors/development/builds-and-runs/state-persistence) on migrations and dealing with state transitions. + +Before moving forward, read about Actor [events](/sdk/js/docs/upgrading/upgrading-to-v3#events) and how to listen for them. + +## Knowledge check 📝 {#quiz} + +1. Actors have an option in the **Settings** tab to **Restart on error**. Would you use this feature for regular Actors? When would you use this feature? +2. Migrations happen randomly, but by [aborting **gracefully**](/platform/actors/running/runs-and-builds#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run? +3. Why don't you (usually) need to add any special migration handling code for a standard crawling/scraping Actor? Are there any features in the Crawlee/Apify SDK that handle this under the hood? +4. How can you intercept the migration event? How much time do you have after this event happens and before the Actor migrates? +5. When would you persist data to the default key-value store instead of to a named key-value store? + +## Our task + +Once again returning to our Amazon **demo-actor**, let's say that we need to store an object in memory (as a variable) containing all of the scraped ASINs as keys and the number of offers scraped from each ASIN as values. The object should follow this format: + +```json +{ + "B079ZJ1BPR": 3, + "B07D4R4258": 21 +} +``` + +Every 10 seconds, we should log the most up-to-date version of this object to the console. Additionally, the object should be able to solve Actor migrations, which means that even if the Actor were to migrate, its data would not be lost upon resurrection. + +[**Solution**](./solutions/handling_migrations.md) + +## Next up {#next} + +You might have already noticed that we've been using the **RESIDENTIAL** proxy group in the `proxyConfiguration` within our Amazon scraping Actor. But what does that mean? Learn why we've used this group, about proxies, and about avoiding anti-scraping measures in the [next lesson](./bypassing_anti_scraping.md). + + + +--- +title: VII - Saving useful run statistics +description: Understand how to save statistics about an Actor's run, what types of statistics you can save, and why you might want to save them for a large-scale scraper. +sidebar_position: 6.7 +slug: /expert-scraping-with-apify/saving-useful-stats +--- + +# Saving useful run statistics {#savings-useful-run-statistics} + +**Understand how to save statistics about an Actor's run, what types of statistics you can save, and why you might want to save them for a large-scale scraper.** + +--- + +Using Crawlee and the Apify SDK, we are now able to collect and format data coming directly from websites and save it into a Key-Value store or Dataset. This is great, but sometimes, we want to store some extra data about the run itself, or about each request. We might want to store some extra general run information separately from our results or potentially include statistics about each request within its corresponding dataset item. + +The types of values that are saved are totally up to you, but the most common are error scores, number of total saved items, number of request retries, number of captchas hit, etc. Storing these values is not always necessary, but can be valuable when debugging and maintaining an Actor. As your projects scale, this will become more and more useful and important. + +## Learning 🧠 {#learning} + +Before moving on, give these valuable resources a quick lookover: + +- Refamiliarize with the various available data on the [Request object](https://crawlee.dev/api/core/class/Request). +- Learn about the [`failedRequestHandler` function](https://crawlee.dev/api/browser-crawler/interface/BrowserCrawlerOptions#failedRequestHandler). +- Understand how to use the [`errorHandler`](https://crawlee.dev/api/browser-crawler/interface/BrowserCrawlerOptions#errorHandler) function to handle request failures. +- Ensure you are comfortable using [key-value stores](/sdk/js/docs/guides/result-storage#key-value-store) and [datasets](/sdk/js/docs/guides/result-storage#dataset), and understand the differences between the two storage types. + +## Knowledge check 📝 {#quiz} + +1. Why might you want to store statistics about an Actor's run (or a specific request)? +2. In our Amazon scraper, we are trying to store the number of retries of a request once its data is pushed to the dataset. Where would you get this information? Where would you store it? +3. What is the difference between the `failedRequestHandler` and `errorHandler`? + +## Our task + +In our Amazon Actor, each dataset result must now have the following extra keys: + +```json +{ + "dateHandled": "date-here", // the date + time at which the request was handled + "numberOfRetries": 4, // the number of retries of the request before running successfully + "currentPendingRequests": 24 // the current number of requests left pending in the request queue +} +``` + +Also, an object including these values should be persisted during the run in th Key-Value store and logged to the console every 10 seconds: + +```json +{ + "errors": { // all of the errors for every request path + "some-site.com/products/123": [ + "error1", + "error2" + ] + }, + "totalSaved": 43 // total number of saved items throughout the entire run +} +``` + +[**Solution**](./solutions/saving_stats.md) + +## Wrap up + +Wow, you've learned a whole lot in this course, so give yourself the pat on the back that you deserve! If you were able to follow along with this course, that means that you're officially an **Apify pro**, and that you're equipped with all of the knowledge and tools you need to build awesome scalable web-scrapers either for your own personal projects or for the Apify platform. + +Congratulations! 🎉 + + + +--- +title: III - Tasks & storage +description: Understand how to save the configurations for Actors with Actor tasks. Also, learn about storage and the different types Apify offers. +sidebar_position: 6.3 +slug: /expert-scraping-with-apify/tasks-and-storage +--- + +# Tasks & storage {#tasks-and-storage} + +**Understand how to save the configurations for Actors with Actor tasks. Also, learn about storage and the different types Apify offers.** + +--- + +Both of these are very different things; however, they are also tied together in many ways. **Tasks** run Actors, Actors return data, and data is stored in different types of **Storages**. + +## Tasks {#tasks} + +Tasks are a very useful feature which allow us to save pre-configured inputs for Actors. This means that rather than configuring the Actor every time, or rather than having to save screenshots of various different Actor configurations, you can store the configurations right in your Apify account instead, and run the Actor at will with them. + +## Storage {#storage} + +Storage allows us to save persistent data for further processing. As you'll learn, there are two main storage options on the Apify platform, as well as two main storage types (**named** and **unnamed**) with one big difference between them. + +## Learning 🧠 {#learning} + +- Check out [the docs about Actor tasks](/platform/actors/running/tasks). +- Read about the [two main storage options](/platform/storage/dataset) on the Apify platform. +- Understand the [crucial differences between named and unnamed storages](/platform/storage/usage#named-and-unnamed-storages). +- Learn about the [`Dataset`](/sdk/js/reference/class/Dataset) and [`KeyValueStore`](/sdk/js/reference/class/KeyValueStore) objects in the Apify SDK. + +## Knowledge check 📝 {#quiz} + +1. What is the relationship between Actors and tasks? +2. What are the differences between default (unnamed) and named storage? Which one would you use for everyday usage? +3. What is data retention, and how does it work for all types of storages (default and named)? + +[**Solution**](./solutions/using_storage_creating_tasks.md) + +## Next up {#next} + +The [next lesson](./apify_api_and_client.md) is very exciting, as it will unlock the ability to seamlessly integrate your Apify Actors into your own external projects and applications with the Apify API. + + + +label: Actor basics +position: 2 + + + +--- +title: Actor description & SEO description +description: Learn about Actor description and meta description. Where to set them and best practices for both content and length. +sidebar_position: 3 +category: apify platform +slug: /actor-marketing-playbook/actor-basics/actor-description +--- + +Learn about Actor description and meta description. Where to set them and best practices for both content and length. + +--- + +## What is an Actor description? + +First impressions are important, especially when it comes to tools. Actor descriptions are the first connection potential users have with your Actor. You can set two kinds of descriptions: _regular description_ (in Apify Store) and _SEO description_ (on Google search), along with their respective names: regular name and SEO name. + +:::tip + +You can change descriptions and names as many times as you want. + +::: + +## Regular description vs. SEO description + +| | Actor description & name | SEO description & name | +|---|---|---| +| Name length | 40-50 characters | 40-50 characters | +| Description length | 300 characters | 145-155 characters | +| Visibility | Visible on Store | Visible on Google | + +### Description & Actor name + +Actor description is what users see on the Actor's web page in Apify Store, along with the Actor's name and URL. When creating an Actor description, a “warm” visitor experience is prioritized (more on that later). + +![actor name & description](images/actor-description-name.png) + +Actor description is also present in Apify Console and across Apify Store. + +![actor description in store](images/actor-description-store.png) + +### SEO description & SEO name + +Actor SEO description is a tool description visible on Google. It is shorter and SEO-optimized (keywords matter here). When creating the SEO description, a “cold” visitor experience is prioritized. + +![seo description](images/seo_description.png) + +Usually the way the potential user interacts with both these descriptions goes like this: SEO first, regular description second. Is there any benefit in them being different? + +### Is there any benefit in the description and meta description being different? + +Different descriptions give you a chance to target different stages of user acquisition. And make sure the acquisition takes place. + +_SEO description (and SEO name)_ is targeting a “cold” potential user who knows nothing about your tool yet and just came across it on Google search. They’re searching to solve a problem or use case. The goal of the meta description is to convince that visitor to click on your tool's page among other similar search results on Google. While it's shorter, SEO description is also the space to search-engine-optimize your language to the max to attract the most matching search intent. + +_Description (and name)_ is targeting a “warm” potential user who is already curious about your tool. They have clicked on the tool's page and have a few seconds to understand how complex the tool is and what it can do for them. Here you can forget SEO optimization and speak directly to the user. The regular description also has a longer character limit, which means you can expand on your Actor’s features. + +Learn more about search intent here: [SEO](/academy/actor-marketing-playbook/promote-your-actor/seo) + +## Where can Actor descriptions be set? + +Both descriptions can be found and edited on the very right **Publication tab → Display information.** It has to be done separately for each Actor. + +:::note + +Setting the SEO description and SEO name is optional. If not set, the description will just be duplicated. + +::: + +![changing seo name](images/changing__SEO_name.png) + +![changing actor name and seo name](images/changing_Actor_name_and_SEO_name.png) + +Actor description specifically can also be quick-edited in this pop-up on the Actor's page in Apify Console. Open the **Actor's page**, then click on **…** in the top right corner, and choose ✎ **Edit name or description**. Then set the URL in the **Unique name** ✎ field and click **Save**. + +![changing actor description](images/change_Actor_description.png) + +## Tips and recommendations on how to write descriptions + +When writing a description, less is more. You only have a few seconds to capture attention and communicate what your Actor can do. To make the most of that time, follow these guidelines used by Apify (these apply to both types of descriptions): + +### Use variations and experiment 🔄 + +- _SEO name vs. regular name_: + - name: Airbnb Scraper + - SEO name: Airbnb Data Scraper +- _Keywords on the web page_:
+Include variations, e.g. Airbnb API, Airbnb data, Airbnb data scraper, Airbnb rentals, Airbnb listings + - No-code scraping tool to extract Airbnb data: host info, prices, dates, location, and reviews. + - Scrape Airbnb listings without official Airbnb API! +- _Scraping/automation process variations_:
+Use terms, e.g. crawl, crawler, scraping tool, finder, scraper, data extraction tool, extract data, get data + - Scrape XYZ data, scraped data, data scraper, data crawler. + +### Choose how to start your sentences 📝 + +- _Noun-first (descriptive)_: + - Data extraction tool to extract Airbnb data: host info, prices, dates, location, and reviews. +- _Imperative-first (motivating)_: + - Try a free web scraping tool to extract Airbnb data: host info, prices, dates, location, and reviews. + + +### Keep it short and SEO-focused ✂️ + +- _Be concise and direct_: clearly state what your Actor does. Avoid unnecessary fluff and boilerplate text. + - ✅ Scrapes job listings from Indeed and gathers... + - ❌ *This Actor scrapes job listings from Indeed in order to gather... +- _Optimize for search engines_: include popular keywords related to your Actor’s functionality that users might search for. + - ✅ This Indeed scraper helps you collect job data efficiently. Use the tool to gather... + - ❌ This tool will search through job listings on Indeed and offers you... + + +### List the data your Actor works with 📝 + +- Data extraction tool to extract Airbnb data: host info, prices, dates, location, and reviews. +- Get hashtags, usernames, mentions, URLs, comments, images, likes, locations without the official Instagram API. + +### Use keywords or the language of the target website 🗣️ + +- Extract data from hundreds of Airbnb home rentals in seconds. +- Extract data from chosen tik-toks. Just add a TikTok URL and get TikTok video and profile data: URLs, numbers of shares, followers, hashtags, hearts, video, and music metadata. +- Scrape Booking with this hotels scraper and get data about accommodation on Booking.com. + +### Highlight your strong suits 🌟 + +- Ease of use, no coding, user-friendly: + - Easy scraping tool to extract Airbnb data. +- Fast and scalable: + - Scrape whole cities or extract data from hundreds of Airbnb rentals in seconds. +- Free (only if the trial run can cover $5 free credits): + - Try a free scraping tool to extract Airbnb data: host info, prices, dates, location, and reviews. + - Extract host information, locations, availability, stars, reviews, images, and host/guest details for free. +- Available platform features (various formats, API, integrations, scheduling): + - Export scraped data in formats like HTML, JSON, and Excel. +- Additional tips: + - Avoid ending lists with etc. + - Consider adding relevant emojis for visual appeal. + +### Break it down 🔠 + +Descriptions typically fit into 2-3 sentences. Don't try to jam everything into one. + +Examples: + +1. Scrape whole cities or extract data from hundreds of Airbnb rentals in seconds. +1. Extract host information, addresses, locations, prices, availability, stars, reviews, images, and host/guest details. +1. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools. + +## FAQ + +#### Can the Actor's meta description and description be the same? + +Yes, they can, as long as they have the same (shorter) length (under 150 characters). But they can also be different - there's no harm in that. + +#### How different can description and meta description be? + +They can be vastly different and target different angles of your Actor. You can experiment by setting up different SEO descriptions for a period of time and seeing if the click-through rate rises. + +#### I set a custom SEO description but Google doesn't show it + +Sometimes Google picks up a part of the README as the SEO description. It's heavily dependent on the search query. Sometimes what you see on Google might look differently compared to how you set the SEO description. It's all a part of how Google customizes search results. +
+ + +--- +title: Actors & emojis +description: Discover how emojis can boost your Actors by grabbing attention, simplifying navigation, and enhancing clarity. Improve user experience and engagement on Apify Store. +sidebar_position: 5 +category: apify platform +slug: /actor-marketing-playbook/actor-basics/actors-and-emojis +--- + +Using emojis in Actors is a science on its own. Learn how emojis enhance the user experience in Actors by grabbing attention, simplifying navigation, and making information clearer. + +## On the use of emojis in Actors + +We started using emojis in Actors for several reasons. First, tech today often uses emojis to make things look more user-friendly. Second, people don’t read as much as we’d like. You only have a few seconds to grab their attention, and text alone can feel overwhelming. Third, we don’t have many opportunities or space to explain things about Actors, and we want to avoid users needing to open extra tabs or pages. Clarity should come instantly, so we turned to emojis. + +When evaluating a new tool, those first 5 seconds are critical. That’s why we use emojis extensively with our Actors. They’re part of the Actor SEO title and description to help the tool stand out in Google search results, although Google doesn't always display them. In READMEs, they serve as shortcuts to different sections and help users quickly understand the type of data they’ll get. In complex input schemas, we rely on emojis to guide users and help them navigate the tool more efficiently. + +## Emoji science + +Believe it or not, there’s a science to emoji usage. When we use emojis in Actors and related content, we tap into the brain's iconic and working memory. Iconic memory holds information for less than a second - this is unconscious processing, where attributes like color, size, and location are instantly recognized. This part is where emojis guide the person's attention in the sea of text. They signify that something important is here. Emojis help with that immediate first impression and create a sense of clarity. + +After that, the brain shifts to working memory, where it combines information into visual chunks. Since we can only hold about 3-4 chunks at once, emojis help reinforce key points, thus reducing cognitive load. Consistent emoji use across the Actor ecosystem ensures users can quickly connect information without getting overwhelmed. + +As an example of this whole process, first, the user notices the emojis used in the field titles (pre-attentive processing). They learn to associate the emojis with those titles (attentive processing). Later, when they encounter the same emojis in a README section, they’ll make the connection, making it easier to navigate without drowning in a sea of text. + +## Caveats to emojis + +1. Don't overuse them, and don’t rely on emojis for critical information. Emojis should support the text, not replace key explanations or instructions. They're a crutch for concise copywriting, not a universal solution. +2. Use them consistently. Choose one and stick with it across all content: descriptions, parts of input schema, mentions in README, blog posts, etc. +3. Some emojis have multiple meanings, so choose the safest one. It could be general internet knowledge or cultural differences, so make sure the ones you choose won’t confuse or offend users in other markets. +4. Some emojis don’t render well on Windows or older devices. Try to choose ones that display correctly on Mac, Windows, and mobile platforms. Besides, emoji-heavy content can be harder for screen readers and accessibility tools to interpret. Make sure the information is still clear without the emojis. +5. It's okay not to use them. + + + +--- +title: How to create an Actor README +description: Learn how to write a comprehensive README to help users better navigate, understand and run public Actors in Apify Store. +sidebar_position: 3 +category: apify platform +slug: /actor-marketing-playbook/actor-basics/how-to-create-an-actor-readme +--- + +**Learn how to write a comprehensive README to help users better navigate, understand and run public Actors in Apify Store.** + +--- + +## What's a README in the Apify sense? + +At Apify, when we talk about a README, we don’t mean a guide mainly aimed at developers that explains what a project is, how to set it up, or how to contribute to it. At least, not in its traditional sense. + +You could argue our notion of README is closer to this [one described on GitHub](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-readmes): + +README files typically include information on: + +- What the project does +- Why the project is useful +- How users can get started with the project +- Where users can get help with your project + +We mean all of this and even more. At Apify, when we talk about READMEs, we refer to the public Actor detail page on Apify Store. Specifically, its first tab. The README exists in the same form both on the web and in Console. So what is it for? + +Before we dive in, a little disclaimer: you don't need your Apify README to fulfill all its purposes. Technically, you could even publish an Actor with just a single word in the README. But you'd be missing out if you did that. + +Your Actor’s README has at least four functions: + +1. _SEO_ - If your README is well-structured and includes important keywords — both in headings and across the text — it has a high chance of being noticed and promoted by Google. Organic search brings the most motivated type of potential users. If you win this game, you've won most of the SEO game. +2. _First impression_ - Your README is one of the first points of contact with a potential user. If you come across as convincing, clear, and reassuring it could be the factor that will make a user try your Actor for their task. +3. _Extended instruction_ - The README is also the space that explains specific complex input settings. For example, special formatting of the input, any coding-related, or extended functions. Of course, you could put that all in a blog post as well, but the README should be their first point of contact. +4. _Support_ - Your users come back to the README when they face issues. So use it as a space to let them know that's where they can find links to the tutorials if they run into issues, describe common troubleshooting techniques, share tricks, or warn you about bugs. + +## README elements theory + +These are the most important elements of the README. This structure is also not to be followed to a “t”. Of course, what you want to say to your potential users and how you want to promote your Actor will differ case by case. These are just the most common practices we have for our Actor READMEs. Beware that the headings are written with SEO in mind, which is why you see certain keywords repeated over and over. + +Aim for sections 1–6 below and try to include at least 300 words. You can move the sections around to some extent if it makes sense, e.g. 3 might come after 6. Consider using emojis as bullet points or otherwise trying to break up the text. + +### Intro and features + +What is [Actor]? + +- explain in two or three sentences what the Actor does and the easiest way to try it. Mention briefly what kind of data it can extract and any other tangible goal the tool can achieve. Describe the input in one sentence. Highlight the most important words in bold. + +What can this [Actor] do? + +- list the main features of this tool. list multiple ways of input if applicable. list platform advantages. If it's a bundle, mention the steps that the Actor will do for you, mention specific obstacles this tool is able to overcome, say upfront how many results you can get for free. + +:::tip Remember the Apify platform! + +Your Actor + the Apify platform. They come as a package. Don't forget to flaunt all the advantages that the platform gives to your solution. + +::: + +Imagine if there was a solution that is identical to yours but without the platform advantages such as monitoring, access to API, scheduling, possibility of integrations, proxy rotation. Now, if that tool suddenly gained all those advantages it would surely make a selling point out of it. This is how you should be thinking about your tool — as a solution boosted by the Apify platform. Don't ever forget that advantage. + +What data can [Actor] extract? + +What data can you extract from [target website] + +- Create a table that represents the main data points that the Actor can extract. You don't have to list every single one, just list the most understandable and relatable ones. + +Depending on the complexity of your Actor, you might include one or all three of these sections. It will also depend on what your Actor does. If your Actor has simple input but does a lot of steps for the user under the hood (like a bundle would), you might like to include the "What can this Actor do?" section. If your Actor extracts data, it makes sense to include a section with a table. + +### Tutorial section + +This could be a simple listed step-by-step section or a paragraph with a link to a tutorial on a blog. + +A step-by-step section is reassuring for the user, and it can be a section optimized for Google. + +How do I use [Actor] to scrape website data? + +### Pricing + +How much will it cost to scrape [target site]? + +How much will scraping [target site] cost? + +Is scraping [target site] free? + +How much does it cost to extract [target site] data? + +Web scraping can be very unpredictable because there are a lot of elements involved in order for the process to be successful: the complexity of the website, proxies, cookies, etc. This is why it's important to set the pricing and scraping volume expectations for your users. + +You might think the part above the Actor detail page already indicates pricing. But this paragraph can still be useful. First of all, cost-related questions can show up in Google, if they are SEO optimized. Second, you can use this space to inform and reassure the user about the pricing, give more details about it, or entice them with the promise of very scalable scraping. + +- If it's a consumption pricing model (only consumed CUs), you can use this space to set expectations and explain what it means to pay for Compute Units. Similarly, if it's a rental Actor, you can also use this paragraph to set expectations. Talk about the average amount of data that can be scraped per given price. Make it easy for users to imagine how much they will pay for a given dataset. This will also make it easier for them to compare your solution with others on the market price-wise and value-wise. +- If it's price per result, you can extrapolate how many results a user can get on a free plan and also entice them with a larger plan and how many thousands of results they can get with that. +- If it's a bundle that consists of a couple of Actors that are priced differently, you can use this section to talk about the difference between all the Actors involved and how that will affect the final price of a run. + +In any case, on top of setting expectations and reassuring users, this paragraph can get into Google. If somebody is Googling "How much does it cost to scrape [website]", they might come across this part of your README and it will lead them from Google search directly to your Actor's detail page. So you don't want to miss that opportunity. + +![readme example](images/readme.png) + +### Input and output examples + +This is what people click on the most in the table of contents of the README. After they are done scrolling through the first part of the README, users are interested in how difficult the input it, what it looks like, and what kind of information they can expect. + +**Input**: often a screenshot of the input schema. This is also a way for people to see the platform even before they create an account. + +**Output**: can be shown as a screenshot if your output schema looks like something you would want to promote to users. You can also just include a JSON example containing a few objects. Even better if there's continuity between the input example and output example. + +If your datasets come out too complex and you want to save your users some scrolling, you can also show multiple output examples: one for reviews, one for contact details, one for ads, etc. + +### Other Actors + +Don't forget to promote your other Actors. While our system for Actor recommendation works - you can see related Actors at the bottom of the README — it only works within the same category or similar name. It won't recommend a completely different Actor from the same creator. So make sure to interconnect your work by taking the initiative yourself. You can mention your other Actors in a list or as a table. + +### FAQ, disclaimers, and support + +The FAQ is a section where you can keep all the secondary questions that might still come up. + +Here are just a few things we usually push to the FAQ section. + +- disclaimers and legality +- comparison table between your Actor and similar solutions +- information about the official API and how the scraper is a stand-in for it (SEO) +- questions brought up by the users +- tips on how best to use the Actor +- troubleshooting and mentioning known bugs +- mentioning the Issues tab and highlighting that you're open for feedback and collecting feedback +- mentioning being open to creating a custom solution based on the current one and showing a way to contact you +- interlinking +- mentioning the possibility of transferring data using an API — API tab +- possibility for integrations +- use cases for the data scraped, success stories exemplifying the use of data + +## Format of the README + +### Markdown + +The README has to be written in Markdown. The most important elements are H2 and H3 headings, links to pages, links to images, and tables. For specific formatting, you can try using basic HTML. That will also work. CSS won’t. + +### HTML use + +You can mix HTML with Markdown interchangeably. The Actor README will display either on the Apify platform. That gives you more freedom to use HTML when needed. Remember, don't try CSS. + +### Tone of the README + +Apify Store has many Actors in its stock, and it's only growing. The advantage of an Actor is that an Actor can be anything, as versatile or complex as possible. From a single URL type of input to complex features that give customized control over the input parameters to the user. There are Actors that are intended for users who aren't familiar with coding and don't have any experience with it. Ideally, the README should reflect the level of skill one should need to use the Actor. + +The tone of the README should make it immediately obvious who the tool is aimed at. If your tool's input includes glob patterns or looking for selectors, it should be immediately visible from the README. Before the user even tries the tool. Trying to simplify this information using simple words with ChatGPT can be misleading to the user. You will attract the wrong audience, and they will end up churning or asking you too many questions. + +And vice versa. If your target audience is people with little to no coding skills, who just prefer point-and-click solutions, this should be visible from the README. Speak in regular terms, avoid code blocks or complex information at the beginning unless it's absolutely necessary. This means that, when people land on your Actor detail page, they will have their expectations set from the get-go. + +### Length of a README + +When working on improving a README, we regularly look at heatmaps that show us where our website visitors spend most of their time. From our experience, most first-time visitors don't scroll past the first 25% of a README. That means that the first quarter of the README is where you want to focus the most of your attention if you're trying to persuade the page visitor to try your Actor. + +From the point of view of acquisition, the first few sections should make it immediately obvious what the tool is about, how hard it is to use, and who it is created for. This is why, in Apify's READMEs, you can see our first few paragraphs are built in such a way as to explain these things and reassure the visitors that anyone can use these tools. + +From the point of view of retention, it doesn't mean you can't have long or complex READMEs or not care for the information beyond the 25% mark. Since the README is also intended to be used as a backup when something goes wrong or the user needs more guidance, your users will come back to it multiple times. + +### Images and videos + +As for using screenshots and GIFs, put them in some sort of image hosting. Your own GitHub repository would be best because you have full control over it. Name the images with SEO in mind and try to keep them compressed but good enough quality. You don't want to load an image or GIF for too long. + +One trick is not only to add images but also to make them clickable. For some reason, people like clicking on images, at least they try to when we look at the heatmaps. You can lead the screenshot clicks towards a signup page, which is possible with Markdown. + +If your screenshot seems too big or occupies too much space, smaller size images are possible by using HTML. + +To embed a YouTube video, all you have to do is include its URL. No further formatting is needed, the thumbnail will render itself on the README page. + +:::tip Try Carbon for code + +If you want to add snippets of code anywhere in your README, you can use [Carbon](https://github.com/carbon-app/carbon). + +::: + +If you need quick Markdown guidance, check out [https://www.markdownguide.org/cheat-sheet/](https://www.markdownguide.org/cheat-sheet/) + + +## README and SEO + +Your README is your landing page. + +If there were only one thing to remember about READMEs on Apify Store, it would be this. A README on Apify Store is not just dry instructions on how to use your Actor. It has much more potential than that. + +In the eyes of Google, your Actor's detail page, aka README, is a full-fledged landing page containing all the most important information to be found and understood by users. + +Of course, that all only counts if your README is both well formatted and contains keywords. We'll talk about that part later on. + +What makes a good README? + +A good README has to be a balance between what you want your page visitors to know, your users to turn to when they run into trouble, and Google to register when it's indexing pages and considering which one deserves to be put up higher. + +### Table of contents + +The H1 of your page is the Actor name, so you don't have to set that up. Don't add more H1s. README headings should be H2 or H3. H2 headings will make up the table of contents on the right. So if you don't want the table to be too crowded, keep the H2s to the basics and push all the longer phrases and questions to H3s. H3s will stay hidden in the accordion in the default state until the visitor hovers their cursor over it. H4 readings can also be included, of course, but they won't show up as a part of the table of contents. + +### Keyword opportunities + +Do SEO research for keywords and see how they can fit organically into the text. Priority for H2s and H3s, then the regular text. Add new keyword-heavy paragraphs if you see an opportunity. + +The easiest sections to include keywords in are, for example: + +- API, as in Instagram API +- data, as in extract Instagram data +- Python, as in extract data in Python +- scrape, as in how to scrape X +- scraping, as in scraping X + +Now, could every H2 just say exactly what it is about, without SEO? Of course. You don't have to optimize your H2s and H3s, and are free to call them simply Features, How it works, Pricing, Support, etc. or not even to have many H2s at all and keep it all as one page. + +However, the H2s and H3s are what sometimes get into the Google Search results. If you're familiar with the People Also Ask section, that's the best place to match your H2s. They can also get highlighted in the Sitelinks of Google Search Results. + +Any part of your README can make it onto Google pages. The intro sentence describing what your Actor is about, a video, a random question. Each one can become a good candidate for those prime Google pages. That's why it's important to structure and write your README with SEO in mind. + +### Importance of including a video + +If your page has a video, it has a better chance of ranking higher in Google. + +## README and input schema + +The README should serve as a fallback for your users if something isn't immediately obvious in the input schema. There's also only that much space in the input schema and the tooltips, so naturally, if you want to provide more details about something, e.g. input, formatting, or expectations, you should put it in the README and refer to it from the relevant place in the input schema. + +Learn about [How to create a great input schema](/academy/actor-marketing-playbook/product-optimization/how-to-create-a-great-input-schema) + +## Readme elements template + +1. What does (Actor name) do? + - in 1–2 sentences describe what the Actor does and what it does not do + - consider adding keywords like API, e.g. Instagram API + - always have a link to the target website in this section +2. Why use (Actor name)? or Why scrape (target site)? + - How it can be beneficial for the user + - Business use cases + - Link to a success story, a business use case, or a blog post. +3. How to scrape (target site) + - Link to "How to…" blogs, if one exists (or suggest one if it doesn't) + - Add a video tutorial or GIF from an ideal Actor run. + +:::tip Embedding YouTube videos + +For better user experience, Apify Console automatically renders every YouTube URL as an embedded video player. Simply add a separate line with the URL of your YouTube video. + +::: + +- Consider adding a short numbered tutorial, as Google will sometimes pick these up as rich snippets. Remember that this might be in search results, so you can repeat the name of the Actor and give a link, e.g. + +1. Is it legal to scrape (target site)? + - This can be used as a boilerplate text for the legal section, but you should use your own judgment and also customize it with the site name. + + > Our scrapers are ethical and do not extract any private user data, such as email addresses, gender, or location. They only extract what the user has chosen to share publicly. We therefore believe that our scrapers, when used for ethical purposes by Apify users, are safe. However, you should be aware that your results could contain personal data. Personal data is protected by the GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers. You can also read our blog post on the legality of web scraping + > +2. Input + - Each Actor detail page has an input tab, so you just need to refer to that. If you like, you can add a screenshot showing the user what the input fields will look like. + - This is an example of how to refer to the input tab: + + > Twitter Scraper has the following input options. Click on the input tab for more information. + > +3. Output + - Mention "You can download the dataset extracted by (Actor name) in various formats such as JSON, HTML, CSV, or Excel.” + - Add a simplified JSON dataset example, like here https://apify.com/compass/crawler-google-places#output-example +4. Tips or Advanced options section + - Share any tips on how to best run the Actor, such as how to limit compute unit usage, get more accurate results, or improve speed. + +If you want some general tips on how to make a GitHub README that stands out, check out these guides. Not everything in there will be suitable for an Apify Actor README, so you should cherry-pick what you like and use your imagination. + +## Resources + +[Build a Stunning README For Your GitHub Profile](https://towardsdatascience.com/build-a-stunning-readme-for-your-github-profile-9b80434fe5d7) + +[How to Create a Beautiful README for Your GitHub Profile](https://yushi95.medium.com/how-to-create-a-beautiful-readme-for-your-github-profile-36957caa711c) + + + +--- +title: Importance of Actor URL +description: Learn how to set your Actor’s URL (technical name) and name effectively when creating it on Apify. Follow best practices to optimize your Actor’s web presence and ensure it stands out on Apify Store. +sidebar_position: 2 +category: apify platform +slug: /actor-marketing-playbook/actor-basics/importance-of-actor-url +--- + +**Actor URL (or technical name, as we call it), is the page URL of the Actor shown on the web. When you're creating an Actor, you can set the URL yourself along with the Actor name. Here are best practices on how to do it well.** + +![actor url example](images/what-is-actor-url.png) + +--- + +## Why is Actor URL so important? + +The Actor URL plays a crucial role in SEO. Google doesn't just read the Actor's name or README; it also analyzes the URL. The _URL is one of the first signals to Google about the content of your page_- whether it's a product listing, a tool, a blog post, a landing page for a specific offering, or something else entirely. Therefore, it's important to know how to use this shorthand to your advantage and clearly communicate to Google what your page offers. + +:::tip Choose the URL carefully + +This part of the manual is only applicable to new Actors. _Once set, existing Actor URLs shouldn't change_. + +::: + +## How to choose a URL + +The right naming can propel or hinder the success of the Actor on Google Search. Just as naming your Actor is important, so is choosing its URL. The only difference is, once set, the URL is intended to be permanent (more on this [later](/academy/actor-marketing-playbook/actor-basics/importance-of-actor-url)). What's the formula for the best Actor URL? + +### Brainstorming + +What does your Actor do? Does it scrape, find, extract, automate, connect? Think of these when you are looking for a name. You might already have a code name in mind, but it’s essential to ensure it stands out and is distinct from similar names—both on Google and on Apify Store. + +### Matching URL and name + +The easiest way is to make sure the Actor name and the technical name match. As in TikTok Scraper (tiktok-scraper) or Facebook Data Extractor (facebook-data-extractor). But they can also be different. + +### SEO + +The name should reflect not only what Actor does (or what website it targets), but also what words people use when they search for it. This is why it's also important to do SEO research to see which keywords work best for the topic. Ideally, the URL should include a keyword that has low complexity (low competition) but high traffic (high demand). + +Learn more about SEO research and the best tools for it here: [SEO](/academy/actor-marketing-playbook/promote-your-actor/seo) + +### Inspiration in Apify Store + +Explore Store URLs of similar Actors. But avoid naming your Actor too similarly to what already exists, because of these two reasons: + +1. There’s evidence that new URLs that are similar to existing ones can have drastically different levels of success. The first URL might thrive while a similar one published later struggles to gain traction. For example, _onedev/pentagon-scraper_ was published first and has almost 100x traction than _justanotherdev/pentagon-scraper_. It will be very hard for the latter to beat the former. The reason for this is that Google operates on a "first come, first served” basis, and once it's set, it is very hard to make Google change its ways and make it pay attention to new pages with a similar name. +2. As Apify Store is growing, it's important to differentiate yourself from the competition. A different URL is just one more way to do that. If a person is doing research on Store, they will be less likely to get confused between two tools with the same name. + +### Length of URL + +Ideally, keep it under four words. As in, _Facebook Data Extractor_ (_facebook-data-extractor_), not (_facebook-data-meta-online-extractor-light_). If the name is long and you're trying to match it with your URL, keep only the most essential words for the URL. + +### Variations + +It can be a long-tail keyword with the tool type in it: scraper, finder, extractor. But you can also consider keywords that include terms like API, data, and even variations of the website name. Check out what keywords competitors outside of Apify Store are using for similar tools. + +### Nouns and adjectives + +One last tip on this topic is to _avoid adjectives and verbs_. Your page is about a tool, so keep it to nouns. Anything regarding what the tool does (scrape, automate, import) and what it's like (fast, light, best) can be expressed in the Actor's name, not the Actor's URL. Adding an adjective or verb like that either does nothing for SEO and might even damage the SEO chances of the page. + +## Why you shouldn’t change your Actor URL + +:::tip Don't change the URL + +There's only one rule about Actor URL: don't change the URL. The Actor's name, however, can be changed without any problems. + +::: + +Once set, the page URL should not be changed. Because of those two important reasons: + +- Google dislikes changes to URLs. Once your Actor has built up keyword associations and familiarity with Google, regaining that standing after a URL change can be challenging. You will have to start from scratch. +- Current integrations will break for your Actor's users. This is essential for maintaining functionality. + +If you absolutely have to change the URL, you will have to communicate that fact to your users. + +💡 Learn more about the easiest ways to communicate with your users: [Emails to Actor users] + +## How and where to set the Actor URL + +In Console. Open the **Actor's page**, then click on **…** in the top right corner, and choose ✎ **Edit name or description**. Then set the URL in the **Unique name** ✎ field and click **Save**. + +![set actor url in console](images/how-and-where-to-set-the-actor-url-console.png) + +![set the actor url](images/how-and-where-to-set-the-actor-url.png) + + +## FAQ + +#### Can Actor URL be different from Actor name? + +Yes. While they can be the same, they don’t have to be. For the best user experience, keeping them identical is recommended, but you can experiment with the Actor's name. Just avoid changing the Actor URL. + +#### Can I change a very fresh Actor URL? + +Yes, but act quickly. It takes Google a few days to start recognizing your page. For this reason, if you really have to, _it is best to change the Actor's name in the first few days_, before you build a steady user base and rapport with Google. + +#### How long does it take Google to pick up on the new URL? + +Google reindexes Apify web pages almost every day. It might take anywhere from 3-7 days for it to pick up a new URL. Or it might happen within a day. + +#### Can I use the identical technical name as this other Actor? + +Yes, you can. But it will most likely lower your chances of being noticed by Google. + +#### Does changing my Apify account name affect the Actor URL? + +Yes. If you're changing from _justanotherdev/pentagon-scraper_ to _dev/pentagon-scraper_, it counts as a new page. Essentially, the consequences are the same as after changing the technical name of the Actor. + + + +--- +title: Name your Actor +description: Learn Apify’s standards for naming Actors and how to choose the right name for your scraping and automation tools and maximize visibility on Apify Store. +sidebar_position: 1 +category: apify platform +slug: /actor-marketing-playbook/actor-basics/name-your-actor +--- + +**Apify's standards for Actor naming. Learn how to choose the right name for scraping and automation Actors and how to optimize your Actor for search engines.** + +--- + +Naming your Actor can be tricky, especially after you’ve worked hard on it. To help people find your Actor and make it stand out, we’ve set some naming guidelines. These will help your Actor rank better on Google and keep things consistent on [Apify Store](https://apify.com/store). + +Ideally, you should choose a name that clearly shows what your Actor does and includes keywords people might use to search for it. + +## Parts of Actor naming + +Your Actor's name consists of four parts: actual name, SEO name, URL, and GitHub repository name. + +- Actor name (name shown in Apify Store), e.g. _Booking Scraper_. + - Actor SEO name (name shown on Google Search, optional), e.g. _Booking.com Hotel Data Scraper_. + - If the SEO name is not set, the Actor name will be the default name shown on Google. +- Actor URL (technical name), e.g. _booking-scraper_. + - More on it on [Importance of Actor URL](/academy/actor-marketing-playbook/actor-basics/importance-of-actor-url) page. +- GitHub repository name (best to keep it similar to the other ones, for convenience), e.g. _actor-booking-scraper_. + +## Actor name + +The Actor name provides a human-readable name. The name is the most important real estate from an SEO standpoint. It should exactly match the most likely search query that potential users of your Actor will use. At the same time, it should give your Actor a clear name for people who will use it every day. + +:::tip + +Your Actor's name should be _40-50 characters_ long. You can change your Actor name freely in Apify Console. + +::: + +### Actor name vs. SEO name + +There's an option to step away from your Actor's name for the sake of search engine optimization — the Actor SEO name. The Actor name and Actor SEO name serve different purposes: + +- _Actor name_: this is the name visible in Apify Store and Console. It should be easy for users to understand and quickly show what your Actor does. It’s about attracting users who browse the Store. + + ![actor name example](images/actor-name.png) + +- _Actor SEO name_: this is the name that appears in search engine results. It should include keywords people might search for to find your Actor. It’s about improving visibility on search engines and encouraging users to click on your link. + + ![actor seo name example](images/actor-seo-name.png) + +For example: + +- _Actor name_: YouTube Scraper +- _Actor SEO name_: YouTube data extraction tool for video analysis + +Here, the SEO name uses extra keywords to help people find it through search engines, while the Actor name is simpler and easier for users to understand and find on Apify Store. + +💡 When creating the SEO name, focus on using relevant keywords that potential users might search for. It should still match what your Actor does. More about SEO name and description: [Actor description and SEO description] + +### Actor name vs. technical name + +The Actor name and technical name (or URL) have different uses: + +- _Actor name_: this is the name users see on Apify Store and Console. It’s designed to be user-friendly and should make the Actor's purpose clear to anyone browsing or searching for it. +- _Technical name_: this is a simplified, URL-friendly version used in technical contexts like API calls and scripts. This name should be concise and easily readable. Once set, it should not be changed as it can affect existing integrations and cause broken links. + +For example: + +- _Actor name_: Google Search Scraper +- _Technical name_: google-search-scraper + +The Actor name is user-friendly and descriptive, while the technical name is a clean, URL-compatible version. Note that the technical name does not include spaces or special characters to ensure it functions properly in technical contexts. + +:::important + +This is important for SEO! Once set, the technical name should not be changed. Make sure you finalize this name early in development. More on why here: [Importance of Actor URL] + +::: + +## Best practices for naming + +### Brainstorming + +What does your Actor do? Does it scrape, find, extract, automate, connect, or upload? When choosing a name, ensure it stands out and is distinct from similar names both on Google and on Apify Store. + +- _Use nouns and variations_: use nouns like "scraper", "extractor", “downloader”, “checker”, or "API" to describe what your Actor does. You can also include terms like API, data, or variations of the website name. +- _Include key features_: mention unique features or benefits to highlight what sets your Actor apart. +- _Check for uniqueness_: ensure your name isn’t too similar to existing Actors to avoid confusion and help with SEO. + +### Match name and URL + +The simplest approach is to make all names match. For example, TikTok Ads Scraper (tiktok-ads-scraper) or Facebook Data Extractor (facebook-data-extractor). However, variations are acceptable. + +### Name length + +Keep the name concise, ideally less than four words. For instance, Facebook Data Extractor is preferable to Facebook Meta Data Extractor Light. + +### Check Apify Store for inspiration + +Look at the names of similar Actors on Apify Store, but avoid naming your Actor too similarly. By choosing a unique name, you can stand out from the competition. This will also reduce confusion and help users easily distinguish your Actor. + +### Keep SEO in mind + +Even though you can set a different variation for SEO name specifically, consider doing a bit of research when setting the regular name as well. The name should reflect what the Actor does and the keywords people use when searching for it. If the keywords you find sound too robotic, save them for the SEO name. But if they sound like something you'd search for, it's a good candidate for a name. + +You can also check the keywords competitors use for similar tools outside Apify Store. + +### Occasionally experiment + +You can test and refine your SEO assumptions by occasionally changing the SEO name. This allows you to track how changes to names affect search rankings and user engagement. Changing the regular name is not forbidden but still less desirable since it can confuse your existing users and also affect SEO. + +## Naming examples + +### Scraping Actors + +✅: + +- Technical name (Actor's name in the [Apify Console](https://console.apify.com/)): `${domain}-scraper`, e.g. youtube-scraper. +- Actor name: `${Domain} Scraper`, e.g. YouTube Scraper. +- Name of the GitHub repository: `actor-${domain}-scraper`, e.g. actor-youtube-scraper. + +❌: + +- Technical name: `the-scraper-of-${domain}`, e.g. the-scraper-of-youtube. +- Actor name: `The Scraper of ${Domain}`, e.g. The Scraper of YouTube. +- GitHub repository: `actor-the-scraper-of-${domain}`, e.g. actor-the-scraper-of-youtube. + +If your Actor only caters to a specific service on a domain (and you don't plan on extending it), add the service to the Actor's name. + +For example, + +- Technical name: `${domain}-${service}-scraper`, e.g. google-search-scraper. +- Actor name: `${Domain} ${Service} Scraper`, e.g. [Google Search Scraper](https://apify.com/apify/google-search-scraper). +- GitHub repository: `actor-${domain}-${service}-scraper`, e.g. actor-google-search-scraper. + +### Non-scraping Actors + +Naming for non-scraping Actors is more liberal. Being creative and considering SEO and user experience are good places to start. Think about what your users will type into a search engine when looking for your Actor. What is your Actor's function? + +Below are examples for the [Google Sheets](https://apify.com/lukaskrivka/google-sheets) Actor. + +✅: + +- Technical name: google-sheets. +- Actor name: Google Sheets Import & Export. +- GitHub repository: actor-google-sheets. + +❌: + +- Technical name: import-to-and-export-from-google-sheets. +- Actor name: Actor for Importing to and Exporting from Google Sheets. +- GitHub repository: actor-for-import-and-export-google-sheets. + +:::warning Renaming your Actor + +You may rename your Actor freely, except when it comes to the Actor URL. Remember to read [Importance of Actor URL](/academy/actor-marketing-playbook/actor-basics/importance-of-actor-url) to find out why! + +::: + + + +label: Interact with users +position: 4 + + + +--- +title: Emails to Actor users +description: Email communication is a key tool to keep users engaged and satisfied. Learn when and how to email your users effectively to build loyalty and strengthen relationships with this practical guide. +sidebar_position: 1 +category: apify platform +slug: /actor-marketing-playbook/interact-with-users/emails-to-actor-users +--- + +**Getting users is one thing, but keeping them is another. While emailing your users might not seem like a typical marketing task, any seasoned marketer will tell you it’s essential. It’s much easier to keep your current users happy and engaged than to find new ones. This guide will help you understand when and how to email your users effectively.** + +--- + +## Whom and where to email + +You can email the audience of a specific Actor directly from Apify Console. Go to **Actors > Emails > Compose new +**. From there, select the Actor whose users you want to email, write a subject line, and craft your message. An automatic signature will be added to the end of your email. + +## How to write a good email + +Emails can include text, formatting, images, GIFs, and links. Here are four main rules for crafting effective emails: + +1. Don’t email users without a clear purpose. +2. Keep your message concise and friendly. +3. Make the subject line direct and to the point. Consider adding an emoji to give users a hint about the email’s content. +4. Use formatting to your advantage. Console emails support Markdown, so use bold, italics, and lists to highlight important details. + +Additional tips: + +- Show, don’t tell — use screenshots with arrows to illustrate your points. +- If you’re asking users to take action, include a direct link to what you're referring to. +- Provide alternatives if it suits the situation. +- Always send a preview to yourself before sending the email to all your users. + +## When to email users + +Our general policy is to avoid spamming users with unnecessary emails. We contact them only if there's a valid reason. Here’s the list of regular good reasons to contact users of the Actor: + +### 1. Introducing a new feature of the Actor + +New filter, faster scraping, changes in input schema, in output schema, a new Integration, etc. + +>✉️ 🏙️ Introducing Deep city search for Tripadvisor scrapers +> +>Hi, +> +>Tired of Tripadvisor's 3000 hotels-per-search limit? We've got your back. Say hello to our latest baked-in feature: Deep city search. Now, to get all results from a country-wide search you need to just set Max search results above 3000, and watch the magic happen. +> +>A bit of context: while Tripadvisor never limited the search for restaurants or attractions, hotel search was a different case; it always capped at 3000. Our smart search is designed to overcome that limit by including every city within your chosen location. We scrape hotels from each one, ensuring no hidden gems slip through the cracks. This feature is available for [Tripadvisor Scraper](https://console.apify.com/actors/dbEyMBriog95Fv8CW/console) and [Tripadvisor Hotels Scraper](https://console.apify.com/actors/qx7G70MC4WBE273SM/console). +> +>So get ready for an unbeatable hotel-hunting experience. Give it a spin, and let us know what you think! + +Introduce and explain the features, add a screenshot of a feature if it will show in the input schema, and ask for feedback. + +### 2. Actor adapting to the changes of the website it scrapes + +A common situation in web scraping that's out of your control. + +>✉️ 📣 Output changes for Facebook Ads Scraper +> +>Hi, +> +>We've got some news regarding your favorite Actor – [Facebook Ads Scraper](https://console.apify.com/actors/JJghSZmShuco4j9gJ/console). Recently, Facebook Ads have changed their data format. To keep our Actor running smoothly, we'll be adapting to these changes by slightly tweaking the Actor Output. Don't worry; it's a breeze! Some of the output data might just appear under new titles. +> +>This change will take place on October 10; please** **make sure to remap your integrations accordingly. +> +>Need a hand or have questions? Our support team is just one friendly message away. + +Inform users about the reason for changes and how the changes impact them and the Actor + give them a date when the change takes effect. + +### 3. Actor changing its payment model (from rental to pay-per-result, for example) + +Email 1 (before the change, warning about deprecation). + +>✉️ 🛎 Changes to Booking Scraper +> +>Hi, +> +>We’ve got news regarding the Booking scraper you have been using. This change will happen in two steps: +> +>1. On September 22, we will deprecate it, i.e., new users will not be able to find it in Store. You will still be able to use it though. +>2. At the end of October, we will unpublish this Actor, and from that point on, you will not be able to use it anymore. +> +>Please use this time to change your integrations to our new [Booking Scraper](https://apify.com/voyager/booking-scraper). +> +>That’s it! If you have any questions or need more information, don’t hesitate to reach out. + +Warn the users about the deprecation and future unpublishing + add extra information about related Actors if applicable + give them steps and the date when the change takes effect. + +Email 2 (after the change, warning about unpublishing) + +>✉️ **📢 Deprecated Booking Scraper will stop working as announced 📢** +> +>Hi, +> +>Just a heads-up: today, the deprecated [Booking Scraper](https://console.apify.com/actors/5T5NTHWpvetjeRo3i/console) you have been using will be completely unpublished as announced, and you will not be able to use it anymore. +> +>If you want to continue to scrape Booking.com, make sure to switch to the [latest Actor version](https://apify.com/voyager/booking-scraper). +> +>For any assistance or questions, don't hesitate to reach out to our support team. + +Remind users to switch to the Actor with a new model. + +### 4. After a major issue + +Actor downtime, performance issues, Actor directly influenced by platform hiccups. + +>✉️ **🛠️ Update on Google Maps Scraper: fixed and ready to go** +> +>Hi, +> +>We've got a quick update on the Google Maps Scraper for you. If you've been running the Actor this week, you might have noticed some hiccups — scraping was failing for certain places, causing retries and overall slowness. +> +>We apologize for any inconvenience this may have caused you. The **good news is those performance issues are now resolved**. So feel free to resurrect any affected runs using the "latest" build, should work like a charm now. +> +>Need a hand or have questions? Feel free to reply to this email. + +Apologize to users and or let them know you're working on it/everything is fixed now. This approach helps maintain trust and reassures users that you're addressing the situation. + +:::tip + +It might be an obvious tip, but If you're not great at emails, just write a short draft and ask ChatGPT to polish it. Play with the style until you find the one that suits you. You can even create templates for each situation. If ChatGPT is being too wordy, you can ask it to write at 9th or 10th-grade level, and it will use simpler words and sentences. + +::: + +## Emails vs. newsletters + +While sending an email is usually a quick way to address immediate needs or support for your users, newsletters can be a great way to keep everyone in the loop on a regular basis. Instead of reaching out every time something small happens, newsletters let you bundle updates together. + +Unless it's urgent, it’s better to wait until you have 2 or 3 pieces of news and share them all at once. Even if those updates span across different Actors, it’s perfectly fine to send one newsletter to all relevant users. + +Here are a few things you can include in your newsletter: + +- updates or new features for your Actors or Actor-to-Actor Integrations +- an invitation to a live webinar or tutorial session +- asking your users to upvote your Actor, leave a review or a star +- a quick feedback request after introducing new features +- spotlighting a helpful blog post or guide you wrote or found +- sharing success stories or use cases from other users +- announcing a promotion or a limited-time discount +- links to your latest YouTube videos or tutorials + +Newsletters are a great way to keep your users engaged without overwhelming them. Plus, it's an opportunity to build a more personal connection by showing them you’re actively working to improve the tools they rely on. + +## Emailing a separate user + +There may be times when you need to reach out to a specific user — whether it’s to address a unique situation, ask a question that doesn’t fit the public forum of the **Issue tab**, or explore a collaboration opportunity. While there isn’t a quick way to do this through Apify Console just yet, you can ensure users can contact you by **adding your email or other contact info to your Store bio**. This makes it easy for them to reach out directly. + +✍🏻 Learn best practices on how to use your Store bio to connect with your users [Your Store bio](/academy/actor-marketing-playbook/interact-with-users/your-store-bio). + + + +--- +title: Issues tab +description: Learn how the Issues tab can help you improve your Actor, engage with users, and build a reliable, user-friendly solution. +sidebar_position: 2 +category: apify platform +slug: /actor-marketing-playbook/interact-with-users/issues-tab +--- + +**Once you publish your Actor in Apify Store, it opens the door to new users, feedback, and… issue reports. Users can create issues and add comments after trying your Actor. But why is this space so important?** + +--- + +## What is the Issues tab? + +The Issues tab is a dedicated section on your Actor’s page where signed-in users can report problems, share feedback, ask questions, and have conversations with you. You can manage each issue thread individually, and the whole thread is visible to everyone. The tab is divided into three categories: **Open**, **Closed**, and **All**, and it shows how long each response has been there. While only signed-in users can post and reply, all visitors can see the interactions, giving your page a transparent and welcoming vibe. + +:::note Keep active + +🕑 On the web, your average 🕑 **Response time** is calculated and shown in your Actor Metrics. The purpose of this metric is to make it easy for potential users to see how active you are and how well-maintained the Actor is. + +::: + +You can view all the issues related to your Actors by going to **Actors** > [**Issues**](https://console.apify.com/actors?tab=issues) in Apify Console. Users can get automatic updates on their reported issues or subscribe to issues they are interested in, so they stay informed about any responses. When users report an issue, they’re encouraged to share their run, which helps you get the full context and solve the problem more efficiently. Note that shared runs aren’t visible on the public Actor page. + +## What is the Issues tab for? + +The tab is a series of conversations between you and your users. There are existing systems like GitHub for that. So why create a separate system like an Issues tab? Since the Issues tab exists both in private space (Console) and public space (Actor's page on the web), it can fulfill two different sets of purposes. + +### Issues tab in Apify Console + +Originally, the Issues tab was only available in Apify Console, and its main goals were: + +- Convenience: a single space to hold the communication between you and your users. +- Unity and efficiency: make sure multiple users don't submit the same issue through multiple channels or multiple times. +- Transparency: make sure users have their issues addressed publicly and professionally. You can’t delete issues, you can only close them, so there's a clear record of what's been resolved and how. +- Quality of service and innovation: make sure the Actor gets fixed and continuously improved, and users get the quality scraping services they pay for. + +### Issues tab on the web + +Now that the Issues tab is public and on the web, it also serves other goals: + +- Credibility: new users can check how active and reliable you are by looking at the issues and your average 🕑 **Response time** even before trying your Actor. It also sets expectations for when to expect a response from you. +- Collaboration: developers can learn from each other’s support styles, which motivates everyone to maintain good interactions and keep up good quality work. +- SEO boost: every issue now generates its own URL, potentially driving more keyword traffic to your Actor's page + +## Example of a well-managed Issues tab + +Check out how the team behind the **Apollo.io leads scraper** manages their [Issues tab](https://apify.com/curious_coder/apollo-io-scraper/issues/open) for a great example of professional responses and quick problem-solving. + +Note that this Actor is a rental, so users expect a high-quality service. + +![issues tab example](images/issues-tab-example.png) + +:::warning + +Once your Actor is public, you’re required to have an Issues tab. + +::: + +## SEO for the Issues tab + +Yes, you read that right! The public Issues tab can boost your search engine visibility. Each issue now has its own URL, which means every report could help your Actor rank for relevant keywords. + +When we made the tab public, we took inspiration from StackOverflow’s SEO strategy. Even though StackOverflow started as a Q&A forum, its strong SEO has been key to its success. Similarly, your Actor’s Issues tab can help bring in more traffic, with each question and answer potentially generating more visibility. This makes it easier for users to find solutions quickly. + +## Tips for handling Actor issues + +1. _Don’t stay silent_ + + Respond quickly, even if it’s just a short note. If an issue takes weeks to resolve, keep the user in the loop. A quick update prevents frustration and shows the user (and others following it) that you’re actively working on solving the issue. + +2. _Encourage search to avoid duplication_ + + Save time by encouraging users to search for existing issues before submitting new ones. If a similar issue exists, they can follow that thread for updates instead of creating a new one. + +3. _Encourage reporters to be specific_ + + The more context, the better! Ask users to share details about their run, which helps you diagnose issues faster. If needed, remind them that runs are shared privately, so sensitive data won’t be exposed. + +4. _Use screenshots and links_ + + The same goes for your side. Screenshots and links to specific runs make your answers much clearer. It’s easier to walk the user through a solution if they can see what you’re referencing. + +5. _Structure issue reporting_ + + As you get more experienced, you’ll notice common types of issues: bugs, feature requests, questions, reports, misc. This way, you can prioritize and respond faster based on the category. + +6. _Have ready answers for common categories_ + + Once you recognize recurring types of issues, have pre-prepared responses. For example, if it’s a bug report, you might already have a troubleshooting guide you can link to, or if it’s a feature request, you can figure out the development timeline. + +7. _Be polite and precise_ + + Politeness goes a long way! Make sure your responses are respectful and straight to the point. It helps to keep things professional, even if the issue seems minor. + + +https://rewind.com/blog/best-practices-for-using-github-issues/ + + + +--- +title: Your Store bio +description: Your Apify Store bio is all about helping you promote your tools & skills. +sidebar_position: 3 +category: apify platform +slug: /actor-marketing-playbook/interact-with-users/your-store-bio +--- + +## Your Apify Store bio and Store “README” + +To help our community showcase their talents and projects, we introduced public profile pages for developers. On a dedicated page, you can showcase contact info, a summary of important Actor metrics (like total users, response time, and success rates), and all of their public Actors. We took inspiration from freelance platforms. + +This space is all about helping you shine and promote your tools and skills. Here’s how you can use it to your advantage: + +- Share your contact email, website, GitHub, X (Twitter), LinkedIn, or Discord handles. +- Summarize what you’ve been doing in Apify Store, your main skills, big achievements, and any relevant experience. +- Offer more ways for people to connect with you, such as links for booking a meeting, discounts, a subscription option for your email newsletter, or your YouTube channel or blog. + - You can even add a Linktree to keep things neat. +- Highlight your other tools on different platforms. +- Get creative by adding banners and GIFs to give your profile some personality. + +Everything is neatly available under a single URL, making it easy to share. + +Need some inspiration? Check out examples of how others are using their Store bio and README. You can set yours up by heading to **Settings > Account > Profile.** + + + +[https://apify.com/anchor](https://apify.com/anchor) + + + +[https://apify.com/jupri](https://apify.com/jupri) + + + +[https://apify.com/apidojo](https://apify.com/apidojo) + + + +[https://apify.com/curious_coder](https://apify.com/curious_coder) + + + +[https://apify.com/epctex](https://apify.com/epctex) + + + +[https://apify.com/microworlds](https://apify.com/microworlds) + + + +label: Product optimization +position: 5 + + + +--- +title: Actor bundles +description: Learn what an Actor bundle is, explore existing examples, and discover how to promote them. +sidebar_position: 2 +category: apify platform +slug: /actor-marketing-playbook/product-optimization/actor-bundles +--- + +**Learn what an Actor bundle is, explore existing examples, and discover how to promote them.** + +--- + +## What is an Actor bundle? + +If an Actor is an example of web automation software, what is an Actor bundle? An Actor bundle is basically a chain of multiple Actors unified by a common use case. Bundles can include both scrapers and automation tools, and they are usually designed to achieve an overarching goal related to scraping or automation. + +The concept of an Actor bundle originated from frequent customer requests for comprehensive tools. For example, someone would ask for a Twitter scraper that also performs additional tasks, or for a way to find all profiles of the same public figure across multiple social media platforms without needing to use each platform separately. + +For example, consider a bundle that scrapes company reviews from multiple platforms, such as Glassdoor, LinkedIn, and Indeed. Typically, you would need to use several different scrapers and then consolidate the results. But this bundle would do it all in one run, once provided with the name of the company. Or consider a bundle that scrapes all posts and comments of a given profile, and then produces a sentiment score for each scraped comment. + +The main advantage of an Actor bundle is its ease of use. The user inputs a keyword or a URL, and the Actor triggers all the necessary Actors sequentially to achieve the desired result. The user is not expected to use each Actor separately and then process and filter the results themselves. + +### Examples of bundles + +🔍 [Social Media Finder](https://apify.com/tri_angle/social-media-finder) searches for profiles on 13 social media sites provided just the (nick)name. + +🍝 [Restaurant Review Aggregator](https://apify.com/tri_angle/restaurant-review-aggregator) gets restaurant reviews from Google Maps, DoorDash, Uber Eats, Yelp, Tripadvisor, and Facebook in one place. + +🤔 [Social Media Sentiment Analysis Tool](https://apify.com/tri_angle/social-media-sentiment-analysis-tool) not only collects comments from Facebook, Instagram, and TikTok but also performs sentiment analysis on them. It unites post scrapers, comments scrapers and a text analysis tool. + +🦾 [Website Content Crawler + Pinecone bundle](https://apify.com/tri_angle/wcc-pinecone-integration) scrapes a website and stores the data in a Pinecone database to build and improve your own AI chatbot assistant. + +🤖 [Pinecone GPT Chatbot](https://apify.com/tri_angle/pinecone-gpt-chatbot) combines OpenAI's GPT models with Pinecone's vector database, which simplifies creating a GPT Chatbot. + +As you can see, they vary in complexity and range. + +--- + +## Caveats + +### Pricing model + +Since bundles are still relatively experimental, profitability is not guaranteed and will depend heavily on the complexity of the bundle. + +However, if you have a solid idea for a bundle, don’t hesitate to reach out. Prepare your case, write to our support team, and we’ll help determine if it’s worth it. + +### Specifics of bundle promotion + +First of all, when playing with the idea of creating a bundle, always check the keyword potential. Sometimes, there are true keyword gems just waiting to be discovered, with high search volume and little competition. + +However, bundles may face the challenge of being "top-of-the-funnel" solutions. People might not search for them directly because they don't have a specific keyword in mind. For instance, someone is more likely to search for an Instagram comment scraper than imagine a bundle that scrapes comments from 10 different platforms, including Instagram. + +Additionally, Google tends to favor tools with rather focused descriptions. If your tool offers multiple functions, it can send mixed signals that may conflict with each other rather than accumulate. + +Sometimes, even though a bundle can be a very innovative tool product-wise, it can be hard to market from an SEO perspective and match the search intent. + +In such cases, you may need to try different marketing and promotion strategies. Once you’ve exhausted every angle of SEO research, be prepared to explore non-organic marketing channels like Product Hunt, email campaigns, community engagement, Reddit, other social media, your existing customer base, word-of-mouth promotion, etc. + +Remember, bundles originated as customized solutions for specific use cases - they were not primarily designed to be easily found. + +This is also an opportunity to tell a story rather than just presenting a tool. Consider writing a blog post about how you created this tool, recording a video, or hosting a live webinar. If you go this route, it’s important to emphasize how the tool was created and what a technical feat it represents. + +That said, don’t abandon SEO entirely. You can still capture some SEO value by referencing the bundle in the READMEs of the individual Actors that comprise it. For example, if a bundle collects reviews from multiple platforms, potential users are likely to search for review scrapers for each specific platform—Google Maps reviews scraper, Tripadvisor reviews scraper, Booking reviews scraper, etc. These keywords may not lead directly to your review scraping bundle, but they can guide users to the individual scrapers, where you can then present the bundle as a more comprehensive solution. + +--- + +## Resources + +Learn more about Actor Bundles: https://blog.apify.com/apify-power-actors/ + + + +--- +title: How to create a great input schema +description: Optimizing your input schema. Learn to design and refine your input schema with best practices for a better user experience. +sidebar_position: 1 +category: apify platform +slug: /actor-marketing-playbook/product-optimization/how-to-create-a-great-input-schema +--- + +Optimizing your input schema. Learn to design and refine your input schema with best practices for a better user experience. + +--- + +## What is an input schema? + +So you've succeeded: your user has 1. found your Actor on Google, 2. explored the Actor's landing page, 3. decided to try it, and 4. created an Apify account. Now they’re on your Actor's page in Apify Console. The SEO fight is over. What’s next? + +Your user is finally one-on-one with your Actor — specifically, its input schema. This is the moment when they try your Actor and decide whether to stick with it. The input schema is your representative here, and you want it to work in your favor. + +Technically, the input schema is a `JSON` object with various field types supported by the Apify platform, designed to simplify the use of the Actor. Based on the input schema you define, the Apify platform automatically generates a _user interface_ for your Actor. + +Of course, you can create an Actor without setting up an elaborate input schema. If your Actor is designed for users who don't need a good interface (e.g. they’ll use a JSON object and call it via API), you can skip this guide. But most users engage with Actors in Manual mode, aka the Actor interface. So, if your Actor is complex or you’re targeting regular users who need an intuitive interface, it's essential to consider their experience. + +In this article, _we’ll refer to the input schema as the user interface_ of your Actor and focus exclusively on it. + +:::tip Understand input schemas + +To fully understand the recommendations in this blog post, you’ll first need to familiarize yourself with the [technical aspects of the input schema](https://docs.apify.com/platform/actors/development/actor-definition/input-schema). This context is essential to make good use of the insights shared here. + +::: + +## The importance of a good input schema + +It can feel intimidating when facing the Apify platform for the first time. You only have a few seconds for a user to assess the ease of using your Actor. + +If something goes wrong or is unclear with the input, an ideal user will first turn to the tooltips in the input schema. Next, they might check the README or tutorials, and finally, they’ll reach out to you through the **Issues** tab. However, many users won’t go through all these steps — they may simply get overwhelmed and abandon the tool altogether. + +A well-designed input schema is all about managing user expectations, reducing cognitive load, and preventing frustration. Ideally, a good input schema, as your first line of interaction, should: + +- Make the tool as easy to use as possible +- Reduce the user’s cognitive load and make them feel confident about using and paying for it +- Give users enough information and control to figure things out on their own +- Save you time on support by providing clear guidance +- Prevent incorrect or harmful tool usage, like overcharges or scraping personal information by default + +### Reasons to rework an input schema + +- Your Actor is complex and has many input fields +- Your Actor offers multiple ways to set up input (by URL, search, profile, etc.) +- You’re adding new features to your Actor +- Certain uses of the Actor have caveats that need to be communicated immediately +- Users frequently ask questions about specific fields + +👀 Input schema can be formatted using basic HTML. + +## Most important elements of the input schema + +You can see the full list of elements and their technical characteristics in [Docs](https://docs.apify.com/academy/deploying-your-code/input-schema): titles, tooltips, toggles, prefills, etc. That's not what this guide is about. It's not enough to just create an input schema, you should ideally aim to place and word its elements to the user's advantage: to alleviate the user's cognitive load and make the acquaintance and usage of your tool as smooth as possible. + +Unfortunately, when it comes to UX, there's only so much you can achieve armed with HTML alone. So here are the best elements to focus on, along with some best practices for using them effectively: + +- **`description` at the top** + - As the first thing users see, the description needs to provide crucial information and a sense of reassurance if things go wrong. Key points to mention: the easiest way to try the Actor, links to a guide, and any disclaimers or other similar Actors to try. + + ![Input schema description example](images/description-sshot.png) + + - Descriptions can include multiple paragraphs. If you're adding a link, it’s best to use the `target_blank` property so your user doesn’t lose the original Actor page when clicking. +- **`title` of the field (regular bold text)** + - This is the default way to name a field. + - Keep it brief. The user’s flow should be 1. title → 2. tooltip → 3. link in the tooltip. Ideally, the title alone should provide enough clarity. However, avoid overloading the title with too much information. Instead, make the title as concise as possible, expand details in the tooltip, and include a link in the tooltip for full instructions. + + ![Input schema input example](images/title-sshot.png) + +- **`prefill`, the default input** + - this is your chance to show rather than tell + - Keep the **prefilled number** low. Set it to 0 if it's irrelevant for a default run. + - Make the **prefilled text** example simple and easy to remember. + - If your Actor accepts various URL formats, add a few different **prefilled URLs** to show that possibility. + - Use the **prefilled date** format that the user is expected to follow. This way, they can learn the correct format without needing to check the tooltip. + - There’s also a type of field that looks like a prefill but isn’t — usually a `default` field. It’s not counted as actual input but serves as a mock input to show users what to type or paste. It is gray and disappears after clicking on it. Use this to your advantage. +- **toggle** + - The toggle is a boolean field. A boolean field represents a yes/no choice. + - So how would you word this toggle: **Skip closed places** or **Scrape open places only**? And should the toggle be enabled or disabled by default? + + ![Input schema toggle example](images/toggle-sshot.png) + + - You have to consider this when you're choosing how to word the toggle button and which choice to set up as the default. If you're making this more complex than it's needed (e.g. by using negation as the ‘yes’ choice), you're increasing your user's cognitive load. You also might get them to receive way less, or way more, data than they need from a default run. + - In our example, we assume the default user wants to scrape all places but still have the option to filter out closed ones. However, they have to make that choice consciously, so we keep the toggle disabled by default. If the toggle were enabled by default, users might not notice it, leading them to think the tool isn't working properly when it returns fewer results than expected. +- **sections or `sectionCaption` (BIG bold text) and `sectionDescription`** + - A section looks like a wrapped toggle list. + + ![Input schema sections example](images/sections-sshot.png) + + - It is useful to section off non-default ways of input or extra features. If your tool is complex, don't leave all fields in the first section. Just group them by topic and section them off (see the screenshot above ⬆️) + - You can add a description to every section. Use `sectionDescription` only if you need to provide extra information about the section (see the screenshot below ⬇️. + - sometimes `sectionDescription` is used as a space for disclaimers so the user is informed of the risks from the outset instead of having to click on the tooltip. + + ![Input schema section description example](images/section-description-sshot.png) + +- tooltips or `description` to the title + - To see the tooltip's text, the user needs to click on the `?` icon. + - This is your space to explain the title and what's going to happen in that field: any terminology, referrals to other fields of the tool, examples that don't fit the prefill, or caveats can be detailed here. Using HTML, you can add links, line breaks, code, and other regular formatting here. Use this space to add links to relevant guides, video tutorials, screenshots, issues, or readme parts if needed. + - Wording in titles vs. tooltips. Titles are usually nouns. They have a neutral tone and simply inform on what content this field is accepting (**Usernames**). + - Tooltips to those titles are usually verbs in the imperative that tell the user what to do (_Add, enter, use_). + - This division is not set in stone, but the reason why the tooltip is an imperative verb is because, if the user is clicking on the tooltip, we assume they are looking for clarifications or instructions on what to do. + + ![Input schema tooltips example](images/tooltips-sshot.png) + +- emojis (visual component) + - Use them to attract attention or as visual shortcuts. Use emojis consistently to invoke a user's iconic memory. The visual language should match across the whole input schema (and README) so the user can understand what section or field is referred to without reading the whole title. + - Don't overload the schema with emojis. They attract attention, so you need to use them sparingly. + +:::tip + +Read more on the use of emojis: [Actors and emojis] + +::: + +## Example of an improved input schema + +1. A well-used `description` space. The description briefly introduces possible scraping options, visual language (sections represented by emojis), the easiest way to try the tool, and a link to a tutorial in case of issues. The description isn't too long, uses different formatting, and looks reassuring. +2. The main section is introduced and visually separated from the rest. This is the space for the user to try the first run before they can discover the other options. +3. The title says right away that this field refers to multiple other fields, not only the first section. +4. `prefill` is a small number (so in case users run the tool with default settings, it doesn't take too long and isn't expensive for them) and uses the language of the target website (not results or posts, _videos_). +5. The tooltip expands with more details and refers to other sections it's applicable to using matching emojis. +6. Section names are short. Sections are grouped by content type. +7. More technical parameters lack emojis. They are formatted this way to attract less attention and visually inform the user that this section is the most optional to set. +8. Visual language is unified across the whole input schema. Emojis are used as a shortcut for the user to understand what section or field is referred to without actually reading the whole title. + +![Input schema example](images/improved-input-schema-example.png) + +### Example of a worse input schema + +The version above was the improved input schema. Here's what this tool's input schema looked like before: + +1. Brief and dry description, with little value for the user, easy to miss. Most likely, the user already knows this info because what this Actor does is described in the Actor SEO description, description, and README. +2. The field title is wordy and reads a bit techie: it uses terminology that's not the most accurate for the target website (_posts_) and limiting terms (_max_). The field is applicable for scraping by hashtags (field above) and by profile (section below). Easy detail to miss. +3. The prefilled number is too high. If the user runs the Actor with default settings, they might spend a lot of money, and it will take some time. Users often just leave if an Actor takes a long time to complete on the first try. +4. The tooltip simply reiterates what is said in the title. Could've been avoided if the language of the title wasn't so complex. +5. Merging two possible input types into one (profiles and URLs) can cause confusion. Verbose, reminds the user about an unrelated field (hashtags). +6. This section refers to profiles but is separate. The user had to make extra effort to scrape profiles. They have to move across 3 sections: (use Max posts from section 1, use Profiles input from section 2, use Date sorting filters from section 3). +7. The proxy and browser section invites the users to explore it even though it's not needed for a default run. It's more technical to set up and can make an impression that you need to know how to set it so the tool works. + +![Input schema example](images/worse-input-schema.png) + +## Best practices + +1. Keep it short. Don’t rely too much on text - most users prefer to read as little as possible. +2. Use formatting to your advantage (bold, italic, underline), links, and breaks to highlight key points. +3. Use specific terminology (e.g., posts, images, tweets) from the target website instead of generic terms like "results" or "pages." +4. Group related items for clarity and ease of use. +5. Use emojis as shortcuts and visual anchors to guide attention. +6. Avoid technical jargon — keep the language simple. +7. Minimize cognitive load wherever possible. + +## Signs and tools for improving input schema + +- _User feedback_. If they're asking obvious things, complaining, or consistently making silly mistakes with input, take notes. Feedback from users can help you understand their experience and identify areas for improvement. +- _High churn rates_. If your users are trying your tool but quickly abandon it, this is a sign they are having difficulties with your schema. +- _Input Schema Viewer_. Write your base schema in any code editor, then copy the file and put it into [**Input Schema Viewer](https://console.apify.com/actors/UHTe5Bcb4OUEkeahZ/source).** This tool should help you visualize your Input Schema before you add it to your Actor and build it. Seeing how your edits look in Apify Console right away will make the process of editing the fields in code easier. + +## Resources + +- Basics of input schema: [https://docs.apify.com/academy/deploying-your-code/input-schema](https://docs.apify.com/academy/deploying-your-code/input-schema) +- Specifications of input schema: [https://docs.apify.com/platform/actors/development/actor-definition/input-schema](https://docs.apify.com/platform/actors/development/actor-definition/input-schema) + + + +label: Promote your Actor +position: 3 + + + +--- +title: Blogs and blog resources +description: Blogs are still a powerful way to promote your Actors and build authority. By sharing expertise, engaging users, and driving organic traffic, blogging remains a key strategy to complement social media, SEO, and other platforms in growing your audience. +sidebar_position: 5 +category: apify platform +slug: /actor-marketing-playbook/promote-your-actor/blogs-and-blog-resources +--- + +**Blogs remain a powerful tool for promoting your Actors and establishing authority in the field. With social media, SEO, and other platforms, you might wonder if blogging is still relevant. The answer is a big yes. Writing blog posts can help you engage your users, share expertise, and drive organic traffic to your Actor.** + +## Why blogs still matter + +1. SEO. Blog posts are great for boosting your Actor’s search engine ranking. Well-written content with relevant keywords can attract users searching for web scraping or automation solutions. For example, a blog about “how to scrape social media profiles” could drive people to your Actor who might not find it on Google otherwise. +2. Establishing authority. When you write thoughtful, well-researched blog posts, you position yourself as an expert in your niche. This builds trust and makes it more likely users will adopt your Actors. +3. Long-form content. Blogs give you the space to explain the value of your Actor in-depth. This is especially useful for complex tools that need more context than what can fit into a README or product description. +4. Driving traffic. Blog posts can be shared across social media, linked in webinars, and included in your Actor’s README. This creates multiple avenues for potential users to discover your Actor. + +## Good topics for blog posts + +1. Problem-solving guides. Write about the specific problems your Actor solves. For example, if you’ve created an Actor that scrapes e-commerce reviews, write a post titled "How to automate e-commerce review scraping in 5 minutes". Focus on the pain points your tool alleviates. +2. Actor use cases. Show real-world examples of how your Actor can be applied. These can be case studies or hypothetical scenarios like "Using web scraping to track competitor pricing." +3. Tutorials and step-by-step guides. Tutorials showing how to use your Actor or similar tools are always helpful. Step-by-step guides make it easier for beginners to start using your Actor with minimal hassle. +4. Trends. If you’ve noticed emerging trends in web scraping or automation, write about them. Tie your Actor into these trends to highlight its relevance. +5. Feature announcements or updates. Have you recently added new features to your Actor? Write a blog post explaining how these features work and what makes them valuable. + +🪄 These days, blog posts always need to be written with SEO in mind. Yeah, it's annoying to use keywords, but think of it this way: even if there's the most interesting customer story and amazing programming insights, but nobody can find it, it won't have the impact you want. Do try to optimize your posts with relevant keywords and phrases — across text, structure, and even images — to ensure they reach your target audience. + +--- + +## Factors to consider when writing a blog + +1. Audience. Know your target audience. Are they developers, small business owners, or data analysts? Tailor your writing to match their technical level and needs. +2. SEO. Incorporate relevant keywords naturally throughout your post. Don’t overstuff your content, but make sure it ranks for search queries like "web scraping tools", "automation solutions", or "how to scrape LinkedIn profiles". Remember to include keywords in H2 and H3 headings. +3. Clarity and simplicity. Avoid jargon, especially if your target audience includes non-technical users. Use simple language to explain how your Actor works and why it’s beneficial. +4. Visuals. Include screenshots, GIFs, or even videos to demonstrate your Actor’s functionality. Visual content makes your blog more engaging and easier to follow. +5. Call to action (CTA). Always end your blog with a clear CTA. Whether it’s "try our Actor today" or "download the demo", guide your readers to the next step. +6. Engage with comments. If readers leave comments or questions, engage with them. Answer their queries and use the feedback to improve both your blog and Actor. + +--- + +## Best places to publish blogs + +There are a variety of platforms where you can publish your blog posts to reach the right audience: + +1. [Dev.to](http://dev.to/): It's a developer-friendly platform where technical content gets a lot of visibility, and a great place to publish how-to guides, tutorials, and technical breakdowns of your Actor. +2. Medium: Allows you to reach a broader, less technical audience. It’s also good for writing about general topics like automation trends or how to improve data scraping practices. +3. ScrapeDiary: Run by Apify, [scrapediary.com](http://scrapediary.com) is a blog specifically geared toward Apify community devs and web scraping topics. Publishing here is a great way to reach users already interested in scraping and automation. Contact us if you want to publish a blog post there. +4. Personal blogs or company websites. If you have your own blog or a company site, post there. It’s the most direct way to control your content and engage your established audience. + +--- + +## Not-so-obvious SEO tips for blog posts + +Everybody knows you should include keywords wherever it looks natural. Some people know the structure of the blog post should be hierarchical and follow an H1 - H2 - H3 - H4 structure with only one possible H1. Here are some unobvious SEO tips for writing a blog post that can help boost its visibility and ranking potential: + +### 1. Keep URL length concise and strategic + +Optimal length. Keep your URL short and descriptive. URLs between 50-60 characters perform best, so aim for 3-4 words. Avoid unnecessary words like "and", "of", or long prepositions. + +Include keywords. Ensure your primary keyword is naturally integrated into the URL. This signals relevance to both users and search engines. + +Avoid dates. Don’t include dates or numbers in the URL to keep the content evergreen, as dates can make the post seem outdated over time. + +### 2. Feature a video at the top of the post + +Engagement boost. Videos significantly increase the time users spend on a page, positively influencing SEO rankings. Blog posts with videos in them generally do better SEO-wise. + +Thumbnail optimization. Use an optimized thumbnail with a clear title and engaging image to increase click-through rates. + +### 3. Alt text for images with a keyword focus + +Descriptive alt text. Include a short, descriptive alt text for every image with one or two keywords where it makes sense. This also improves accessibility. + +Optimize file names. Name your images with SEO-friendly keywords before uploading (e.g., "web-scraping-tools.png" rather than "IMG12345_screenshot1.png"). This helps search engines understand the content of your images. + +File format and size. Use web-optimized formats like WebP or compressed JPEGs/PNGs to ensure fast page loading, which is a key SEO factor. + +Lazy loading images. Use lazy loading to only load images when the user scrolls to them, reducing initial page load times, which can help your SEO ranking. + +### 4. Interlinking for better user experience and SEO + +Internal links. Use contextual links to other relevant blog posts or product pages on your site. This not only helps with SEO but also keeps users engaged longer on your site, reducing bounce rates. + +Anchor text. When linking internally, use keyword-rich anchor text that describes what users will find on the linked page. + +Content depth. By interlinking, you can show Google that your site has a strong internal structure and is a hub of related, authoritative content. + +### 5. Target the 'People Also Ask' section of Google results with an FAQ + +Answer common questions. Including an FAQ section that answers questions people search for can help you rank in the "People Also Ask" section of Google. Research questions that come up in this feature related to your topic and address them in your content. + +Provide clear, concise answers to the FAQs, typically between 40-60 words, since these match the format used in "People Also Ask". + +Don't bother using FAQ schema. Google doesn't react to those anymore unless you’re a .gov or .edu domain. + +### 6. Optimize for readability and structure + +Short paragraphs and subheadings. Make your blog post easy to scan by using short paragraphs and meaningful subheadings that contain keywords. + +Bullet points and lists. Include bullet points and numbered lists to break up content and make it more digestible. Search engines prioritize well-structured content. + +Readability tools. Use tools like Hemingway Editor or Grammarly to improve readability. Content that is easy to read tends to rank higher, as it keeps readers engaged. + +## Referring to blogs in your Actor’s ecosystem + +To drive traffic to your blog and keep users engaged, reference your blog posts across various touchpoints: + +1. README. Add links to your blog posts in your Actor’s README. If you’ve written a tutorial or feature guide, include it under a "Further reading" section. +2. Input schema. Use your input schema to link to blog posts. For instance, if a certain field in your Actor has complex configurations, add a link to a blog post that explains how to use it. +3. YouTube videos. If you’ve created tutorial videos about your Actor, link them in your blog and vice versa. Cross-promoting these assets will increase your overall engagement. +4. Webinars and live streams. Mention your blog posts during webinars, especially if you’re covering a topic that’s closely related. Include the links in follow-up emails after the event. +5. Social media. Share your blog posts on Twitter, LinkedIn, or other social platforms. Include snippets or key takeaways to entice users to click through. + +🔄 Remember, you can always turn your blog into a video and vice versa. You can also use parts of blog posts for social media promotion. + +## Additional tips for blog success + +1. Consistency. Regular posting helps build an audience and makes sure you keep at it. Try to stick to a consistent schedule, whether it’s weekly, bi-weekly, or monthly. As Woody Allen said, “80 percent of success is showing up”. +2. Guest blogging. Reach out to other blogs or platforms like [Dev.to](http://dev.to/) for guest blogging opportunities. This helps you tap into new audiences. +3. Repurpose content. Once you’ve written a blog post, repurpose it. Turn it into a YouTube video, break it down into social media posts, or use it as the base for a webinar. +4. Monitor performance. Use analytics to track how your blog is performing. Are people reading it? Is it driving traffic to your Actor? What keywords is it ranking for? Who are your competitors? Use this data to refine your content strategy. + + + +--- +title: Parasite SEO +description: Explore parasite SEO, a unique strategy that leverages third-party sites to boost rankings and drive traffic to your tools. +sidebar_position: 3 +category: apify platform +slug: /actor-marketing-playbook/promote-your-actor/parasite-seo +--- + +**Do you want to attract more users to your Actors? Consider parasite SEO, a non-conventional method of ranking that leverages third-party sites.** + +--- + +Here’s a full definition, from Authority Hackers: + +> Parasite SEO involves publishing a quality piece of content on an established, high-authority external site to rank on search engines. This gives you the benefit of the host’s high traffic, boosting your chances for leads and successful conversions. These high DR websites have a lot of authority and trust in the eyes of Google +> + +As you can see, you’re leveraging the existing authority of a third-party site where you can publish content promoting your Actors, and the content should rank better and faster as you publish it on an established site. + +You can do parasite SEO for free, but you can also pay for guest posts on high-authority sites to post your articles promoting the Actors. + +Let’s keep things simple and practical for this guide, so you can start immediately. We will cover only the free options, which should give you enough exposure to get started. + +If you want to learn more, we recommend the following reading about parasite SEO: + +- [Authority Hackers](https://www.authorityhacker.com/parasite-seo/) +- [Ahrefs](https://ahrefs.com/blog/parasite-seo/) + +In this guide, we will cover the following sites that you can use for parasite SEO for free: + +- Medium +- LinkedIn +- Reddit +- Quora + +## Medium + +You probably know [Medium](https://medium.com/). But you might not know that Google quite likes Medium, and you have a good chance of ranking high in Google with articles you publish there. + +1. You need a Medium account. It’s free and easy to create. +2. Now, you need to do keyword research. Go to [Ahrefs Free Keyword Generator](https://ahrefs.com/keyword-generator/?country=us), enter your main keyword (e.g. Airbnb scraper), and check what keyword has the highest search volume. +3. Search for that keyword in Google. Use incognito mode and a US VPN if you can. Analyze the results and check what type of content you need to create. Is it a how-to guide on how to create an Airbnb scraper? Or is it a list of the best Airbnb scrapers? Or perhaps it’s a review or just a landing page. +4. Now, you should have a good idea of the article you have to write. Write the article and try to mimic the structure of the first results. +5. Once you’re done with the article, don’t forget to include a few calls to action linking to your Actor on Apify Store. Don’t be too pushy, but mention all the benefits of your Actor. +6. Publish the article. Make sure your title and URL have the main keyword and that the main keyword is also in the first paragraph of the article. Also, try to use relevant tags for your Actor. + +## LinkedIn Pulse + +LinkedIn Pulse is similar to Medium, so we won’t go into too much detail. The entire process is the same as with Medium; the way you publish the article differs. + +[Here is a full guide](https://www.linkedin.com/pulse/how-publish-content-linkedin-pulse-hamza-sarfraz/) for publishing your article on LinkedIn Pulse. + +## Reddit + +1. You must have a Reddit account to use to comment in relevant Subreddits. +2. Go to Google and perform this search: `site:reddit.com `, where you replace `` with the main topic of your Actor. +3. Now, list relevant Reddit threads that Google gives you. For an Airbnb scraper, this might be a good thread: [Has anybody have an latest Airbnb scraper code?](https://www.reddit.com/r/webscraping/comments/m650ol/has_anybody_have_an_latest_airbnb_scraper_code/) +4. To prioritize threads from the list, you can check the traffic they get from Google in [Ahrefs Traffic Checker](https://ahrefs.com/traffic-checker). Just paste the URL, and the tool will give you traffic estimation. You can use this number to prioritize your list. If the volume exceeds 10, it usually has some traffic potential. +5. Now, the last step is to craft a helpful comment that also promotes your Actor. Try to do that subtly. People on Reddit usually don’t like people who promote their stuff, but you should be fine if you’re being genuinely helpful. + +## Quora + +Quora is similar to Reddit, so again we won’t go into too much detail. The entire process is the same. You just have to use a different search phrase in Google, which is `site:quora.com `. + + + +--- +title: Product Hunt +description: Boost your Actor’s visibility by launching it on Product Hunt, a top platform for tech innovations. Attract early adopters, developers, and businesses while showcasing your tool’s value through visuals or demos. +sidebar_position: 4 +category: apify platform +slug: /actor-marketing-playbook/promote-your-actor/product-hunt +--- + +Product Hunt is one of the best platforms for introducing new tools, especially in the tech community. It attracts a crowd of early adopters, startup enthusiasts, and developers eager to discover the latest innovations. Even [Apify itself](https://www.producthunt.com/products/apify) was on PH. + +If you're looking to build awareness and generate short-term traffic, Product Hunt can be a powerful tool in your marketing strategy. It's a chance to attract a wide audience, including developers, startups, and businesses looking for automation. If your Actor solves a common problem, automates a tedious process, or enhances productivity, it's a perfect candidate for Product Hunt. + +Product Hunt is also great for tools with a strong visual component or demo potential. If you can show the value of your Actor in action, you’re more likely to grab attention and drive engagement. + +--- + +## How to promote your Actor on Product Hunt + +### Create a compelling launch + +Launching your Actor on Product Hunt requires thoughtful planning. Start by creating a product page that clearly explains what your Actor does and why it’s valuable. You’ll need: + +- _A catchy tagline_. Keep it short and to the point. Think of something that captures your Actor's essence in just a few words. +- _Eye-catching visuals_. Screenshots, GIFs, or short videos that demonstrate your Actor in action are essential. Show users what they’ll get, how it works, and why it’s awesome. +- _Concise description_. Write a brief description of what your Actor does, who it’s for, and the problem it solves. Use plain language to appeal to a wide audience, even if they aren’t developers. +- _Demo video_. A short video that shows how your Actor works in a real-life scenario will resonate with potential users. + +Once your page is set up, you’ll need to choose the right day to launch. Product Hunt is most active on weekdays, with Tuesday and Wednesday being the most popular launch days. Avoid launching on weekends or holidays when traffic is lower. + +### Build momentum before launch + +Start building awareness before your launch day. This is where your social media channels and community engagement come into play. Share teasers about your upcoming Product Hunt launch on Twitter (X), Discord, LinkedIn, and even StackOverflow, where other developers might take an interest. Highlight key features or the problems your Actor solves. + +If you have a mailing list, give your subscribers a heads-up about your launch date. Encourage them to visit Product Hunt and support your launch by upvoting and commenting. This pre-launch activity helps create early momentum on launch day. + +### Timing your launch + +The timing of your Product Hunt launch matters a lot. Since Product Hunt operates on a daily ranking system, getting in early gives your product more time to gain votes. Aim to launch between 12:01 AM and 2:00 AM PST, as this will give your product a full day to collect upvotes. + +Once you’ve launched, be ready to engage with the community throughout the day. Respond to comments, answer questions, and thank users for their support. Product Hunt users appreciate creators who are active and communicative, and this can help drive more visibility for your Actor. + +### Engage with your audience + +The first few hours after your launch are crucial for gaining traction. Engage with users who comment on your product page, answer any questions, and address any concerns they might have. The more interaction you generate, the more likely you are to climb the daily rankings. + +Be transparent and friendly in your responses. If users point out potential improvements or bugs, acknowledge them and make a commitment to improve your Actor. Product Hunt users are often open to giving feedback, and this can help you iterate on your product quickly. + +If possible, have team members or collaborators available to help respond to comments. The more responsive and helpful you are, the better the overall experience will be for users checking out your Actor. + +:::tip Leverage Apify + +You can also give a shoutout to Apify, this way your Actor will also notified to the community of Apify on Product Hunt: [https://www.producthunt.com/stories/introducing-shoutouts](https://www.producthunt.com/stories/introducing-shoutouts) + +::: + +## Expectations and results + +Launching on Product Hunt can provide a massive spike in short-term traffic and visibility. However, it’s important to manage your expectations. Not every launch will result in hundreds of upvotes or immediate sales. Here’s what you can realistically expect: + +- _Short-term traffic boost_. Your Actor might see a surge in visitors, especially on the day of the launch. If your Actor resonates with users, this traffic may extend for a few more days. +- _Potential long-term benefits_. While the short-term traffic is exciting, the long-term value lies in the relationships you build with early users. Some of them may convert into paying customers or become advocates for your Actor. +- _SEO boost_. Product Hunt is a high-authority site with a 91 [domain rating](https://help.ahrefs.com/en/articles/1409408-what-is-domain-rating-dr). Having your product listed can provide an SEO boost and help your Actor's page rank higher in search engines. +- _User feedback_. Product Hunt is a great place to gather feedback. Users may point out bugs, request features, or suggest improvements. + +## Tricks for a successful launch + +1. _Leverage your network_. Ask friends, colleagues, and early users to support your launch. Ask the Apify community. Ask your users. Encourage them to upvote, comment, and share your product on social media. +2. _Prepare for feedback_. Product Hunt users can be critical, but this is an opportunity to gather valuable insights. Be open to suggestions and use them to improve your Actor. +3. _Use a consistent brand voice_. Make sure your messaging is consistent across all platforms when you're responding to comments and promoting your launch on social media. +4. _Offer a special launch deal_. Incentivize users to try your Actor by offering a discount or exclusive access for Product Hunt users. This can drive early adoption and build momentum. + +## Caveats to Product Hunt promotion + +- _Not every Actor is a good fit_. Product Hunt is best for tools with broad appeal or innovative features. If your Actor is highly specialized or niche, it may not perform as well. +- _High competition_. Product Hunt is a popular platform, and your Actor will be competing with many other launches. A strong marketing strategy is essential to stand out. +- _Short-term focus_. While the traffic spike is great, Product Hunt tends to focus on short-term visibility. To maintain long-term growth, you’ll need to continue promoting your Actor through other channels. + + + +--- +title: SEO +description: Learn how to optimize your content to rank higher on search engines like Google and Bing, attract more users, and drive long-term traffic - all for free. +sidebar_position: 1 +category: apify platform +slug: /actor-marketing-playbook/promote-your-actor/seo +--- + +SEO means optimizing your content to rank high for your target queries in search engines such as Google, Bing, etc. SEO is a great way to get more users for your Actors. It’s also free, and it can bring you traffic for years. This guide will give you a simple framework to rank better for your targeted queries. + +## Search intent + +Matching the search intent of potential users is super important when creating your Actor's README. The information you include should directly address the problems or needs that led users to search for a solution like yours. For example: + +- _User goals_: What are users trying to accomplish? +- _Pain points_: What challenges are they facing? +- _Specific use cases_: How might they use your Actor? + +Make sure your README demonstrates how your Actor aligns with the search intent. This alignment helps users quickly recognize your Actor's value and helps Google understand your Actor and rank you better. + +_Example:_ + +Let’s say you want to create a “YouTube Hashtag Scraper” Actor. After you search YouTube HashTag Scraper, you see that most people searching for it want to extract hashtags from YouTube videos, not download videos using a certain hashtag. + +## Keyword research + +Keyword research is a very important part of your SEO success. Without that, you won’t know which keywords you should target with your Actor, and you might be leaving traffic on the table by not targeting all the angles or targeting the wrong one. + +We will do keyword research with free tools, but if you want to take this seriously, we highly recommend [Ahrefs](https://ahrefs.com/). + +### Google autocomplete suggestions + +Start by typing your Actor's main function or purpose into Google. As you type, Google will suggest popular search terms. These suggestions are based on common user queries and can provide insight into what your potential users are searching for. + +_Example:_ + +Let's say you've created an Actor for scraping product reviews. Type "product review scraper" into Google and note the suggestions: + +- product review scraper free +- product review scraper amazon +- product review scraper python +- product review scraper api + +These suggestions reveal potential features or use cases to highlight in your README. + +### Alphabet soup method + +This technique is similar to the previous one, but it involves adding each letter of the alphabet after your main keyword to discover more specific and long-tail keywords. + +_Example_: + +Continue with "product review scraper" and add each letter of the alphabet: + +- product review scraper a (autocomplete might suggest "api") +- product review scraper b (might suggest "best") +- product review scraper c (might suggest "chrome extension") + +...and so on through the alphabet. + +### People Also Ask + +Search for your Actor's main function or purpose on Google. Scroll down to find the "People Also Ask" section, which contains related questions. + +_Example_: + +For a "product review scraper" Actor: + +- How do I scrape product reviews? +- Is it legal to scrape product reviews? +- What is the best tool for scraping reviews? +- How can I automate product review collection? + +Now, you can expand the “People Also Ask” questions. Click on each question to reveal the answer and generate more related questions you can use in your README. + +### Google Keyword Planner + +Another way to collect more keywords is to use the official Google Keyword Planner. Go to [Google Keyword Planner](https://ads.google.com/home/tools/keyword-planner/) and open the tool. You need a Google Ads account, so just create one for free if you don’t have one already. + +After you’re in the tool, click on “Discover new keywords”, make sure you’re in the “Start with keywords” tab, enter your Actor's main function or purpose, and then select the United States as the region and English as the language. Click “Get results” to see keywords related to your actor. + +Write them down. + +### Ahrefs Keyword Generator + +Go to [Ahrefs Keyword Generator](https://ahrefs.com/keyword-generator), enter your Actor's main function or purpose, and click “Find keywords.” You should see a list of keywords related to your actor. + +Write them down. + +## What to do with the keywords + +First, remove any duplicates that you might have on your list. You can use an online tool [like this one](https://dedupelist.com/) for that. + +After that, we need to get search volumes for your keywords. Put all your keywords in a spreadsheet, with one column being the keyword and the second one being the search volume. + +Go to the [Keyword Tool](https://backlinko.com/tools/keyword), enter the keyword, and write down the search volume. You will also see other related keywords, so you might as well write them down if you don’t have them on your list yet. + +At the end, you should have a list of keywords together with their search volumes that you can use to prioritize the keywords, use the keywords to name your Actor, choose the URL, etc. + +### Headings + +If it makes sense, consider using keywords with the biggest search volume and the most relevant for your Actor as H2 headings in your README. + +Put the most relevant keyword at the beginning of the heading when possible. Also, remember to use a clear hierarchy. The main features are H2, sub-features are H3, etc. + +### Content + +When putting keywords in your Actor’s README, it's important to maintain a natural, informative tone. Your primary goal should be to create valuable, easily understandable content for your users. + +Aim to use your most important keyword in the first paragraph of your README. This helps both search engines and users quickly understand what your Actor does. But avoid forcing keywords where they don't fit naturally. + +In your content, you can use the keywords you gathered before where they make sense. We want to include those keywords naturally in your README. + +If there are relevant questions in your keyword list, you can always cover them within an “FAQ” section of your Actor. + +Remember that while including keywords is important, always prioritize readability and user experience. Your content should flow naturally and provide real value to the reader. + +## Learn more about SEO + +If you want to learn more about SEO, these two free courses will get you started: + +- [SEO Course for Beginners](https://ahrefs.com/academy/seo-training-course) by Ahrefs +- [SEO Courses](https://www.semrush.com/academy/courses/seo/) by Semrush + +The [Ahrefs YouTube channel](https://www.youtube.com/@AhrefsCom/featured) is also a great resource. You can start with [this video](https://www.youtube.com/watch?v=xsVTqzratPs). + + + +--- +title: Social media +description: Leverage social media to connect with users and grow your Actor’s audience. Learn how to showcase features, engage with users, and avoid common pitfalls. +sidebar_position: 2 +category: apify platform +slug: /actor-marketing-playbook/promote-your-actor/social-media +--- + +**Social media is a powerful way to connect with your Actor users and potential users. Whether your tool focuses on web scraping or automation, social platforms can help you showcase its features, answer user questions, and grow your audience. This guide will show you how to use social media effectively, what to share, and how to avoid common mistakes along the way.** + +Now, before we start listing social media platforms, it might be important to acknowledge something. + +Developers are notorious for not using social media that much. Or they use social media exclusively in the context of their own interests: that won’t find them new users, but rather colleagues or collaborators. + +That's a good start, and maybe it's enough. A developer that can also “do” social media is a unicorn. These are super rare. And if you want to really promote your Actor, you'll need to become that unicorn. Before we start, you need to understand the benefits of this activity. + +--- + +## Why be active on social media + +Engaging with your users on social media offers a lot of benefits beyond just promoting your Actor. Let’s look at some of the main reasons why being active online can be a game-changer for your Actor’s success: + +1. Social platforms make it easy to gather real-time feedback and also provide support in real-time. You can quickly learn what users love, what they struggle with, and what features they’d like to see. This can guide your Actor’s future development. It also allows you to build trust and credibility with your audience. +2. Shot in the dark: social media exposes your Actor to new users who might not find you through search engines alone. A shared post or retweet can dramatically expand your reach, helping you grow your user base. +3. Consistent activity on social platforms creates more backlinks to your Actor’s page, which can improve its search engine ranking and drive organic traffic. + +## Where to engage: Choosing the right platforms + +Choosing the right platforms is key to reaching your target audience. Here's a breakdown of the best places for developers to promote their web scraping and automation tools: + +- _Discord_: We started with an easy one. Create a community around your Actor to engage with users directly. Offering quick support and discussing the features of your Actor in a real-time chat setting can lead to deeper user engagement. + + :::tip Use Apify's Discord + + You can also promote your tools through [Apify's Discord](https://discord.com/invite/crawlee-apify-801163717915574323). + + ::: + +- _Twitter (X)_: Good for short updates, feature announcements, and quick interactions with users. The tech community on Twitter is very active, which makes it a great spot for sharing tips and getting noticed. +- _Reddit_: In theory, subreddits like r/webscraping, r/automation, and r/programming allow you to share expertise, engage in discussions, and present your Actor as a solution. However, in reality, you have to be quite careful with promotion there. Be very mindful of subreddit rules to avoid spamming or over-promoting. For Reddit, personal stories on how you built the tool + a roadblock you might be facing right now are the safest formula. If a tool is already finished and perfected, it will be treated as promotional content. But if you're asking for advice - now that's a community activity. +- _TikTok_: Might not be an obvious choice, but that’s where most young people spend time. They discuss a myriad of topics, laugh at the newest memes, and create trends that take weeks to get to Reels and Shorts. If you want to create educational, fun, short video content (and be among the first to talk about web scraping), this is your place for experiments and taking algorithm guesses. +- _YouTube_: Ideal for tutorials and demos. A visual walk-through of how to use your Actor can attract users who prefer watching videos to reading tutorials or READMEs. It's also good for Shorts and short, funny content. +- _StackOverflow_: While not a traditional social media platform, StackOverflow is a great space to answer technical questions and demonstrate your expertise. Offering help related to web scraping or automation can build credibility, and you can subtly mention your Actor if it directly solves the issue (as long as it adheres to community guidelines). +- _LinkedIn_: If your Actor solves problems for professionals or automates business tasks, LinkedIn is the place to explain how your tool provides value to an industry or business. + +--- + +## Best practices for promoting your Actor on social media + +Now that you know where to engage and why it’s important, here are some best practices to help you make the most of social media: + +1. _Offer value beyond promotion_: If you look around, you'll see that the golden rule of social media these days is to educate and entertain. Focus on sharing useful information related to your Actor. Post tips on automation, web scraping techniques, or industry insights that can help your audience. When you do promote your Actor, users will see it as part of a valuable exchange, not just an ad. Besides, constantly posting promotional content turns anybody off. +2. _Post consistently_: The most important rule for social media is to show up. Whether it’s a weekly post about new features or daily tips for using your Actor more effectively, maintaining a regular posting schedule keeps your audience connected. +3. _Visuals matter_: Screenshots, GIFs, and short videos can explain more than text ever could. Show users how your Actor works, the results it scrapes, or how automation saves time. +4. _Widen your reach_: Web scraping is a niche topic. Find ways to talk about it more widely. If you stumble upon ways to relate it to wider topics: news, science, research, even politics and art, use it. Or you can go more technical and talk about various libraries and languages you can use to build it. +5. _Use relevant hashtags_: Hashtags like #webscraping, #automation, #programming, and #IT help you reach a wider audience on platforms like Twitter and TikTok. Stick to a few relevant hashtags per post to avoid clutter. +6. _Engage actively_: Social media is a two-way street. Reply to comments, thank users for sharing your content, create stitches, and answer questions. Building relationships with your users helps foster loyalty and builds a sense of community around your Actor. +7. _Use polls and Q&As_: Interactive content like polls or Q&A sessions can drive engagement. Ask users what features they’d like to see next or run a live Q&A to answer questions about using your Actor. These tools encourage participation and provide valuable insights. +8. _Collaborate with other creators_. + +## Caveats to social media engagement + +1. _Over-promotion_: Constantly pushing your Actor without offering value can turn users away. Balance your promotional content with educational posts, interesting links, or insights into the development process. Users are more likely to engage when they feel like they’re learning something, rather than just being sold to. +2. _Handling negative feedback_: Social media is a public forum, and not all feedback will be positive. Be prepared to address user concerns or criticism professionally. Responding kindly (or funnily) to criticism shows you’re committed to improving your tool and addressing users' needs. +3. _Managing multiple platforms_: Social media management can be time-consuming, especially if you’re active on multiple platforms. Focus on one or two platforms that matter most to your audience instead of spreading yourself too thin. +4. _Algorithm changes_: Social media platforms often tweak their algorithms, which can impact your content’s visibility. Stay updated on these changes, and adjust your strategy accordingly. If a post doesn’t perform well, experiment with different formats (videos, visuals, polls) to see what resonates with your audience. +5. _Privacy and compliance_: Very important here to be mindful of sharing user data or results, especially if your Actor handles sensitive information. Make sure your posts comply with privacy laws and don’t inadvertently expose any personal data. + +## For inspiration + +It's sometimes hard to think of a good reason to scream into the void that is social media. Here are 25 scenarios where you might use social media to promote your Actor or your work: + +1. _Funny interaction with a user_: Share a humorous tweet or post about a quirky question or feedback from a user that highlights your Actor’s unique features. +2. _Roadblock story_: Post about a challenging bug you encountered while developing your Actor and how you solved it, including a screenshot or snippet of code. +3. _Success story_: Share a post detailing how a user’s feedback led to a new feature in your Actor and thank them for their suggestion. +4. _Tutorial video_: Create and share a short video demonstrating how to use a specific feature of your Actor effectively. +5. _Before-and-after example_: Post a visual comparison showing the impact of your Actor’s automation on a task or process. +6. _Feature announcement_: Announce a new feature or update in your Actor with a brief description and a call-to-action for users to try it out. +7. _User testimonial_: Share a positive review or testimonial from a user who benefited from your Actor, including their quote and a link to your tool. +8. _Live Q&A_: Host a live Q&A session on a platform like Twitter or Reddit, answering questions about your Actor and its capabilities. +9. _Behind-the-scenes look_: Post a behind-the-scenes photo or video of your development process or team working on your Actor. +10. _Debugging tip_: Share a tip or trick related to debugging or troubleshooting common issues with web scraping or automation. +11. _Integration highlight_: Post about how your Actor integrates with other popular tools or platforms, showcasing its versatility. Don't forget to tag them. +12. _Case study_: Share a case study or success story showing how a business or individual used your Actor to achieve specific results. +13. _Commentary on a news piece_: Offer your perspective on a recent news story related to technology, scraping, or automation. If possible, explain how it relates to your Actor. +14. _User-generated content_: Share content created by your users, such as screenshots or examples of how they’re using your Actor. +15. _Memes_: Post a relevant meme about the challenges of web scraping or automation. +16. Milestone celebration: Announce and celebrate reaching a milestone, such as a certain number of users or downloads for your Actor. +17. _Quick tip_: Share a short, useful tip or hack related to using your Actor more efficiently. +18. _Throwback post_: Share a throwback post about the early development stages of your Actor, including any challenges or milestones you achieved. +19. _Collaboration announcement_: Announce a new collaboration with another developer or tool, explaining how it enhances your Actor’s functionality. +20. _Community shout-out_. Give a shout-out to a user or community member who has been particularly supportive or helpful. +21. _Demo invitation_: Invite your followers to a live demo or webinar where you’ll showcase your Actor and answer questions. +22. _Feedback request_: Ask your audience for feedback on a recent update or feature release, and encourage them to share their thoughts. +23. _Book or resource recommendation_: Share a recommendation for a book or resource that helped you in developing your Actor, and explain its relevance. + + + +--- +title: Video tutorials +description: Use video tutorials to demonstrate features, offer tutorials, and connect with users in real time, building trust and driving interest in your tools. +sidebar_position: 6 +category: apify platform +slug: /actor-marketing-playbook/promote-your-actor/video-tutorials +--- + +**Videos and live streams are powerful tools for connecting with users and potential users, especially when promoting your Actors. You can use them to demonstrate functionality, provide tutorials, or engage with your audience in real time.** + +--- + +## Why videos and live streams matter + +1. _Visual engagement_. Videos allow you to show rather than just tell. Demonstrating how your Actor works or solving a problem in makes the content more engaging and easier to understand. For complex tools, visual explanations can be much more effective than text alone. +2. _Enhanced communication_. Live streams offer a unique opportunity for direct interaction. You can answer questions, address concerns, and gather immediate feedback from your audience, creating a more dynamic and personal connection. +3. _Increased reach_. Platforms like YouTube and TikTok have massive user bases, giving you access to a broad audience. Videos can also be shared across various social media channels, extending your reach even further. + +Learn more about the rules of live streams in our next section: [Webinars](/academy/actor-marketing-playbook/promote-your-actor/webinars) + +## Optimizing videos for SEO + +1. _Keywords and titles_. Use relevant keywords in your video titles and descriptions. For instance, if your Actor is a web scraping tool, include terms like “web scraping tutorial” or “how to use web scraping tools” to help users find your content. +2. _Engaging thumbnails_. Create eye-catching thumbnails that accurately represent the content of your video. Thumbnails are often the first thing users see, so make sure they are visually appealing and relevant. +3. _Transcriptions and captions_. Adding transcripts and captions to your videos improves accessibility and can enhance SEO. They allow search engines to index your content more effectively and help users who prefer reading or have hearing impairments. + +## YouTube vs. TikTok + +1. _YouTube_. YouTube is an excellent platform for longer, detailed videos. Create a channel dedicated to your Actors and regularly upload content such as tutorials, feature walkthroughs, and industry insights. Utilize YouTube’s SEO features by optimizing video descriptions, tags, and titles with relevant keywords. Engage with your audience through comments and encourage them to subscribe for updates. Collaborating with other YouTubers or influencers in the tech space can also help grow your channel. +2. _TikTok_. TikTok is ideal for short, engaging videos. Use it to share quick tips, demo snippets, or behind-the-scenes content about your Actors. The platform’s algorithm favors high engagement, so create catchy content that encourages viewers to interact. Use trending hashtags and participate in challenges relevant to your niche to increase visibility. Consistency is key, so post regularly and monitor which types of content resonate most with your audience. + +## Growing your channels + +1. _Regular content_. Consistently upload content to keep your audience engaged and attract new viewers. Create a content calendar to plan and maintain a regular posting schedule. +2. _Cross-promotion_. Share your videos across your social media channels, blogs, and newsletters. This cross-promotion helps drive traffic to your videos and increases your reach. +3. _Engage with your audience_. Respond to comments and feedback on your videos. Engaging with viewers builds a community around your content and encourages ongoing interaction. +4. _Analyze performance_. Use analytics tools provided by YouTube and TikTok to track the performance of your videos. Monitor metrics like watch time, engagement rates, and viewer demographics to refine your content strategy. + +--- + +## Where to mention videos across your Actor ecosystem + +1. _README_: include links to your videos in your Actor’s README file. For example, if you have a tutorial video, mention it in a "How to scrape X" or "Resources" section to guide users. +2. _Input schema_: if your Actor’s input schema includes complex fields, link to a video that explains how to configure these fields. This can be especially helpful for users who prefer visual guides. +3. _Social media_: share your videos on platforms like Twitter, LinkedIn, and Facebook. Use engaging snippets or highlights to attract users to watch the full video. +4. _Blog posts_: embed videos in your blog posts for a richer user experience. If you write a tutorial or feature update, include a video to provide additional context. +5. _Webinars and live streams_: mention your videos during webinars or live streams. If you’re covering a topic related to a video you’ve posted, refer to it as a supplemental resource. + + + +--- +title: Webinars +description: Webinars and live streams are powerful tools to showcase your Actor’s features. Learn how to plan, host, and maximize the impact of your webinar. +sidebar_position: 7 +category: apify platform +slug: /actor-marketing-playbook/promote-your-actor/webinars +--- + +Webinars and live streams are a fantastic way to connect with your audience, showcase your Actor's capabilities, and gather feedback from users. Though the term webinar might sound outdated these days, the concept of a live video tutorial is alive and well in the world of marketing and promotion. + +Whether you're introducing a new feature, answering questions, or walking through a common use case, a live event can create more personal engagement, boost user trust, and open the door for valuable two-way communication. + +But how do you get started? Here's a friendly guide on where to host, how to prepare, and what to do before, during, and after your webinar. + +--- + +## Why host a live stream? + +Here are a few reasons why live streams are ideal for promoting your Actor: + +- _Demo_. You can show your Actor in action and highlight its most powerful features. You can tell a story about how you built it. You can also show how your Actor interacts with other tools and platforms and what its best uses are. A live demo lets users see immediately how your tool solves their problems. +- _Building trust and rapport_. Interacting directly with your users builds trust and rapport. Even showing up and showing your face/voice, it's a chance to let your users meet you and get a feel for the team behind the Actor. +- _Live Q&A_. Users often have questions that can be hard to fully address in documentation, README, or tutorials. A live session allows for Q&A, so you can explain complex features and demonstrate how to overcome common issues. +- _Tutorial or training_. If you don't have time for complex graphics, this is an easy replacement for a video tutorial until you do. Remember that some platforms (YouTube) give the option of publishing the webinar after it's over. You can reuse it later in other content as well as a guide. Also, if you’ve noticed users struggling with particular features, a webinar is a great way to teach them directly. + +Webinars help build a community around your Actor and turn one-time users into loyal advocates. + +## Where to host your webinar or live stream + +It all goes back to where you have or would like to have your audience and whether you want to have the webinar available on the web later. + +1. Social media: + 1. _YouTube_: ideal for reaching a broad audience. It’s free and easy to set up. You can also make recordings available for future viewing. + 2. _TikTok_: same, ideal for reaching a broad audience, free and easy to set up. However, live video will disappear once the broadcast has ended. TikTok does allow you to save your livestreams. You won't be able to republish them to the platform (we assume your live stream will be longer than 10 minutes). But you can later re-upload it elsewhere. + 3. _Twitch_: Known for gaming, Twitch has become a space for tech demos, coding live streams, and webinars. If your target audience enjoys an interactive and casual format, Twitch might be a good fit. + 4. _LinkedIn_: If your audience is more professional, LinkedIn Live could be a good fit to present your Actor there. Once a stream is complete, it will remain on the feed of your LinkedIn Page or profile as a video that was ‘previously recorded live’. + 5. Facebook: Not recommended. +2. General platforms: + 1. _Zoom_ or _Google Meet_: More personal, these are great for smaller webinars where you might want closer interaction. They also give you control over who attends. + +Pick a platform where your users are most likely to hang out. If your audience is primarily tech-savvy, YouTube or Twitch could work. If your Actor serves businesses, LinkedIn might be the best spot. + +## Webinar/live stream prep + +### Promote your webinar and get your users + +Send an email blast if you have an email list of users or potential users, send a friendly invite. Include details about what you’ll cover and how they can benefit from attending. + +- Social media promotion on Twitter (X), LinkedIn, or other platforms. Highlight what people will learn and any special features you’ll be demonstrating. Do it a few times - 2 weeks before the webinar, 1 week before, a day before, and the day of. Don't forget to announce on Apify’s Discord. These are places where your potential audience is likely hanging out. Let them know you’re hosting an event and what they can expect. +- Use every piece of real estate on Apify Store and Actor pages. Add a banner or notification to your Actor’s page (top of the README): This can be a great way to notify people who are already looking at your Actor. A simple “join us for a live demo on DATE” message works well. Add something like that to your Store bio and its README. Mention it at the top description of your Actor's input schema. + +:::tip Use UTM tags + +When creating a link to share to the webinar, you can add different UTM tags for different places where you will insert the link. That way you can later learn which space brought the most webinar sign-ups. + +::: + +- Collaborate with other developers. If you can team up with someone in the Apify community, you’ll double your reach. Cross-promotion can bring in users from both sides. + +--- + +### Plan the content + +Think carefully about what you’ll cover. Focus on what’s most relevant for your audience: + +- _Decide on your content_. What will you cover? A demo? A deep dive into Actor configurations? Create a flow and timeline to keep yourself organized. +- Prepare visuals. Slides, product demos, and examples are helpful to explain complex ideas clearly. +- _Feature highlights_. Demonstrate the key features of your Actor. Walk users through common use cases and be ready to show live examples. +- _Input schema_. If your Actor has a complex input schema, spend time explaining how to use it effectively. Highlight tips that will save users time and frustration. You can incorporate your knowledge from the issues tab. +- _Q&A session_. Leave time for questions at the end. Make sure to keep this flexible, as it’s often where users will engage the most. + +Don't forget to add an intro with an agenda and an outro with your contact details. + +:::tip Consider timezones + +When thinking of when to run the webinar, focus on the timezone of the majority of your users. + +::: + +### Prepare technically + +Test your setup before going live. Here’s what to focus on: + +- _Stable internet connection_. This one’s obvious but essential. Test your stream quality ahead of time. +- _Test the Actor live_. If you're demoing your Actor, ensure it works smoothly. Avoid running scripts that take too long or have potential bugs during the live session. +- _Audio quality_. People are far more likely to tolerate a blurry video than bad audio. Use a good-quality microphone to ensure you’re heard clearly. +- Screen sharing. If you’re doing a live demo, make sure you know how to seamlessly switch between windows and share your screen effectively. +- _Backup plan_. Have a backup plan in case something goes wrong. This could be as simple as a recorded version of your presentation to share if things go south during the live session. +- _Make it interactive_. Consider using polls or a live Q&A session to keep the audience engaged. Maybe have a support person assisting with that side of things while you're speaking. + +## Best practices during the live stream + +When the time comes, here’s how to make the most of your webinar or live stream: + +- _Start with an introduction_. Begin with a brief introduction of yourself, the Actor you’re showcasing, and what attendees can expect to learn. This sets expectations and gives context. It's also best if you have a slide that shows the agenda. +- _Try to stay on time_. Stick to the agenda. Users appreciate when events run on schedule. +- _Show a live demo_. Walk through a live demo of your Actor. Show it solving a problem from start to finish. +- _Explain as you go_. Be mindful that some people might be unfamiliar with technical terms or processes. Try to explain things simply and offer helpful tips as you demonstrate but don't go off on a tangent. +- _Invite questions and engage your audience_. Encourage users to ask questions throughout the session. This creates a more conversational tone and helps you address their concerns in real time. You can also ask a simple question or poll to get the chat going. Try to direct the Q&A into one place so you don't have to switch tabs. Throughout the presentation, pause for questions and make sure you're addressing any confusion in real time. +- _Wrap up with a clear call to action_. Whether it’s to try your Actor, leave a review, or sign up for a future live, finish with a clear CTA. Let them know the next step to take. + +This works for when it's a simple tutorial walkthrough and if you have a code-along session, the practices work for it as well.s + +## After the live session + +Once your live session wraps up, there are still sides of it you can benefit from: + +- _Make it public and share the recording_. Not everyone who wanted to attend will have been able to make it. Send a recording to all attendees whose emails you have and make it publicly available on your channels (emails, README, social media, etc.). Upload the recorded session to YouTube and your Actor’s documentation. If it's on YouTube, you can also ask Apify's video team to add it to their Community playlist. Make it easy for people to revisit the content or share it with others. +- _Follow up with attendees, thank them, and ask for feedback_. Send a follow-up email thanking people for attending. Include a link to the recording, additional resources, and ways to get in touch if they have more questions. Share any special offers or discount codes if relevant. If you don’t have the attendees' emails, include a link in your newsletter and publish it on your channels. Ask for feedback on what they liked and what could be improved. This can guide your next webinar or help fine-tune your Actor. +- _Answer lingering questions_. If any questions didn’t get answered live, take the time to address them in the follow-up email. +- _Create a blog post or article_. Summarize the key points of your webinar in a written format. This can boost your SEO and help users find answers in the future. +- _Review your performance_. Analyze the data from your webinar, if available. How many people attended? Which platform brought the most sign-ups? How many questions did you receive? Were there any technical difficulties? This helps refine your approach for future events. +- _Share snippets from the webinar or interesting takeaways on social media_. Encourage people to watch the recording and let them know when you’ll be hosting another event. + + + +label: Store basics +position: 1 + + + +--- +title: Actor success stories +description: Learn about developers who successfully make passive income from their Actors. +sidebar_position: 5 +category: apify platform +slug: /actor-marketing-playbook/store-basics/actor-success-stories +--- + +_Web scraping freelance financial freedom with microworlds._ + +Discover how Caleb David, founder of `microworlds`, achieved financial freedom through freelance web scraping. His journey showcases how mastering the craft with tools like Crawlee and creating a Twitter scraper transformed his career. See the full story [here](https://blog.apify.com/web-scraping-freelance-financial-freedom/) and learn from his success. + +https://apify.com/microworlds + +_Web scraping for freelance success – insights from Tugkan._ + +In this success story, our first community dev Tugkan shares how his journey into freelancing via Apify changed his life. Learn about his process, challenges, and how his paid Actors have brought him financial rewards and freedom. Check out his story [here](https://apify.com/success-stories/paid-actor-journey-apify-freelancer-tugkan) for inspiration. + +https://apify.com/epctex + + +Interested in sharing your story? Reach out to our marketing team at [marketing@apify.com](mailto:marketing@apify.com) for a case study to showcase your journey. + + + +--- +title: How Actor monetization works +description: Discover how to share your tools and explore monetization options to earn from your automation expertise. +sidebar_position: 3 +category: apify platform +slug: /actor-marketing-playbook/store-basics/how-actor-monetization-works +--- + +**You can turn your web scrapers into a source of income by publishing them on Apify Store. Learn how it's done and what monetization options you have.** + +--- + +## Monetizing your Actor + +Monetizing your Actor on the Apify platform involves several key steps: + +1. _Development_: create and refine your Actor. +2. _Testing_: ensure your Actor works reliably. +3. _Publication & monetization_: publish your Actor and set up its monetization model. +4. _Promotion_: attract users to your Actor. + +--- + +## Monetization models + +### Rental pricing model + +![rental model example](images/rental-model.png) + +- _How it works_: you offer a free trial period and set a monthly fee. Users on Apify paid plans can continue using the Actor after the trial. You earn 80% of the monthly rental fees. +- _Example_: you set a 7-day free trial and $30/month rental. If 3 users start using your Actor: + - 1st user on a paid plan pays $30 after the trial (you earn $24). + - 2nd user starts their trial but pays next month. + - 3rd user on a free plan finishes the trial without upgrading to a paid plan and can’t use the Actor further. + +Learn more about the rental pricing model in our [documentation](/platform/actors/publishing/monetize#rental-pricing-model). + +### Pay-per-result pricing model + +![pay per result model example](images/ppr-model.png) + +- _How it works_: you charge users based on the number of results your Actor generates. You earn 80% of the revenue minus platform usage costs. +- _Profit calculation_: `profit = (0.8 * revenue) - platform usage costs` +- _Cost breakdown_: + - Compute unit: $0.4 per CU + - Residential proxies: $13 per GB + - SERPs proxy: $3 per 1,000 SERPs + - Data transfer (external): $0.20 per GB + - Dataset storage: $1 per 1,000 GB-hours +- _Example_: you set a price of $1 per 1,000 results. Two users generate 50,000 and 20,000 results, paying $50 and $20, respectively. If the platform usage costs are $5 and $2, your profit is $49. + +Learn more about the pay-per-result pricing model in our [documentation](/platform/actors/publishing/monetize#pay-per-result-pricing-model). + +### Pay-per-event pricing model + +![pay per event model example](images/ppe-model.png) + +- _How it works_: you charge users based on specific events triggered programmatically by your Actor's code. You earn 80% of the revenue minus platform usage costs. +- - _Profit calculation_: `profit = (0.8 * revenue) - platform usage costs` +- _Event cost example_: you set the following events for your Actor: + - `Actor start per 1 GB of memory` at $0.005 + - `Pages scraped` at $0.002 + - `Page opened with residential proxy` at $0.002 - this is on top of `Pages scraped` + - `Page opened with a browser` at $0.002 - this is on top of `Pages scraped` +- _Example_: + - User A: + - Started the Actor 10 times = $0.05 + - Scraped 1,000 pages = $2.00 + - 500 of those were scraped using residential proxy = $1.00 + - 300 of those were scraped using browser = $0.60 + - This comes up to $3.65 of total revenue + - User B: + - Started the Actor 5 times = $0.025 + - Scraped 500 pages = $1.00 + - 200 of those were scraped using residential proxy = $0.40 + - 100 of those were scraped using browser = $0.20 + - This comes up to $1.625 of total revenue + - That means if platform usage costs are $0.365 for user A and $0.162 for user B your profit is $4.748 + +Learn more about the pay-per-event pricing model in our [documentation](/platform/actors/publishing/monetize#pay-per-event-pricing-model). + +## Setting up monetization + +1. _Go to your Actor page_: navigate to the **Publication** tab and open the **Monetization** section. +2. _Fill in billing details_: set up your payment details for payouts. +3. _Choose your pricing model_: use the monetization wizard to select your model and set fees. + +### Changing monetization + +Adjustments to monetization settings take 14 days to take effect and can be made once per month. + +### Tracking and promotion + +- _Track profit_: review payout invoices and statistics in Apify Console (**Monitoring** tab). +- _Promote your Actor_: optimize your Actor’s description for SEO, share on social media, and consider creating tutorials or articles to attract users. + +## Marketing tips for defining the price for your Actor + +It's up to you to set the pricing, of course. It can be as high or low as you wish, you can even make your Actor free. But if you're generally aiming for a successful, popular Actor, here are a few directions: + +### Do market research outside Apify Store + +The easiest way to understand your tool's value is to look around. Are there similar tools on the market? What do they offer, and how much do they charge? What added value does your tool provide compared to theirs? What features can your tool borrow from theirs for the future? + +Try competitor tools yourself (to assess the value and the quality they provide), check their SEO (to see how much traffic they get), and note ballpark figures. Think about what your Actor can do that competitors might be missing. + +Also, remember that your Actor is a package deal with the Apify platform. So all the platform's features automatically transfer onto your Actor and its value. Scheduling, monitoring runs, ways of exporting data, proxies, and integrations can all add value to your Actor (on top of its own functionalities). Be sure to factor this into your tool's value proposition and communicate that to the potential user. + +### Do research in Apify Store + +Apify Store is like any other marketplace, so take a look at your competition there. Are you the first in your lane, or are there other similar tools? What makes yours stand out? Remember, your README is your first impression — communicate your tool's benefits clearly and offer something unique. Competing with other developers is great, but collaborations can drive even better results 😉 + +Learn more about what makes a good readme here: [How to create an Actor README](/academy/actor-marketing-playbook/actor-basics/how-to-create-an-actor-readme) + +### Rental, pay-per-result (PPR), or pay-per-event (PPE) + +Rental pricing is technically easier: you set the rental fee, and the user covers their CU usage. So all you have to define is how much you want to charge the users. With pay-per-result, you’ll need to include both CU usage and your margin. So you have to calculate how much the average run is going to cost for the user + define how much you want to charge them. + +To figure out the average cost per run for users, just run a few test runs and look at the statistics in the Actor [**Analytics**](https://console.apify.com/actors?tab=analytics) tab. + +From an average user's perspective, pay-per-result is often easier to grasp — $25 for a thousand pages, $5 for a thousand videos, $1 for a thousand images, etc. It gives users a clearer idea of what they’re paying for and allows them to estimate faster. But rental pricing has its fans, too — if your tool provides high value, users will come. + +Pay-per-event (PPE) lets you define pricing for individual events. You can charge for specific events directly from your Actor by calling our PPE charging API. The most common events will most likely be Actor start, dataset item, external API calls, etc. PPE is great for users who want to optimize their costs and value transparency. PPE is also a fairer pricing model for integration and AI-driven use cases, where dataset-based pricing doesn’t make sense. + +### Adapt when needed + +Don’t be afraid to experiment with pricing, especially at the start. You can monitor your results in the dashboard and adjust if necessary. + +Keep an eye on SEO as well. If you monitor the volume of the keywords your Actor is targeting as well as how well your Actor's page is ranking for those keywords, you can estimate the number of people who actually end up trying your tool (aka conversion rate). If your keywords are getting volume, but conversions are lower than expected, it might point to a few issues It could be due to your pricing, a verbose README, or complex input. If users are bouncing right away, it makes sense to check out your pricing and your closest competitors to see where adjustments might help. + +### Summary & a basic plan + +Pick a pricing model, run some tests, and calculate your preliminary costs (**Analytics** tab in Console). + +Then check your costs against similar solutions in the Store and the market (try Google search or other marketplaces), and set a price that gives you some margin. + +It’s also normal to adjust pricing as you get more demand. For context, most prices on Apify Store range between $1-10 per 1,000 results. + +Example of useful pricing estimates from the **Analytics** tab: + +![example of pricing estimates in analytics tab](images/analytisc-example.png) + +:::tip Use emails! + +📫 Don't forget to set an email sequence to warn and remind your users about pricing changes. Learn more about emailing your users here: [Emails to Actor users] + +::: + +## Resources + +- Learn about [incentives behind monetization](https://apify.com/partners/actor-developers) +- Detailed guide to [setting up monetization models](https://docs.apify.com/academy/actor-marketing-playbook/monetizing-your-actor) +- Guide to [publishing Actors](https://docs.apify.com/platform/actors/publishing) +- Watch our webinar on how to [build, publish and monetize Actors](https://www.youtube.com/watch?v=4nxStxC1BJM) +- Read a blog post from our CEO on the [reasoning behind monetizing Actors](https://blog.apify.com/make-regular-passive-income-developing-web-automation-actors-b0392278d085/) +- Learn about the [Creator plan](https://apify.com/pricing/creator-plan), which allows you to create and freely test your own Actors for $1 + + + +--- +title: How Apify Store works +description: Learn how to create and publish your own Actor, and join a growing community of innovators in scraping and web automation. +sidebar_position: 1 +category: apify platform +slug: /actor-marketing-playbook/store-basics/how-store-works +--- + +**Out of the 3,000+ Actors on [Apify Store](https://apify.com/store) marketplace, hundreds of them were created by developers just like you. Let's get acquainted with the concept of Apify Store and what it takes to publish an Actor there.** + +--- + +## What are Actors (and why they're called that)? + +[Actors](https://apify.com/actors) are serverless cloud applications that run on the Apify platform, capable of performing various computing tasks on the web, such as crawling websites or sending automated emails. They are developed by independent developers all over the world, and _you can be one of them_. + +The term "Actor" is used because, like human actors, these programs follow a script. This naming convention unifies both web scraping and web automation solutions, including AI agents, under a single term. Actors can range in complexity and function, targeting different websites or performing multiple tasks, which makes the umbrella term very useful. + +## What is Apify Store? + +[Apify Store](https://apify.com/store) is a public library of Actors that is constantly growing and evolving. It's basically a publicly visible (and searchable) part of the Apify platform. With over 3,000 Actors currently available, most of them are created and maintained by the community. Actors that consistently perform well remain on Apify Store, while those reported as malfunctioning or under maintenance are eventually removed. This keeps the tools in our ecosystem reliable, effective, and competitive. + +### Types of Actors + +- _Web scraping Actors_: for instance, [Twitter (X) Scraper](https://apify.com/apidojo/twitter-user-scraper) extracts data from Twitter. +- _Automation Actors_: for example, [Content Checker](https://apify.com/jakubbalada/content-checker) monitors website content for changes and emails you once a change occurs. +- _Bundles_: chains of multiple Actors united by a common data point or target website. For example, [Restaurant Review Aggregator](https://apify.com/tri_angle/restaurant-review-aggregator) can scrape reviews from six platforms at once. + +Learn more about bundles here: [Actor bundles](/academy/actor-marketing-playbook/product-optimization/actor-bundles) + +## Public and private Actors + +Actors on Apify Store can be public or private: + +- _Private Actors_: these are only accessible to you in Apify Console. You can use them without exposing them to the web. However, you can still share the results they produce. +- _Public Actors_: these are available to everyone on Apify Store. You can choose to make them free or set a price. By publishing your web scrapers and automation solutions, you can attract users and generate income. + +## How Actor monetization works (briefly) + +You can monetize your Actors using three different pricing models: + +- Pay for usage: charge based on how much the Actor is used. +- Pay per result: the price is based on the number of results produced, with the first few free. +- Pay per event: the price is based on specific events triggered by the Actor. +- Monthly billing: set a fixed monthly rental rate for using the Actor. + +For detailed information on which pricing model might work for your Actor, refer to [How Actor monetization works](/academy/actor-marketing-playbook/store-basics/how-actor-monetization-works). + +## Actor ownership on Store + +Actors are either created and maintained by Apify or by members of the community: + +- _Maintained by Apify_: created and supported by the Apify team. +- _Maintained by Community_: created and managed by independent developers from the community. + +To see who maintains an Actor, check the upper-right corner of the Actor's page. + +When it comes to managing Actors on Apify, it’s important that every potential community developer understands the differences between Apify-maintained and Community-maintained Actors. Here’s what you need to know to navigate the platform effectively and ensure your work stands out. + +### Community-maintained Actors + +✨ _Features and functionality_: offers a broader range of use cases and features, often tailored to specific needs. Great for exploring unique or niche applications. + +🧑‍💻 _Ownership_: created and maintained by independent developers like you. + +🛠 _Maintenance_: you’re responsible for all updates, bug fixes, and ongoing maintenance. Apify hosts your Actor but does not manage its code. + +👷‍♀️ _Reliability and testing_: it’s up to you to ensure your Actor’s reliability and performance. + +☝️ _Support and Issues_: Apify does not provide direct support for Community-maintained Actors. You must manage issues through the Issues tab, where you handle user queries and problems yourself. + +✍️ _Documentation_: you’re responsible for creating and maintaining documentation for your Actor. Make sure it’s clear and helpful for users. + +:::tip Test your Actor! + +For the best results, make sure your Actor is well-documented and thoroughly tested. Engage with users through the Issues tab to address any problems promptly. By maintaining high standards and being proactive, you’ll enhance your Actor’s reputation and usability in Apify Store. + +::: + +## Importance of Actor testing and reliability + +It's essential to test your Actors and make sure they work as intended. That's why Apify does it on our side as much as you should do it on yours. + +Apify runs automated tests daily to ensure all Actors on Apify Store are functional and reliable. These tests check _if an Actor can successfully run with its default input within 5 minutes_. If an Actor fails for three consecutive days, it’s labeled under maintenance, and the developer is notified. Continuous failures for another 28 days lead to deprecation. + +To restore an Actor's health, developers should fix and rebuild it. The testing system will automatically recognize the changes within 24 hours. If your Actor requires longer run times or authentication, contact support to explain why it should be excluded from tests. For more control, you can implement your own tests using the Actor Testing tool available on Apify Store. + +### Actor metrics and reliability score + +On the right panel of each Actor on Store, you can see a list of Actor metrics. + +Actor metrics such as the number of monthly users, star ratings, success rates, response times, creation dates, and recent modifications collectively offer insights into its reliability. Basically, they serve as a _shorthand for potential users to assess your Actor's reliability_ before even trying it out. + +A high number of monthly users indicates widespread trust and effective performance, while a high star rating reflects user satisfaction. A success rate nearing 100% demonstrates consistent performance. Short response times show a commitment to addressing issues promptly, though quicker responses are ideal. A recent creation date suggests modern features and ongoing development, while recent modifications point to active maintenance and continuous improvements. Together, these metrics provide a comprehensive view of an Actor’s reliability and quality. + +### Reporting Issues in Actors + +Each Actor has an **Issues** tab in Apify Console and on the web. Here, users can open an issue (ticket) and engage in discussions with the Actor's creator, platform admins, and other users. The tab is ideal for asking questions, requesting new features, or providing feedback. + +Since the **Issues** tab is public, the level of activity — or lack thereof — can be observed by potential users and may serve as an indicator of the Actor's reliability. A well-maintained Issues tab with prompt responses suggests an active and dependable Actor. + +Learn more about how to handle the [Issues tab](/academy/actor-marketing-playbook/interact-with-users/issues-tab) + +## Resources + +- Best practices on setting up [testing for your Actor](https://docs.apify.com/platform/actors/publishing/test) +- What are Apify-maintained and [Community-maintained Actors](https://help.apify.com/en/articles/6999799-what-are-apify-maintained-and-community-maintained-actors)? On ownership, maintenance, features, and support +- Step-by-step guide on how to [publish your Actor](https://docs.apify.com/platform/actors/publishing) +- Watch our webinar on how to [build, publish and monetize Actors](https://www.youtube.com/watch?v=4nxStxC1BJM) +- Detailed [guide on pricing models](https://docs.apify.com/platform/actors/running/actors-in-store) for Actors in Store + + + +--- +title: How to build Actors +description: Learn how to create web scrapers and automation tools on Apify. Use universal scrapers for quick setup, code templates for a head start, or SDKs and libraries for full control. +sidebar_position: 2 +category: apify platform +slug: /actor-marketing-playbook/store-basics/how-to-build-actors +--- + +At Apify, we try to make building web scraping and automation straightforward. You can customize our universal scrapers with JavaScript for quick tweaks, use our code templates for rapid setup in JavaScript, TypeScript, or Python, or build from scratch using our JavaScript and Python SDKs or Crawlee libraries for Node.js and Python for ultimate flexibility and control. This guide offers a quick overview of our tools to help you find the right fit for your needs. + +## Three ways to build Actors + +1. [Our universal scrapers](https://apify.com/store/scrapers/universal-web-scrapers) — customize our boilerplate tools to your needs with a bit of JavaScript and setup. +2. [Our code templates](https://apify.com/templates) for web scraping projects — for a quick project setup to save you development time (includes JavaScript, TypeScript, and Python templates). +3. Open-source libraries and SDKs + 1. [JavaScript SDK](https://docs.apify.com/sdk/js/) & [Python SDK](https://docs.apify.com/sdk/python/) — for creating your own solution from scratch on the Apify platform using our free development kits. Involves more coding but offers infinite flexibility. + 2. [Crawlee](https://crawlee.dev/) and [Crawlee for Python](https://crawlee.dev/python) — for creating your own solutions from scratch using our free web automation libraries. Involves even more coding but offers infinite flexibility. There’s also no need to host these on the platform. + +## Universal scrapers & what are they for + +[Universal scrapers](https://apify.com/scrapers/universal-web-scrapers) were built to provide an intuitive UI plus configuration that will help you start extracting data as quickly as possible. Usually, you just provide a [simple JavaScript function](https://docs.apify.com/tutorials/apify-scrapers/getting-started#the-page-function) and set up one or two parameters, and you're good to go. + +Since scraping and automation come in various forms, we decided to build not just one, but _six_ scrapers. This way, you can always pick the right tool for the job. Let's take a look at each particular tool and its advantages and disadvantages. + +| Scraper | Technology | Advantages | Disadvantages | Best for | +| --- | --- | --- | --- | --- | +| 🌐 Web Scraper | Headless Chrome Browser | Simple, fully JavaScript-rendered pages | Executes only client-side JavaScript | Websites with heavy client-side JavaScript | +| 👐 Puppeteer Scraper | Headless Chrome Browser | Powerful Puppeteer functions, executes both server-side and client-side JavaScript | More complex | Advanced scraping with client/server-side JS | +| 🎭 Playwright Scraper | Cross-browser support with Playwright library | Cross-browser support, executes both server-side and client-side JavaScript | More complex | Cross-browser scraping with advanced features | +| 🍩 Cheerio Scraper | HTTP requests + Cheerio parser (JQuery-like for servers) | Simple, fast, cost-effective | Pages may not be fully rendered (lacks JavaScript rendering), executes only server-side JavaScript | High-speed, cost-effective scraping | +| ⚠️ JSDOM Scraper | JSDOM library (Browser-like DOM API) | + Handles client-side JavaScript
+ Faster than full-browser solutions
+ Ideal for light scripting | Not for heavy dynamic JavaScript, executes server-side code only, depends on pre-installed NPM modules | Speedy scraping with light client-side JS | +| 🍲 BeautifulSoup Scraper | Python-based, HTTP requests + BeautifulSoup parser | Python-based, supports recursive crawling and URL lists | No full-featured web browser, not suitable for dynamic JavaScript-rendered pages | Python users needing simple, recursive crawling | + +### How do I choose the right universal web scraper to start with? + +🎯 Decision points: + +- Use 🌐 [Web Scraper](https://apify.com/apify/web-scraper) if you need simplicity with full browser capabilities and client-side JavaScript rendering. +- Use 🍩 [Cheerio Scraper](https://apify.com/apify/cheerio-scraper) for fast, cost-effective scraping of static pages with simple server-side JavaScript execution. +- Use 🎭 [Playwright Scraper](https://apify.com/apify/playwright-scraper) when cross-browser compatibility is crucial. +- Use 👐 [Puppeteer Scraper](https://apify.com/apify/puppeteer-scraper) for advanced, powerful scraping where you need both client-side and server-side JavaScript handling. +- Use ⚠️ [JSDOM Scraper](https://apify.com/apify/jsdom-scraper) for lightweight, speedy scraping with minimal client-side JavaScript requirements. +- Use 🍲 [BeautifulSoup Scraper](https://apify.com/apify/beautifulsoup-scraper) for Python-based scraping, especially with recursive crawling and processing URL lists. + + +To make it easier, here's a short questionnaire that guides you on selecting the best scraper based on your specific use case: + +
+ Questionnaire + 1. Is the website content rendered with a lot of client-side JavaScript? + - Yes: + - Do you need full browser capabilities? + - Yes: use Web Scraper or Playwright Scraper + - No, but I still want advanced features: use Puppeteer Scraper + - No: + - Do you prioritize speed and cost-effectiveness? + - Yes: use Cheerio Scraper + - No: use JSDOM Scraper + 2. Do you need cross-browser support for scraping? + - Yes:** use Playwright Scraper + - No:** continue to the next step. + 3. Is your preferred scripting language Python?** + - Yes:** use BeautifulSoup Scraper + - No:** continue to the next step. + 4. Are you dealing with static pages or lightweight client-side JavaScript?** + - Static pages: use Cheerio Scraper or BeautifulSoup Scraper + - Light client-side JavaScript: + - Do you want a balance between speed and client-side JavaScript handling? + - Yes: use JSDOM Scraper + - No: use Web Scraper or Puppeteer Scraper + 5. Do you need to support recursive crawling or process lists of URLs? + - Yes, and I prefer Python: use BeautifulSoup Scraper + - Yes, and I prefer JavaScript: use Web Scraper or Cheerio Scraper + - No: choose based on other criteria above. + +This should help you navigate through the options and choose the right scraper based on the website’s complexity, your scripting language preference, and your need for speed or advanced features. + +
+ + +📚 Resources: + +- How to use [Web Scraper](https://www.youtube.com/watch?v=5kcaHAuGxmY) to scrape any website +- How to use [Beautiful Soup](https://www.youtube.com/watch?v=1KqLLuIW6MA) to scrape the web +- Learn about our $1/month [Creator plan](https://apify.com/pricing/creator-plan) that encourages devs to build Actors based on universal scrapers + +## Web scraping code templates + +Similar to our universal scrapers, our [code templates](https://apify.com/templates) also provide a quick start for developing web scrapers, automation scripts, and testing tools. Built on popular libraries like BeautifulSoup for Python or Playwright for JavaScript, they save time on setup, allowing you to focus on customization. Though they require more coding than universal scrapers, they're ideal for those who want a flexible foundation while still needing room to tailor their solutions. + +| Code template | Supported libraries | Purpose | Pros | Cons | +| --- | --- | --- | --- | --- | +| 🐍 Python | Requests, BeautifulSoup, Scrapy, Selenium, Playwright | Creating scrapers Automation Testing tools | - Simplifies setup - Supports major Python libraries | - Requires more manual coding (than universal scrapers)- May be restrictive for complex tasks | +| ☕️ JavaScript | Playwright, Selenium, Cheerio, Cypress, LangChain | Creating scrapers Automation Testing tools | - Eases development with pre-set configurations - Flexibility with JavaScript and TypeScript | - Requires more manual coding (than universal scrapers)- May be restrictive for tasks needing full control | + + +📚 Resources: + +- [How to build a scraper](https://www.youtube.com/watch?v=u-i-Korzf8w) using a web scraper template. + +## Toolkits and libraries + +### Apify JavaScript and Python SDKs + +[Apify SDKs](https://docs.apify.com/sdk/js/) are designed for developers who want to interact directly with the Apify platform. It allows you to perform tasks like saving data in Apify Datasets, running Apify Actors, and accessing the key-value store. Ideal for those who are familiar with [Node.js](https://docs.apify.com/sdk/js/) and [Python](https://docs.apify.com/sdk/python/), SDKs provide the tools needed to develop software specifically on the Apify platform, offering complete freedom and flexibility within the JavaScript ecosystem. + +- _Best for_: interacting with the Apify platform (e.g., saving data, running Actors, etc) +- _Pros_: full control over platform-specific operations, integrates seamlessly with Apify services +- _Cons_: requires writing boilerplate code, higher complexity with more room for errors + +### Crawlee + +[Crawlee](https://crawlee.dev/) (for both Node.js and [Python](https://crawlee.dev/python)) is a powerful web scraping library that focuses on tasks like extracting data from web pages, automating browser interactions, and managing complex scraping workflows. Unlike the Apify SDK, Crawlee does not require the Apify platform and can be used independently for web scraping tasks. It handles complex operations like concurrency management, auto-scaling, and request queuing, allowing you to concentrate on the actual scraping tasks. + +- _Best for_: web scraping and automation (e.g., scraping paragraphs, automating clicks) +- _Pros_: full flexibility in web scraping tasks, does not require the Apify platform, leverages the JavaScript ecosystem +- _Cons_: requires more setup and coding, higher chance of mistakes with complex operations + +### Combining Apify SDK and Crawlee + +While these tools are distinct, they can be combined. For example, you can use Crawlee to scrape data from a page and then use the Apify SDK to save that data in an Apify dataset. This integration allows developers to make use of the strengths of both tools while working within the Apify ecosystem. + +📚 Resources: + +- Introduction to [Crawlee](https://www.youtube.com/watch?v=g1Ll9OlFwEQ) +- Crawlee [blog](https://crawlee.dev/blog) +- Webinar on scraping with [Crawlee 101](https://www.youtube.com/watch?v=iAk1mb3v5iI): how to create scrapers in JavaScript and TypeScript +- Step-by-step video guide: [building an Amazon Scraper](https://www.youtube.com/watch?v=yTRHomGg9uQ) in Node.js with Crawlee +- Webinar on how to use [Crawlee Python](https://www.youtube.com/watch?v=ip8Ii0eLfRY) +- Introduction to Apify's [Python SDK](https://www.youtube.com/watch?v=C8DmvJQS3jk) + + +## Code templates vs. universal scrapers vs. libraries + +Basically, the choice here depends on how much flexibility you need and how much coding you're willing to do. More flexibility → more coding. + +[Universal scrapers](https://apify.com/scrapers/universal-web-scrapers) are simple to set up but are less flexible and configurable. Our [libraries](https://crawlee.dev/), on the other hand, enable the development of a standard [Node.js](https://nodejs.org/) or Python application, so be prepared to write a little more code. The reward for that is almost infinite flexibility. + +[Code templates](https://apify.com/templates) are sort of a middle ground between scrapers and libraries. But since they are built on libraries, they are still on the rather more coding than less coding side. They will only give you a starter code to begin with. So please take this into account when choosing the way to build your scraper, and if in doubt — just ask us, and we'll help you out. + +## Switching sides: how to transfer an existing solution from another platform + +You can also take advantage of the Apify platform's features without having to modify your existing scraping or automation solutions. + +### Integrating Scrapy spiders + +The Apify platform fully supports Scrapy spiders. By [deploying your existing Scrapy code to Apify](https://apify.com/run-scrapy-in-cloud), you can take advantage of features like scheduling, monitoring, scaling, and API access, all without needing to modify your original spider. This process is made easy with the [Apify CLI](https://docs.apify.com/cli/), which allows you to convert your Scrapy spider into an Apify Actor with just a few commands. Once deployed, your spider can run in the cloud, offering a reliable and scalable solution for your web scraping needs. + +Additionally, you can monetize your spiders by [publishing them as Actors](https://apify.com/partners/actor-developers) on Apify Store, potentially earning passive income from your work while benefiting from the platform’s extensive features. + +### ScrapingBee, ScrapingAnt, ScraperAPI + +To make the transition from these platforms easier, we've also created [SuperScraper API](https://apify.com/apify/super-scraper-api). This API is an open-source REST API designed for scraping websites by simply passing a URL and receiving the rendered HTML content in return. This service functions as a cost-effective alternative to other scraping services like ScrapingBee, ScrapingAnt, and ScraperAPI. It supports dynamic content rendering with a headless browser, can use various proxies to avoid blocking, and offers features such as capturing screenshots of web pages. It is ideal for large-scale scraping tasks due to its scalable nature. + +To use SuperScraper API, you can deploy it with an Apify API token and access it via HTTP requests. The API supports multiple parameters for fine-tuning your scraping tasks, including options for rendering JavaScript, waiting for specific elements, and handling cookies and proxies. It also allows for custom data extraction rules and JavaScript execution on the scraped pages. Pricing is based on actual usage, which can be cheaper or more expensive than competitors, depending on the configuration. + +📚 Resources: + +- [How to integrate Scrapy projects](https://docs.apify.com/cli/docs/integrating-scrapy) +- Scrapy monitoring: how to [manage your Scrapy spider on Apify](https://blog.apify.com/scrapy-monitoring-spidermon/) +- Run ScrapingBee, ScraperAPI, and ScrapingAnt on Apify — [SuperScraper API Tutorial](https://www.youtube.com/watch?v=YKs-I-2K1Rg) + +## General resources + +- Creating your Actor: [Actor sources](https://docs.apify.com/academy/getting-started/creating-actors) +- Use it, build it or buy it? [Choosing the right solution on Apify](https://help.apify.com/en/articles/3024655-choosing-the-right-solution) +- How to programmatically retrieve data with the [Apify API](https://www.youtube.com/watch?v=ViYYDHSBAKM&t=0s) +- Improved way to [build your scrapers from a Git repo](https://www.youtube.com/watch?v=8QJetr-BYdQ) +- Webinar on [how to build and monetize Actors](https://www.youtube.com/watch?v=4nxStxC1BJM) on Apify Store +- 6 things you should know before buying or [building a web scraper](https://blog.apify.com/6-things-to-know-about-web-scraping/) +- For a comprehensive guide on creating your first Actor, visit the [Apify Academy](https://docs.apify.com/academy). +
+ + +--- +title: Ideas page and its use +description: Learn where you can draw inspiration for your Actors. +sidebar_position: 4 +category: apify platform +slug: /actor-marketing-playbook/store-basics/ideas-page +--- + +So you want to build an Actor and publish it on Apify Store. Where should you start? How can you make people want to use it? + +To generate new Actor ideas, you can draw from your experience. You can also use SEO tools to discover relevant search terms and explore sites related to web scraping, automation, or integrations. But for direct inspiration straight from Apify, check out our Actor [Ideas page](https://apify.com/ideas) to see what data extraction tools are trending in the Apify community. Let's see how you can both use and contribute to this valuable resource. + +--- + +## What's the Ideas page? + +The [Ideas page](https://apify.com/ideas) is where users can submit and explore potential projects for Actors, including scrapers, integrations, and automations. It serves as a collaborative space for proposing new tool ideas and finding inspiration for building and developing web scraping and automation solutions. + +## How you, as a developer, can use the Ideas page + +Got an innovative Actor idea or unsure what to build next? The Apify Ideas page is your go-to destination for submitting, developing, and claiming Actor concepts. If you're a developer ready to build an Actor using the Apify Ideas page, here’s how you can get involved: + +1. _Browse the Ideas page_
+ Check out the [Ideas page](https://apify.com/ideas) to find ideas that interest you. Look for ideas that align with your skills and the kind of Actor you want to build. +2. _Select an idea_
+ Once you’ve found a promising idea, review the details and requirements provided. If you see an idea you want to develop, make sure to check its current status. If it’s marked as **Open to develop**, you’re good to go. +3. _Develop your Actor_
+ Start building your Actor based on the idea. You don’t need to notify Apify about your development process. Focus on creating a functional and well-documented tool. +4. _Prepare for launch_
+ Once your Actor is ready, ensure it meets all quality standards and has a comprehensive README. This documentation should include installation instructions, usage details, and any other relevant information. +5. _Publish your Actor_
+ Deploy your Actor on Apify Store. Make sure it’s live and accessible for users. +6. _Claim your idea_
+ After your Actor is published, email [ideas@apify.com](mailto:ideas@apify.com) with the URL of your Actor and the original idea. This will allow us to tag the idea as Completed and link it to your new Actor, giving you credit and visibility. +7. _Monitor and optimize_
+ Make sure to monitor your Actor’s performance and user feedback. Use this information to make improvements and keep your Actor up to date. + +By following these steps, you’ll be able to contribute to the community while also gaining recognition for your work. + +## Criteria for claiming an idea + +To claim an idea, ensure that: + +1. Your Actor is functional. +2. Your README contains relevant information. +3. Your Actor closely aligns with the original idea. + +## Giving back to the Ideas page + +The Ideas page at Apify offers a variety of concepts for scrapers, integrations, and automations, and is a great place to find inspiration or solutions. It’s also a platform where you can contribute your own ideas to drive innovation and growth in our community. + +1. _Submit your Ideas_
+ Got a great Actor concept? Share it with us through the [Ideas form](https://apify.typeform.com/to/BNON8poB#source=ideas). Provide clear details about what your tool should do and how it should work. +2. _Engage with the community_
+ Upvote ideas you find intriguing. The more support an idea receives, the more likely it is to catch a developer’s eye and move forward. +3. _Don’t forget to claim your idea_
+ Once your Actor is up and running, claim your idea by emailing [ideas@apify.com](mailto:ideas@apify.com) with your Actor's URL and the original idea. We’ll mark your idea as **Completed** and link it to your Actor - a signal to the other developers that this tool already exists on Apify Store. + +## Multiple developers for one idea + +No problem! Apify Store can host multiple Actors with similar functions. However, we go by the “first come - first served” rule, so the first developer to claim an idea will receive the **Completed** tag and a link from the Ideas page. + +Remember that Apify Store is just like any other marketplace. We believe that competition helps developers thrive and improve upon their code, especially when there are similar scrapers on the horizon! You can still build the Actor, but try to be imaginative when it comes to its set of features. +
+ + +--- +title: Actor marketing playbook +description: Learn how to optimize and monetize your Actors on Apify Store by sharing them with other platform users. +sidebar_position: 10 +category: apify platform +slug: /actor-marketing-playbook +--- + +**Learn how to optimize and monetize your Actors on Apify Store by sharing them with other platform users.** + +--- + +import Card from '@site/src/components/Card'; +import CardGrid from '@site/src/components/CardGrid'; + +[Apify Store](https://apify.com/store) is a marketplace featuring thousands of ready-made automation tools called Actors. As a developer, you can publish your own Actors and generate revenue through our [monetization program](https://apify.com/partners/actor-developers). + +To help you succeed, we've created a comprehensive Actor marketing playbook. You'll learn how to: + +- Optimize your Actor's visibility on Apify Store +- Create compelling descriptions and documentation +- Build your developer brand +- Promote your work to potential customers +- Analyze performance metrics +- Engage with the Apify community + +## Apify Store basics + + + + + + + + + +## Actor basics + + + + + + + + + +## Promoting your Actor + + + + + + + + + + + +## Interacting with users + + + + + + + +## Product optimization + + + + + +
+Ready to grow your presence on the Apify platform? Check out our guide to [publishing your first Actor](/platform/actors/publishing). +
+ + +--- +title: Monetizing your Actor +description: Learn how you can monetize your web scraping and automation projects by publishing Actors to users in Apify Store. +sidebar_position: 5 +slug: /get-most-of-actors/monetizing-your-actor +unlisted: true +--- + +**Learn how you can monetize your web scraping and automation projects by publishing Actors to users in Apify Store.** + +--- + +When you publish your Actor on the Apify platform, you have the option to make it a _Paid Actor_ and earn revenue from users who benefit from your tool. You can choose between two pricing models: + +- Rental +- Pay-per-result + +## Rental pricing model + +With the rental model, you can specify a free trial period and a monthly rental price. After the trial, users with an [Apify paid plan](https://apify.com/pricing) can continue using your Actor by paying the monthly fee. You can receive 80% of the total rental fees collected each month. + +
+ Example - rental pricing model + +You make your Actor rental with 7 days free trial and then $30/month. During the first calendar month, three users start to use your Actor: + +1. First user, on Apify paid plan, starts the free trial on 15th +2. Second user, on Apify paid plan, starts the free trial on 25th +3. Third user, on Apify free plan, start the free trial on 20th + +The first user pays their first rent 7 days after the free trial, i.e., on 22nd. The second user only starts paying the rent next month. The third user is on Apify free plan, so after the free trial ends on 27th, they are not charged and cannot use the Actor further until they get a paid plan. Your profit is computed only from the first user. They were charged $30, so 80% of this goes to you, i.e., _0.8 * 30 = $24_. +
+ +## Pay-per-result pricing model + +In this model, you set a price per 1000 results. Users are charged based on the number of results your Actor produces. Your profit is calculated as 80% of the revenue minus platform usage costs. The formula is: + +`(0.8 * revenue) - costs = profit` + +### Pay-per-result unit pricing for cost computation + +| Service | Unit price | +|:--------------------------------|:---------------------------| +| Compute unit | **$0.4** / CU | +| Residential proxies | **$13** / GB | +| SERPs proxy | **$3** / 1,000 SERPs | +| Data transfer - external | **$0.20** / GB | +| Data transfer - internal | **$0.05** / GB | +| Dataset - timed storage | **$1.00** / 1,000 GB-hours | +| Dataset - reads | **$0.0004** / 1,000 reads | +| Dataset - writes | **$0.005** / 1,000 writes | +| Key-value store - timed storage | **$1.00** / 1,000 GB-hours | +| Key-value store - reads | **$0.005** / 1,000 reads | +| Key-value store - writes | **$0.05** / 1,000 writes | +| Key-value store - lists | **$0.05** / 1,000 lists | +| Request queue - timed storage | **$4.00** / 1,000 GB-hours | +| Request queue - reads | **$0.004** / 1,000 reads | +| Request queue - writes | **$0.02** / 1,000 writes | + + +Only revenue & cost for Apify customers on paid plans are taken into consideration when computing your profit. Users on free plans are not reflected there, although you can see statistics about the potential reeenue of users that are currently on free plans in Actor Insights in the Apify Console. + +:::note What are Gigabyte-hours? + +Gigabyte-hours (GB-hours) are a unit of measurement used to quantify data storage and processing capacity over time. To calculate GB-hours, multiply the amount of data in gigabytes by the number of hours it's stored or processed. + +For example, if you host 50GB of data for 30 days: + +- Convert days to hours: _30 * 24 = 720_ +- Multiply data size by hours: _50 * 720 = 36,000_ + +This means that storing 50 GB of data for 30 days results in 36,000 GB-hours. +::: + +Read more about Actors in the Store and different pricing models from the perspective of your users in the [Store documentation](https://docs.apify.com/platform/actors/running/actors-in-store). + +
+Example - pay-per-result pricing model + +You make your Actor pay-per-result and set price to be $1/1,000 results. During the first month, two users on Apify paid plans use your Actor to get 50,000 and 20,000 results, costing them $50 and $20 respectively. Let's say the underlying platform usage for the first user is $5 and for the second $2. Third user, this time on Apify free plan, uses the Actor to get 5,000 results, with underlying platform usage $0.5. + +Your profit is computed only from the first two users, since they are on Apify paid plans. The revenue for the first user is $50 and for the second $20, i.e., total revenue is $70. The total underlying cost is _$5 + $2 = $7_. Since your profit is 80% of the revenue minus the cost, it would be _0.8 * 70 - 7 = $49_. +
+ +### Best practices for Pay-per-results Actors + +To ensure profitable operation: + +- Set memory limits in your [`actor.json`](https://docs.apify.com/platform/actors/development/actor-definition/actor-json) file to control platform usage costs +- Implement the `ACTOR_MAX_PAID_DATASET_ITEMS` check to prevent excess result generation +- Test your Actor with various result volumes to determine optimal pricing + +## Setting up monetization + +Navigate to your [Actor page](https://console.apify.com/actors?tab=my) in the Apify Console choose Actor that you want to monetize, and select the Publication tab. +![Monetization section](./images/monetization-section.png) +Open the Monetization section and complete your billing and payment details. +![Set up monetization](./images/monetize_actor_set_up_monetization.png) +Follow the monetization wizard to configure. Follow the monetization wizard to configure your pricing model. +![Monetization wizard](./images/monetization_wizard.png) + +### Changing monetization + +You can change the monetization setting of your Actor by using the same wixard as for the setup in the **Monetization** section of your Actor's **Publication** tab. Any changes made to an already published Actor will take _14 days_ to come in effect, so that the users of your Actor have time to prepare. + +:::important Frequency of monetization adjustments + +Be aware that you can change monetization setting of each Actor only once per month. For further information & guidelines please refer to our [Terms & Conditions](https://apify.com/store-terms-and-conditions) + +::: + + +## Payouts & analytics + +Payout invoices are generated automatically on the 14th of each month. Review your invoice in the Settings > Payout section within one week. If not approved by the 20th, the system will auto-approve on the 21st. + +Track your Actor's performance through: + +- The payout section for financial records +- Actor Analytics for usage statistics + + ![Actor analytics](./images/actor_analytics.png) + +- Individual Actor Insights for detailed performance metrics + + ![Actor insights](./images/actor-insights.png) + +## Promoting your Actor + +Create SEO-optimized descriptions and README files to improve search engine visibility. Share your Actor on multiple channels: + +- Post on Reddit, Quora, and social media platform +- Create tutorial videos demonstrating key features +- Publish articles about your Actor on relevant websites +- Consider creating a product showcase on platforms like Product hunt + + +Remember to tag Apify in your social media posts for additional exposure. Effective promotion can significantly impact your Actor's success, differentiating between those with many paid users and those with few to none. + +Learn more about promoting your Actor from [Apify's Marketing Playbook](https://apify.notion.site/3fdc9fd4c8164649a2024c9ca7a2d0da?v=6d262c0b026d49bfa45771cd71f8c9ab). +
+ + +--- +title: Actors +description: What is an Actor? How do we create them? Learn the basics of what Actors are, how they work, and try out an Actor yourself right on the Apify platform! +sidebar_position: 1 +slug: /getting-started/actors +--- + +# Actors {#actors} + +**What is an Actor? How do we create them? Learn the basics of what Actors are, how they work, and try out an Actor yourself right on the Apify platform!** + +--- + +After you've followed the **Getting started** lesson, you're almost ready to start creating some Actors! But before we get into that, let's discuss what an Actor is, and a bit about how they work. + +## What's an Actor? {#what-is-an-actor} + +When you deploy your script to the Apify platform, it is then called an **Actor**, which is a [serverless microservice](https://www.datadoghq.com/knowledge-center/serverless-architecture/serverless-microservices/#:~:text=Serverless%20microservices%20are%20cloud-based,suited%20for%20microservice-based%20architectures.) that accepts an input and produces an output. Actors can run for a few seconds, hours or even infinitely. An Actor can perform anything from a basic action such as filling out a web form or sending an email, to complex operations such as crawling an entire website and removing duplicates from a large dataset. + +Once an Actor has been pushed to the Apify platform, they can be shared to the world through the [Apify Store](https://apify.com/store), and even monetized after going public. + +> Though the majority of Actors that are currently on the Apify platform are scrapers, crawlers, or automation software, Actors are not limited to scraping. They can be any program running in a Docker container. + +## Actors on the Apify platform {#actors-on-platform} + +For a super quick and dirty understanding of what a published Actor looks like, and how it works, let's run an SEO audit of **apify.com** using the [SEO audit Actor](https://apify.com/misceres/seo-audit-tool). + +On the front page of the Actor, click the green **Try for free** button. If you're logged into your Apify account which you created during the [**Getting started**](./index.md) lesson, you'll be taken to the Apify Console and greeted with a page that looks like this: + +![Actor configuration](./images/seo-actor-config.png) + +This is where we can provide input to the Actor. The defaults here are just fine, so we'll leave it as is and click the green **Start** button to run it. While the Actor is running, you'll see it log some information about itself. + +![Actor logs](./images/actor-logs.jpg) + +After the Actor has completed its run (you'll know this when you see **SEO audit for apify.com finished.** in the logs), the results of the run can be viewed by clicking the **Results** tab, then subsequently the **View in another tab** option under **Export**. + +## The "Actors" tab {#actors-tab} + +While still on the platform, click on the tab with the **< >** icon which says **Actors**. This tab is your one-stop-shop for seeing which Actors you've used recently, and which ones you've developed yourself. You will be frequently using this tab when developing and testing on the Apify platform. + +![The "Actors" tab on the Apify platform](./images/actors-tab.jpg) + +Now that you know the basics of what Actors are and how to use them, it's time to develop **an Actor of your own**! + +## Next up {#next} + +Get ready, because in the [next lesson](./creating_actors.md), you'll be writing your very own Actor! + + + +--- +title: Apify API +description: Learn how to use the Apify API to programmatically call your Actors, retrieve data stored on the platform, view Actor logs, and more! +sidebar_position: 4 +slug: /getting-started/apify-api +--- + +# The Apify API {#the-apify-api} + +**Learn how to use the Apify API to programmatically call your Actors, retrieve data stored on the platform, view Actor logs, and more!** + +--- + +[Apify's API](/api/v2#/reference) is your ticket to the Apify platform without even needing to access the [Apify Console](https://console.apify.com?asrc=developers_portal) web-interface. The API is organized around RESTful HTTP endpoints. + +In this lesson, we'll be learning how to use the Apify API to call an Actor and view its results. We'll be using the Actor we created in the previous lesson, so if you haven't already gotten that one set up, go ahead do that before moving forward if you'd like to follow along. + +## Finding your endpoint {#finding-your-endpoint} + +Within one of your Actors on the [Apify Console](https://console.apify.com?asrc=developers_portal) (we'll use the **adding-actor** from the previous lesson), click on the **API** button in the top right-hand corner: + +![The "API" button on an Actor's page on the Apify Console](./images/api-tab.jpg) + +You should see a long list of API endpoints that you can copy and paste elsewhere, or even test right within the **API** modal. Go ahead and copy the endpoint labeled **Run Actor synchronously and get dataset items**. It should look something like this: + +```text +https://api.apify.com/v2/acts/YOUR_USERNAME~adding-actor/run-sync?token=YOUR_TOKEN +``` + +> In this lesson, we'll only be focusing on this one endpoint, as it is the most popularly used one; however, don't let this limit your curiosity! Take a look at the other endpoints in the **API** window to learn about everything you can do to your Actor programmatically. + +Now, let's move over to our favorite HTTP client (in this lesson we'll use [Insomnia](../../glossary/tools/insomnia.md) in order to prepare and send the request). + +## Providing input {#providing-input} + +Our **adding-actor** takes in two input values (`num1` and `num2`). When using the Actor on the platform, provide these fields either through the UI generated by the **INPUT_SCHEMA.json**, or directly in JSON format. When providing input when making an API call to run an Actor, the input must be provided in the **body** of the POST request as a JSON object. + +![Providing input](./images/provide-input.jpg) + +## Parameters {#parameters} + +Let's say we want to run our **adding-actor** via API and view its results in CSV format at the end. We'll achieve this by passing the **format** parameter with a value of **csv** to change the output format: + +```text +https://api.apify.com/v2/acts/YOUR_USERNAME~adding-actor/run-sync-get-dataset-items?token=YOUR_TOKEN_HERE&format=csv +``` + +Additional parameters can be passed to this endpoint. You can learn about them [here](/api/v2#/reference/actors/run-actor-synchronously-and-get-dataset-items/run-actor-synchronously-with-input-and-get-dataset-items) + +> Network components can record visited URLs, so it's more secure to send the token as a HTTP header, not as a parameter. The header should look like `Authorization: Bearer YOUR_TOKEN`. Popular HTTP clients, such as [Postman](../../glossary/tools/postman.md) or [Insomnia](../../glossary/tools/insomnia.md), provide a convenient way to configure the Authorization header for all your API requests. + +## Sending the request {#sending-the-request} + +If you're not using an HTTP client, you can send the request through your terminal with this command: + +```curl +curl -d '{"num1":1, "num2":8}' -H "Content-Type: application/json" -X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~adding-actor/run-sync-get-dataset-items?token=YOUR_TOKEN_HERE&format=csv" +``` + +Here's the response we got: + +![API response](./images/api-csv-response.png) + +And there it is! The Actor was run with our inputs of **num1** and **num2**, then the dataset results were returned back to us in CSV format. + +## Apify API's many features {#api-many-features} + +What we've done in this lesson only scratches the surface of what the Apify API can do. Right from Insomnia, or from any HTTP client, you can [manage datasets](/api/v2#/reference/datasets/dataset/get-dataset) and [key-value stores](/api/v2#/reference/key-value-stores/key-collection/get-dataset), [add to request queues](/api/v2#/reference/request-queues/queue-collection/add-request), [update Actors](/api/v2#/reference/actors/actor-object/add-request), and much more! Basically, whatever you can do on the platform's web interface, you also do through the API. + +## Next up {#next} + +[Next up](./apify_client.md), we'll be learning about how to use Apify's JavaScript and Python clients to interact with the API right within our code. + + + + + +--- +title: Apify client +description: Interact with the Apify API in your code by using the apify-client package, which is available for both JavaScript and Python. +sidebar_position: 5 +slug: /getting-started/apify-client +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# Apify client {#apify-client} + +**Interact with the Apify API in your code by using the apify-client package, which is available for both JavaScript and Python.** + +--- + +Now that you've gotten your toes wet with interacting with the Apify API through raw HTTP requests, you're ready to become familiar with the **Apify client**, which is a package available for both JavaScript and Python that allows you to interact with the API in your code without explicitly needing to make any GET or POST requests. + +This lesson will provide code examples for both Node.js and Python, so regardless of the language you are using, you can follow along! + +## Examples {#examples} + +You can access `apify-client` examples in the Console Actor detail page. Click the **API** button and then the **API Client** dropdown button. + +![API button](./images/api-button.png) + +## Installing and importing {#installing-and-importing} + +If you are going to use the client in Node.js, use this command within one of your projects to install the package through npm: + +```shell +npm install apify-client +``` + +In Python, you can install it from PyPI with this command: + +```shell +pip install apify-client +``` + +After installing the package, let's make a file named **client** and import the Apify client like so: + + + + +```js +// client.js +import { ApifyClient } from 'apify-client'; +``` + + + + +```py +# client.py +from apify_client import ApifyClient + +``` + + + + +## Running an Actor {#running-an-actor} + +In the last lesson, we ran the **adding-actor** and retrieved its dataset items. That's exactly what we're going to do now; however, by using the Apify client instead. + +Before we can use the client though, we must create a new instance of the `ApifyClient` class and pass it our API token from the [**Integrations** page](https://console.apify.com/account?tab=integrations&asrc=developers_portal) on the Apify Console: + + + + +```js +const client = new ApifyClient({ + token: 'YOUR_TOKEN', +}); +``` + + + + +```py +client = ApifyClient(token='YOUR_TOKEN') + +``` + + + + +> If you are planning on publishing your code to a public GitHub/Gitlab repository or anywhere else online, be sure to set your API token as en environment variable, and never hardcode it directly into your script. + +Now that we've got our instance, we can point to an Actor using the [`client.actor()`](/api/client/js/reference/class/ApifyClient#actor) function, then call the Actor with some input with the [`.call()`](/api/client/js/reference/class/ApifyClient#actor) function - the first parameter of which is the input for the Actor. + + + + +```js +const run = await client.actor('YOUR_USERNAME/adding-actor').call({ + num1: 4, + num2: 2, +}); +``` + + + + +```py +run = client.actor('YOUR_USERNAME/adding-actor').call(run_input={ + 'num1': 4, + 'num2': 2 +}) + +``` + + + + +> Learn more about the `.call()` function [here](/api/client/js/reference/class/ApifyClient#actor). + +## Downloading dataset items {#downloading-dataset-items} + +Once an Actor's run has completed, it will return a **run info** object that looks something like this: + +![Run info object](./images/run-info.jpg) + +The `run` variable we created in the last section points to the **run info** object of the run we created with the `.call()` function, which means that through this variable, we can access the run's `defaultDatasetId`. This ID can then be passed into the `client.dataset()` function. + + + + +```js +const dataset = client.dataset(run.defaultDatasetId); +``` + + + + +```py +dataset = client.dataset(run['defaultDatasetId']) + +``` + + + + +Finally, we can download the items in the dataset by using the **list items** function, then log them to the console. + + + + +```js +const { items } = await dataset.listItems(); + +console.log(items); +``` + + + + +```py +items = dataset.list_items().items + +print(items) + +``` + + + + +The final code for running the Actor and fetching its dataset items looks like this: + + + + +```js +// client.js +import { ApifyClient } from 'apify-client'; + +const client = new ApifyClient({ + token: 'YOUR_TOKEN', +}); + +const run = await client.actor('YOUR_USERNAME/adding-actor').call({ + num1: 4, + num2: 2, +}); + +const dataset = client.dataset(run.defaultDatasetId); + +const { items } = await dataset.listItems(); + +console.log(items); +``` + + + + +```py +# client.py +from apify_client import ApifyClient + +client = ApifyClient(token='YOUR_TOKEN') + +actor = client.actor('YOUR_USERNAME/adding-actor').call(run_input={ + 'num1': 4, + 'num2': 2 +}) + +dataset = client.dataset(run['defaultDatasetId']) + +items = dataset.list_items().items + +print(items) + +``` + + + + +## Updating an Actor {#updating-actor} + +If you check the **Settings** tab within your **adding-actor**, you'll notice that the default memory being allocated to the Actor is **2048 MB**. This is a bit overkill considering the fact that the Actor is only adding two numbers together - **256 MB** would be much more reasonable. Also, we can safely say that the run should never take more than 20 seconds (even this is a generous number) and that the default of 3600 seconds is also overkill. + +Let's change these two Actor settings via the Apify client using the [`actor.update()`](/api/client/js/reference/class/ActorClient#update) function. This function will call the **update Actor** endpoint, which can take `defaultRunOptions` as an input property. You can find the shape of the `defaultRunOptions` in the [API documentation](/api/v2#/reference/actors/actor-object/update-actor). Perfect! + +First, we'll create a pointer to our Actor, similar to before (except this time, we won't be using `.call()` at the end): + + + + +```js +const actor = client.actor('YOUR_USERNAME/adding-actor'); +``` + + + + +```py +actor = client.actor('YOUR_USERNAME/adding-actor') + +``` + + + + +Then, we'll call the `.update()` method on the `actor` variable we created and pass in our new **default run options**: + + + + +```js +await actor.update({ + defaultRunOptions: { + build: 'latest', + memoryMbytes: 256, + timeoutSecs: 20, + }, +}); +``` + + + + +```py +actor.update(default_run_build='latest', default_run_memory_mbytes=256, default_run_timeout_secs=20) + +``` + + + + +After running the code, go back to the **Settings** page of **adding-actor**. If your default options now look like this, then it worked!: + +![New run defaults](./images/new-defaults.jpg) + +## Overview {#overview} + +You can do so much more with the Apify client than running Actors, updating Actors, and downloading dataset items. The purpose of this lesson was to get you comfortable using the client in your own projects, as it's the absolute best developer tool for integrating the Apify platform with an external system. + +For a more in-depth understanding of the Apify API client, give these a quick lookover: + +- [API client for JavaScript](/api/client/js) +- [API client for Python](/api/client/python) + +## Next up {#next} + +Now that you're familiar and a bit more comfortable with the Apify platform, you're ready to start deploying your code to Apify! In the [next section](../deploying_your_code/index.md), you'll learn how to take any project written in any programming language and turn it into an Actor. + + + +--- +title: Creating Actors +description: Build and run your very first Actor directly in Apify Console from a template. This lesson provides hands-on experience with building and running Actors. +sidebar_position: 2 +slug: /getting-started/creating-actors +--- + +# Creating Actors {#creating-actors} + +**This lesson offers hands-on experience in building and running Actors in Apify Console using a template. By the end of it, you will be able to build and run your first Actor using an Actor template.** + +--- + +You can create an Actor in several ways. You can create one from your own source code hosted in a Git repository or in your local machine, for example. But in this tutorial, we'll focus on the easiest method: selecting an Actor code template. We don't need to install any special software, and everything can be done directly in Apify Console using an Apify account. + +## Choose the source {#choose-the-source} + +Once you're in Apify Console, go to [Development](https://console.apify.com/actors/development/my-actors), and click on the **Develop new** button in the top right-hand corner. + +![Develop an Actor button](./images/develop-new-actor.png) + +You'll be presented with a page featuring two ways to get started with a new Actor. + +1. Creating an Actor from existing source code (using Git providers or pushing the code from your local machine using Apify CLI) +2. Creating an Actor from a code template + +| Existing source code | Code templates | +|:---------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------:| +| ![Create and Actor from source code](./images/create-actor-from-source-code.png) | ![Create an Actor from code templates](./images/create-actor-from-templates.png) | + +## Creating Actor from existing source code {#existing-source-code} + +If you already have your code hosted by a Git provider, you can use it to create an Actor by linking the repository. If you use GitHub, you can use our [GitHub integration](/platform/integrations/github) to create an Actor from your public or private repository. You can also use GitLab, Bitbucket or other Git providers or external repositories. + +![Create an Actor from Git repository](./images/create-actor-git.png) + +You can also push your existing code from your local machine using [Apify CLI](/cli). This is useful when you develop your code locally and then you want to push it to the Apify Console to run the code as an Actor in the cloud. For this option, you'll need the [Apify CLI installed](/cli/docs/installation) on your machine. By clicking on the **Push your code using the Apify command-line interface (CLI)** button, you will be presented with instructions on how to push your code to the Apify Console. + +![Push your code using the Apify CLI](./images/create-actor-cli.png) + +## Creating Actor from code template {#code-template} + +Python, JavaScript, and TypeScript have several template options that you can use. + +> You can select one from the list on this page or you can browse all the templates in the template library by clicking on the **View all templates** button in the right corner. + +For example, let's choose the **Start with JavaScript** template and click on the template card. + +![JavaScript template card](./images/create-actor-template-javascript-card.png) + +You will end up on a template detail page where you can see all the important information about the template - description, included features, used technologies, and what is the use-case of this template. More importantly, there is a code preview and also instructions for how the code works. + +![JavaScript template detail page](./images/create-actor-template-detail-page.png) + +### Using the template in the Web IDE {#web-ide} + +By clicking **Use this template** button you will create the Actor in Apify Console and you will be moved to the **Code** tab with the [Web IDE](/platform/actors/development/quick-start/web-ide) where you can see the code of the template and start editing it. + +> The Web IDE is a great tool for developing your Actor directly in Apify Console without the need to install or use any other software. + +![Web IDE](./images/create-actor-web-ide.png) + +### Using the template locally {#local} + +If you want to use the template locally, you can again use our [Apify CLI](/cli) to download the template to your local machine. + +> Creating an Actor from a template locally is a great option if you want to develop your code using your local environment and IDE and then push the final solution back to the Apify Console. + +When you click on the **Use locally** button, you'll be presented with instructions on how to create an Actor from this template in your local environment. + +With the Apify CLI installed, you can run the following commands in your terminal: + +```shell +apify create my-actor -t getting_started_node +``` + +```shell +cd my-actor +apify run +``` + +![Use the template locally](./images/create-actor-template-locally.png) + +## Start with scraping single page {#scraping-single-page} + +This template is a great starting point for web scraping as it extracts data from a single website. It uses [Axios](https://axios-http.com/docs/intro) for downloading the page content and [Cheerio](https://cheerio.js.org/) for parsing the HTML from the content. + +Let's see what's inside the **Start with JavaScript** template. The main logic of the template lives in the `src/main.js` file. + +```js +// Axios - Promise based HTTP client for the browser and node.js (Read more at https://axios-http.com/docs/intro). +import { Actor } from 'apify'; +import axios from 'axios'; +// Cheerio - The fast, flexible & elegant library for parsing and manipulating HTML and XML (Read more at https://cheerio.js.org/). +import * as cheerio from 'cheerio'; +// Apify SDK - toolkit for building Apify Actors (Read more at https://docs.apify.com/sdk/js/). + +// The init() call configures the Actor for its environment. It's recommended to start every Actor with an init(). +await Actor.init(); + +// Structure of input is defined in input_schema.json +const input = await Actor.getInput(); +const { url } = input; + +// Fetch the HTML content of the page. +const response = await axios.get(url); + +// Parse the downloaded HTML with Cheerio to enable data extraction. +const $ = cheerio.load(response.data); + +// Extract all headings from the page (tag name and text). +const headings = []; +$('h1, h2, h3, h4, h5, h6').each((i, element) => { + const headingObject = { + level: $(element).prop('tagName').toLowerCase(), + text: $(element).text(), + }; + console.log('Extracted heading', headingObject); + headings.push(headingObject); +}); + +// Save headings to Dataset - a table-like storage. +await Actor.pushData(headings); + +// Gracefully exit the Actor process. It's recommended to quit all Actors with an exit(). +await Actor.exit(); +``` + +The Actor takes the `url` from the input and then: + +1. Sends a request to the URL. +2. Downloads the page's HTML content. +3. Extracts headings (H1 - H6) from the page. +4. Stores the extracted data. + +The extracted data is stored in the [Dataset](/platform/storage/dataset) where you can preview it and download it. We'll show how to do that later in [Run the Actor](#run-the-actor) section. + +> Feel free to play around with the code and add some more features to it. For example, you can extract all the links from the page or extract all the images or completely change the logic of this template. Keep in mind that this template uses [input schema](/academy/deploying-your-code/input-schema) defined in the `.actor/input_schema.json` file and linked to the `.actor/actor.json`. If you want to change the input schema, you need to change it in those files as well. Learn more about the Actor input and output [in the next page](/academy/getting-started/inputs-outputs). + +## Build the Actor 🧱 {#build-an-actor} + +In order to run the Actor, you need to [build](/platform/actors/development/builds-and-runs/builds) it first. Click on the **Build** button at the bottom of the page or **Build now** button right under the code editor. + +![Build the Actor](./images/build-actor.png) + +After you've clicked the **Build** button, it'll take around 5–10 seconds to complete the build. You'll know it's finished when you see a green **Start** button. + +![Start button](./images/start.png) + +## Fill the input {#fill-input} + +And now we are ready to run the Actor. But before we do that, let's give the Actor some input by going to the `Input` tab. + +The input tab is where you can provide the Actor with some meaningful input. In this case, we'll be providing the Actor with a URL to scrape. For now, we'll use the prefilled value of [Apify website](https://apify.com/) (`https://apify.com/`). + +You can change the website you want to extract the data from by changing the URL in the input field. + +![Input tab](./images/actor-input-tab.png) + +## Run the Actor {#run-the-actor} + +Once you have provided the Actor with some URL you want to extract the data from, click **Start** button and wait a few seconds. You should see the Actor run logs in the **Last run** tab. + +![Actor run logs](./images/actor-run.png) + +After the Actor finishes, you can preview or download the extracted data by clicking on the **Export X results** button. + +![Export results](./images/actor-run-dataset.png) + +And that's it! You've just created your first Actor and extracted data from a website 🎉. + +## Getting stuck? Check out the tips 💡 {#get-help-with-tips} + +If you ever get stuck, you can always click on the **Tips** button in the top right corner of the page. It will show you a list of tips that are relevant to the Actor development. + +![Tips](./images/actor-tips.png) + +## Next up {#next} + +We've created an Actor, but how can we give it more complex inputs and make it do stuff based on these inputs? This is exactly what we'll be discussing in the [next lesson](./inputs_outputs.md)'s activity. + + + +--- +title: Getting started +description: Get started with the Apify platform by creating an account and learning about the Apify Console, which is where all Apify Actors are born! +sidebar_position: 8 +category: apify platform +slug: /getting-started +--- + +# Getting started {#getting-started} + +**Get started with the Apify platform by creating an account and learning about the Apify Console, which is where all Apify Actors are born!** + +--- + +Your gateway to the Apify platform is your Apify account. The great thing about creating an account is that we support integration with both Google and GitHub, which takes only about 30 seconds! + +1. Create your account on the [sign up](https://console.apify.com/sign-up?asrc=developers_portal) page. +2. Check your email, you should have a verification email with a link. Click it! +3. Done! 👍 + +## Getting to know the platform {#getting-to-know-the-platform} + +Now that you have an account, you have access to the [Apify Console](https://console.apify.com?asrc=developers_portal), which is a wonderful place where you utilize all of the features the platform has to offer, as well as manage and test your own projects. + +## Next up {#next} + +In our next lesson, we'll learn about something super exciting - **Actors**. Actors are the living and breathing core of the Apify platform and are an extremely powerful concept. What are you waiting for? Let's jump [right into the next lesson](./actors.md)! + + + +--- +title: Inputs & outputs +description: Create an Actor from scratch which takes an input, processes that input, and then outputs a result that can be used elsewhere. +sidebar_position: 3 +slug: /getting-started/inputs-outputs +--- + +# Inputs & outputs {#inputs-outputs} + +**Create an Actor from scratch which takes an input, processes that input, and then outputs a result that can be used elsewhere.** + +--- + +Actors, as any other programs, take inputs and generate outputs. The Apify platform has a way how to specify what inputs the Actor expects, and a way to temporarily or permanently store its results. + +In this lesson, we'll be demonstrating inputs and outputs by building an Actor which takes two numbers as input, adds them up, and then outputs the result. + +## Accept input into an Actor {#accept-input} + +Let's first create another new Actor using the same template as before. Feel free to refer to the [previous lesson](./creating_actors.md) for a refresher on how to do this. + +Replace all of the code in **main.js** with this code snippet: + +```js +import { Actor } from 'apify'; + +await Actor.init(); + +// Grab our numbers which were inputted +const { num1, num2 } = await Actor.getInput(); + +// Calculate the solution +const solution = num1 + num2; + +// Push the solution to the dataset +await Actor.pushData({ solution }); + +await Actor.exit(); +``` + +Then, replace everything in **INPUT_SCHEMA.json** with this: + +> This step isn't necessary, as the Actor will still be able to take input in JSON format without it; however, we are providing the content for this Actor's input schema in this lesson, as it will give the Apify platform a blueprint off of which it can generate a nice UI for your inputs, as well as validate their values. + +```json +{ + "title": "Number adder", + "type": "object", + "schemaVersion": 1, + "properties": { + "num1": { + "title": "1st Number", + "type": "integer", + "description": "First number.", + "editor": "number" + }, + "num2": { + "title": "2nd Number", + "type": "integer", + "description": "Second number.", + "editor": "number" + } + }, + "required": ["num1", "num2"] +} +``` + +> If you're interested in learning more about how the code works, and what the **INPUT_SCHEMA.json** means, read about [inputs](/sdk/js/docs/examples/accept-user-input) and [adding data to a dataset](/sdk/js/docs/examples/add-data-to-dataset) in the Apify SDK documentation, and refer to the [input schema docs](/platform/actors/development/actor-definition/input-schema/specification/v1#integer). + +Finally, **Save** and **Build** the Actor just as you did in the previous lesson. + +## Configuring an Actor with inputs {#configuring} + +If you scroll down a bit, you'll find the **Developer console** located under the multifile editor. By default, after running a build, the **Last build** tab will be selected, where you can see all of the logs related to building the Actor. Inputs can be configured within the **Input** tab. + +![Configuring inputs](./images/configure-inputs.jpg) + +Enter any two numbers you'd like, then press **Start**. The Actor's run should be completed almost immediately. + +## View Actor results {#view-results} + +Since we've pushed the result into the default dataset, it, and some info about it can be viewed by clicking this box, which will take you to the results tab: + +![Result box](./images/result-box.png) + +On the results tab, there are a whole lot of options for which format to view/download the data in. Keep the default of **JSON** selected, and click on **Preview**. + +![Dataset preview](./images/dataset-preview.png) + +There's our solution! Did it work for you as well? Now, we can download the data right from the results tab to be used elsewhere, or even programmatically retrieve it by using [Apify's API](/api/v2) (we'll be discussing how to do this in the next lesson). + +It's important to note that the default dataset of the Actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage/usage#data-retention). + +## Next up {#next} + +In [next lesson](./apify_api.md)'s fun activity, you'll learn how to call the Actor we created in this lesson programmatically using one of Apify's most powerful tools - the Apify API. + + + +--- +title: Introduction to Apify platform +description: Learn all about the Apify platform, all of the tools it offers, and how it can improve your overall development experience. +sidebar_position: 7 +category: apify platform +slug: /apify-platform +--- + +# Introduction to the Apify platform {#about-the-platform} + +**Learn all about the Apify platform, all of the tools it offers, and how it can improve your overall development experience.** + +--- + +The [Apify platform](https://apify.com) was built to serve large-scale and high-performance web scraping and automation needs. It provides easy access to compute instances ([Actors](./getting_started/actors.md)), convenient request and result storages, proxies, scheduling, webhooks and more - all accessible through the **Console** web interface, [Apify's API](/api/v2), or our [JavaScript](/api/client/js) and [Python](/api/client/python) API clients. + +## Category outline {#this-category} + +In this category, you'll learn how to become an Apify platform developer from the ground up. From creating your first account, to developing Actors, this is your one-stop-shop for understanding how the platform works, and how to work with it. + +## First up {#first} + +We'll start off this category light, by showing you how to create an Apify account and get everything ready for development with the platform. [Let's go!](./getting_started/index.md) + + + +--- +title: Running a web server on the Apify platform +description: A web server running in an Actor can act as a communication channel with the outside world. Learn how to set one up with Node.js. +sidebar_position: 11 +category: apify platform +slug: /running-a-web-server +--- + +# Running a web server on the Apify platform + +**A web server running in an Actor can act as a communication channel with the outside world. Learn how to set one up with Node.js.** + +--- + +Sometimes, an Actor needs a channel for communication with other systems (or humans). This channel might be used to receive commands, to provide info about progress, or both. To implement this, we will run a HTTP web server inside the Actor that will provide: + +- An API to receive commands. +- An HTML page displaying output data. + +Running a web server in an Actor is a piece of cake! Each Actor run is available at a unique URL (container URL) which always takes the form `https://CONTAINER-KEY.runs.apify.net`. This URL is available in the [**Actor run** object](/api/v2#/reference/actor-runs/run-object-and-its-storages/get-run) returned by the Apify API, as well as in the Apify console. + +If you start a web server on the port defined by the **APIFY_CONTAINER_PORT** environment variable (the default value is **4321**), the container URL becomes available and gets displayed in the **Live View** tab in the Actor run console. + +For more details, see [the documentation](/platform/actors/development/programming-interface/container-web-server). + +## Building the Actor {#building-the-actor} + +Let's try to build the following Actor: + +- The Actor will provide an API to receive URLs to be processed. +- For each URL, the Actor will create a screenshot. +- The screenshot will be stored in the key-value store. +- The Actor will provide a web page displaying thumbnails linked to screenshots and a HTML form to submit new URLs. + +To achieve this we will use the following technologies: + +- [Express.js](https://expressjs.com) framework to create the server +- [Puppeteer](https://pptr.dev) to grab screenshots. +- The [Apify SDK](/sdk/js) to access Apify storages to store the screenshots. + +Our server needs two paths: + +- `/` - Index path will display a page form to submit a new URL and the thumbnails of processed URLs. +- `/add-url` - Will provide an API to add new URLs using a HTTP POST request. + +First, we'll import `express` and create an Express.js app. Then, we'll add some middleware that will allow us to receive form submissions. + +```js +import { Actor } from 'apify'; +import express from 'express'; + +await Actor.init(); + +const app = express(); + +app.use(express.json()); +app.use(express.urlencoded({ extended: true })); +``` + +Now we need to read the following environment variables: + +- **APIFY_CONTAINER_PORT** contains a port number where we must start the server. +- **APIFY_CONTAINER_URL** contains a URL under which we can access the container. +- **APIFY_DEFAULT_KEY_VALUE_STORE_ID** is the ID of the default key-value store of this Actor where we can store screenshots. + +```js +const { + APIFY_CONTAINER_PORT, + APIFY_CONTAINER_URL, + APIFY_DEFAULT_KEY_VALUE_STORE_ID, +} = process.env; +``` + +Next, we'll create an array of the processed URLs where the **n**th URL has its screenshot stored under the key **n**.jpg in the key-value store. + +```js +const processedUrls = []; +``` + +After that, the index route is ready to be defined. + +```js +app.get('/', (req, res) => { + let listItems = ''; + + // For each of the processed + processedUrls.forEach((url, index) => { + const imageUrl = `https://api.apify.com/v2/key-value-stores/${APIFY_DEFAULT_KEY_VALUE_STORE_ID}/records/${index}.jpg`; + + // Display the screenshots below the form + listItems += `
  • + + +
    + ${url} +
    +
  • `; + }); + + const pageHtml = ` + Example + +
    + URL: + +
    +
      ${listItems}
    +
    + +`; + + res.send(pageHtml); +}); +``` + +And then a second path that receives the new URL submitted using the HTML form; after the URL is processed, it redirects the user back to the root path. + +```js +import { launchPuppeteer } from 'crawlee'; + +app.post('/add-url', async (req, res) => { + const { url } = req.body; + console.log(`Got new URL: ${url}`); + + // Start chrome browser and open new page ... + const browser = await launchPuppeteer(); + const page = await browser.newPage(); + + // ... go to our URL and grab a screenshot ... + await page.goto(url); + const screenshot = await page.screenshot({ type: 'jpeg' }); + + // ... close browser ... + await page.close(); + await browser.close(); + + // ... save screenshot to key-value store and add URL to processedUrls. + await Actor.setValue(`${processedUrls.length}.jpg`, screenshot, { contentType: 'image/jpeg' }); + processedUrls.push(url); + + res.redirect('/'); +}); +``` + +And finally, we need to start the web server. + +```js +// Start the web server! +app.listen(APIFY_CONTAINER_PORT, () => { + console.log(`Application is listening at URL ${APIFY_CONTAINER_URL}.`); +}); +``` + +### Final code {#final-code} + +```js +import { Actor } from 'apify'; +import express from 'express'; + +await Actor.init(); + +const app = express(); + +app.use(express.json()); +app.use(express.urlencoded({ extended: true })); + +const { + APIFY_CONTAINER_PORT, + APIFY_CONTAINER_URL, + APIFY_DEFAULT_KEY_VALUE_STORE_ID, +} = process.env; + +const processedUrls = []; + +app.get('/', (req, res) => { + let listItems = ''; + + // For each of the processed + processedUrls.forEach((url, index) => { + const imageUrl = `https://api.apify.com/v2/key-value-stores/${APIFY_DEFAULT_KEY_VALUE_STORE_ID}/records/${index}.jpg`; + + // Display the screenshots below the form + listItems += `
  • + + +
    + ${url} +
    +
  • `; + }); + + const pageHtml = ` + Example + +
    + URL: + +
    +
      ${listItems}
    +
    + +`; + + res.send(pageHtml); +}); + +app.post('/add-url', async (req, res) => { + const { url } = req.body; + console.log(`Got new URL: ${url}`); + + // Start chrome browser and open new page ... + const browser = await Actor.launchPuppeteer(); + const page = await browser.newPage(); + + // ... go to our URL and grab a screenshot ... + await page.goto(url); + const screenshot = await page.screenshot({ type: 'jpeg' }); + + // ... close browser ... + await page.close(); + await browser.close(); + + // ... save screenshot to key-value store and add URL to processedUrls. + await Actor.setValue(`${processedUrls.length}.jpg`, screenshot, { contentType: 'image/jpeg' }); + processedUrls.push(url); + + res.redirect('/'); +}); + +app.listen(APIFY_CONTAINER_PORT, () => { + console.log(`Application is listening at URL ${APIFY_CONTAINER_URL}.`); +}); +``` + +When we deploy and run this Actor on the Apify platform, then we can open the **Live View** tab in the Actor console to submit the URL to your Actor through the form. After the URL is successfully submitted, it appears in the Actor log. + +With that, we're done! And our application works like a charm :) + +The complete code of this Actor is available [here](https://apify.com/apify/example-web-server). You can run it there or copy it to your account. +
    + + +--- +title: API tutorials +description: A collection of various tutorials explaining how to interact with the Apify platform programmatically using its API. +sidebar_position: 20 +category: tutorials +slug: /api +--- + +# API Tutorials 💻📚 + +**A collection of various tutorials explaining how to interact with the Apify platform programmatically using its API.** + +--- + +This section explains how you can run [Apify Actors](/platform/actors) using Apify's [API](/api/v2), retrieve their results, and integrate them into your own product and workflows. You can do this using a raw HTTP client, or you can benefit from using one of our API clients for: + +- [JavaScript](/api/client/js/) +- [Python](/api/client/python) + + + +--- +title: How to retry failed requests +description: Learn how to resurrect your run but retrying only failed requests +sidebar_position: 6 +slug: /api/retry-failed-requests +--- + +**Learn how to re-scrape only failed requests in your run.** + +--- + +Requests of a scraper can fail for many reasons. The most common causes are different page layouts or proxy blocking issues ([check here on how to effectively analyze errors](https://docs.apify.com/academy/node-js/analyzing-pages-and-fixing-errors)). Both [Apify](https://apify.com) and [Crawlee](https://crawlee.dev/) allow you to restart your scraper run from the point where it ended, but there is no native functionality to re-scrape only failed requests. Usually, you also want to first analyze the problem, update the code, and build it before trying again. + +If you attempt to restart an already finished run, it will likely immediately finish because all the requests in the [request queue](https://crawlee.dev/docs/guides/request-storage) are marked as handled. You need to update the failed requests in the queue to be marked as pending again. + +The additional complication is that the [Request](https://crawlee.dev/api/core/class/Request) object doesn't have anything like the `isFailed` property. We have to approximate it using other fields. Fortunately, we can use the `errorMessages` and `retryCount` properties to identify failed requests. Unless the user explicitly has overridden these properties, we can identify failed requests with a larger amount of `errorMessages` than `retryCount`. That happens because the last error that doesn't cause a retry anymore is added to `errorMessages`. + +A simplified code example can look like this: + +```ts +// The code is similar for both Crawlee-only but uses a different API +import { Actor } from 'apify'; + +const REQUEST_QUEUE_ID = 'pFCvCasdvsyvyZdfD'; // Replace with your valid request queue ID +const allRequests = []; +let exclusiveStartId = null; +// List all requests from the queue, we have to do it in a loop because the request queue list is paginated +for (; ;) { + const { items: requests } = await Actor.apifyClient + .requestQueue(REQUEST_QUEUE_ID) + .listRequests({ exclusiveStartId, limit: 1000 }); + allRequests.push(...requests); + // If we didn't get the full 1,000 requests, we have all and can finish the loop + if (requests.length < 1000) { + break; + } + + // Otherwise, we need to set the exclusiveStartId to the last request id to get the next batch + exclusiveStartId = requests[requests.length - 1].id; +} + +console.log(`Loaded ${allRequests.length} requests from the queue`); + +// Now we filter the failed requests +const failedRequests = allRequests.filter((request) => (request.errorMessages?.length || 0) > (request.retryCount || 0)); + +// We need to update them 1 by 1 to the pristine state +for (const request of failedRequests) { + request.retryCount = 0; + request.errorMessages = []; + // This tells the request queue to handle it again + request.handledAt = null; + await Actor.apifyClient.requestQueue(REQUEST_QUEUE_ID).updateRequest(request); +} + +// And now we can resurrect our scraper again; it will only process the failed requests. +``` + +## Resurrect automatically with a free public Actor {#resurrect-automatically-with-a-free-public-actor} + +Fortunately, you don't need to implement this code into your workflow. [Apify Store](https://apify.com/store) provides the [Rebirth Failed Requests](https://apify.com/lukaskrivka/rebirth-failed-requests) Actor (that is [open-source](https://github.com/metalwarrior665/rebirth-failed-requests)) that does this and more. The Actor can automatically scan multiple runs of your Actors based on filters like `date started`. It can also automatically resurrect the runs after renewing the failed requests. That means you will finish your scrape into the final successful state with a single click on the Run button. + + + +--- +title: Run Actor and retrieve data via API +description: Learn how to run an Actor/task via the Apify API, wait for the job to finish, and retrieve its output data. Your key to integrating Actors with your projects. +sidebar_position: 6 +slug: /api/run-actor-and-retrieve-data-via-api +--- + +**Learn how to run an Actor/task via the Apify API, wait for the job to finish, and retrieve its output data. Your key to integrating Actors with your projects.** + +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +The most popular way of [integrating](https://help.apify.com/en/collections/1669769-integrations) the Apify platform with an external project/application is by programmatically running an [Actor](/platform/actors) or [task](/platform/actors/running/tasks), waiting for it to complete its run, then collecting its data and using it within the project. Follow this tutorial to have an idea on how to approach this, it isn't as complicated as it sounds! + +> Remember to check out our [API documentation](/api/v2) with examples in different languages and a live API console. We also recommend testing the API with a desktop client like [Postman](https://www.postman.com/) or [Insomnia](https://insomnia.rest). + + +Apify API offers two ways of interacting with it: + +- [Synchronously](#synchronous-flow) +- [Asynchronously](#asynchronous-flow) + +If the Actor being run via API takes 5 minutes or less to complete a typical run, it should be called **synchronously**. Otherwise, (if a typical run takes longer than 5 minutes), it should be called **asynchronously**. + +## Run an Actor or task {#run-an-actor-or-task} + +> If you are unsure about the differences between an Actor and a task, you can read about them in the [tasks](/platform/actors/running/tasks) documentation. In brief, tasks are pre-configured inputs for Actors. + +The API endpoints and usage (for both sync and async) for [Actors](/api/v2#tag/ActorsRun-collection/operation/act_runs_post) and [tasks](/api/v2#/reference/actor-tasks/run-collection/run-task) are essentially the same. + +To run, or **call**, an Actor/task, you will need a few things: + +- The name or ID of the Actor/task. The name looks like `username~actorName` or `username~taskName`. The ID can be retrieved on the **Settings** page of the Actor/task. + +- Your [API token](/platform/integrations), which you can find on the **Integrations** page in [Apify Console](https://console.apify.com/account?tab=integrations) (do not share it with anyone!). + +- Possibly an input, which is passed in JSON format as the request's **body**. + +- Some other optional settings if you'd like to change the default values (such as allocated memory or the build). + +The URL of [POST request](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST) to run an Actor looks like this: + +```cURL +https://api.apify.com/v2/acts/ACTOR_NAME_OR_ID/runs?token=YOUR_TOKEN +``` + +For tasks, we can switch the path from **acts** to **actor-tasks** and keep the rest the same: + +```cURL +https://api.apify.com/v2/actor-tasks/TASK_NAME_OR_ID/runs?token=YOUR_TOKEN +``` + +If we send a correct POST request to one of these endpoints, the actor/actor-task will start just as if we had pressed the **Start** button on the Actor's page in the [Apify Console](https://console.apify.com). + +### Additional settings {#additional-settings} + +We can also add settings for the Actor (which will override the default settings) as additional query parameters. For example, if we wanted to change how much memory the Actor's run should be allocated and which build to run, we could add the `memory` and `build` parameters separated by `&`. + +```cURL +https://api.apify.com/v2/acts/ACTOR_NAME_OR_ID/runs?token=YOUR_TOKEN&memory=8192&build=beta +``` + +This works in almost exactly the same way for both Actors and tasks; however, for tasks, there is no reason to specify a [`build`](/platform/actors/development/builds-and-runs/builds) parameter, as a task already has only one specific Actor build which cannot be changed with query parameters. + +### Input JSON {#input-json} + +Most Actors would not be much use if input could not be passed into them to change their behavior. Additionally, even though tasks already have specified input configurations, it is handy to have the ability to overwrite task inputs through the **body** of the POST request. + +> The input can technically be any [JSON object](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON), and will vary depending on the Actor being run. Ensure that you are familiar with the Actor's input schema while writing the body of the request. + +Good Actors have reasonable defaults for most input fields, so if you want to run one of the major Actors from [Apify Store](https://apify.com/store), you usually do not need to provide all possible fields. + +Via API, let's quickly try to run [Web Scraper](https://apify.com/apify/web-scraper), which is the most popular Actor on the Apify Store at the moment. The full input with all possible fields is [pretty long and ugly](https://apify.com/apify/web-scraper?section=example-run), so we will not show it here. Because it has default values for most fields, we can provide a JSON input containing only the fields we'd like to customize. We will send a POST request to the endpoint below and add the JSON as the **body** of the request: + +```cURL +https://api.apify.com/v2/acts/apify~web-scraper/runs?token=YOUR_TOKEN +``` + +Here is how it looks in [Postman](https://www.postman.com/): + +![Run an Actor via API in Postman](./images/run-actor-postman.png) + +If we press **Send**, it will immediately return some info about the run. The `status` will be either `READY` (which means that it is waiting to be allocated on a server) or `RUNNING` (99% of cases). + +![Actor run info in Postman](./images/run-info-postman.png) + +We will later use this **run info** JSON to retrieve the run's output data. This info about the run can also be retrieved with another call to the [**Get run**](https://apify.com/docs/api/v2#/reference/actors/run-object/get-run) endpoint. + +## JavaScript and Python client {#javascript-and-python-client} + +If you are using JavaScript or Python, we highly recommend using the Apify API client ([JavaScript](https://docs.apify.com/api/client/js/), [Python](https://docs.apify.com/api/client/python/)) instead of the raw HTTP API. The client implements smart polling and exponential backoff, which makes calling Actors and getting results efficient. + +You can skip most of this tutorial by following this code example that calls Google Search Results Scraper and logs its results: + + + + +```js +import { ApifyClient } from 'apify-client'; + +const client = new ApifyClient({ token: 'YOUR_API_TOKEN' }); + +const input = { queries: 'Food in NYC' }; + +// Run the Actor and wait for it to finish +// .call method waits infinitely long using smart polling +// Get back the run API object +const run = await client.actor('apify/google-search-scraper').call(input); + +// Fetch and print Actor results from the run's dataset (if any) +const { items } = await client.dataset(run.defaultDatasetId).listItems(); +items.forEach((item) => { + console.dir(item); +}); +``` + + + + +```py +from apify_client import ApifyClient +client = ApifyClient(token='YOUR_API_TOKEN') + +run_input = { + "queries": "Food in NYC", +} + +# Run the Actor and wait for it to finish +# .call method waits infinitely long using smart polling +# Get back the run API object +run = client.actor("apify/google-search-scraper").call(run_input=run_input) + +# Fetch and print Actor results from the run's dataset (if there are any) +for item in client.dataset(run["defaultDatasetId"]).iterate_items(): + print(item) +``` + + + + +By using our client, you don't need to worry about choosing between synchronous or asynchronous flow. But if you don't want your code to wait during `.call` (potentially for hours), continue reading below about how to implement webhooks. + +## Synchronous flow {#synchronous-flow} + +If each of your runs will last shorter than 5 minutes, you can use a single [synchronous endpoint](https://usergrid.apache.org/docs/introduction/async-vs-sync.html#synchronous). When running **synchronously**, the connection will be held for _up to_ 5 minutes. + +If your synchronous run exceeds the 5-minute time limit, the response will be a run object containing information about the run and the status of `RUNNING`. If that happens, you need to restart the run [asynchronously](#asynchronous-flow) and [wait for the run to finish](#wait-for-the-run-to-finish). + +### Synchronous runs with dataset output {#synchronous-runs-with-dataset-output} + +Most Actor runs will store their data in the default [dataset](/platform/storage/dataset). The Apify API provides **run-sync-get-dataset-items** endpoints for [Actors](/api/v2#/reference/actors/run-actor-synchronously-and-get-dataset-items/run-actor-synchronously-with-input-and-get-dataset-items) and [tasks](/api/v2#/reference/actor-tasks/run-task-synchronously-and-get-dataset-items/run-task-synchronously-and-get-dataset-items-(post)), which allow you to run an Actor and receive the items from the default dataset once the run has finished. + +Here is a Node.js example of calling a task via the API and logging the dataset items to the console: + +```js +// Use your favorite HTTP client +import got from 'got'; + +// Specify your API token +// (find it at https://console.apify.com/account#/integrations) +const myToken = ''; + +// Start apify/google-search-scraper Actor +// and pass some queries into the JSON body +const response = await got({ + url: `https://api.apify.com/v2/acts/apify~google-search-scraper/run-sync-get-dataset-items?token=${myToken}`, + method: 'POST', + json: { + queries: 'web scraping\nweb crawling', + }, + responseType: 'json', +}); + +const items = response.body; + +// Log each non-promoted search result for both queries +items.forEach((item) => { + const { nonPromotedSearchResults } = item; + nonPromotedSearchResults.forEach((result) => { + const { title, url, description } = result; + console.log(`${title}: ${url} --- ${description}`); + }); +}); +``` + +### Synchronous runs with key-value store output {#synchronous-runs-with-key-value-store-output} + +[Key-value stores](/platform/storage/key-value-store) are useful for storing files like images, HTML snapshots, or JSON data. The Apify API provides **run-sync** endpoints for [Actors](/api/v2#/reference/actors/run-actor-synchronously/with-input) and [tasks](/api/v2#/reference/actor-tasks/run-task-synchronously/run-task-synchronously), which allow you to run a specific task and receive the output. By default, they return the `OUTPUT` record from the default key-value store. + +> For more detailed information, check the [API reference](/api/v2#/reference/actors/run-actor-synchronously-and-get-dataset-items/run-actor-synchronously-with-input-and-get-dataset-items). + +## Asynchronous flow {#asynchronous-flow} + +For runs longer than 5 minutes, the process consists of three steps: + +- [Run the Actor or task](#run-an-actor-or-task) +- [Wait for the run to finish](#wait-for-the-run-to-finish) +- [Collect the data](#collect-the-data) + +### Wait for the run to finish {#wait-for-the-run-to-finish} + +There may be cases where we need to run the Actor and go away. But in any kind of integration, we are usually interested in its output. We have three basic options for how to wait for the actor/task to finish. + +- [`waitForFinish` parameter](#waitforfinish-parameter) +- [Webhooks](#webhooks) +- [Polling](#polling) + +#### `waitForFinish` parameter {#waitforfinish-parameter} + +This solution is quite similar to the synchronous flow. To make the POST request wait, add the `waitForFinish` parameter. It can have a value from `0` to `60`, which is the maximum time in seconds to wait (the max value for `waitForFinish` is 1 minute). Knowing this, we can extend the example URL like this: + +```cURL +https://api.apify.com/v2/acts/apify~web-scraper/runs?token=YOUR_TOKEN&waitForFinish=60 +``` + +You can also use the `waitForFinish` parameter with the [**GET Run** endpoint](/api/v2#/reference/actors/run-object/get-run) to implement a smarter [polling](#polling) system. + +Once again, the final response will be the **run info object**; however, now its status should be `SUCCEEDED` or `FAILED`. If the run exceeds the `waitForFinish` duration, the status will still be `RUNNING`. + +#### Webhooks {#webhooks} + +If you have a server, [webhooks](/platform/integrations/webhooks) are the most elegant and flexible solution for integrations with Apify. You can set up a webhook for any Actor or task, and that webhook will send a POST request to your server after an [event](/platform/integrations/webhooks/events) has occurred. + +Usually, this event is a successfully finished run, but you can also set a different webhook for failed runs, etc. + +![Webhook example](./images/webhook.png) + +The webhook will send you a pretty complicated [JSON object](/platform/integrations/webhooks/actions), but usually, you would only be interested in the `resource` object within the response, which is like the **run info** JSON from the previous sections. We can leave the payload template as is for our example since it is all we need. + +Once your server receives this request from the webhook, you know that the event happened, and you can ask for the complete data. + +> Don't forget to respond to the webhook with a **200** status code! Otherwise, it will ping you again. + +#### Polling {#polling} + +What if you don't have a server, and the run you'd like to do is much too long to use a synchronous call? In cases like these, periodic **polling** of the run's status is the solution. + +When we run the Actor with the [usual API call](#run-an-actor-or-task) shown above, we will back a response with the **run info** object. From this JSON object, we can then extract the ID of the Actor run that we just started from the `id` field. Then, we can set an interval that will poll the Apify API (let's say every 5 seconds) by calling the [**Get run**](https://apify.com/docs/api/v2#/reference/actors/run-object/get-run) endpoint to retrieve the run's status. + +Replace the `RUN_ID` in the following URL with the ID you extracted earlier: + +```cURL +https://api.apify.com/v2/acts/ACTOR_NAME_OR_ID/runs/RUN_ID +``` + +Once a status of `SUCCEEDED` or `FAILED` has been received, we know the run has finished and can cancel the interval and finally [collect the data](#collect-the-data). + +### Collecting the data {#collect-the-data} + +Unless you used the [synchronous call](#synchronous-flow) mentioned above, you will have to make one additional request to the API to retrieve the data. + +The **run info** JSON also contains the IDs of the default [dataset](/platform/storage/dataset) and [key-value store](/platform/storage/key-value-store) that are allocated separately for each run, which is usually everything you need. The fields are called `defaultDatasetId` and `defaultKeyValueStoreId`. + +#### Retrieving a dataset {#retrieve-a-dataset} + +> If you are scraping products, or any list of items with similar fields, the [dataset](/platform/storage/dataset) should be your storage of choice. Don't forget though, that dataset items are immutable. This means that you can only add to the dataset, and not change the content that is already inside it. + +To retrieve the data from a dataset, send a GET request to the [**Get items**](/api/v2#/reference/datasets/item-collection/get-items) endpoint and pass the `defaultDatasetId` into the URL. For a GET request to the default dataset, no token is needed. + +```cURL +https://api.apify.com/v2/datasets/DATASET_ID/items +``` + +By default, it will return the data in JSON format with some metadata. The actual data are in the `items` array. + +You can use plenty of additional parameters, to learn more about them, visit our API reference [documentation](/api/v2#/reference/datasets/item-collection/get-items). We will only mention that you can pass a `format` parameter that transforms the response into popular formats like CSV, XML, Excel, RSS, etc. + +The items are paginated, which means you can ask only for a subset of the data. Specify this using the `limit` and `offset` parameters. This endpoint has a limit of 250,000 items that it can return per request. To retrieve more, you will need to send more requests incrementing the `offset` parameter. + +```cURL +https://api.apify.com/v2/datasets/DATASET_ID/items?format=csv&offset=250000 +``` + +#### Retrieving a key-value store {#retrieve-a-key-value-store} + +> [Key-value stores](/platform/storage/key-value-store) are mainly useful if you have a single output or any kind of files that cannot be [stringified](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify) (such as images or PDFs). + +When you want to retrieve something from a key-value store, the `defaultKeyValueStoreId` is _not_ enough. You also need to know the name (or **key**) of the record you want to retrieve. + +If you have a single output JSON, the convention is to return it as a record named `OUTPUT` to the default key-value store. To retrieve the record's content, call the [**Get record**](/api/v2#/reference/key-value-stores/record/get-record) endpoint. + +```cURL +https://api.apify.com/v2/key-value-stores/STORE_ID/records/RECORD_KEY +``` + +If you don't know the keys (names) of the records in advance, you can retrieve just the keys with the [**List keys**](https://apify.com/docs/api/v2#/reference/key-value-stores/key-collection/get-list-of-keys) endpoint. + +Keep in mind that you can get a maximum of 1000 keys per request, so you will need to paginate over the keys using the `exclusiveStartKey` parameter if you have more than 1000 keys. To do this, after each call, take the last record key and provide it as the `exclusiveStartKey` parameter. You can do this until you get 0 keys back. + +```cURL +https://api.apify.com/v2/key-value-stores/STORE_ID/keys?exclusiveStartKey=myLastRecordKey +``` + + + +--- +title: Scraping with Cheerio Scraper +menuTitle: Cheerio Scraper +description: Learn how to scrape a website using Apify's Cheerio Scraper. Build an Actor's page function, extract information from a web page and download your data. +externalSourceUrl: https://raw.githubusercontent.com/apify/actor-scraper/master/docs/build/cheerio-scraper-tutorial.md +sidebar_position: 3 +slug: /apify-scrapers/cheerio-scraper +--- + +[//]: # (TODO: Should be updated) + +# + +This scraping tutorial will go into the nitty gritty details of extracting data from **https://apify.com/store** +using **Cheerio Scraper** ([apify/cheerio-scraper](https://apify.com/apify/cheerio-scraper)). If you arrived here from the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started), +tutorial, great! You are ready to continue where we left off. If you haven't seen the Getting started yet, +check it out, it will help you learn about Apify and scraping in general and set you up for this tutorial, +because this one builds on topics and code examples discussed there. + +## Getting to know our tools + +In the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started) tutorial, we've confirmed that the scraper works as expected, +so now it's time to add more data to the results. + +To do that, we'll be using the [Cheerio](https://github.com/cheeriojs/cheerio) library. This may not sound familiar, +so let's try again. Does [jQuery](https://jquery.com/) ring a bell? If it does you're in luck, +because Cheerio is like jQuery that doesn't need an actual browser to run. Everything else is the same. +All the functions you already know are there and even the familiar `$` is used. If you still have no idea what either +of those are, don't worry. We'll walk you through using them step by step. + +> [Check out the Cheerio docs](https://github.com/cheeriojs/cheerio) to learn more about it. + +Now that's out of the way, let's open one of the Actor detail pages in the Store, for example the +**Web Scraper** ([apify/web-scraper](https://apify.com/apify/web-scraper)) page, and use our DevTools-Fu to scrape some data. + +> If you're wondering why we're using Web Scraper as an example instead of Cheerio Scraper, +it's only because we didn't want to triple the number of screenshots we needed to make. Lazy developers! + +## Building our Page function + +Before we start, let's do a quick recap of the data we chose to scrape: + + 1. **URL** - The URL that goes directly to the Actor's detail page. + 2. **Unique identifier** - Such as **apify/web-scraper**. + 3. **Title** - The title visible in the Actor's detail page. + 4. **Description** - The Actor's description. + 5. **Last modification date** - When the Actor was last modified. + 6. **Number of runs** - How many times the Actor was run. + +![$1](https://raw.githubusercontent.com/apify/actor-scraper/master/docs/img/scraping-practice.webp) + +We've already scraped numbers 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started) +tutorial, so let's get to the next one on the list: title. + +### Title + +![$1](https://raw.githubusercontent.com/apify/actor-scraper/master/docs/img/title.webp) + +By using the element selector tool, we find out that the title is there under an `

    ` tag, as titles should be. +Maybe surprisingly, we find that there are actually two `

    ` tags on the detail page. This should get us thinking. +Is there any parent element that includes our `

    ` tag, but not the other ones? Yes, there is! A `
    ` +element that we can use to select only the heading we're interested in. + +> Remember that you can press CTRL+F (CMD+F) in the Elements tab of DevTools to open the search bar where you can quickly search for elements using +> their selectors. And always make sure to use the DevTools to verify your scraping process and assumptions. It's faster than changing the crawler +> code all the time. + +To get the title we need to find it using a `header h1` selector, which selects all `

    ` elements that have a `
    ` ancestor. +And as we already know, there's only one. + +```js +// Using Cheerio. +async function pageFunction(context) { + const { $ } = context; + // ... rest of your code can come here + return { + title: $('header h1').text(), + }; +} +``` + +### Description + +Getting the Actor's description is a little more involved, but still pretty straightforward. We cannot search for a `

    ` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the Actor description is nested within +the `

    ` element too, same as the title. Moreover, the actual description is nested inside a `` tag with a class `actor-description`. + +![$1](https://raw.githubusercontent.com/apify/actor-scraper/master/docs/img/description.webp) + +```js +async function pageFunction(context) { + const { $ } = context; + // ... rest of your code can come here + return { + title: $('header h1').text(), + description: $('header span.actor-description').text(), + }; +} +``` + +### Modified date + +The DevTools tell us that the `modifiedDate` can be found in a `