From e18f2277afae449eba25b565e6886dec59120766 Mon Sep 17 00:00:00 2001 From: Honza Javorek Date: Wed, 30 Jul 2025 11:59:00 +0200 Subject: [PATCH 1/6] feat: update the platform lesson to be about JS --- .../13_platform.md | 407 +++++++----------- .../scraping_basics_python/13_platform.md | 2 +- 2 files changed, 151 insertions(+), 258 deletions(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md index 475f36a172..8d9b8b27f3 100644 --- a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md @@ -35,9 +35,16 @@ Apify serves both as an infrastructure where to privately deploy and run own scr ## Getting access from the command line -To control the platform from our machine and send the code of our program there, we'll need the Apify CLI. On macOS, we can install the CLI using [Homebrew](https://brew.sh), otherwise we'll first need [Node.js](https://nodejs.org/en/download). +To control the platform from our machine and send the code of our program there, we'll need the Apify CLI. The [Apify CLI installation guide](https://docs.apify.com/cli/docs/installation) suggests we can install it with `npm` as a global package: -After following the [Apify CLI installation guide](https://docs.apify.com/cli/docs/installation), we'll verify that we installed the tool by printing its version: +```text +$ npm -g install apify-cli + +added 440 packages in 2s +... +``` + +We better verify that we installed the tool by printing its version: ```text $ apify --version @@ -52,191 +59,98 @@ $ apify login Success: You are logged in to Apify as user1234! ``` -## Starting a real-world project - -Until now, we've kept our scrapers simple, each with just a single Python module like `main.py`, and we've added dependencies only by installing them with `pip` inside a virtual environment. +## Turning our program to an Actor -If we sent our code to a friend, they wouldn't know what to install to avoid import errors. The same goes for deploying to a cloud platform. +Every program that runs on the Apify platform first needs to be packaged as a so-called [Actor](https://apify.com/actors)—a standardized container with designated places for input and output. -To share our project, we need to package it. The best way is following the official [Python Packaging User Guide](https://packaging.python.org/), but for this course, we'll take a shortcut with the Apify CLI. +Many [Actor templates](https://apify.com/templates/categories/javascript) simplify the setup for new projects. We'll skip those, as we're about to package an existing program. -In our terminal, let's change to a directory where we usually start new projects. Then, we'll run the following command: +Inside the project directory we'll run the `apify init` command followed by a name we want to give to the Actor: ```text -apify create warehouse-watchdog --template=python-crawlee-beautifulsoup -``` - -It will create a new subdirectory called `warehouse-watchdog` for the new project, containing all the necessary files: - -```text -Info: Python version 0.0.0 detected. -Info: Creating a virtual environment in ... -... -Success: Actor 'warehouse-watchdog' was created. To run it, run "cd warehouse-watchdog" and "apify run". -Info: To run your code in the cloud, run "apify push" and deploy your code to Apify Console. -Info: To install additional Python packages, you need to activate the virtual environment in the ".venv" folder in the actor directory. -``` - -## Adjusting the template - -Inside the `warehouse-watchdog` directory, we should see a `src` subdirectory containing several Python files, including `main.py`. This is a sample Beautiful Soup scraper provided by the template. - -The file contains a single asynchronous function, `main()`. At the beginning, it handles [input](https://docs.apify.com/platform/actors/running/input-and-output#input), then passes that input to a small crawler built on top of the Crawlee framework. - -Every program that runs on the Apify platform first needs to be packaged as a so-called [Actor](https://apify.com/actors)—a standardized container with designated places for input and output. Crawlee scrapers automatically connect their default dataset to the Actor output, but input must be handled explicitly in the code. - -![The expected file structure](./images/actor-file-structure.webp) - -We'll now adjust the template so that it runs our program for watching prices. As the first step, we'll create a new empty file, `crawler.py`, inside the `warehouse-watchdog/src` directory. Then, we'll fill this file with final, unchanged code from the previous lesson: - -```py title=warehouse-watchdog/src/crawler.py -import asyncio -from decimal import Decimal -from crawlee.crawlers import BeautifulSoupCrawler - -async def main(): - crawler = BeautifulSoupCrawler() - - @crawler.router.default_handler - async def handle_listing(context): - context.log.info("Looking for product detail pages") - await context.enqueue_links(selector=".product-list a.product-item__title", label="DETAIL") - - @crawler.router.handler("DETAIL") - async def handle_detail(context): - context.log.info(f"Product detail page: {context.request.url}") - price_text = ( - context.soup - .select_one(".product-form__info-content .price") - .contents[-1] - .strip() - .replace("$", "") - .replace(",", "") - ) - item = { - "url": context.request.url, - "title": context.soup.select_one(".product-meta__title").text.strip(), - "vendor": context.soup.select_one(".product-meta__vendor").text.strip(), - "price": Decimal(price_text), - "variant_name": None, - } - if variants := context.soup.select(".product-form__option.no-js option"): - for variant in variants: - context.log.info("Saving a product variant") - await context.push_data(item | parse_variant(variant)) - else: - context.log.info("Saving a product") - await context.push_data(item) - - await crawler.run(["https://warehouse-theme-metal.myshopify.com/collections/sales"]) - - crawler.log.info("Exporting data") - await crawler.export_data_json(path='dataset.json', ensure_ascii=False, indent=2) - await crawler.export_data_csv(path='dataset.csv') - -def parse_variant(variant): - text = variant.text.strip() - name, price_text = text.split(" - ") - price = Decimal( - price_text - .replace("$", "") - .replace(",", "") - ) - return {"variant_name": name, "price": price} - -if __name__ == '__main__': - asyncio.run(main()) +$ apify init warehouse-watchdog +Success: The Actor has been initialized in the current directory. ``` -Now, let's replace the contents of `warehouse-watchdog/src/main.py` with this: +The command creates an `.actor` directory with `actor.json` file inside. This file serves as the configuration of the Actor. -```py title=warehouse-watchdog/src/main.py -from apify import Actor -from .crawler import main as crawl +:::tip Hidden dot files -async def main(): - async with Actor: - await crawl() -``` +On some systems, `.actor` might be hidden in the directory listing because it starts with a dot. Use your editor's built-in file explorer to locate it. -We import our scraper as a function and await the result inside the Actor block. Unlike the sample scraper, the one we made in the previous lesson doesn't expect any input data, so we can omit the code that handles that part. +::: -Next, we'll change to the `warehouse-watchdog` directory in our terminal and verify that everything works locally before deploying the project to the cloud: +We'll also need a few changes to our code. First, let's add the `apify` package, which is the [Apify SDK](https://docs.apify.com/sdk/js/): ```text -$ apify run -Run: /Users/course/Projects/warehouse-watchdog/.venv/bin/python3 -m src -[apify] INFO Initializing Actor... -[apify] INFO System info ({"apify_sdk_version": "0.0.0", "apify_client_version": "0.0.0", "crawlee_version": "0.0.0", "python_version": "0.0.0", "os": "xyz"}) -[BeautifulSoupCrawler] INFO Current request statistics: -┌───────────────────────────────┬──────────┐ -│ requests_finished │ 0 │ -│ requests_failed │ 0 │ -│ retry_histogram │ [0] │ -│ request_avg_failed_duration │ None │ -│ request_avg_finished_duration │ None │ -│ requests_finished_per_minute │ 0 │ -│ requests_failed_per_minute │ 0 │ -│ request_total_duration │ 0.0 │ -│ requests_total │ 0 │ -│ crawler_runtime │ 0.016736 │ -└───────────────────────────────┴──────────┘ -[crawlee._autoscaling.autoscaled_pool] INFO current_concurrency = 0; desired_concurrency = 2; cpu = 0; mem = 0; event_loop = 0.0; client_info = 0.0 -[BeautifulSoupCrawler] INFO Looking for product detail pages -[BeautifulSoupCrawler] INFO Product detail page: https://warehouse-theme-metal.myshopify.com/products/jbl-flip-4-waterproof-portable-bluetooth-speaker -[BeautifulSoupCrawler] INFO Saving a product variant -[BeautifulSoupCrawler] INFO Saving a product variant +$ npm install apify --save + +added 123 packages, and audited 123 packages in 0s ... ``` -## Updating the Actor configuration - -The Actor configuration from the template tells the platform to expect input, so we need to update that before running our scraper in the cloud. +Now we'll modify the program so that before it starts, it configures the Actor environment, and after it ends, it gracefully exits the Actor process: -Inside `warehouse-watchdog`, there's a directory called `.actor`. Within it, we'll edit the `input_schema.json` file, which looks like this by default: +```js title="index.js" +import { CheerioCrawler } from 'crawlee'; +// highlight-next-line +import { Actor } from 'apify'; -```json title=warehouse-watchdog/src/.actor/input_schema.json -{ - "title": "Python Crawlee BeautifulSoup Scraper", - "type": "object", - "schemaVersion": 1, - "properties": { - "start_urls": { - "title": "Start URLs", - "type": "array", - "description": "URLs to start with", - "prefill": [ - { "url": "https://apify.com" } - ], - "editor": "requestListSources" - } - }, - "required": ["start_urls"] +function parseVariant($option) { + ... } -``` -:::tip Hidden dot files +// highlight-next-line +await Actor.init(); -On some systems, `.actor` might be hidden in the directory listing because it starts with a dot. Use your editor's built-in file explorer to locate it. +const crawler = new CheerioCrawler({ + ... +}); -::: +await crawler.run(['https://warehouse-theme-metal.myshopify.com/collections/sales']); +crawler.log.info('Exporting data'); +await crawler.exportData('dataset.json'); +await crawler.exportData('dataset.csv'); -We'll remove the expected properties and the list of required ones. After our changes, the file should look like this: +// highlight-next-line +await Actor.exit(); +``` -```json title=warehouse-watchdog/src/.actor/input_schema.json +Finally, let's tell others how to start the project. This is not specific to Actors. JavaScript projects usually include this so people and tools like Apify know how to run them. We will add a `start` script to `package.json`: + +```json title="package.json" { - "title": "Python Crawlee BeautifulSoup Scraper", - "type": "object", - "schemaVersion": 1, - "properties": {} + "name": "academy-example", + "version": "1.0.0", + ... + "scripts": { + // highlight-next-line + "start": "node index.js", + "test": "echo \"Error: no test specified\" && exit 1" + }, + "dependencies": { + ... + } } ``` -:::danger Trailing commas in JSON +That's it! Before deploying the project to the cloud, let's verify that everything works locally: -Make sure there's no trailing comma after `{}`, or the file won't be valid JSON. +```text +$ apify run +Run: npm run start -::: +> academy-example@1.0.0 start +> node index.js + +INFO System info {"apifyVersion":"0.0.0","apifyClientVersion":"0.0.0","crawleeVersion":"0.0.0","osType":"Darwin","nodeVersion":"v0.0.0"} +INFO CheerioCrawler: Starting the crawler. +INFO CheerioCrawler: Looking for product detail pages +INFO CheerioCrawler: Product detail page: https://warehouse-theme-metal.myshopify.com/products/jbl-flip-4-waterproof-portable-bluetooth-speaker +INFO CheerioCrawler: Saving a product variant +INFO CheerioCrawler: Saving a product variant +... +``` ## Deploying the scraper @@ -263,7 +177,7 @@ When the run finishes, the interface will turn green. On the **Output** tab, we :::info Accessing data -We don't need to click buttons to download the data. It's possible to retrieve it also using Apify's API, the `apify datasets` CLI command, or the Python SDK. Learn more in the [Dataset docs](https://docs.apify.com/platform/storage/dataset). +We don't need to click buttons to download the data. It's possible to retrieve it also using Apify's API, the `apify datasets` CLI command, or the JavaScript SDK. Learn more in the [Dataset docs](https://docs.apify.com/platform/storage/dataset). ::: @@ -279,103 +193,95 @@ From now on, the Actor will execute daily. We can inspect each run, view logs, c If monitoring shows that our scraper frequently fails to reach the Warehouse Shop website, it's likely being blocked. To avoid this, we can [configure proxies](https://docs.apify.com/platform/proxy) so our requests come from different locations, reducing the chances of detection and blocking. -Proxy configuration is a type of Actor input, so let's start by reintroducing the necessary code. We'll update `warehouse-watchdog/src/main.py` like this: - -```py title=warehouse-watchdog/src/main.py -from apify import Actor -from .crawler import main as crawl +Proxy configuration is a type of [Actor input](https://docs.apify.com/platform/actors/running/input-and-output#input). Crawlee scrapers automatically connect their default dataset to the Actor output, but input must be handled manually. Inside the `.actor` directory we'll create a new file, `inputSchema.json`, with the following content: -async def main(): - async with Actor: - input_data = await Actor.get_input() +```json title=".actor/inputSchema.json" +{ + "title": "Crawlee Cheerio Scraper", + "type": "object", + "schemaVersion": 1, + "properties": { + "proxyConfig": { + "title": "Proxy config", + "description": "Proxy configuration", + "type": "object", + "editor": "proxy", + "prefill": { + "useApifyProxy": true, + "apifyProxyGroups": [] + }, + "default": { + "useApifyProxy": true, + "apifyProxyGroups": [] + } + } + } +} +``` - if actor_proxy_input := input_data.get("proxyConfig"): - proxy_config = await Actor.create_proxy_configuration(actor_proxy_input=actor_proxy_input) - else: - proxy_config = None +Now let's connect this file to the actor configuration. In `actor.json`, we'll add one more line: - await crawl(proxy_config) +```json title=".actor/actor.json" +{ + "actorSpecification": 1, + "name": "warehouse-watchdog", + "version": "0.0", + "buildTag": "latest", + "environmentVariables": {}, + // highlight-next-line + "input": "./inputSchema.json" +} ``` -Next, we'll add `proxy_config` as an optional parameter in `warehouse-watchdog/src/crawler.py`. Thanks to the built-in integration between Apify and Crawlee, we only need to pass it to `BeautifulSoupCrawler()`, and the class will handle the rest: +:::danger Trailing commas in JSON -```py title=warehouse-watchdog/src/crawler.py -import asyncio -from decimal import Decimal -from crawlee.crawlers import BeautifulSoupCrawler +Make sure there's no trailing comma after the line, or the file won't be valid JSON. -# highlight-next-line -async def main(proxy_config = None): - # highlight-next-line - crawler = BeautifulSoupCrawler(proxy_configuration=proxy_config) - # highlight-next-line - crawler.log.info(f"Using proxy: {'yes' if proxy_config else 'no'}") +::: - @crawler.router.default_handler - async def handle_listing(context): - context.log.info("Looking for product detail pages") - await context.enqueue_links(selector=".product-list a.product-item__title", label="DETAIL") +That tells the platform our Actor expects proxy configuration on input. We'll also update the `index.js`. Thanks to the built-in integration between Apify and Crawlee, we can pass the proxy configuration as-is to the `CheerioCrawler`: +```js +... +await Actor.init(); +// highlight-next-line +const proxyConfiguration = await Actor.createProxyConfiguration(); + +const crawler = new CheerioCrawler({ + // highlight-next-line + proxyConfiguration, + async requestHandler({ $, request, enqueueLinks, pushData, log }) { ... -``` - -Finally, we'll modify the Actor configuration in `warehouse-watchdog/src/.actor/input_schema.json` to include the `proxyConfig` input parameter: + }, +}); -```json title=warehouse-watchdog/src/.actor/input_schema.json -{ - "title": "Python Crawlee BeautifulSoup Scraper", - "type": "object", - "schemaVersion": 1, - "properties": { - "proxyConfig": { - "title": "Proxy config", - "description": "Proxy configuration", - "type": "object", - "editor": "proxy", - "prefill": { - "useApifyProxy": true, - "apifyProxyGroups": [] - }, - "default": { - "useApifyProxy": true, - "apifyProxyGroups": [] - } - } - } -} +// highlight-next-line +crawler.log.info(`Using proxy: ${proxyConfiguration ? 'yes' : 'no'}`); +await crawler.run(['https://warehouse-theme-metal.myshopify.com/collections/sales']); +... ``` To verify everything works, we'll run the scraper locally. We'll use the `apify run` command again, but this time with the `--purge` option to ensure we're not reusing data from a previous run: ```text $ apify run --purge -Info: All default local stores were purged. -Run: /Users/course/Projects/warehouse-watchdog/.venv/bin/python3 -m src -[apify] INFO Initializing Actor... -[apify] INFO System info ({"apify_sdk_version": "0.0.0", "apify_client_version": "0.0.0", "crawlee_version": "0.0.0", "python_version": "0.0.0", "os": "xyz"}) -[BeautifulSoupCrawler] INFO Using proxy: no -[BeautifulSoupCrawler] INFO Current request statistics: -┌───────────────────────────────┬──────────┐ -│ requests_finished │ 0 │ -│ requests_failed │ 0 │ -│ retry_histogram │ [0] │ -│ request_avg_failed_duration │ None │ -│ request_avg_finished_duration │ None │ -│ requests_finished_per_minute │ 0 │ -│ requests_failed_per_minute │ 0 │ -│ request_total_duration │ 0.0 │ -│ requests_total │ 0 │ -│ crawler_runtime │ 0.014976 │ -└───────────────────────────────┴──────────┘ -[crawlee._autoscaling.autoscaled_pool] INFO current_concurrency = 0; desired_concurrency = 2; cpu = 0; mem = 0; event_loop = 0.0; client_info = 0.0 -[BeautifulSoupCrawler] INFO Looking for product detail pages -[BeautifulSoupCrawler] INFO Product detail page: https://warehouse-theme-metal.myshopify.com/products/jbl-flip-4-waterproof-portable-bluetooth-speaker -[BeautifulSoupCrawler] INFO Saving a product variant -[BeautifulSoupCrawler] INFO Saving a product variant +Run: npm run start + +> academy-example@1.0.0 start +> node index.js + +INFO System info {"apifyVersion":"0.0.0","apifyClientVersion":"0.0.0","crawleeVersion":"0.0.0","osType":"Darwin","nodeVersion":"v0.0.0"} +WARN ProxyConfiguration: The "Proxy external access" feature is not enabled for your account. Please upgrade your plan or contact support@apify.com +INFO CheerioCrawler: Using proxy: no +INFO CheerioCrawler: Starting the crawler. +INFO CheerioCrawler: Looking for product detail pages +INFO CheerioCrawler: Product detail page: https://warehouse-theme-metal.myshopify.com/products/denon-ah-c720-in-ear-headphones +INFO CheerioCrawler: Saving a product variant +INFO CheerioCrawler: Saving a product variant ... ``` -In the logs, we should see `Using proxy: no`, because local runs don't include proxy settings. All requests will be made from our own location, just as before. Now, let's update the cloud version of our scraper with `apify push`: +In the logs, we should see `Using proxy: no`, because local runs don't include proxy settings. A warning informs us that it's a paid feature we don't have enabled, so all requests will be made from our own location, just as before. Now, let's update the cloud version of our scraper with `apify push`: ```text $ apify push @@ -394,30 +300,17 @@ Back in the Apify console, we'll go to the **Source** screen and switch to the * We'll leave it as is and click **Start**. This time, the logs should show `Using proxy: yes`, as the scraper uses proxies provided by the platform: ```text -(timestamp) ACTOR: Pulling Docker image of build o6vHvr5KwA1sGNxP0 from repository. +(timestamp) ACTOR: Pulling Docker image of build o6vHvr5KwA1sGNxP0 from registry. (timestamp) ACTOR: Creating Docker container. (timestamp) ACTOR: Starting Docker container. -(timestamp) [apify] INFO Initializing Actor... -(timestamp) [apify] INFO System info ({"apify_sdk_version": "0.0.0", "apify_client_version": "0.0.0", "crawlee_version": "0.0.0", "python_version": "0.0.0", "os": "xyz"}) -(timestamp) [BeautifulSoupCrawler] INFO Using proxy: yes -(timestamp) [BeautifulSoupCrawler] INFO Current request statistics: -(timestamp) ┌───────────────────────────────┬──────────┐ -(timestamp) │ requests_finished │ 0 │ -(timestamp) │ requests_failed │ 0 │ -(timestamp) │ retry_histogram │ [0] │ -(timestamp) │ request_avg_failed_duration │ None │ -(timestamp) │ request_avg_finished_duration │ None │ -(timestamp) │ requests_finished_per_minute │ 0 │ -(timestamp) │ requests_failed_per_minute │ 0 │ -(timestamp) │ request_total_duration │ 0.0 │ -(timestamp) │ requests_total │ 0 │ -(timestamp) │ crawler_runtime │ 0.036449 │ -(timestamp) └───────────────────────────────┴──────────┘ -(timestamp) [crawlee._autoscaling.autoscaled_pool] INFO current_concurrency = 0; desired_concurrency = 2; cpu = 0; mem = 0; event_loop = 0.0; client_info = 0.0 -(timestamp) [crawlee.storages._request_queue] INFO The queue still contains requests locked by another client -(timestamp) [BeautifulSoupCrawler] INFO Looking for product detail pages -(timestamp) [BeautifulSoupCrawler] INFO Product detail page: https://warehouse-theme-metal.myshopify.com/products/jbl-flip-4-waterproof-portable-bluetooth-speaker -(timestamp) [BeautifulSoupCrawler] INFO Saving a product variant +(timestamp) INFO System info {"apifyVersion":"0.0.0","apifyClientVersion":"0.0.0","crawleeVersion":"0.0.0","osType":"Darwin","nodeVersion":"v0.0.0"} +(timestamp) INFO CheerioCrawler: Using proxy: yes +(timestamp) INFO CheerioCrawler: Starting the crawler. +(timestamp) INFO CheerioCrawler: Looking for product detail pages +(timestamp) INFO CheerioCrawler: Product detail page: https://warehouse-theme-metal.myshopify.com/products/sony-ps-hx500-hi-res-usb-turntable +(timestamp) INFO CheerioCrawler: Saving a product +(timestamp) INFO CheerioCrawler: Product detail page: https://warehouse-theme-metal.myshopify.com/products/klipsch-r-120sw-powerful-detailed-home-speaker-set-of-1 +(timestamp) INFO CheerioCrawler: Saving a product ... ``` diff --git a/sources/academy/webscraping/scraping_basics_python/13_platform.md b/sources/academy/webscraping/scraping_basics_python/13_platform.md index 44fedf4cb3..8298dd4b82 100644 --- a/sources/academy/webscraping/scraping_basics_python/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_python/13_platform.md @@ -322,7 +322,7 @@ Finally, we'll modify the Actor configuration in `warehouse-watchdog/src/.actor/ ```json title=warehouse-watchdog/src/.actor/input_schema.json { - "title": "Python Crawlee BeautifulSoup Scraper", + "title": "Crawlee BeautifulSoup Scraper", "type": "object", "schemaVersion": 1, "properties": { From e1884c5e440bc5e5e4b55a1c04f09e6da8e0c4be Mon Sep 17 00:00:00 2001 From: Honza Javorek Date: Wed, 30 Jul 2025 12:16:13 +0200 Subject: [PATCH 2/6] fix: remove hard tabs --- .../scraping_basics_javascript2/13_platform.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md index 8d9b8b27f3..e4405a47da 100644 --- a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md @@ -223,11 +223,11 @@ Now let's connect this file to the actor configuration. In `actor.json`, we'll a ```json title=".actor/actor.json" { - "actorSpecification": 1, - "name": "warehouse-watchdog", - "version": "0.0", - "buildTag": "latest", - "environmentVariables": {}, + "actorSpecification": 1, + "name": "warehouse-watchdog", + "version": "0.0", + "buildTag": "latest", + "environmentVariables": {}, // highlight-next-line "input": "./inputSchema.json" } From 4b5d02766057afcf6d32e359a70df0100d7f6b4b Mon Sep 17 00:00:00 2001 From: Honza Javorek Date: Tue, 2 Sep 2025 10:47:58 +0200 Subject: [PATCH 3/6] style: literally put a dot there MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com> --- .../webscraping/scraping_basics_javascript2/13_platform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md index e4405a47da..b86e3c18b2 100644 --- a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md @@ -76,7 +76,7 @@ The command creates an `.actor` directory with `actor.json` file inside. This fi :::tip Hidden dot files -On some systems, `.actor` might be hidden in the directory listing because it starts with a dot. Use your editor's built-in file explorer to locate it. +On some systems, `.actor` might be hidden in the directory listing because it starts with a `.`. Use your editor's built-in file explorer to locate it. ::: From e592f90d83cdb6d639131c3fec6080afabdfa067 Mon Sep 17 00:00:00 2001 From: Honza Javorek Date: Tue, 2 Sep 2025 10:50:45 +0200 Subject: [PATCH 4/6] fix: use different link to promote the idea of Actors --- .../webscraping/scraping_basics_javascript2/13_platform.md | 2 +- .../academy/webscraping/scraping_basics_python/13_platform.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md index b86e3c18b2..5de5af894a 100644 --- a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md @@ -61,7 +61,7 @@ Success: You are logged in to Apify as user1234! ## Turning our program to an Actor -Every program that runs on the Apify platform first needs to be packaged as a so-called [Actor](https://apify.com/actors)—a standardized container with designated places for input and output. +Every program that runs on the Apify platform first needs to be packaged as a so-called [Actor](https://docs.apify.com/platform/actors)—a standardized container with designated places for input and output. Many [Actor templates](https://apify.com/templates/categories/javascript) simplify the setup for new projects. We'll skip those, as we're about to package an existing program. diff --git a/sources/academy/webscraping/scraping_basics_python/13_platform.md b/sources/academy/webscraping/scraping_basics_python/13_platform.md index 8298dd4b82..50445ec03a 100644 --- a/sources/academy/webscraping/scraping_basics_python/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_python/13_platform.md @@ -82,7 +82,7 @@ Inside the `warehouse-watchdog` directory, we should see a `src` subdirectory co The file contains a single asynchronous function, `main()`. At the beginning, it handles [input](https://docs.apify.com/platform/actors/running/input-and-output#input), then passes that input to a small crawler built on top of the Crawlee framework. -Every program that runs on the Apify platform first needs to be packaged as a so-called [Actor](https://apify.com/actors)—a standardized container with designated places for input and output. Crawlee scrapers automatically connect their default dataset to the Actor output, but input must be handled explicitly in the code. +Every program that runs on the Apify platform first needs to be packaged as a so-called [Actor](https://docs.apify.com/platform/actors)—a standardized container with designated places for input and output. Crawlee scrapers automatically connect their default dataset to the Actor output, but input must be handled explicitly in the code. ![The expected file structure](./images/actor-file-structure.webp) From 0c6b83e84395ea9702798f27c603b78273f49066 Mon Sep 17 00:00:00 2001 From: Honza Javorek Date: Wed, 3 Sep 2025 09:26:46 +0200 Subject: [PATCH 5/6] fix: improve the 'Hidden dot files' admonition --- .../webscraping/scraping_basics_javascript2/13_platform.md | 5 ++++- .../webscraping/scraping_basics_python/13_platform.md | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md index 5de5af894a..26ace812d7 100644 --- a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md @@ -76,7 +76,10 @@ The command creates an `.actor` directory with `actor.json` file inside. This fi :::tip Hidden dot files -On some systems, `.actor` might be hidden in the directory listing because it starts with a `.`. Use your editor's built-in file explorer to locate it. +Files and folders that start with a dot (like `.actor`) may be hidden by default. To see them: + +1. In your operating system's file explorer, look for a setting like **Show hidden files**. +1. Many editors or IDEs can show hidden files as well. For example, the file explorer in **VS Code** shows them by default. ::: diff --git a/sources/academy/webscraping/scraping_basics_python/13_platform.md b/sources/academy/webscraping/scraping_basics_python/13_platform.md index 50445ec03a..7790853b5b 100644 --- a/sources/academy/webscraping/scraping_basics_python/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_python/13_platform.md @@ -217,7 +217,10 @@ Inside `warehouse-watchdog`, there's a directory called `.actor`. Within it, we' :::tip Hidden dot files -On some systems, `.actor` might be hidden in the directory listing because it starts with a dot. Use your editor's built-in file explorer to locate it. +Files and folders that start with a dot (like `.actor`) may be hidden by default. To see them: + +1. In your operating system's file explorer, look for a setting like **Show hidden files**. +1. Many editors or IDEs can show hidden files as well. For example, the file explorer in **VS Code** shows them by default. ::: From d80829aa3b72787417af9ad53fec3634ef58e414 Mon Sep 17 00:00:00 2001 From: Honza Javorek Date: Wed, 3 Sep 2025 09:56:04 +0200 Subject: [PATCH 6/6] fix: admonition markup --- .../webscraping/scraping_basics_javascript2/13_platform.md | 4 ++-- .../academy/webscraping/scraping_basics_python/13_platform.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md index 26ace812d7..8169ba2d9d 100644 --- a/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_javascript2/13_platform.md @@ -78,8 +78,8 @@ The command creates an `.actor` directory with `actor.json` file inside. This fi Files and folders that start with a dot (like `.actor`) may be hidden by default. To see them: -1. In your operating system's file explorer, look for a setting like **Show hidden files**. -1. Many editors or IDEs can show hidden files as well. For example, the file explorer in **VS Code** shows them by default. +- In your operating system's file explorer, look for a setting like **Show hidden files**. +- Many editors or IDEs can show hidden files as well. For example, the file explorer in VS Code shows them by default. ::: diff --git a/sources/academy/webscraping/scraping_basics_python/13_platform.md b/sources/academy/webscraping/scraping_basics_python/13_platform.md index 7790853b5b..23f042a048 100644 --- a/sources/academy/webscraping/scraping_basics_python/13_platform.md +++ b/sources/academy/webscraping/scraping_basics_python/13_platform.md @@ -219,8 +219,8 @@ Inside `warehouse-watchdog`, there's a directory called `.actor`. Within it, we' Files and folders that start with a dot (like `.actor`) may be hidden by default. To see them: -1. In your operating system's file explorer, look for a setting like **Show hidden files**. -1. Many editors or IDEs can show hidden files as well. For example, the file explorer in **VS Code** shows them by default. +- In your operating system's file explorer, look for a setting like **Show hidden files**. +- Many editors or IDEs can show hidden files as well. For example, the file explorer in VS Code shows them by default. :::