Skip to content

Commit e54ba89

Browse files
authored
fix(academy): typos, updates and clarifications (#1218)
- fix typos, mainly excess** - update google accept cookies - update google search element selection - logical correction
2 parents ec5b323 + e17d550 commit e54ba89

File tree

10 files changed

+29
-27
lines changed

10 files changed

+29
-27
lines changed

sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ slug: /expert-scraping-with-apify/actors-webhooks
77

88
# Webhooks & advanced Actor overview {#webhooks-and-advanced-actors}
99

10-
**Learn more advanced details about Actors, how they work, and the default configurations they can take. **Also**,** learn how** to integrate your Actor with webhooks.**
10+
**Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks.**
1111

1212
---
1313

sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ const dataset = await Actor.openDataset(datasetId);
4040
// ...
4141
```
4242

43+
> Tip: You will need to use `forceCloud` option - `Actor.openDataset(<name/id>, { forceCloud: true });` - to open dataset from platform storage while running Actor locally.
44+
4345
Next, we'll grab hold of the dataset's items with the `dataset.getData()` function:
4446

4547
```js
@@ -141,7 +143,7 @@ https://api.apify.com/v2/acts/USERNAME~filter-actor/runs?token=YOUR_TOKEN_HERE
141143
142144
Whichever one you choose is totally up to your preference.
143145
144-
Next, within the Actor, we will click the **Integrations** tab and choose **Webhook**, then fill out the details to look like this:
146+
Next, within the Amazon scraping Actor, we will click the **Integrations** tab and choose **Webhook**, then fill out the details to look like this:
145147
146148
![Configuring a webhook](./images/adding-webhook.jpg)
147149
@@ -163,7 +165,7 @@ Additionally, we should be able to see that our **filter-actor** was run, and ha
163165
164166
**Q: How do you allocate more CPU for an Actor's run?**
165167
166-
**A:** On the platform, more memory can be allocated in the Actor's input configuration, and the default allocated CPU can be changed in the Actor's **Settings** tab. When running locally, you can use the **APIFY_MEMORY_MBYTES**** environment variable to set the allocated CPU. 4GB is equal to 1 CPU core on the Apify platform.
168+
**A:** On the platform, more memory can be allocated in the Actor's input configuration, and the default allocated CPU can be changed in the Actor's **Settings** tab. When running locally, you can use the **APIFY_MEMORY_MBYTES** environment variable to set the allocated CPU. 4GB is equal to 1 CPU core on the Apify platform.
167169
168170
**Q: Within itself, can you get the exact time that an Actor was started?**
169171

sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ const crawler = new CheerioCrawler({
5050
});
5151
```
5252

53-
Now, we'll use the **maxUsageCount** key to force each session to be thrown away after 5 uses and **maxErrorScore**** to trash a session once it receives an error.
53+
Now, we'll use the **maxUsageCount** key to force each session to be thrown away after 5 uses and **maxErrorScore** to trash a session once it receives an error.
5454

5555
```js
5656
const crawler = new CheerioCrawler({

sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ await Stats.initialize();
6363
6464
## Tracking errors {#tracking-errors}
6565
66-
In order to keep track of errors, we must write a new function within the crawler's configuration called **failedRequestHandler**. Passed into this function is an object containing an **Error** object for the error which occurred and the **Request** object, as well as information about the session and proxy which were used for the request.
66+
In order to keep track of errors, we must write a new function within the crawler's configuration called **errorHandler**. Passed into this function is an object containing an **Error** object for the error which occurred and the **Request** object, as well as information about the session and proxy which were used for the request.
6767
6868
```js
6969
const crawler = new CheerioCrawler({
@@ -79,7 +79,7 @@ const crawler = new CheerioCrawler({
7979
maxConcurrency: 50,
8080
requestHandler: router,
8181
// Handle all failed requests
82-
failedRequestHandler: async ({ error, request }) => {
82+
errorHandler: async ({ error, request }) => {
8383
// Add an error for this url to our error tracker
8484
Stats.addError(request.url, error?.message);
8585
},

sources/academy/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ That's it! Now, our Actor will push its data to a dataset named **amazon-offers-
6767

6868
We now want to store the cheapest item in the default key-value store under a key named **CHEAPEST-ITEM**. The most efficient and practical way of doing this is by filtering through all of the newly named dataset's items and pushing the cheapest one to the store.
6969

70-
Let's add the following code to the bottom of the Actor after **Crawl** finished** is logged to the console:
70+
Let's add the following code to the bottom of the Actor after **Crawl finished** is logged to the console:
7171

7272
```js
7373
// ...

sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Once again, we'll be adding onto our main Amazon-scraping Actor in this activity
4040

4141
We have decided that we want to retain the data scraped by the Actor for a long period of time, so instead of pushing to the default dataset, we will be pushing to a named dataset. Additionally, we want to save the absolute cheapest item found by the scraper into the default key-value store under a key named **CHEAPEST-ITEM**.
4242

43-
Finally, we'll create a task for the Actor that saves the configuration with the **keyword** set to **google pixel****.
43+
Finally, we'll create a task for the Actor that saves the configuration with the **keyword** set to **google pixel**.
4444

4545
[**Solution**](./solutions/using_storage_creating_tasks.md)
4646

sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -36,15 +36,15 @@ Let's first focus on the first 3 steps listed above. By using `page.click()` and
3636
<TabItem value="Playwright" label="Playwright">
3737

3838
```js
39-
// Click the "I agree" button
39+
// Click the "Accept all" button
4040
await page.click('button:has-text("Accept all")');
4141
```
4242

4343
</TabItem>
4444
<TabItem value="Puppeteer" label="Puppeteer">
4545

4646
```js
47-
// Click the "I agree" button
47+
// Click the "Accept all" button
4848
await page.click('button + button');
4949
```
5050

@@ -53,15 +53,15 @@ await page.click('button + button');
5353

5454
With `page.click()`, Puppeteer and Playwright actually drag the mouse and click, allowing the bot to act more human-like. This is different from programmatically clicking with `Element.click()` in vanilla client-side JavaScript.
5555

56-
Notice that in the Playwright example, we are using a different selector than in the Puppeteer example. This is because Playwright supports [many custom CSS selectors](https://playwright.dev/docs/other-locators#css-elements-matching-one-of-the-conditions), such as the **has-text** pseudo class. As a rule of thumb, using text selectors is much more preferable to using regular selectors, as they are much less likely to break. If Google makes the sibling above the **I agree** button a `<div>` element instead of a `<button>` element, our `button + button` selector will break. However, the button will always have the text **I agree**; therefore, `button:has-text("I agree")` is more reliable.
56+
Notice that in the Playwright example, we are using a different selector than in the Puppeteer example. This is because Playwright supports [many custom CSS selectors](https://playwright.dev/docs/other-locators#css-elements-matching-one-of-the-conditions), such as the **has-text** pseudo class. As a rule of thumb, using text selectors is much more preferable to using regular selectors, as they are much less likely to break. If Google makes the sibling above the **Accept all** button a `<div>` element instead of a `<button>` element, our `button + button` selector will break. However, the button will always have the text **Accept all**; therefore, `button:has-text("Accept all")` is more reliable.
5757

5858
> If you're not already familiar with CSS selectors and how to find them, we recommend referring to [this lesson](../../scraping_basics_javascript/data_extraction/using_devtools.md) in the **Web scraping for beginners** course.
5959
60-
Then, we can type some text into an input field with `page.type()`; passing a CSS selector as the first, and the string to input as the second parameter:
60+
Then, we can type some text into an input field `<textarea>` with `page.type()`; passing a CSS selector as the first, and the string to input as the second parameter:
6161

6262
```js
6363
// Type the query into the search box
64-
await page.type('input[title="Search"]', 'hello world');
64+
await page.type('textarea[title]', 'hello world');
6565
```
6666

6767
Finally, we can press a single key by accessing the `keyboard` property of `page` and calling the `press()` function on it:
@@ -85,11 +85,11 @@ const page = await browser.newPage();
8585

8686
await page.goto('https://www.google.com/');
8787

88-
// Click the "I agree" button
88+
// Click the "Accept all" button
8989
await page.click('button:has-text("Accept all")');
9090

9191
// Type the query into the search box
92-
await page.type('textarea[title="Search"]', 'hello world');
92+
await page.type('textarea[title]', 'hello world');
9393

9494
// Press enter
9595
await page.keyboard.press('Enter');
@@ -110,11 +110,11 @@ const page = await browser.newPage();
110110

111111
await page.goto('https://www.google.com/');
112112

113-
// Click the "I agree" button
113+
// Click the "Accept all" button
114114
await page.click('button + button');
115115

116116
// Type the query into the search box
117-
await page.type('textarea[title="Search"]', 'hello world');
117+
await page.type('textarea[title]', 'hello world');
118118

119119
// Press enter
120120
await page.keyboard.press('Enter');
@@ -146,7 +146,7 @@ await page.goto('https://www.google.com/');
146146

147147
await page.click('button:has-text("Accept all")');
148148

149-
await page.type('textarea[title="Search"]', 'hello world');
149+
await page.type('textarea[title]', 'hello world');
150150

151151
await page.keyboard.press('Enter');
152152

@@ -172,7 +172,7 @@ await page.goto('https://www.google.com/');
172172

173173
await page.click('button + button');
174174

175-
await page.type('textarea[title="Search"]', 'hello world');
175+
await page.type('textarea[title]', 'hello world');
176176

177177
await page.keyboard.press('Enter');
178178

sources/academy/webscraping/puppeteer_playwright/page/page_methods.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,10 +63,10 @@ const page = await browser.newPage();
6363
await page.goto('https://google.com');
6464

6565
// Agree to the cookies policy
66-
await page.click('button:has-text("I agree")');
66+
await page.click('button:has-text("Accept all")');
6767

6868
// Type the query and visit the results page
69-
await page.type('input[title="Search"]', 'hello world');
69+
await page.type('textarea[title]', 'hello world');
7070
await page.keyboard.press('Enter');
7171

7272
// Click on the first result
@@ -99,7 +99,7 @@ await page.goto('https://google.com');
9999
await page.click('button + button');
100100

101101
// Type the query and visit the results page
102-
await page.type('input[title="Search"]', 'hello world');
102+
await page.type('textarea[title]', 'hello world');
103103
await page.keyboard.press('Enter');
104104

105105
// Wait for the first result to appear on the page,

sources/academy/webscraping/puppeteer_playwright/page/waiting.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ await page.goto('https://www.google.com/');
3939

4040
await page.click('button + button');
4141

42-
await page.type('input[title="Search"]', 'hello world');
42+
await page.type('textarea[title]', 'hello world');
4343
await page.keyboard.press('Enter');
4444

4545
// Wait for the element to be present on the page prior to clicking it
@@ -104,10 +104,10 @@ const page = await browser.newPage();
104104
await page.goto('https://google.com');
105105

106106
// Agree to the cookies policy
107-
await page.click('button:has-text("I agree")');
107+
await page.click('button:has-text("Accept all")');
108108

109109
// Type the query and visit the results page
110-
await page.type('input[title="Search"]', 'hello world');
110+
await page.type('textarea[title]', 'hello world');
111111
await page.keyboard.press('Enter');
112112

113113
// Click on the first result
@@ -139,7 +139,7 @@ await page.goto('https://google.com');
139139
await page.click('button + button');
140140

141141
// Type the query and visit the results page
142-
await page.type('input[title="Search"]', 'hello world');
142+
await page.type('textarea[title]', 'hello world');
143143
await page.keyboard.press('Enter');
144144

145145
// Wait for the first result to appear on the page,

sources/academy/webscraping/typescript/mini_project.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -366,7 +366,7 @@ async function scrape(input: UserInput) {
366366
}
367367
```
368368

369-
Now, we can access `result[0].images` on the return value of `scrape` if **removeImages** was false without any compiler errors being thrown. But, if we switch **removeImages** to false, TypeScript will yell at us.
369+
Now, we can access `result[0].images` on the return value of `scrape` if **removeImages** was false without any compiler errors being thrown. But, if we switch **removeImages** to true, TypeScript will yell at us.
370370

371371
![No more error](./images/no-more-error.png)
372372

0 commit comments

Comments
 (0)