Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,19 @@ const response = await fetch(url);
if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);
// highlight-next-line
$(".product-item").each((i, element) => {
// highlight-next-line
// highlight-start
for (const element of $(".product-item").toArray()) {
console.log($(element).text());
// highlight-next-line
});
}
// highlight-end
} else {
throw new Error(`HTTP ${response.status}`);
}
```

We're using [`each()`](https://cheerio.js.org/docs/api/classes/Cheerio#each) to loop over the items in the Cheerio container. It calls the given function for each of the elements, with two arguments. The first is an index (0, 1, 2…), and the second is the element being processed.
Calling [`toArray()`](https://cheerio.js.org/docs/api/classes/Cheerio#toarray) converts the Cheerio selection to a standard JavaScript array. We can then loop over that array and process each selected element.

Cheerio requires us to wrap the element with `$()` again before we can work with it further, and then we call `.text()`. If we run the code, it… well, it definitely prints _something_…
Cheerio requires us to wrap each element with `$()` again before we can work with it further, and then we call `.text()`. If we run the code, it… well, it definitely prints _something_…

```text
$ node index.js
Expand Down Expand Up @@ -79,7 +78,7 @@ if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

$(".product-item").each((i, element) => {
for (const element of $(".product-item").toArray()) {
const $productItem = $(element);

const $title = $productItem.find(".product-item__title");
Expand All @@ -89,7 +88,7 @@ if (response.ok) {
const price = $price.text();

console.log(`${title} | ${price}`);
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down Expand Up @@ -175,7 +174,7 @@ if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

$(".product-item").each((i, element) => {
for (const element of $(".product-item").toArray()) {
const $productItem = $(element);

const $title = $productItem.find(".product-item__title");
Expand All @@ -186,7 +185,7 @@ if (response.ok) {
const price = $price.text();

console.log(`${title} | ${price}`);
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down Expand Up @@ -248,11 +247,11 @@ Djibouti
const html = await response.text();
const $ = cheerio.load(html);

$(".wikitable").each((i, tableElement) => {
for (const tableElement of $(".wikitable").toArray()) {
const $table = $(tableElement);
const $rows = $table.find("tr");

$rows.each((j, rowElement) => {
for (const rowElement of $rows.toArray()) {
const $row = $(rowElement);
const $cells = $row.find("td");

Expand All @@ -261,12 +260,11 @@ Djibouti
const $link = $thirdColumn.find("a").first();
console.log($link.text());
}
});
});
}
}
} else {
throw new Error(`HTTP ${response.status}`);
}

```

Because some rows contain [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th), we skip processing a row if `table_row.select("td")` doesn't find any [table data](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td) cells.
Expand All @@ -293,11 +291,11 @@ Simplify the code from previous exercise. Use a single for loop and a single CSS
const html = await response.text();
const $ = cheerio.load(html);

$(".wikitable tr td:nth-child(3)").each((i, element) => {
for (const element of $(".wikitable tr td:nth-child(3)").toArray()) {
const $nameCell = $(element);
const $link = $nameCell.find("a").first();
console.log($link.text());
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down Expand Up @@ -335,9 +333,9 @@ Max Verstappen wins Canadian Grand Prix: F1 – as it happened
const html = await response.text();
const $ = cheerio.load(html);

$("#maincontent ul li h3").each((i, element) => {
for (const element of $("#maincontent ul li h3").toArray()) {
console.log($(element).text());
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

$(".product-item").each((i, element) => {
for (const element of $(".product-item").toArray()) {
const $productItem = $(element);

const $title = $productItem.find(".product-item__title");
Expand All @@ -87,7 +87,7 @@ if (response.ok) {
}

console.log(`${title} | ${priceRange.minPrice} | ${priceRange.price}`);
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down Expand Up @@ -177,7 +177,7 @@ if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

$(".product-item").each((i, element) => {
for (const element of $(".product-item").toArray()) {
const $productItem = $(element);

const $title = $productItem.find(".product-item__title");
Expand All @@ -200,7 +200,7 @@ if (response.ok) {
}

console.log(`${title} | ${priceRange.minPrice} | ${priceRange.price}`);
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down Expand Up @@ -258,7 +258,7 @@ Denon AH-C720 In-Ear Headphones | 236
const html = await response.text();
const $ = cheerio.load(html);

$(".product-item").each((i, element) => {
for (const element of $(".product-item").toArray()) {
const $productItem = $(element);

const title = $productItem.find(".product-item__title");
Expand All @@ -268,7 +268,7 @@ Denon AH-C720 In-Ear Headphones | 236
const unitsCount = parseUnitsText(unitsText);

console.log(`${title} | ${unitsCount}`);
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down Expand Up @@ -307,7 +307,7 @@ Simplify the code from previous exercise. Use [regular expressions](https://deve
const html = await response.text();
const $ = cheerio.load(html);

$(".product-item").each((i, element) => {
for (const element of $(".product-item").toArray()) {
const $productItem = $(element);

const $title = $productItem.find(".product-item__title");
Expand All @@ -317,7 +317,7 @@ Simplify the code from previous exercise. Use [regular expressions](https://deve
const unitsCount = parseUnitsText(unitsText);

console.log(`${title} | ${unitsCount}`);
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down Expand Up @@ -369,7 +369,7 @@ Hints:
const html = await response.text();
const $ = cheerio.load(html);

$("#maincontent ul li").each((i, element) => {
for (const element of $("#maincontent ul li").toArray()) {
const $article = $(element);

const title = $article
Expand All @@ -383,7 +383,7 @@ Hints:
const date = new Date(dateText);

console.log(`${title} | ${date.toDateString()}`);
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ if (response.ok) {
const $ = cheerio.load(html);

// highlight-next-line
const $items = $(".product-item").map((i, element) => {
const data = $(".product-item").toArray().map(element => {
const $productItem = $(element);

const $title = $productItem.find(".product-item__title");
Expand All @@ -64,15 +64,13 @@ if (response.ok) {
return { title, ...priceRange };
});
// highlight-next-line
const data = $items.get();
// highlight-next-line
console.log(data);
} else {
throw new Error(`HTTP ${response.status}`);
}
```

Instead of printing each line, we now return the data for each product as a JavaScript object. We've replaced `.each()` with [`.map()`](https://cheerio.js.org/docs/api/classes/Cheerio#map-3), which also iterates over the selection but, in addition, collects all the results and returns them as a Cheerio collection. We then convert it into a standard JavaScript array by calling [`.get()`](https://cheerio.js.org/docs/api/classes/Cheerio#call-signature-32). Near the end of the program, we print the entire array.
Instead of printing each line, we now return the data for each product as a JavaScript object. We've replaced the `for` loop with [`.map()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map), which also iterates over the selection but, in addition, collects all the results and returns them as another array. Near the end of the program, we print this entire array.

:::tip Advanced syntax

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

const $items = $(".product-item").map((i, element) => {
const data = $(".product-item").toArray().map(element => {
const $productItem = $(element);

const $title = $productItem.find(".product-item__title");
Expand All @@ -67,7 +67,6 @@ if (response.ok) {

return { title, ...priceRange };
});
const data = $items.get();

const jsonData = JSON.stringify(data);
await writeFile('products.json', jsonData);
Expand Down Expand Up @@ -190,12 +189,11 @@ async function exportCSV(data) {
const listingURL = "https://warehouse-theme-metal.myshopify.com/collections/sales"
const $ = await download(listingURL);

const $items = $(".product-item").map((i, element) => {
const data = $(".product-item").toArray().map(element => {
const $productItem = $(element);
const item = parseProduct($productItem);
return item;
});
const data = $items.get();

await writeFile('products.json', exportJSON(data));
await writeFile('products.csv', await exportCSV(data));
Expand Down Expand Up @@ -286,13 +284,12 @@ Now we'll pass the base URL to the function in the main body of our program:
const listingURL = "https://warehouse-theme-metal.myshopify.com/collections/sales"
const $ = await download(listingURL);

const $items = $(".product-item").map((i, element) => {
const data = $(".product-item").toArray().map(element => {
const $productItem = $(element);
// highlight-next-line
const item = parseProduct($productItem, listingURL);
return item;
});
const data = $items.get();
```

When we run the scraper now, we should see full URLs in our exports:
Expand Down Expand Up @@ -353,12 +350,12 @@ https://en.wikipedia.org/wiki/Botswana
const html = await response.text();
const $ = cheerio.load(html);

$(".wikitable tr td:nth-child(3)").each((i, element) => {
for (const element of $(".wikitable tr td:nth-child(3)").toArray()) {
const nameCell = $(element);
const link = nameCell.find("a").first();
const url = new URL(link.attr("href"), listingURL).href;
console.log(url);
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down Expand Up @@ -397,11 +394,11 @@ https://www.theguardian.com/sport/article/2024/sep/02/max-verstappen-damns-his-u
const html = await response.text();
const $ = cheerio.load(html);

$("#maincontent ul li").each((i, element) => {
for (const element of $("#maincontent ul li").toArray()) {
const link = $(element).find("a").first();
const url = new URL(link.attr("href"), listingURL).href;
console.log(url);
});
}
} else {
throw new Error(`HTTP ${response.status}`);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,12 @@ async function exportCSV(data) {
const listingURL = "https://warehouse-theme-metal.myshopify.com/collections/sales"
const $ = await download(listingURL);

const $items = $(".product-item").map((i, element) => {
const data = $(".product-item").toArray().map(element => {
const $productItem = $(element);
// highlight-next-line
const item = parseProduct($productItem, listingURL);
return item;
});
const data = $items.get();

await writeFile('products.json', exportJSON(data));
await writeFile('products.csv', await exportCSV(data));
Expand Down Expand Up @@ -131,20 +130,20 @@ But where do we put this line in our program?

In the `.map()` loop, we're already going through all the products. Let's expand it to include downloading the product detail page, parsing it, extracting the vendor's name, and adding it to the item object.

First, we need to make the loop asynchronous so that we can use `await download()` for each product. We'll add the `async` keyword to the inner function and rename the collection to `$promises`, since it will now store promises that resolve to items rather than the items themselves. We'll still convert the collection to a standard JavaScript array, but this time we'll pass it to `await Promise.all()` to resolve all the promises and retrieve the actual items.
First, we need to make the loop asynchronous so that we can use `await download()` for each product. We'll add the `async` keyword to the inner function and rename the collection to `promises`, since it will now store promises that resolve to items rather than the items themselves. We'll pass it to `await Promise.all()` to resolve all the promises and retrieve the actual items.

```js
const listingURL = "https://warehouse-theme-metal.myshopify.com/collections/sales"
const $ = await download(listingURL);

// highlight-next-line
const $promises = $(".product-item").map(async (i, element) => {
const promises = $(".product-item").toArray().map(async element => {
const $productItem = $(element);
const item = parseProduct($productItem, listingURL);
return item;
});
// highlight-next-line
const data = await Promise.all($promises.get());
const data = await Promise.all(promises);
```

The program behaves the same as before, but now the code is prepared to make HTTP requests from within the inner function. Let's do it:
Expand All @@ -153,7 +152,7 @@ The program behaves the same as before, but now the code is prepared to make HTT
const listingURL = "https://warehouse-theme-metal.myshopify.com/collections/sales"
const $ = await download(listingURL);

const $promises = $(".product-item").map(async (i, element) => {
const promises = $(".product-item").toArray().map(async element => {
const $productItem = $(element);
const item = parseProduct($productItem, listingURL);

Expand Down Expand Up @@ -248,7 +247,8 @@ Hint: Locating cells in tables is sometimes easier if you know how to [filter](h
const listingURL = "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Africa";
const $ = await download(listingURL);

const $promises = $(".wikitable tr td:nth-child(3)").map(async (i, element) => {
const $cells = $(".wikitable tr td:nth-child(3)");
const promises = $cells.toArray().map(async element => {
const $nameCell = $(element);
const $link = $nameCell.find("a").first();
const countryURL = new URL($link.attr("href"), listingURL).href;
Expand All @@ -266,7 +266,7 @@ Hint: Locating cells in tables is sometimes easier if you know how to [filter](h

console.log(`${countryURL} ${callingCode || null}`);
});
await Promise.all($promises.get());
await Promise.all(promises);
```

</details>
Expand Down Expand Up @@ -314,7 +314,7 @@ Hints:
const listingURL = "https://www.theguardian.com/sport/formulaone";
const $ = await download(listingURL);

const $promises = $("#maincontent ul li").map(async (i, element) => {
const promises = $("#maincontent ul li").toArray().map(async element => {
const $item = $(element);
const $link = $item.find("a").first();
const authorURL = new URL($link.attr("href"), listingURL).href;
Expand All @@ -327,7 +327,7 @@ Hints:

console.log(`${author || address || null}: ${title}`);
});
await Promise.all($promises.get());
await Promise.all(promises);
```

</details>
Loading
Loading