Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
fb9e3bc
wip
honzajavorek Nov 18, 2025
819462b
feat: keep exercises as separate files, include them to Markdown
honzajavorek Nov 21, 2025
10a0743
chore: implement testing of JavaScript exercises
honzajavorek Nov 24, 2025
6d50689
refactor: use shorter names
honzajavorek Nov 24, 2025
401cc92
chore: add GitHub Action to run tests automatically
honzajavorek Nov 24, 2025
e296be1
chore: ouch, wrong branch
honzajavorek Nov 24, 2025
e673e9f
chore: one does not simply npm install
honzajavorek Nov 24, 2025
846602c
style: make linter happy
honzajavorek Nov 24, 2025
a674604
chore: make sure there is no schedule until we merge this, add explan…
honzajavorek Nov 24, 2025
a72090d
refactor: simplify the tests
honzajavorek Nov 24, 2025
db19ee5
docs: document lychee and academy testing
honzajavorek Nov 24, 2025
fc05b0d
refactor: make exercises testable
honzajavorek Nov 25, 2025
6936de9
fix: avoid the yes option, fix crawlee installation, improve readabil…
honzajavorek Nov 25, 2025
cd5deaa
chore: make the tests more meaningful
honzajavorek Nov 25, 2025
857d3a3
chore: improve the JS test suite
honzajavorek Nov 25, 2025
9901553
style: make the code linter happy
honzajavorek Nov 25, 2025
c056906
style: condense and fix the solutions markup
honzajavorek Nov 25, 2025
ddb9a1b
chore: setup and teardown for Python
honzajavorek Nov 25, 2025
87b813b
chore: fix the JS test suite not to rely on npx --package
honzajavorek Nov 25, 2025
cdc8023
fix: improve the Python test suite and fix solutions using Crawlee (s…
honzajavorek Nov 25, 2025
d006397
style: fix markup
honzajavorek Nov 25, 2025
dfe9b28
chore: fix typo
honzajavorek Nov 25, 2025
b324b20
chore: enable only as a cron
honzajavorek Nov 25, 2025
d4e0643
chore: run monthly
honzajavorek Nov 25, 2025
118a550
style: fix markup
honzajavorek Nov 25, 2025
8c875d1
fix: address bugbot comments
honzajavorek Nov 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/test-academy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Test Academy

on:
schedule:
- cron: "0 3 1 * *" # at 3am UTC on 1st day of month
workflow_dispatch: # allows running this workflow manually from the Actions tab

jobs:
test-exercises:
name: Test Academy Exercises
runs-on: ubuntu-latest
steps:
- name: Checkout Source code
uses: actions/checkout@v6

- name: Setup Node.js
uses: actions/setup-node@v6
with:
cache: npm
cache-dependency-path: package-lock.json

- name: Setup Python
uses: astral-sh/setup-uv@v7

- name: Install Bats
run: |
corepack enable
npm install --only=dev

- name: Test
run: npm run test:academy
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,7 @@ codegen/*/generated/
codegen/*/go.sum
.github/styles/Microsoft
.github/styles/write-good
sources/academy/**/exercises/storage
sources/academy/**/exercises/node_modules
sources/academy/**/exercises/package*.json
sources/academy/**/exercises/dataset.json
6 changes: 6 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,12 @@ Add languages by adding new folders at the appropriate path level.
- Run `vale sync` to download styles
- Configure exceptions in `accepts.txt`

### Testing
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TC-MO Can you take a look at this README change, please? Does it make sense this way?


- **Broken links**: [Periodic GitHub Action](.github/workflows/lychee.yml) checks broken links by [lychee](https://lychee.cli.rs/). If the Action fails, we manually fix the issues.

- **Academy exercises**: At the end of each lesson in the academy courses, there are exercises that target real-world websites. Each exercise includes a solution, stored as a separate file containing executable code. These files are included in the docs using the `!!raw-loader` syntax. Each course has a [Bats](https://bats-core.readthedocs.io/) test file named `test.bats`. The tests run each solution as a standalone program and verify that it produces output matching the expected results. A [periodic GitHub Action](.github/workflows/test-academy.yml) runs all these tests using `npm run test:academy`. If the Action fails, we rework the exercises.

## Pull request process

1. Follow [Conventional Commits](https://www.conventionalcommits.org/)
Expand Down
11 changes: 11 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
"lint:md:fix": "markdownlint '**/*.md' --fix",
"lint:code": "eslint .",
"lint:code:fix": "eslint . --fix",
"test:academy": "bats --print-output-on-failure -r .",
"postinstall": "patch-package",
"postbuild": "node ./scripts/joinLlmsFiles.mjs && node ./scripts/indentLlmsFile.mjs"
},
Expand All @@ -48,6 +49,7 @@
"@apify/tsconfig": "^0.1.0",
"@types/react": "^19.0.0",
"babel-plugin-styled-components": "^2.1.4",
"bats": "^1.13.0",
"cross-env": "^10.0.0",
"eslint": "^9.32.0",
"eslint-plugin-react": "^7.37.5",
Expand All @@ -61,8 +63,8 @@
"typescript-eslint": "^8.38.0"
},
"dependencies": {
"@apify/ui-library": "^1.97.2",
"@apify/ui-icons": "^1.19.0",
"@apify/ui-library": "^1.97.2",
Copy link
Collaborator Author

@honzajavorek honzajavorek Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not my change, npm re-ordered this on its own 👀

"@docusaurus/core": "^3.8.1",
"@docusaurus/faster": "^3.8.1",
"@docusaurus/plugin-client-redirects": "^3.8.1",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@ description: Lesson about building a Node.js application for watching prices. Us
slug: /scraping-basics-javascript/downloading-html
---

import CodeBlock from '@theme/CodeBlock';
import LegacyJsCourseAdmonition from '@site/src/components/LegacyJsCourseAdmonition';
import Exercises from '../scraping_basics/_exercises.mdx';
import LegoExercise from '!!raw-loader!roa-loader!./exercises/lego.mjs';

<LegacyJsCourseAdmonition />

Expand Down Expand Up @@ -184,28 +186,17 @@ Letting our program visibly crash on error is enough for our purposes. Now, let'

<Exercises />

### Scrape AliExpress
### Scrape LEGO
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only I fixed. After this one I decided fixing the exercises should be in separate PRs, not in this one: #2113


Download HTML of a product listing page, but this time from a real world e-commerce website. For example this page with AliExpress search results:
Download HTML of a product listing page, but this time from a real world e-commerce website. For example this page with LEGO search results:

```text
https://www.aliexpress.com/w/wholesale-darth-vader.html
https://www.lego.com/en-us/themes/star-wars
```

<details>
<summary>Solution</summary>

```js
const url = "https://www.aliexpress.com/w/wholesale-darth-vader.html";
const response = await fetch(url);

if (response.ok) {
console.log(await response.text());
} else {
throw new Error(`HTTP ${response.status}`);
}
```

<CodeBlock language="js">{LegoExercise.code}</CodeBlock>
</details>

### Save downloaded HTML as a file
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,11 @@ description: Lesson about building a Node.js application for watching prices. Us
slug: /scraping-basics-javascript/parsing-html
---

import CodeBlock from '@theme/CodeBlock';
import LegacyJsCourseAdmonition from '@site/src/components/LegacyJsCourseAdmonition';
import Exercises from '../scraping_basics/_exercises.mdx';
import F1AcademyTeamsExercise from '!!raw-loader!roa-loader!./exercises/f1academy_teams.mjs';
import F1AcademyDriversExercise from '!!raw-loader!roa-loader!./exercises/f1academy_drivers.mjs';

<LegacyJsCourseAdmonition />

Expand Down Expand Up @@ -183,22 +186,7 @@ https://www.f1academy.com/Racing-Series/Teams

<details>
<summary>Solution</summary>

```js
import * as cheerio from 'cheerio';

const url = "https://www.f1academy.com/Racing-Series/Teams";
const response = await fetch(url);

if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);
console.log($(".teams-driver-item").length);
} else {
throw new Error(`HTTP ${response.status}`);
}
```

<CodeBlock language="js">{F1AcademyTeamsExercise.code}</CodeBlock>
</details>

### Scrape F1 Academy drivers
Expand All @@ -207,20 +195,5 @@ Use the same URL as in the previous exercise, but this time print a total count

<details>
<summary>Solution</summary>

```js
import * as cheerio from 'cheerio';

const url = "https://www.f1academy.com/Racing-Series/Teams";
const response = await fetch(url);

if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);
console.log($(".driver").length);
} else {
throw new Error(`HTTP ${response.status}`);
}
```

<CodeBlock language="js">{F1AcademyDriversExercise.code}</CodeBlock>
</details>
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,12 @@ description: Lesson about building a Node.js application for watching prices. Us
slug: /scraping-basics-javascript/locating-elements
---

import CodeBlock from '@theme/CodeBlock';
import LegacyJsCourseAdmonition from '@site/src/components/LegacyJsCourseAdmonition';
import Exercises from '../scraping_basics/_exercises.mdx';
import WikipediaCountriesExercise from '!!raw-loader!roa-loader!./exercises/wikipedia_countries.mjs';
import WikipediaCountriesSingleSelectorExercise from '!!raw-loader!roa-loader!./exercises/wikipedia_countries_single_selector.mjs';
import GuardianF1TitlesExercise from '!!raw-loader!roa-loader!./exercises/guardian_f1_titles.mjs';

<LegacyJsCourseAdmonition />

Expand Down Expand Up @@ -238,36 +242,7 @@ Djibouti

<details>
<summary>Solution</summary>

```js
import * as cheerio from 'cheerio';

const url = "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Africa";
const response = await fetch(url);

if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

for (const tableElement of $(".wikitable").toArray()) {
const $table = $(tableElement);
const $rows = $table.find("tr");

for (const rowElement of $rows.toArray()) {
const $row = $(rowElement);
const $cells = $row.find("td");

if ($cells.length > 0) {
const $thirdColumn = $($cells[2]);
const $link = $thirdColumn.find("a").first();
console.log($link.text());
}
}
}
} else {
throw new Error(`HTTP ${response.status}`);
}
```
<CodeBlock language="js">{WikipediaCountriesExercise.code}</CodeBlock>

Because some rows contain [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th), we skip processing a row if `table_row.select("td")` doesn't find any [table data](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td) cells.

Expand All @@ -288,27 +263,7 @@ You may want to check out the following pages:

<details>
<summary>Solution</summary>

```js
import * as cheerio from 'cheerio';

const url = "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Africa";
const response = await fetch(url);

if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

for (const element of $(".wikitable tr td:nth-child(3)").toArray()) {
const $nameCell = $(element);
const $link = $nameCell.find("a").first();
console.log($link.text());
}
} else {
throw new Error(`HTTP ${response.status}`);
}
```

<CodeBlock language="js">{WikipediaCountriesSingleSelectorExercise.code}</CodeBlock>
</details>

### Scrape F1 news
Expand All @@ -330,23 +285,5 @@ Max Verstappen wins Canadian Grand Prix: F1 – as it happened

<details>
<summary>Solution</summary>

```js
import * as cheerio from 'cheerio';

const url = "https://www.theguardian.com/sport/formulaone";
const response = await fetch(url);

if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

for (const element of $("#maincontent ul li h3").toArray()) {
console.log($(element).text());
}
} else {
throw new Error(`HTTP ${response.status}`);
}
```

<CodeBlock language="js">{GuardianF1TitlesExercise.code}</CodeBlock>
</details>
Loading
Loading