Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal on future structure of courses #1015

Open
honzajavorek opened this issue May 21, 2024 · 9 comments
Open

Proposal on future structure of courses #1015

honzajavorek opened this issue May 21, 2024 · 9 comments
Assignees
Labels
t-academy Issues related to Web Scraping and Apify academies.

Comments

@honzajavorek
Copy link
Collaborator

honzajavorek commented May 21, 2024

This is a structure of courses I propose we should gravitate towards. As of now, it is a rough structure which will get more detailed over time - splitting, merging, renaming, etc. is expected as part of the evolution.

flowchart TB
	subgraph start["Getting started"]
		direction LR
		beginner_js(Introduction to scraping<br>for JavaScript developers)
		beginner_py(Introduction to scraping<br>for Python developers)
		beginner_js ~~~ beginner_py
	end

	subgraph advanced["Learning advanced techiques"]
		direction LR
		browsers(Advanced scraping<br>with browsers)
		apis(Advanced scraping<br>with APIs)
		anti(Avoiding<br>anti-scraping protections)
		browsers ~~~ apis
	end

	subgraph simplify["Making life easier"]
		direction LR
		frameworks(Using frameworks<br>to simplify scraping)
		platforms(Using platforms<br>to simplify scraping)
		frameworks ~~~ platforms
	end

	start-->advanced
	start-->simplify
	advanced-->simplify
Loading

This issue is an elaboration of what I earlier described internally with the following words:

My hunch is there could be Web Scraping Basics in JS, Web Scraping Basics in Python, Web Scraping with Browsers, Web Scraping of APIs, etc. If we connect these courses to one learning path and call it a Web Scraping Zero to Hero Learning Path™, it can easily also have a landing page and some marketing content, so the actual number of actual courses doesn't concern me that much.

Include one or two courses on how to start with Apify, but not more. Something like Web Scraping with Apify, to complete the learning path. Maybe even something more inconspicuous, such as Getting Productive with Web Scraping Platforms, where we'd teach people why and how to use e.g. proxies, and only then mentioning that Apify has awesome proxies, and use our platform as something the student uses hands-on as an example.

In Getting Productive with Web Scraping Platforms course we'd teach people how platforms in general can help them to avoid a lot of heavy lifting. In the lessons, we would lay out the problems, explain the solutions, and then show, hands-on, how Apify can be used as a solution. The best Apify advertisement - shows the advantages in a contrast to manual solutions.

@honzajavorek honzajavorek added the t-academy Issues related to Web Scraping and Apify academies. label May 21, 2024
@B4nan
Copy link
Member

B4nan commented May 21, 2024

Some time ago I found this, not sure if you saw that already:

https://diataxis.fr/

@honzajavorek
Copy link
Collaborator Author

Yes, I'm a fan of diataxis. A single course should consist of lessons and a each lesson can take the diataxis approach, as proposed here evildmp/diataxis-documentation-framework#130. Also, current "tutorials" are clearly How-to guides as defined by diataxis, and I want to keep them as such, but that's outside of the scope of the course flow above.

@mnmkng
Copy link
Member

mnmkng commented May 21, 2024

What's in the "Using frameworks to simplify scraping" part? Do you plan to move all Crawlee related content in there, or is that something even more advanced?

@honzajavorek
Copy link
Collaborator Author

@mnmkng You made me thinking! The idea when drawing the chart was we explain basic concepts and then in a separate course we show people they can use frameworks (Crawlee, Scrapy) to achieve the same and more, but simpler.

But your question brings me to a better approach 💡 All the courses should start with simple tools, but lead people to using frameworks in the end, demonstrating why they're useful on the way. The same could be done with platforms.

Maybe there will be some topics left which could form a separate "Using frameworks/platforms to simplify scraping" course, maybe not. But these two shouldn't be separate courses, they should be layers each course culminates to.

@honzajavorek
Copy link
Collaborator Author

honzajavorek commented May 21, 2024

Hierarchy of courses:

flowchart TB
    subgraph start["Getting started"]
        direction LR
        beginner_js(Web scraping basics<br>for JavaScript devs)
        beginner_py(Web scraping basics<br>for Python devs)
        beginner_js ~~~ beginner_py
    end

    subgraph advanced["Learning advanced techiques"]
        direction LR
        browsers(Web scraping with browsers)
        apis(Web scraping with APIs)
        anti(Navigating anti-scraping protections)
        browsers ~~~ apis
    end

    start --> advanced
Loading

Structure of a single course

flowchart TB
    subgraph advanced["Course"]
        direction TB
        home(State requirements,<br>promises, motivation)-->basic(Teach basics<br>with basic tools)-->framework(Use framework<br>to simplify code or<br>allow advanced goal)-->platform(Use platform<br>to simplify code or <br>allow advanced goal)
    end
Loading

@honzajavorek
Copy link
Collaborator Author

I changed names of the courses in the chart above to

  • Web scraping basics for XYZ devs
  • Web scraping with browsers
  • Web scraping with APIs
  • Navigating anti-scraping protections

@honzajavorek honzajavorek self-assigned this May 22, 2024
@metalwarrior665
Copy link
Member

Looks good to me. The Browsers vs API scraping can be in a way put against each other with the typical pros & cons page.

Historically, I wanted to have some super-pro course, something like "High scale scraping" with things like recursive pagination, reverse engineering JS etc. Basically a final stage of the journey. Unfortunately, I failed to deploy it in meaningful form (this crazy PR still exists). I think we can easily add that later if we find someone to write it.

@honzajavorek
Copy link
Collaborator Author

Yup, definitely there should be a page which clearly explains where browsers are the best fit and where APIs are the best fit (which will obviously lean towards recommending everything else than browsers if possible). Ideally a page which can be shared between those two courses.

Regarding super-pro techniques, I wonder if it's a field which allows for creation of a step-by-step course, or if it's more like scenarios which you search for once you bump into them, and then look for a canned solution to that particular problem. Because in such case it might make sense to have it as a collection of how-tos.

But that's something we can figure out later. I didn't know about the PR, so it's good you mentioned it. I'll keep it in mind.

@mnmkng
Copy link
Member

mnmkng commented May 23, 2024

Btw, note that the most popular topic in web scraping, and the most sought after guides nowadays are all about bypassing anti scraping protections. Primarily Cloudflare, but also CAPTCHAs and other annoying blocks. So we should definitely keep in mind that expanding the anti-blocking section is one of the priorities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-academy Issues related to Web Scraping and Apify academies.
Projects
None yet
Development

No branches or pull requests

4 participants