Name		Name	Last commit message	Last commit date
parent directory ..
.redun		.redun
templates		templates
README.md		README.md
requirements.txt		requirements.txt
workflow.py		workflow.py

README.md

Let's go scrapping!

Setup

This example requires a few additional libraries. You can install them using pip:

pip install -r requirements.txt

Running the example

redun run workflow.py main

By default this will scrape web pages from https://www.python.org/ with a depth of 2 link traversals. All of the HTML files encountered will be stored in crawl/. Word frequency across all pages will be calculated and a CSV of the word counts will be stored in computed/word_counts.txt.

Lastly, an HTML report is generated in reports/report.html that summarizes the scrapping and analysis. The report is generated using a jinja2 template stored in templates/report.html.

Exercises for the reader

Feel free to try other urls and depth of scrapping using the task arguments:

redun run workflow.py main --url URL --depth DEPTH

Also feel free to alter and the report template templates/report.html. It is passed the task make_report() as a File argument, so you should have automatic reactivity to changes in the template when rerun the workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scrapping

scrapping

.redun

.redun

templates

templates

README.md

README.md

requirements.txt

requirements.txt

workflow.py

workflow.py

README.md

Let's go scrapping!

Setup

Running the example

Exercises for the reader

Files

scrapping

Directory actions

More options

Directory actions

More options

Latest commit

History

scrapping

Folders and files

parent directory

Let's go scrapping!

Setup

Running the example

Exercises for the reader