Skip to content

aurelien-clu/temporal-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Example of a Python Temporal.io scraper

Python Temporal

Linter

Getting Started

Pre-requisites

Setup

# skip if python 3.9 is already installed with or without pyenv
pyenv install 3.9.10

# update path to your own python 3.9 installation
poetry env use ~/.pyenv/versions/3.9.10/bin/python3.9

# install packages
poetry install

Run

# terminal 1
temporal server start-dev

# terminal 2
python src/run_worker.py
# you could start more workers with more terminals, here it won't be necessary

# terminal 3
mkdir -p data
python src/run_workflow.py --url=https://news.yahoo.com --output-dir=data
# terminal 3 output
INFO    | starting: CrawlUrl(id='b049[...]', url='https://news.yahoo.com')
SUCCESS | Output(url='https://news.yahoo.com', title='Yahoo News [...]', nb_links=86, path='data/b049[...].json')

Go to 127.0.0.1:8233/namespaces/default/workflows to see the temporal web UI.

You can see your recent workflow executions:

recent workflows

When selected, you have a top summary:

workflow summary

And finally you can see all events related to the workflow execution:

workflow event history

this includes inputs, outputs, retries, exceptions, etc.