Skip to content


Repository files navigation

Python Web Scrapper

A python example script for web page scrapping and data collection.

In this example, scraps the website to find Jobs in Vancouver, BC. By default, the search term is "Python".


  • A Python script utility for web scrapping (scrapper/
    • Read the content of a remote URL
    • Save cached responses for the same search term
    • Parse Job posting entities to get structured information
    • Save CSV and JSON data to be loaded into a database or used by a web/mobile application
  • A Django web interface to show dynamic search results
    • An clone to search for job terms
    • Uses the scrapper/ utility as a back-end search engine
    • Display results using a web interface created with React.js and Materialize.css
  • Heroku integration to publish the application automatically in a cloud environment


Install the required python libraries

pip install bs4
pip install requests
pip install pandas
pip install lxml
pip install Django
pip install gunicorn
pip install django_heroku

or install using the pip shortcut:

pip install -r requirements.txt

Execute this command to test the web scrapper script:

python [searchTerm]

The results will be stored in [searchTerm].csv and [searchTerm].json files in the ./data/ folder.

Example .json file:

        "id": "p_ed9aac79a2cdb122",
        "text": "Python CI CD Dev",
        "link": "\/rc\/clk?jk=ed9aac79a2cdb122&fccid=8395f0911ec08bde&vjs=3",
        "comp": "Example Company",
        "loc": "Vancouver, BC",
        "desc": "Experience with python and JavaScript Technologies. Python developers are encouraged to apply....",
        "pub": "Just posted"
        "id": "p_6e27c3b05a273a2e",
        "text": "Python Software Engineer",
        "link": "\/rc\/clk?jk=6e27c3b05a273a2e&fccid=55e67e4c356ab20b&vjs=3",
        "comp": "Big Company",
        "loc": "Richmond, BC",
        "desc": "Our stack include Python, Django, C#, .NET, JavaScript and Postgres ....",
        "pub": "1 day ago"

Web interface

You can run the following command to start the Django web interface:

python runserver

A web server will be started at port 8000 (http://localhost:8000)

In this web interface you can query a word and get the results.


Python packages used

  • bs4 - BeatufulSoup4 HTML document navigator
  • requests - Requests HTTP library
  • pandas - Pandas Data analysis library
  • lxml - LXml XML/HTML document parser
  • Django - Django Python Web framework

Front-end components

  • React.js - JavaScript library for building user interfaces
  • Materialize.css - Responsive front-end framework based on Material Design