- Project Owner: @dark-teal-coder
- First Published Date: 2022-03-02
- Title: Washington Post Web Scraper
- Difficulty:
- Beginner
- Intermediate
- Advanced
- Scale:
- Small
- Medium
- Big
The project uses Python to scrape newspaper article content from Washington Post. The article used here is "87 percent of websites are tracking you. This new tool will let you run a creepiness check" and the scraped items are the newspaper article title, author, date and body. The original idea is taken from "Web scraper to get news article content" by DevProjects @codementor.
- Text Editor or Integrated Development Environment (IDE)
- Python 3
- You can install Python 3 from python.org.
- Python Package Installer/Manager
pip
- If you installed Python from python.org, you should already have
pip
. If it is not installed, you can use the commandpy -m ensurepip --default-pip
to bootstrap it from the standard library. If you are using Linux, you will have to install the package manager separately. You can find out more about thepip
tool here.
- If you installed Python from python.org, you should already have
- Command-line interface (CLI)
- You can install the open-source PowerShell on Windows, Linux and macOS if you do not have or want to use a pre-installed CLI on your local machine.
Check if you have Python installed using the command python --version
, or simply, python version
, in the CLI. Git-clone the project repository from Github to the local machine. Use the command py -m pip install package_name
to install the necessary Python libraries. Check out pip documentation to learn more about pip install
. Check the top part of the .py
script file for the list of libraries required. For example, you may need requests
and beautifulsoup4
libraries if you see the following lines in the top part of the script file:
import requests
from bs4 import BeautifulSoup
If pip
fails to locate the relevant packages, you may find it at Python Package Index (PyPI). Use python file_name.py
to run the script in a CLI. Or, use an IDE, such as VS Code, to run the script. There will usually be a [Run] button in the top right corner of the opened script file.
- 87 percent of websites are tracking you. This new tool will let you run a creepiness check.
- XPath Tutorial
- Beautiful Soup: Build a Web Scraper With Python
1st Completion Date: Oct 23, 2022