Washington Post Web Scraper

Metadata

Project Owner: @dark-teal-coder
First Published Date: 2022-03-02

Project

Title: Washington Post Web Scraper
Difficulty:
- Beginner
- Intermediate
- Advanced
Scale:
- Small
- Medium
- Big

Repository Description

The project uses Python to scrape newspaper article content from Washington Post. The article used here is "87 percent of websites are tracking you. This new tool will let you run a creepiness check" and the scraped items are the newspaper article title, author, date and body. The original idea is taken from "Web scraper to get news article content" by DevProjects @codementor.

Installation

Tools

Text Editor or Integrated Development Environment (IDE)
- You can download the famous text editor Notepad++.
- Or, you can download the popular IDE Visual Studio Code (VS Code).
Python 3
- You can install Python 3 from python.org.
Python Package Installer/Manager pip
- If you installed Python from python.org, you should already have pip. If it is not installed, you can use the command py -m ensurepip --default-pip to bootstrap it from the standard library. If you are using Linux, you will have to install the package manager separately. You can find out more about the pip tool here.
Command-line interface (CLI)
- You can install the open-source PowerShell on Windows, Linux and macOS if you do not have or want to use a pre-installed CLI on your local machine.

Description

Check if you have Python installed using the command python --version, or simply, python version, in the CLI. Git-clone the project repository from Github to the local machine. Use the command py -m pip install package_name to install the necessary Python libraries. Check out pip documentation to learn more about pip install. Check the top part of the .py script file for the list of libraries required. For example, you may need requests and beautifulsoup4 libraries if you see the following lines in the top part of the script file:

import requests
from bs4 import BeautifulSoup

If pip fails to locate the relevant packages, you may find it at Python Package Index (PyPI). Use python file_name.py to run the script in a CLI. Or, use an IDE, such as VS Code, to run the script. There will usually be a [Run] button in the top right corner of the opened script file.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
LICENSE		LICENSE
README.md		README.md
scrape_washingtonpost.py		scrape_washingtonpost.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Washington Post Web Scraper

Metadata

Project

Repository Description

Installation

Tools

Description

Credits

Contributors

References

About

Releases

Packages

Languages

License

dark-teal-coder/project-washington-post-web-scraper

Folders and files

Latest commit

History

Repository files navigation

Washington Post Web Scraper

Metadata

Project

Repository Description

Installation

Tools

Description

Credits

Contributors

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages