A simple Python web scraper that extracts article titles from a given webpage using requests and BeautifulSoup.
- Fetches HTML content from a specified URL.
- Parses and extracts article titles.
- Saves extracted titles to a text file (
output.txt). - Prints a preview of the fetched HTML for debugging.
Ensure you have Python installed on your system. Then, install the required dependencies:
pip install requests beautifulsoup4- Clone this repository:
git clone https://github.com/your-username/web_scrapper.git cd web-scraper - Run the script:
python main.py
- Extracted titles will be saved in
output.txt.
- If no articles are found, inspect the HTML structure by checking the printed output and modify the
parse_htmlfunction accordingly. - For JavaScript-rendered pages, consider using
seleniuminstead ofrequests.
This project is open-source under the MIT License.
Feel free to fork this repository and submit pull requests to improve functionality!