HTML Scraper for Website Analysis

Description

This Python script is an HTML scraper designed for website analysis. It extracts various elements from a given website, including HTML, CSS, APIs, metadata, content, and links. The script also performs basic structural analysis and keyword frequency analysis.

Features

Extracts HTML, CSS, and potential API endpoints
Retrieves metadata (title, description, keywords)
Extracts text content and links (both internal and external)
Analyzes basic website structure
Performs simple keyword frequency analysis
Saves all extracted data to a CSV file
Provides key findings, recommendations, and an actionable plan based on the analysis

Requirements

Python 3.6+
requests library
beautifulsoup4 library

Installation

Ensure you have Python 3.6 or higher installed on your system.
Install the required libraries:

Usage

Run the script:
When prompted, enter the URL of the website you want to analyze.
The script will scrape the website and save the results to a file named website_analysis.csv in the same directory.
If needed, a console output with key findings, recommendations and an actionable plan will be provided.

Output

The script generates two types of output:

A CSV file (website_analysis.csv) containing all extracted data.
Console output with key findings, recommendations, and an actionable plan.

CSV File Structure

The CSV file contains the following information:

HTML content
CSS content
Detected API endpoints
Metadata (title, description, keywords)
Page content
Internal and external links
Website structure analysis
Top keywords

Console Output

The console output provides:

A summary of key findings from the analysis
Recommendations based on the findings
An actionable plan for website improvement

Limitations

The script performs basic scraping and analysis. For more complex websites, additional customization may be required.
The keyword analysis is based on simple frequency counting and may not account for context or importance.
The script does not handle JavaScript-rendered content.

Legal and Ethical Considerations

Ensure you have the right to scrape and analyze the target website. Always review and comply with the website's robots.txt file and terms of service.

Contributing

Contributions, issues, and feature requests are welcome. Feel free to check [issues page] if you want to contribute.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HTML Scraper for Website Analysis

Description

Features

Requirements

Installation

Usage

Output

CSV File Structure

Console Output

Limitations

Legal and Ethical Considerations

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Catefishh/HTML-Scraper

Folders and files

Latest commit

History

Repository files navigation

HTML Scraper for Website Analysis

Description

Features

Requirements

Installation

Usage

Output

CSV File Structure

Console Output

Limitations

Legal and Ethical Considerations

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages