This Python script is an HTML scraper designed for website analysis. It extracts various elements from a given website, including HTML, CSS, APIs, metadata, content, and links. The script also performs basic structural analysis and keyword frequency analysis.
- Extracts HTML, CSS, and potential API endpoints
- Retrieves metadata (title, description, keywords)
- Extracts text content and links (both internal and external)
- Analyzes basic website structure
- Performs simple keyword frequency analysis
- Saves all extracted data to a CSV file
- Provides key findings, recommendations, and an actionable plan based on the analysis
- Python 3.6+
- requests library
- beautifulsoup4 library
- Ensure you have Python 3.6 or higher installed on your system.
- Install the required libraries:
- Run the script:
- When prompted, enter the URL of the website you want to analyze.
- The script will scrape the website and save the results to a file named
website_analysis.csv
in the same directory. - If needed, a console output with key findings, recommendations and an actionable plan will be provided.
The script generates two types of output:
- A CSV file (
website_analysis.csv
) containing all extracted data. - Console output with key findings, recommendations, and an actionable plan.
The CSV file contains the following information:
- HTML content
- CSS content
- Detected API endpoints
- Metadata (title, description, keywords)
- Page content
- Internal and external links
- Website structure analysis
- Top keywords
The console output provides:
- A summary of key findings from the analysis
- Recommendations based on the findings
- An actionable plan for website improvement
- The script performs basic scraping and analysis. For more complex websites, additional customization may be required.
- The keyword analysis is based on simple frequency counting and may not account for context or importance.
- The script does not handle JavaScript-rendered content.
Ensure you have the right to scrape and analyze the target website. Always review and comply with the website's robots.txt file and terms of service.
Contributions, issues, and feature requests are welcome. Feel free to check [issues page] if you want to contribute.