Skip to content

HTTParser is an open-source Python library designed for parsing web content using various HTTP methods. It allows for both static and dynamic content extraction, making it a versatile tool for web scraping and data retrieval tasks.

License

Notifications You must be signed in to change notification settings

RMNCLDYO/HTTParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Made with Python

maintained - yes contributions - welcome

dependency - requests dependency - beautifulsoup4 dependency - selenium

HTTParser

Overview

HTTParser is an open-source Python library designed for parsing web content using various HTTP methods. It allows for both static and dynamic content extraction, making it a versatile tool for web scraping and data retrieval tasks.

This tool is valuable for anyone working with web scraping, API testing, or any application requiring advanced HTTP response handling and parsing. Its modular design allows for easy extension or modification to suit specific needs or handle various web content types.

Key Features

  • Supports GET and POST methods.
  • Handles multiple response formats: JSON, HTML, JavaScript.
  • Customizable request headers, parameters, and payload.
  • Option to parse dynamic content using Selenium WebDriver.
  • Simple and intuitive interface for making HTTP requests.

Prerequisites

  • Python 3.x

Dependencies

The following Python packages are required:

  • requests: For making HTTP requests.
  • beautifulsoup4: Library for parsing results.

The following Python packages are optional:

  • selenium: Library for loading dynamic content.

Installation

To install HTTParser, clone the repository and install dependencies:

git clone https://github.com/RMNCLDYO/HTTParser.git
cd HTTParser
pip install -r requirements.txt

Available Variables

  • url: URL of the page to be parsed. ( REQUIRED )
  • method: HTTP method, options: "get" or "post". ( REQUIRED )
  • response_format: Response format, options: "js", "json", or "html". ( REQUIRED )
  • headers: Custom HTTP headers, format: { "header_name": "header_value" }. ( OPTIONAL )
  • params: URL parameters, format: { "param_name": "param_value" }. ( OPTIONAL )
  • payload: Data payload for POST requests, format: { "payload_name": "payload_value" }. ( OPTIONAL )
  • browser_path: Path to the web browser, used for JavaScript rendering. ( OPTIONAL )
  • chromedriver_path: Path to ChromeDriver, used for JavaScript rendering. ( OPTIONAL )

Usage

HTML Usage

GET Method

from httparser import HTTParser

request = HTTParser(
    url="https://httpbin.org/html",
    method="get",
    response_format="html"
)

response = request.response()
print(response)

JSON Usage

GET Method

from httparser import HTTParser

request = HTTParser(
    url="https://httpbin.org/json",
    method="get",
    response_format="json"
)

response = request.response()
print(response)

POST Method

from httparser import HTTParser

request = HTTParser(
    url="https://httpbin.org/anything",
    method="post",
    response_format="json",
    payload={"HTTParser":"Example Payload"}
)

response = request.response()
print(response)

Dynamic (JS) Usage

GET Method

from httparser import HTTParser

request = HTTParser(
    url="https://httpbin.org/delay/3",
    method="get",
    response_format="js",
    browser_path="/path/to/browser",
    chromedriver_path="/path/to/chromedriver"
)

response = request.response()
print(response)

Dynamic Content Rendering with Javascript ( * optional * )

Installation

pip install selenium

Setting Up ChromeDriver and WebDrivers

To ensure HTTParser works effectively, especially for content that requires JavaScript rendering, you'll need to download and set up ChromeDriver and a compatible WebDriver.

Choosing a Compatible WebDriver

While ChromeDriver is designed for Chrome, you can also use it with other Chromium-based browsers. Here are some options:

  • Google Chrome
  • Brave Browser
  • Opera Browser

Visit Supported WebDrivers to explore other Chromium-based browsers.

Downloading ChromeDriver

  1. Visit ChromeDriver Downloads to download the latest ChromeDriver.
  2. Choose the version that matches your browser's version. To check your browser version, navigate to 'Help > About' in your browser.
  3. Download the appropriate ChromeDriver for your operating system (Windows, Mac, or Linux).

Installing ChromeDriver

Follow the detailed instructions on the ChromeDriver Getting Started page for your specific operating system.

Error Handling

HTTParser logs errors in Error.log. Check this file for error details.

Contributing

Contributions are welcome!

Please refer to CONTRIBUTING.md for detailed guidelines on how to contribute to this project.

Reporting Issues

Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:

  1. Check if the issue has already been reported.
  2. Use the Bug Report template to create a detailed report.
  3. Submit the report here.

Your report will help us make the project better for everyone.

Feature Requests

Got an idea for a new feature? Feel free to suggest it. Here's how:

  1. Check if the feature has already been suggested or implemented.
  2. Use the Feature Request template to create a detailed request.
  3. Submit the request here.

Your suggestions for improvements are always welcome.

Versioning and Changelog

Stay up-to-date with the latest changes and improvements in each version:

  • CHANGELOG.md provides detailed descriptions of each release.

Security

Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in SECURITY.md. Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.

License

Licensed under the MIT License. See LICENSE for details.

About

HTTParser is an open-source Python library designed for parsing web content using various HTTP methods. It allows for both static and dynamic content extraction, making it a versatile tool for web scraping and data retrieval tasks.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages