HTTParser is an open-source Python library designed for parsing web content using various HTTP methods. It allows for both static and dynamic content extraction, making it a versatile tool for web scraping and data retrieval tasks.
This tool is valuable for anyone working with web scraping, API testing, or any application requiring advanced HTTP response handling and parsing. Its modular design allows for easy extension or modification to suit specific needs or handle various web content types.
- Supports GET and POST methods.
- Handles multiple response formats: JSON, HTML, JavaScript.
- Customizable request headers, parameters, and payload.
- Option to parse dynamic content using Selenium WebDriver.
- Simple and intuitive interface for making HTTP requests.
Python 3.x
The following Python packages are required:
requests
: For making HTTP requests.beautifulsoup4
: Library for parsing results.
The following Python packages are optional:
selenium
: Library for loading dynamic content.
To install HTTParser, clone the repository and install dependencies:
git clone https://github.com/RMNCLDYO/HTTParser.git
cd HTTParser
pip install -r requirements.txt
url
: URL of the page to be parsed. ( REQUIRED )method
: HTTP method, options:"get"
or"post"
. ( REQUIRED )response_format
: Response format, options:"js"
,"json"
, or"html"
. ( REQUIRED )headers
: Custom HTTP headers, format:{ "header_name": "header_value" }
. ( OPTIONAL )params
: URL parameters, format:{ "param_name": "param_value" }
. ( OPTIONAL )payload
: Data payload for POST requests, format:{ "payload_name": "payload_value" }
. ( OPTIONAL )browser_path
: Path to the web browser, used for JavaScript rendering. ( OPTIONAL )chromedriver_path
: Path to ChromeDriver, used for JavaScript rendering. ( OPTIONAL )
GET Method
from httparser import HTTParser
request = HTTParser(
url="https://httpbin.org/html",
method="get",
response_format="html"
)
response = request.response()
print(response)
GET Method
from httparser import HTTParser
request = HTTParser(
url="https://httpbin.org/json",
method="get",
response_format="json"
)
response = request.response()
print(response)
POST Method
from httparser import HTTParser
request = HTTParser(
url="https://httpbin.org/anything",
method="post",
response_format="json",
payload={"HTTParser":"Example Payload"}
)
response = request.response()
print(response)
GET Method
from httparser import HTTParser
request = HTTParser(
url="https://httpbin.org/delay/3",
method="get",
response_format="js",
browser_path="/path/to/browser",
chromedriver_path="/path/to/chromedriver"
)
response = request.response()
print(response)
pip install selenium
To ensure HTTParser works effectively, especially for content that requires JavaScript rendering, you'll need to download and set up ChromeDriver and a compatible WebDriver.
While ChromeDriver is designed for Chrome, you can also use it with other Chromium-based browsers. Here are some options:
Google Chrome
Brave Browser
Opera Browser
Visit Supported WebDrivers to explore other Chromium-based browsers.
- Visit ChromeDriver Downloads to download the latest ChromeDriver.
- Choose the version that matches your browser's version. To check your browser version, navigate to 'Help > About' in your browser.
- Download the appropriate ChromeDriver for your operating system (Windows, Mac, or Linux).
Follow the detailed instructions on the ChromeDriver Getting Started page for your specific operating system.
HTTParser logs errors in Error.log
. Check this file for error details.
Contributions are welcome!
Please refer to CONTRIBUTING.md for detailed guidelines on how to contribute to this project.
Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:
- Check if the issue has already been reported.
- Use the Bug Report template to create a detailed report.
- Submit the report here.
Your report will help us make the project better for everyone.
Got an idea for a new feature? Feel free to suggest it. Here's how:
- Check if the feature has already been suggested or implemented.
- Use the Feature Request template to create a detailed request.
- Submit the request here.
Your suggestions for improvements are always welcome.
Stay up-to-date with the latest changes and improvements in each version:
- CHANGELOG.md provides detailed descriptions of each release.
Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in SECURITY.md. Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.
Licensed under the MIT License. See LICENSE for details.