Den K's Web Module contains a set of scripts for web operations: downloading files, fetching page content (static and via Playwright), URL parsing, and SSL-aware HTTP requests.
To get a local copy up and running follow these simple steps.
- Install Python.
- Install the library using pip:
pip install dkwebmod
Core module for web operations:
download— download a file from a URL to a local directory, with SSL fallback (certifi then system CA), progress output, and overwrite control.download_and_extract_file— download an archive and extract it in one step.get_page_bytes— fetch raw page content as bytes, with optional user-agent spoofing.get_page_content— fetch page content usingurllib(static pages) or Playwright (dynamic/JS pages), with output as HTML, text, PDF, PNG, or JPEG.is_status_ok— check whether an HTTP status code is 200.get_filename_from_url— extract the filename from a URL.
Wrapper around the GitHub API for downloading repositories, releases, and querying commits:
GitHubWrapperclass — initialize with user/repo or a repo URL, then:download_and_extract_branch— download and extract a branch (or a specific path within it).download_file/download_directory— download individual files or entire directories from a repo.download_latest_release/download_and_extract_latest_release— download the latest release asset matching a glob pattern.get_latest_release_json/get_latest_release_version/get_latest_release_url— query release metadata.get_releases_json— list releases with optional pattern filtering and pagination.get_latest_commit/get_latest_commit_message— retrieve the latest commit data or message for a branch/path.list_files— list files in the repo (with glob pattern and recursive options).
The githubw module can be executed directly:
python -m dkwebmod.githubw -u https://github.com/user/repo -b main [options]| Flag | Description |
|---|---|
-u, --repo_url |
Repository URL (required) |
-b, --branch |
Branch name (required) |
-p, --path |
Path to a file/folder inside the repo |
-t, --target_directory |
Local directory to download to |
--pat |
Personal access token |
-glcm |
Print the latest commit message |
-glcj |
Print the latest commit JSON |
-db |
Download the branch (or path if -p is set) |
Examples:
:: Get the latest commit message for a specific path
python -m dkwebmod.githubw -u https://github.com/user/repo -b main -p src/config.json -glcm
:: Download a branch to a local directory
python -m dkwebmod.githubw -u https://github.com/user/repo -b main -t C:\Downloads\repo -db
:: Download only a specific folder from the branch
python -m dkwebmod.githubw -u https://github.com/user/repo -b main -p docs -t C:\Downloads\docs -dbURL parsing and validation utilities:
url_parser— parse a URL into its components (scheme, netloc, path, directories, queries, file).is_valid_url— check whether a string is a valid URL.find_urls_in_text— extract all URLs from a block of text.
A dictionary of common browser user-agent strings for use with web requests.
This module was built in the early stages of the project mainly for reference purposes and is not actively used or maintained. However, if you're interested in Playwright, you can find there some useful usage examples for browser automation, element interaction, waiting strategies, and more.
If you still want to use it, you will need to install the following dependencies:
beautifulsoup4— HTML parsing library.playwright— Python bindings for Playwright.- Playwright browsers — the actual browser binaries used by Playwright.
pip install beautifulsoup4==4.14.3
pip install playwright==1.56.0
pip install pillow==12.2.0
playwright installNote: You can use newer versions of these modules, but they were not tested with this project.
Distributed under the MIT License. See LICENSE.txt for more information.