Skip to content

denis-kras/dkwebmod

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Den K's Web Module - dkwebmod

About The Project

Den K's Web Module contains a set of scripts for web operations: downloading files, fetching page content (static and via Playwright), URL parsing, and SSL-aware HTTP requests.

Getting Started

To get a local copy up and running follow these simple steps.

Installation

  1. Install Python.
  2. Install the library using pip:
    pip install dkwebmod

Modules

web

Core module for web operations:

  • download — download a file from a URL to a local directory, with SSL fallback (certifi then system CA), progress output, and overwrite control.
  • download_and_extract_file — download an archive and extract it in one step.
  • get_page_bytes — fetch raw page content as bytes, with optional user-agent spoofing.
  • get_page_content — fetch page content using urllib (static pages) or Playwright (dynamic/JS pages), with output as HTML, text, PDF, PNG, or JPEG.
  • is_status_ok — check whether an HTTP status code is 200.
  • get_filename_from_url — extract the filename from a URL.

githubw — GitHub Wrapper

Wrapper around the GitHub API for downloading repositories, releases, and querying commits:

  • GitHubWrapper class — initialize with user/repo or a repo URL, then:
    • download_and_extract_branch — download and extract a branch (or a specific path within it).
    • download_file / download_directory — download individual files or entire directories from a repo.
    • download_latest_release / download_and_extract_latest_release — download the latest release asset matching a glob pattern.
    • get_latest_release_json / get_latest_release_version / get_latest_release_url — query release metadata.
    • get_releases_json — list releases with optional pattern filtering and pagination.
    • get_latest_commit / get_latest_commit_message — retrieve the latest commit data or message for a branch/path.
    • list_files — list files in the repo (with glob pattern and recursive options).

Running from the command line

The githubw module can be executed directly:

python -m dkwebmod.githubw -u https://github.com/user/repo -b main [options]
Flag Description
-u, --repo_url Repository URL (required)
-b, --branch Branch name (required)
-p, --path Path to a file/folder inside the repo
-t, --target_directory Local directory to download to
--pat Personal access token
-glcm Print the latest commit message
-glcj Print the latest commit JSON
-db Download the branch (or path if -p is set)

Examples:

:: Get the latest commit message for a specific path
python -m dkwebmod.githubw -u https://github.com/user/repo -b main -p src/config.json -glcm

:: Download a branch to a local directory
python -m dkwebmod.githubw -u https://github.com/user/repo -b main -t C:\Downloads\repo -db

:: Download only a specific folder from the branch
python -m dkwebmod.githubw -u https://github.com/user/repo -b main -p docs -t C:\Downloads\docs -db

urls

URL parsing and validation utilities:

  • url_parser — parse a URL into its components (scheme, netloc, path, directories, queries, file).
  • is_valid_url — check whether a string is a valid URL.
  • find_urls_in_text — extract all URLs from a block of text.

user_agents

A dictionary of common browser user-agent strings for use with web requests.

Playwright Wrapper Module - playwrightw

This module was built in the early stages of the project mainly for reference purposes and is not actively used or maintained. However, if you're interested in Playwright, you can find there some useful usage examples for browser automation, element interaction, waiting strategies, and more.

If you still want to use it, you will need to install the following dependencies:

  • beautifulsoup4 — HTML parsing library.
  • playwright — Python bindings for Playwright.
  • Playwright browsers — the actual browser binaries used by Playwright.
pip install beautifulsoup4==4.14.3
pip install playwright==1.56.0
pip install pillow==12.2.0
playwright install

Note: You can use newer versions of these modules, but they were not tested with this project.

License

Distributed under the MIT License. See LICENSE.txt for more information.

History

History.md

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors