Skip to content

artofscripting/URLcache2db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

URLcache2db

A lightweight Python library that fetches URL content and stores it in a local SQLite database for a specified amount of time. If a URL is requested again within the cache expiration window, the library returns the cached content instead of making a new HTTP request.

Features

  • Avoid redundant HTTP requests by caching responses locally.
  • Easy integration with standard SQLite.
  • Supports sending custom headers and data with requests, differentiating cache entries automatically based on request payload.
  • Cleans up expired cache entries automatically.

Requirements

  • Python 3.x
  • requests

Installation

You can install the dependencies via the included requirements.txt:

pip install -r requirements.txt

Basic Usage

from urlcache import URLCache

# Initialize to cache for 1 hour (3600 seconds) in "my_cache.db"
cache = URLCache(db_path="my_cache.db", expiration_seconds=3600)

# Fetch the content. Will make an HTTP request on the first call.
content = cache.get("https://httpbin.org/get")

# Call it again. If it happens within an hour, it skips the HTTP request 
# and returns the cache from SQLite immediately.
cached_content = cache.get("https://httpbin.org/get")

Advanced Usage (Headers & Data)

The cache key is uniquely generated based on the URL, headers, and data. Changing the headers or payload data will result in a separate cache entry.

from urlcache import URLCache

cache = URLCache(expiration_seconds=600)

custom_headers = {
    "Authorization": "Bearer 1234567890"
}

payload_data = {
    "search": "python",
    "limit": 10
}

# First time: makes the GET request
content = cache.get("https://httpbin.org/get", headers=custom_headers, data=payload_data)

# Second time within 10 minutes: uses the cached version
cached_content = cache.get("https://httpbin.org/get", headers=custom_headers, data=payload_data)

SSL Verification & Custom User-Agent

You can easily bypass SSL certificate verification for self-signed or expired certificates using verify=False, and conveniently set a custom User-Agent string:

from urlcache import URLCache

cache = URLCache()

# Fetch content bypassing SSL verification and setting a specific User-Agent
content = cache.get(
    "https://expired.badssl.com/", 
    user_agent="MyCustomApp/2.0", 
    verify=False
)

Maintenance

You can periodically remove expired entries from your database using:

cache.clear_expired()

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages