Skip to content

Releases: bart-turczynski/rurl

rurl 1.2.0

20 Jun 10:41
6b999fa

Choose a tag to compare

rurl 1.2.0

Dependencies

  • punycoder (used for IDNA/Punycode encoding and decoding) is now on CRAN.
    DESCRIPTION requires punycoder (>= 1.0.0).

Behavior changes

  • The package-wide default for case_handling is now "lower_host" (was
    "keep" for safe_parse_url(), safe_parse_urls(), get_clean_url(), and
    the get_*() accessors, and "lower" for get_path()). This is the
    RFC 3986 §6.2.2.1 normalization: the case-insensitive scheme and host fold to
    lowercase while the case-sensitive path is preserved. With the previous
    defaults, hosts such as WWW.Example.COM and www.example.com did not fold
    to one identity, and get_path() silently lowercased paths (two pages that
    differ only by path casing collapsed to one). Pass case_handling = "keep"
    to restore the previous reconstruction, or "lower" to lowercase the whole
    URL including the path. (RURL-lzepdnmm)

Available on CRAN: https://CRAN.R-project.org/package=rurl

v1

16 Feb 21:23

Choose a tag to compare

v1

rurl v1 Release Notes

This is the first stable GitHub release of rurl, published as v1.
It reflects package version 1.0.0 as declared in DESCRIPTION.

What rurl Provides

rurl is a vectorized R toolkit for URL parsing, normalization, extraction, permutation, and matching.

Core capabilities:

  • Safe parsing with safe_parse_url() and safe_parse_urls()
  • URL normalization with configurable handling for:
    • protocol
    • www
    • case
    • trailing slash
    • index pages
    • path normalization
    • scheme-relative URLs
    • host encoding (IDNA/Unicode)
    • path encoding
  • URL component accessors (get_* helpers)
  • URL permutation with permute_url()
  • Dataset joins based on canonical or permuted URLs:
    • canonical_join()
    • permutation_join()
  • Built-in memoization caches with rurl_clear_caches()

Included in This Release

This v1 release includes the current 1.0.0 functionality, including:

  • Flexible normalization controls (case_handling, trailing_slash_handling, and related options)
  • URL permutation generation for robust matching workflows
  • Canonical and permutation-based URL joins
  • Improved handling of malformed schemes and schemeless URLs with ports
  • Safer parsing fallbacks and improved IPv6 parsing reliability

Installation

# install.packages("remotes")
remotes::install_github("bart-turczynski/rurl")

Notes

  • This release tag is v1.
  • The R package version for this release is 1.0.0.
  • For full historical package changes, see NEWS.md.

rurl 0.3.0

01 Jun 09:21

Choose a tag to compare

🚀 rurl - Release Notes (Version 0.3.0)

This release adds powerful URL cleaning features, improves parsing flexibility, and introduces utilities for comparing and joining datasets using URL permutations.


✨ New Features

  • URL Case Handling
    safe_parse_url() and get_clean_url() now support a case_handling parameter ("lower", "upper", or "keep"), allowing control over output casing.

  • Trailing Slash Control
    New trailing_slash_handling parameter lets users preserve, strip, or ignore trailing slashes for cleaner and more consistent URLs.

  • URL Permutation Utility
    Added permute_url() to generate standardized variants of a URL (altering scheme, www prefix, and trailing slash). Useful for deduplication, comparison, and joins across inconsistent URL formats.

  • Permutation-Based Joins
    Introduced permutation_join() to join two datasets by matching across all URL variants, helping align reports or datasets where URLs appear in differing forms.


🛠️ Enhancements

  • Non-Standard Scheme Handling
    safe_parse_url() now better handles malformed schemes like htp:// when protocol_handling is configured, with improved status reporting.

🐛 Bug Fixes

  • Schemeless URLs with Ports
    Fixed incorrect NA returns for URLs like example.com:8080/path.

  • Parsing Stability
    Reinforced fallback behavior when curl::curl_parse_url() fails, ensuring safe returns without downstream errors.

rurl v0.2.0

01 May 19:31

Choose a tag to compare

rurl v0.2.0

This release adds robust support for internationalized domain names (IDNs), improves punycode handling, and ensures accurate extraction of TLDs and registered domains.

✅ Highlights

  • Accurate TLD extraction for both ASCII and Unicode domains
  • Graceful fallback when urltools is unavailable
  • NFC normalization with stringi
  • 100% test coverage with edge cases and punycode validation
  • Improved internal helpers and clearer test diagnostics

🧪 Passes all tests: 103/103

v0.1.3 — Remove psl dependency

01 May 08:31

Choose a tag to compare

🚀 v0.1.3 — Remove psl Dependency, Add Internal Public Suffix List

This release removes the dependency on the psl package and replaces it with an internal, CRAN-compliant solution using a locally cached copy of the Public Suffix List.

🔄 Changes

  • ✅ Replaced psl::apex_domain() with an internal .get_registered_domain() helper.
  • ✅ Added a data-raw/update_psl.R script to refresh the PSL during development.
  • ✅ Stored psl_clean as internal data in sysdata.rda.
  • ✅ Updated get_domain() to rely on the internal suffix logic.
  • ✅ Added new unit tests for edge cases (e.g., wildcard and exception rules).
  • ✅ Achieved 100% test coverage (covr::package_coverage()).
  • ✅ Removed psl from DESCRIPTION and all related references.

📦 Package Check

  • devtools::check() passes with 0 errors, 0 warnings, and 1 note (timestamp-related).
  • Fully CRAN-compliant and ready for publishing.

Reached 100% coverage, cmd check no errors.

01 May 00:35

Choose a tag to compare

Release v0.1.2: Improvements, Bug Fixes, and Enhanced Documentation

  • Bug Fixes: Resolved issues with incorrect handling of edge cases in URL parsing (e.g., handling of URLs without a scheme or with incomplete domain information).
  • Code Enhancements: Refined the handling of protocols (HTTP, HTTPS, FTP, etc.) to improve consistency and user feedback on invalid or incomplete URLs.
  • Test Coverage: Improved test coverage and stability with additional test cases, achieving over 80% coverage, ensuring the package's functionality across various use cases.
  • Documentation Updates: Improved function documentation and examples, making it easier for users to understand and utilize the library. Fixed @usage and @param sections in Rd files to ensure full documentation.
  • License File Added: Included the LICENSE file and updated metadata for proper licensing information.

This update stabilizes the package and improves its robustness, documentation, and testability.

rurl 0.1.1

30 Apr 20:45
2b6fe6c

Choose a tag to compare

Highlights

  • ✅ Strict scheme whitelist: http, https, ftp, ftps
  • 🚫 Rejects mailto:, file:, s3:, malformed protocols
  • 🎯 Correctly handles ftps:// as ok-ftp
  • 🧪 Expanded test coverage for edge cases

rurl 0.1.0

30 Apr 19:41
2b6fe6c

Choose a tag to compare

  • Vectorized all accessors
  • Added clean URL handling
  • Built foundational API for future use