Skip to content

rurl 0.3.0

Choose a tag to compare

@bart-turczynski bart-turczynski released this 01 Jun 09:21
· 296 commits to main since this release

πŸš€ rurl - Release Notes (Version 0.3.0)

This release adds powerful URL cleaning features, improves parsing flexibility, and introduces utilities for comparing and joining datasets using URL permutations.


✨ New Features

  • URL Case Handling
    safe_parse_url() and get_clean_url() now support a case_handling parameter ("lower", "upper", or "keep"), allowing control over output casing.

  • Trailing Slash Control
    New trailing_slash_handling parameter lets users preserve, strip, or ignore trailing slashes for cleaner and more consistent URLs.

  • URL Permutation Utility
    Added permute_url() to generate standardized variants of a URL (altering scheme, www prefix, and trailing slash). Useful for deduplication, comparison, and joins across inconsistent URL formats.

  • Permutation-Based Joins
    Introduced permutation_join() to join two datasets by matching across all URL variants, helping align reports or datasets where URLs appear in differing forms.


πŸ› οΈ Enhancements

  • Non-Standard Scheme Handling
    safe_parse_url() now better handles malformed schemes like htp:// when protocol_handling is configured, with improved status reporting.

πŸ› Bug Fixes

  • Schemeless URLs with Ports
    Fixed incorrect NA returns for URLs like example.com:8080/path.

  • Parsing Stability
    Reinforced fallback behavior when curl::curl_parse_url() fails, ensuring safe returns without downstream errors.