Releases: bart-turczynski/rurl
rurl 1.2.0
rurl 1.2.0
Dependencies
punycoder(used for IDNA/Punycode encoding and decoding) is now on CRAN.
DESCRIPTIONrequirespunycoder (>= 1.0.0).
Behavior changes
- The package-wide default for
case_handlingis now"lower_host"(was
"keep"forsafe_parse_url(),safe_parse_urls(),get_clean_url(), and
theget_*()accessors, and"lower"forget_path()). This is the
RFC 3986 §6.2.2.1 normalization: the case-insensitive scheme and host fold to
lowercase while the case-sensitive path is preserved. With the previous
defaults, hosts such asWWW.Example.COMandwww.example.comdid not fold
to one identity, andget_path()silently lowercased paths (two pages that
differ only by path casing collapsed to one). Passcase_handling = "keep"
to restore the previous reconstruction, or"lower"to lowercase the whole
URL including the path. (RURL-lzepdnmm)
Available on CRAN: https://CRAN.R-project.org/package=rurl
v1
rurl v1 Release Notes
This is the first stable GitHub release of rurl, published as v1.
It reflects package version 1.0.0 as declared in DESCRIPTION.
What rurl Provides
rurl is a vectorized R toolkit for URL parsing, normalization, extraction, permutation, and matching.
Core capabilities:
- Safe parsing with
safe_parse_url()andsafe_parse_urls() - URL normalization with configurable handling for:
- protocol
www- case
- trailing slash
- index pages
- path normalization
- scheme-relative URLs
- host encoding (IDNA/Unicode)
- path encoding
- URL component accessors (
get_*helpers) - URL permutation with
permute_url() - Dataset joins based on canonical or permuted URLs:
canonical_join()permutation_join()
- Built-in memoization caches with
rurl_clear_caches()
Included in This Release
This v1 release includes the current 1.0.0 functionality, including:
- Flexible normalization controls (
case_handling,trailing_slash_handling, and related options) - URL permutation generation for robust matching workflows
- Canonical and permutation-based URL joins
- Improved handling of malformed schemes and schemeless URLs with ports
- Safer parsing fallbacks and improved IPv6 parsing reliability
Installation
# install.packages("remotes")
remotes::install_github("bart-turczynski/rurl")Notes
- This release tag is
v1. - The R package version for this release is
1.0.0. - For full historical package changes, see
NEWS.md.
rurl 0.3.0
🚀 rurl - Release Notes (Version 0.3.0)
This release adds powerful URL cleaning features, improves parsing flexibility, and introduces utilities for comparing and joining datasets using URL permutations.
✨ New Features
-
URL Case Handling
safe_parse_url()andget_clean_url()now support acase_handlingparameter ("lower","upper", or"keep"), allowing control over output casing. -
Trailing Slash Control
Newtrailing_slash_handlingparameter lets users preserve, strip, or ignore trailing slashes for cleaner and more consistent URLs. -
URL Permutation Utility
Addedpermute_url()to generate standardized variants of a URL (altering scheme,wwwprefix, and trailing slash). Useful for deduplication, comparison, and joins across inconsistent URL formats. -
Permutation-Based Joins
Introducedpermutation_join()to join two datasets by matching across all URL variants, helping align reports or datasets where URLs appear in differing forms.
🛠️ Enhancements
- Non-Standard Scheme Handling
safe_parse_url()now better handles malformed schemes likehtp://whenprotocol_handlingis configured, with improved status reporting.
🐛 Bug Fixes
-
Schemeless URLs with Ports
Fixed incorrect NA returns for URLs likeexample.com:8080/path. -
Parsing Stability
Reinforced fallback behavior whencurl::curl_parse_url()fails, ensuring safe returns without downstream errors.
rurl v0.2.0
rurl v0.2.0
This release adds robust support for internationalized domain names (IDNs), improves punycode handling, and ensures accurate extraction of TLDs and registered domains.
✅ Highlights
- Accurate TLD extraction for both ASCII and Unicode domains
- Graceful fallback when
urltoolsis unavailable - NFC normalization with
stringi - 100% test coverage with edge cases and punycode validation
- Improved internal helpers and clearer test diagnostics
🧪 Passes all tests: 103/103
v0.1.3 — Remove psl dependency
🚀 v0.1.3 — Remove psl Dependency, Add Internal Public Suffix List
This release removes the dependency on the psl package and replaces it with an internal, CRAN-compliant solution using a locally cached copy of the Public Suffix List.
🔄 Changes
- ✅ Replaced
psl::apex_domain()with an internal.get_registered_domain()helper. - ✅ Added a
data-raw/update_psl.Rscript to refresh the PSL during development. - ✅ Stored
psl_cleanas internal data insysdata.rda. - ✅ Updated
get_domain()to rely on the internal suffix logic. - ✅ Added new unit tests for edge cases (e.g., wildcard and exception rules).
- ✅ Achieved 100% test coverage (
covr::package_coverage()). - ✅ Removed
pslfromDESCRIPTIONand all related references.
📦 Package Check
devtools::check()passes with 0 errors, 0 warnings, and 1 note (timestamp-related).- Fully CRAN-compliant and ready for publishing.
Reached 100% coverage, cmd check no errors.
Release v0.1.2: Improvements, Bug Fixes, and Enhanced Documentation
- Bug Fixes: Resolved issues with incorrect handling of edge cases in URL parsing (e.g., handling of URLs without a scheme or with incomplete domain information).
- Code Enhancements: Refined the handling of protocols (HTTP, HTTPS, FTP, etc.) to improve consistency and user feedback on invalid or incomplete URLs.
- Test Coverage: Improved test coverage and stability with additional test cases, achieving over 80% coverage, ensuring the package's functionality across various use cases.
- Documentation Updates: Improved function documentation and examples, making it easier for users to understand and utilize the library. Fixed
@usageand@paramsections in Rd files to ensure full documentation. - License File Added: Included the LICENSE file and updated metadata for proper licensing information.
This update stabilizes the package and improves its robustness, documentation, and testability.
rurl 0.1.1
Highlights
- ✅ Strict scheme whitelist: http, https, ftp, ftps
- 🚫 Rejects mailto:, file:, s3:, malformed protocols
- 🎯 Correctly handles
ftps://asok-ftp - 🧪 Expanded test coverage for edge cases
rurl 0.1.0
- Vectorized all accessors
- Added clean URL handling
- Built foundational API for future use