matchpointR turns the public pages of the Women's Tennis Association (https://www.wtatennis.com) into tidy data frames. It ships helpers for player biographies, career highlights, full match histories and live rankings.
Dynamic content (matches, rankings) is rendered through a headless
Chrome session via chromote,
so JavaScript-generated sections are fully captured before parsing. Where
the WTA site exposes structured schema.org JSON-LD data, matchpointR
reads from that in preference to CSS selectors for resilience against
site redesigns.
You can install the development version of matchpointR from
GitHub with:
# install.packages("pak")
pak::pak("Angnar-97/matchpointR")Or, once it lands on CRAN:
install.packages("matchpointR")You will also need a working Chrome/Chromium install for the dynamic
scrapers, which is managed automatically by chromote.
library(matchpointR)
# Build a canonical player URL from id + slug
url <- wta_player_url(320301, "katerina-siniakova")
# Player biography (name, nationality, headshot, flag, ...)
wta_get_player_basics(url, download_images = FALSE)
# Career highlights (ranks, titles, prize money)
wta_get_player_overview(url)
# Full match history (walks the "Show more" button)
matches_url <- wta_player_url(320301, "katerina-siniakova", "matches")
wta_get_player_matches(matches_url)
# Live rankings
wta_get_rankings("singles", top = 50)matchpointR reads publicly accessible pages of
https://www.wtatennis.com using a single headless browser session per
call. Before using this package — especially at scale — you are
responsible for checking and complying with the WTA website's
Terms and Conditions.
The package does not bypass paywalls, authentication walls or any
technical access controls, does not interact with user accounts, and
exposes no bulk-download helpers. Please scrape considerately: the
chromote session hits the site as a real browser, so iterating over
many players or rankings days back-to-back is equivalent to a human
browsing the site rapidly. Add explicit Sys.sleep() between calls in
loops.
- HTML selectors may drift as the site is redesigned. Where possible
matchpointRreads from the page'sschema.orgJSON-LD block for stability. File an issue on GitHub if a function stops returning data. - Functions return everything as character to stay faithful to the rendered page; cast to numeric or date in a follow-up step.
- Tour-wide statistics leaderboards are not yet implemented; tracked in issue #1.
Alejandro Navas González (Angnar).
