TidierTuesday.jl is a Julia package that ports the functionality of the TidyTuesday CRAN package to Julia. It provides a suite of functions for accessing and downloading TidyTuesday datasets hosted on GitHub.
- Direct dataset loading: Load datasets directly as DataFrames with automatic caching
- Get the most recent Tuesday date: Useful for aligning with TidyTuesday releases
- List available datasets: Discover available TidyTuesday datasets across years
- Download datasets: Retrieve individual files or complete datasets
- Display dataset README: Open the dataset's README in your web browser
- Check GitHub API rate limits: Monitor your GitHub API usage
- Configurable caching: Control where and how datasets are cached
You can install TidierTuesday.jl using Julia's package manager. From the Julia REPL:
using Pkg
Pkg.add("TidierTuesday")
Or in pkg mode (press ]
in the REPL):
pkg> add TidierTuesday
Once you have installed the package, you can start using it:
using TidierTuesday
The main function for loading datasets is tt_load
. It returns a NamedTuple of DataFrames:
# Load by date
data = tt_load("2024-04-16")
# Access datasets like:
data.dataset1
data.dataset2
# Or load by year and week
data = tt_load(2024, 16) # 16th week of 2024
By default, datasets are cached to avoid repeated downloads. You can disable caching with:
data = tt_load("2024-04-16", use_cache=false)
By default, datasets are cached in ~/.tidytuesday/cache
. You can configure the cache location in several ways:
-
Environment Variable: Set the
TIDYTUESDAY_CACHE_DIR
environment variable:# In your shell export TIDYTUESDAY_CACHE_DIR="/path/to/cache"
-
Runtime Configuration: Use the
set_cache_dir
function:# Set cache to a project-specific directory set_cache_dir(joinpath(pwd(), ".tidytuesday", "cache")) # Set cache to a custom location set_cache_dir("/path/to/cache")
-
Check Current Cache Location:
cache_path = get_cache_dir() println("Using cache at: $cache_path")
-
Last Tuesday
- Description: Get the most recent Tuesday date relative to today's date or an optionally provided date
- Usage:
last_tuesday = get_last_tuesday() # Returns current week's Tuesday last_tuesday = get_last_tuesday(Date(2025, 3, 10)) # Returns nearest Tuesday to specified date
-
List Available Datasets
- Description: Lists all available TidyTuesday datasets, optionally filtered by year
- Usage:
# List all datasets across all years all_datasets = list_datasets() # List datasets for a specific year year_datasets = list_datasets(2025)
-
Download Specific File
- Description: Downloads a specified file from a TidyTuesday dataset by date
- Usage:
download_file("2025-03-10", "data.csv")
-
Download Dataset Files
- Description: Downloads all or selected files from a TidyTuesday dataset by date
- Usage:
download_dataset("2025-03-10") # Downloads all files download_dataset("2025-03-10", ["data.csv", "summary.json"]) # Downloads specific files
-
Display Dataset README
- Description: Opens the README for a TidyTuesday dataset in your default web browser
- Usage:
show_readme("2025-03-10")
-
Check GitHub Rate Limit
- Description: Checks the remaining GitHub API rate limit
- Usage:
check_rate_limit()
Here's a complete example of how to discover and analyze TidyTuesday data:
using TidierTuesday
using DataFrames
using Plots
# 1. Find the most recent Tuesday date
tuesday = get_last_tuesday()
println("Most recent Tuesday: ", tuesday)
# 2. Load the dataset directly as DataFrames
data = tt_load(tuesday)
# 3. Access and analyze the datasets
for (name, df) in pairs(data)
println("\nDataset: $name")
println(describe(df))
# Create a simple visualization if appropriate
if ncol(df) >= 2
plot(df[!, 1], df[!, 2],
title="TidyTuesday Data Visualization - $name",
xlabel=names(df)[1],
ylabel=names(df)[2],
seriestype=:scatter)
end
end
If you prefer to download and work with files directly:
using TidierTuesday
using DataFrames
using CSV
# Download a specific file
download_file("2024-04-16", "data.csv")
# Read and analyze the data
df = CSV.read("data.csv", DataFrame)
describe(df)
You can also read data directly from GitHub without downloading:
# Read a CSV file directly from GitHub
url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-03-18/palmtrees.csv"
df = CSV.read(HTTP.get(url).body, DataFrame)
- Julia 1.6 or higher
- HTTP.jl
- JSON3.jl
- DataFrames.jl
- CSV.jl
- Dates (stdlib)
This project is licensed under the MIT License - see the LICENSE file for details.