TidierTuesday.jl

TidierTuesday.jl is a Julia package that ports the functionality of the TidyTuesday CRAN package to Julia. It provides a suite of functions for accessing and downloading TidyTuesday datasets hosted on GitHub.

Features

Direct dataset loading: Load datasets directly as DataFrames with automatic caching
Get the most recent Tuesday date: Useful for aligning with TidyTuesday releases
List available datasets: Discover available TidyTuesday datasets across years
Download datasets: Retrieve individual files or complete datasets
Display dataset README: Open the dataset's README in your web browser
Check GitHub API rate limits: Monitor your GitHub API usage
Configurable caching: Control where and how datasets are cached

Installation

You can install TidierTuesday.jl using Julia's package manager. From the Julia REPL:

using Pkg
Pkg.add("TidierTuesday")

Or in pkg mode (press ] in the REPL):

pkg> add TidierTuesday

Usage

Once you have installed the package, you can start using it:

using TidierTuesday

Loading Datasets

The main function for loading datasets is tt_load. It returns a NamedTuple of DataFrames:

# Load by date
data = tt_load("2024-04-16")
# Access datasets like:
data.dataset1
data.dataset2

# Or load by year and week
data = tt_load(2024, 16)  # 16th week of 2024

By default, datasets are cached to avoid repeated downloads. You can disable caching with:

data = tt_load("2024-04-16", use_cache=false)

Cache Configuration

By default, datasets are cached in ~/.tidytuesday/cache. You can configure the cache location in several ways:

Environment Variable: Set the TIDYTUESDAY_CACHE_DIR environment variable:
```
# In your shell
export TIDYTUESDAY_CACHE_DIR="/path/to/cache"
```

Runtime Configuration: Use the set_cache_dir function:

# Set cache to a project-specific directory
set_cache_dir(joinpath(pwd(), ".tidytuesday", "cache"))

# Set cache to a custom location
set_cache_dir("/path/to/cache")

Check Current Cache Location:

cache_path = get_cache_dir()
println("Using cache at: $cache_path")

Basic Functions

Last Tuesday

Description: Get the most recent Tuesday date relative to today's date or an optionally provided date
Usage:

last_tuesday = get_last_tuesday()  # Returns current week's Tuesday
last_tuesday = get_last_tuesday(Date(2025, 3, 10))  # Returns nearest Tuesday to specified date

List Available Datasets

Description: Lists all available TidyTuesday datasets, optionally filtered by year
Usage:

# List all datasets across all years
all_datasets = list_datasets()

# List datasets for a specific year
year_datasets = list_datasets(2025)

Download Specific File
- Description: Downloads a specified file from a TidyTuesday dataset by date
- Usage:
```
download_file("2025-03-10", "data.csv")
```

Download Dataset Files

Description: Downloads all or selected files from a TidyTuesday dataset by date
Usage:

download_dataset("2025-03-10")  # Downloads all files
download_dataset("2025-03-10", ["data.csv", "summary.json"])  # Downloads specific files

Display Dataset README
- Description: Opens the README for a TidyTuesday dataset in your default web browser
- Usage:
```
show_readme("2025-03-10")
```
Check GitHub Rate Limit
- Description: Checks the remaining GitHub API rate limit
- Usage:
```
check_rate_limit()
```

Example Workflows

Basic Workflow

Here's a complete example of how to discover and analyze TidyTuesday data:

using TidierTuesday
using DataFrames
using Plots

# 1. Find the most recent Tuesday date
tuesday = get_last_tuesday()
println("Most recent Tuesday: ", tuesday)

# 2. Load the dataset directly as DataFrames
data = tt_load(tuesday)

# 3. Access and analyze the datasets
for (name, df) in pairs(data)
    println("\nDataset: $name")
    println(describe(df))
    
    # Create a simple visualization if appropriate
    if ncol(df) >= 2
        plot(df[!, 1], df[!, 2], 
            title="TidyTuesday Data Visualization - $name",
            xlabel=names(df)[1],
            ylabel=names(df)[2],
            seriestype=:scatter)
    end
end

Manual Download Workflow

If you prefer to download and work with files directly:

using TidierTuesday
using DataFrames
using CSV

# Download a specific file
download_file("2024-04-16", "data.csv")

# Read and analyze the data
df = CSV.read("data.csv", DataFrame)
describe(df)

Direct Data Loading

You can also read data directly from GitHub without downloading:

# Read a CSV file directly from GitHub
url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-03-18/palmtrees.csv"
df = CSV.read(HTTP.get(url).body, DataFrame)

Dependencies

Julia 1.6 or higher
HTTP.jl
JSON3.jl
DataFrames.jl
CSV.jl
Dates (stdlib)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
src		src
test		test
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TidierTuesday.jl

Features

Installation

Usage

Loading Datasets

Cache Configuration

Basic Functions

Example Workflows

Basic Workflow

Manual Download Workflow

Direct Data Loading

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

TidierOrg/TidierTuesday.jl

Folders and files

Latest commit

History

Repository files navigation

TidierTuesday.jl

Features

Installation

Usage

Loading Datasets

Cache Configuration

Basic Functions

Example Workflows

Basic Workflow

Manual Download Workflow

Direct Data Loading

Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages