Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
cmake_minimum_required(VERSION 3.5)


set(CORROSION_VERBOSE_OUTPUT ON)

# We need C++17 for std::filesystem on all platforms
Expand Down Expand Up @@ -143,6 +144,10 @@ message(STATUS "OS_ARCH: ${OS_ARCH} (orig='${_GAGGLE_ORIG_OS_ARCH}')")
message(STATUS "DUCKDB_PLATFORM: ${DUCKDB_PLATFORM}")
message(STATUS "Rust_CARGO_TARGET: ${Rust_CARGO_TARGET}")


# ==============================================================================
# Corrosion (Rust integration)
# ==============================================================================
include(FetchContent)
FetchContent_Declare(
Corrosion
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ updates, etc.
This workflow can quickly become complex, especially when working with multiple datasets or when datasets are updated
frequently.
Gaggle tries to help simplify this process by hiding the complexity and letting you work with datasets directly inside
an analytical database like DuckDB that can handle fast queries.
DuckDB that allow you to run fast analytical queries on the data.

In essence, Gaggle makes DuckDB into a SQL-enabled frontend for Kaggle datasets.

Expand All @@ -39,9 +39,9 @@ In essence, Gaggle makes DuckDB into a SQL-enabled frontend for Kaggle datasets.
- Provides a simple API to interact with Kaggle datasets from DuckDB
- Allows you to search, download, and read datasets from Kaggle
- Supports datasets that contain CSV, Parquet, JSON, and XLSX files
- Configurable and has built-in caching of downloaded datasets
- Thread-safe, fast, and has a low memory footprint
- Supports dataset updates and versioning
- Configurable and has built-in caching support to avoid re-downloading
- Thread-safe, fast, and has a low memory footprint

See the [ROADMAP.md](ROADMAP.md) for the list of implemented and planned features.

Expand Down Expand Up @@ -103,7 +103,7 @@ select *
from gaggle_ls('habedi/flickr-8k-dataset-clean') limit 5;

-- Read a Parquet file from local cache using a prepared statement
-- (DuckDB doesn't support subquery in function arguments, so we use a prepared statement)
-- (DuckDB doesn't allow the use of subqueries in function arguments, so we use a prepared statement)
prepare rp as select * from read_parquet(?) limit 10;
execute rp(gaggle_file_path('habedi/flickr-8k-dataset-clean', 'flickr8k.parquet'));

Expand All @@ -118,7 +118,7 @@ select gaggle_cache_info();
select gaggle_is_current('habedi/flickr-8k-dataset-clean');
```

[![Simple Demo 1](https://asciinema.org/a/745806.svg)](https://asciinema.org/a/745806)
[![Simple Demo 1](https://asciinema.org/a/do6g8xv1G5tkRc4e3bExbNYwZ.svg)](https://asciinema.org/a/do6g8xv1G5tkRc4e3bExbNYwZ)

---

Expand Down
20 changes: 9 additions & 11 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,21 +72,19 @@ It outlines features to be implemented and their current status.
### 6. Documentation and Distribution

* **Documentation**
* [x] API reference in README.md.
* [x] Usage examples (see `docs/examples/`).
* [ ] Tutorial documentation.
* [ ] FAQ section.
* [ ] Troubleshooting guide.
* [x] API reference (see `docs/README.md`).
* [x] Usage examples (see the files in `docs/examples/`).
* [x] Other documentation files like the list of errors (check out `docs/` directory).
* **Testing**
* [x] Unit tests for core modules (Rust).
* [x] SQL integration tests (DuckDB shell).
* [x] End-to-end integration tests with mocked HTTP (basic coverage).
* [x] Unit tests for core (Rust) modules.
* [x] SQL integration tests (run in DuckDB shell).
* [x] End-to-end integration tests with mocked HTTP.
* [ ] Performance benchmarks.
* **Distribution**
* [ ] Pre-compiled extension binaries for Linux, macOS, and Windows.
* [ ] Submission to the DuckDB Community Extensions repository.
* [x] Built binaries for Linux, macOS, and Windows; AMD64 and ARM64.
* [x] Submission to the DuckDB's community extensions repository.

### 7. Observability

* **Logging**
* [x] Structured logging via `tracing` with `GAGGLE_LOG_LEVEL`.
* [x] Structured logging (configurable via `GAGGLE_LOG_LEVEL` environment variable).
Loading
Loading