A list of personal contributions to the kaggle community.
Herculaneum papyri: the kind of challenge you can tackle on the Kaggle platform.
-
Paris flood dataset. A tabular dataset that provides 125+ years of daily maximum water level measurements (1900–February 2026) from hydrometric stations monitoring the Seine River at Paris Austerlitz. Data collected via the official Hub'Eau API and formatted as a clean, chronologically ordered CSV file that includes 53,332 records with minimal gaps and no artificial interpolation.
- Related project: Fluctuat nec mergitur
-
Fallout: New Vegas, the dataset. A narrative corpus with speaker and other metadata tagged text entries containing mostly dialogues mined from the game. This 59415 entries rich dataset enables NLP and game design researchers to unlock their post-apocalyptic storytelling potential and other advanced language modeling perks.
- Related project: Fallout New Vegas papyrus
-
French rare words lexicon: a CSV of 9351 uncommon French words annotated with gender, lemma, phonological transcription, frequency index, and dictionary definitions for NLP, lexicography, and language‑learning research.
- Related project: dailyword
-
Toulouse public libraries loans dataset: a CSV that contains annual circulation statistics and bibliographic metadata for print, music and movie items from the Toulouse public library collection. It is designed for analyzing lending trends, collection management, patron behavior, and media usage.
- Related project: Toulouse biblio chronicle
-
Paris flood dataset, the designer notebook is an exploratory data analysis of the Paris flood dataset that leverages top-notch visualization techniques to achieve visually striking results through creative design. This is a follow-up of the dataswag submission to the Hackaviz competition.
-
Fallout: New Vegas data exploration notebook is an NLP tailored playground that explores and transforms raw game text through some systematic exploratory data analysis and preprocessing.
-
This EDA guide walks through an analysis of the Toulouse public library loans dataset. It includes basic statistics, outlier detection, and column‑level analytics for fields like title, author, media type, and audience. It also includes summary reports about popular authors, media types and audience trends plus an interactive Jupyter widget for exploratory visualization.
-
A hands‑on exploration of the unusual/rare French words dataset presenting a curated set of computational‑linguistics methods for data prep, feature‑level EDA (word form, gender, frequency, definitions), lexical and semantic analyses (affixes, n‑grams), and interactive visualizations. It is intended as a practical toolkit rather than an exhaustive study. The project is designed to be reproducible, extensible and invites to user experimentation and feedback.
-
Kaggle Digit Recognizer: Playing with some deep learning underused techniques, ObsCure MNIST ranked 7 (on 1000+ participants) with a 99.94% accuracy.
-
Predicting road accidents: Had the chance to learn more about residuals, ensembling and blending models. That's what we call a very close competition, ranked 328/4082.
Ahh.. xkcd