Skip to content

brooks-code/kaggle

Kaggle

A list of personal contributions to the kaggle community.

Banner Image
Herculaneum papyri: the kind of challenge you can tackle on the Kaggle platform.


Note

Kaggle datasets and notebooks are also archived in this repository.

Datasets

  • Paris flood dataset. A tabular dataset that provides 125+ years of daily maximum water level measurements (1900–February 2026) from hydrometric stations monitoring the Seine River at Paris Austerlitz. Data collected via the official Hub'Eau API and formatted as a clean, chronologically ordered CSV file that includes 53,332 records with minimal gaps and no artificial interpolation.

  • Fallout: New Vegas, the dataset. A narrative corpus with speaker and other metadata tagged text entries containing mostly dialogues mined from the game. This 59415 entries rich dataset enables NLP and game design researchers to unlock their post-apocalyptic storytelling potential and other advanced language modeling perks.

  • French rare words lexicon: a CSV of 9351 uncommon French words annotated with gender, lemma, phonological transcription, frequency index, and dictionary definitions for NLP, lexicography, and language‑learning research.

  • Toulouse public libraries loans dataset: a CSV that contains annual circulation statistics and bibliographic metadata for print, music and movie items from the Toulouse public library collection. It is designed for analyzing lending trends, collection management, patron behavior, and media usage.

Notebooks

  • Paris flood dataset, the designer notebook is an exploratory data analysis of the Paris flood dataset that leverages top-notch visualization techniques to achieve visually striking results through creative design. This is a follow-up of the dataswag submission to the Hackaviz competition.

  • Fallout: New Vegas data exploration notebook is an NLP tailored playground that explores and transforms raw game text through some systematic exploratory data analysis and preprocessing.

  • This EDA guide walks through an analysis of the Toulouse public library loans dataset. It includes basic statistics, outlier detection, and column‑level analytics for fields like title, author, media type, and audience. It also includes summary reports about popular authors, media types and audience trends plus an interactive Jupyter widget for exploratory visualization.

  • A hands‑on exploration of the unusual/rare French words dataset presenting a curated set of computational‑linguistics methods for data prep, feature‑level EDA (word form, gender, frequency, definitions), lexical and semantic analyses (affixes, n‑grams), and interactive visualizations. It is intended as a practical toolkit rather than an exhaustive study. The project is designed to be reproducible, extensible and invites to user experimentation and feedback.

Competitions

  • Kaggle Digit Recognizer: Playing with some deep learning underused techniques, ObsCure MNIST ranked 7 (on 1000+ participants) with a 99.94% accuracy.

  • Predicting road accidents: Had the chance to learn more about residuals, ensembling and blending models. That's what we call a very close competition, ranked 328/4082.


Banner Image
Ahh.. xkcd