Highlights
- Pro
Stars
A FiveThirtyEight/The Marshall Project effort to collect comprehensive data on police misconduct settlements from 2010-19.
NYC Subway Turnstile Data
An ongoing list of pandas quirks
Parse and analyze the data that the sleep tracking app Pillow exports.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Transportation planning and traffic simulation software for creating cities friendlier to walking, biking, and public transit
A repository of data on coronavirus cases and deaths in the U.S.
Data and methodology for the Big Mac index
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive lea…
Below are some simple methods for exiting vim.
MusicBrainz Spotify integration hack for SF Music Hack Day 2014
A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems"
Python package + CLI to generate wordclouds of Twitter tweets.
Library to scrape and clean web pages to create massive datasets.
Reusable JavaScript library for creating sketchy/hand-drawn styled charts in the browser.
Datasets, tools, and benchmarks for representation learning of code.
Python library for Multi-Armed Bandits
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
Library of contextual bandits algorithms
A library of extension and helper modules for Python's data analysis and machine learning libraries.
Statistics for each published edition of Data Is Plural.
A game theoretic approach to explain the output of any machine learning model.
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Build Accelerated Mobile Page versions of your Jekyll posts
Populate a database with NBA shot data
Code for the paper "Language Models are Unsupervised Multitask Learners"