Skip to content
Python module to impute missing values using state-of-the-art machine learning algorithms.
HTML Jupyter Notebook Python
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs
impyte
tests
tools
tutorials
.travis.yml
README.md
documentation.html
requirements.txt
setup.py

README.md

impyte

Documentation Status Build Status

Python module to impute missing values by prediction using machine learning algorithms.

Documentation

A full documentation can be found on ReadTheDocs or in docs/_build/html/index.html. The symlink documentation.html in the root directory leads to this file.

For additional tutorials and usage scenarios please head over to tutorials where you'll find a static tutorial version as well as an interactive jupyter notebook.

Value Imputation

One essential problem for any person dealing with data is missing values. There are several possibilities to deal with missing information, ranging from dropping data points to estimating the value based on other values in that column (i.e. average or median values). A more recent method involves machine-learning algorithms. This module offers a lightweight Python solution to calculate missing information based on the underlying relationship between data points.

Requirements

Files

Below are the most important files and a quick one line summary:

  • docs/
    • _build/html/index.html - static documentation
  • impyte/
    • impyte.py - contains main classes
  • requirements.txt - requirements file, install dependencies with pip install -r requirements.txt
  • tests/
    • testing.ipynb - interactive testing notebook
    • testing.html - html version of jupyter notebook
    • test_impyte.py - automated pytest script
  • tools/ - contains scripts for development (i.e. fake data generation)
  • tutorials/
    • tutorials.ipynb - notebook with common tutorial tasks
    • tutorials.html - static html version of notebook

Functions

impyte focuses on two main goals:

  1. Easy to interpret visualization of missing patterns
  2. Easy imputation of missing values

Usage

df = pd.read_csv("missing_values.csv")
imp = impyte.Impyter(df)

# show nan-patterns of data in one data frame
imp.pattern() # shows nan-patterns

# imputation of all single-nans using random forest
imp.impute(estimator='rf')

# imputation of all nan-patterns
imp.impute(estimator='rf', multi_nans=True)

# use f1 and r2 thresholds
imp.impute(estimator='rf', threshold={"r2": .7, "f1_macro": .7})

Limits and Notes

The current version is a work in progress. If you discover any errors or bugs don't hesitate to reach out!

You can’t perform that action at this time.