missingdata

missing data visualization and imputation

Goals

To provide an easy to use yet thorough assessment of missing values in one's dataset:

in addition to the blackholes plot bellow,

show the variable-to-variable, subject-to-subject co-missingness, and

quantify the TYPE of missingness etc

Note

To easily manage your data with missing values etc, I strongly recommend you to move away from CSV files and start managing your data in self-contained flexible data structures like pyradigm, as your data, as well your needs, will only get bigger & more complicated e.g. with mixed-types, missing values and large number of groups.

These would be great contributions if you have time.

Features

visualization
imputation (coming!)
other handling

blackholes plot

State

Software is beta and under dev. Update regularly and quite often!!

Contributions most welcome, esp. reporting bugs and improving usability.

Installation

pip install -U missingdata

We encourage you to update quite often, when you run into any issues.

Usage

Take a look at the help text first before diving in to use it - with the following code:

from missingdata import blackholes
help(blackholes)

I encourage you to read the text for each parameter carefully to understand the behaviour of this plotting mechanism.

Note

If you don't see any labels (for rows or columns), when you try the blackholes plot for the first time, it may be because the total effective number of rows/cols being displayed, after applying filter_spec_*, exceeded a preset number (60/80) and we removed the labels to avoid them getting occluded or becoming illegible. You can use the parameter freq_thresh_show_labels to bring the effective number of rows/cols down to display to a smaller number, or pass show_all_labels=True to force the display of labels. If number of subjects or variables is large, you may want to increase figsize (width or height), to minimize occlusion and improve label readability.

Also, the defaults chosen may not work for you, hence I strongly encourage you to control as many parameters as needed to customize the plot to your liking. If a feature you need is not served currently, send a PR with improvements, or open an issue. Thanks.

Let's say you have all the data in a pandas DataFrame, where subject IDs are in a 'sub_ids' column and variable names are in a 'var_names' column, and they belong to groups identified by sub_class and var_group, you can use the following code produce the blackholes plot:

blackholes(data_frame,
           label_rows_with='sub_ids', label_cols_with='var_names',
           group_rows_by=sub_class, group_cols_by=var_group)

If you were interested in seeing subjects/variables with least amount of missing data, you can control miss perc window with filter_spec_samples and/or filter_spec_variables by passing a tuple of two floats e.g. (0, 0.1) which will filter away those with more than 10% of missing data.

blackholes(data_frame,
           label_rows_with='sub_ids', label_cols_with='var_names',
           filter_spec_samples=(0, 0.1))

The other parameters for the function are self-explanatory.

Please open an issue if you find something confusing, or have feedback to improve, or identify a bug. Thanks.

Citation

If you find this package useful, I'd greatly appreciate if cite this package via:

Pradeep Reddy Raamana, (2019), "missingdata python library for visualization and handling of missing values" (Version v0.1). Zenodo. http://doi.org/10.5281/zenodo.3352336 DOI: 10.5281/zenodo.3352336

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.github		.github
.idea		.idea
datasets/OpenMV		datasets/OpenMV
docs		docs
examples		examples
missingdata		missingdata
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
cmd2pkg		cmd2pkg
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
versioneer.py		versioneer.py

License

raamana/missingdata

Folders and files

Latest commit

History

Repository files navigation

missingdata

Goals

Note

Features

blackholes plot

State

Installation

Usage

Citation

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages