InstaEDA

Quick and easy way to clean data and build exploratory data analysis plots.

This idea came up as we have been building data projects for quite some time now in the UBC MDS program. We noticed that there are some repetitive activities that occur when we first explore the data. This project will help you take a given raw data set an conduct some data cleansing and plotting with a minimal amount of code.

At present, this package currently supports pandas DataFrame as inputs. In the future, we may assess other input types to support. Most of the functions will support numerical data types in the DataFrame; however, some function(s) will also support Strings (type: object) columns.

The main components of this package are:

Data Checking
- Plot basic information for input data: Take the input data and declare the title of the plot and a list of configurations to be passed to themes to invisibly return the Altair object with summary metrics including the memory usage, the basic description of the input data such as the distribution of the discrete columns, continuous columns, all missing columns, complete rows and missing observations.
Data Cleansing
- Custom Imputation of missing values in a data frame. This includes filling the missing values with the mean, median, constant, and most_frequent. There is also a feature to divide and fill where it splits the DataFrame into parts, then applies the custom imputation on each group. Lastly, there is the parameter to randomly shuffle the dataframe.
Exploratory Visualization
- Numerical Correlation Plot: takes in a data frame, selects the numerical columns and outputs a correlation plot object. User can optionally pass in subset of columns to define which columns to compare.
- Plot Basic Distribution Plot by datatype: Pass in data frame and based on parameters, will return histograms, bar charts, or other chart types depending on the column's datatype.

There are a myriad of packages that provide similar functionality in the Python ecosystem. A few of the more popular packages include:

Installation

$ pip install -i https://test.pypi.org/simple/ instaeda

Dependencies

python = "^3.8"
pandas = "^1.2.3"
palmerpenguins = "^0.1.4"
altair = "^4.1.0"
numpy = "^1.20.1"
vega-datasets = "^0.9.0"
scikit-learn = "^0.24.1"

Usage

from palmerpenguins import load_penguins
from instaeda import instaeda

penguin_df = load_penguins()

#plot_intro
instaeda.plot_intro(penguin_df)

#plot_corr
instaeda.plot_corr(penguin_df)

#divide_and_fill
instaeda.divide_and_fill(penguin_df)

#plot_basic_distributions
dict_plots = instaeda.plot_basic_distributions(penguin_df)
dict_plots['bill_length_mm']   
dict_plots['species']

Documentation

The official documentation is hosted on Read the Docs: https://instaeda.readthedocs.io/en/latest/

Contributors

We welcome and recognize all contributions. You can see a list of current contributors at the bottom of CONTRIBUTING.rst

Credits

This package was created with Cookiecutter and the UBC-MDS/cookiecutter-ubc-mds project template, modified from the pyOpenSci/cookiecutter-pyopensci project template and the audreyr/cookiecutter-pypackage.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github/workflows		.github/workflows
docs		docs
explore		explore
instaeda		instaeda
tests		tests
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CONDUCT.rst		CONDUCT.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InstaEDA

Installation

Dependencies

Usage

Documentation

Contributors

Credits

About

Uh oh!

Releases 6

Packages

Contributors 5

Uh oh!

Languages

License

UBC-MDS/instaeda_py

Folders and files

Latest commit

History

Repository files navigation

InstaEDA

Installation

Dependencies

Usage

Documentation

Contributors

Credits

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 5

Uh oh!

Languages

Packages