peperoncino: A library for easy data processing for pandas

Install

$ pip install peperoncino

How to use

Processing DataFrame

import peperoncino as pp

pipeline = pp.Pipeline(
    # query data
    pp.Query("bar <= 3"),
    # assign new feature
    pp.Assign(hoge="foo * bar"),
    # generate combination feature
    pp.Combinations(["foo", "baz"], ["*", "/"]),
    # target encoding
    pp.TargetEncoding(["baz"], "y", ref=0),
    # select features
    pp.Select(
        ["hoge", "*_foo_baz", "TARGET_ENC_baz_BY_y", "y"],
        lackable_cols=["y"],
    )
)

# execute the processing
train_df, val_df, test_df = \
    pipeline.process([train_df, val_df, test_df])

Predefined processings

name	description
`ApplyColumn`	Apply a function to a column.
`AsCategory`	Assign `category` dtype to columns.
`Assign`	Assign a feature by a formula.
`Combinations`	Create combination features.
`DropColumns`	Drop columns.
`DropDuplicates`	Drop duplicate rows.
`Pipeline`	Chain processings.
`Query`	Query rows by a given condition.
`RenameCOlumns`	Rename columns.
`Select`	Select columns.
`StatsEncoding`	Encode columns by statistical values of another column.
`TargetEncoding`	Target Encoding with smoothing.

Define processing

All processings are subclass of pp.BaseProcessing.
All you need is define the _process(self, dfs: List[pd.DataFrame]) -> List[pd.DataFrame] function.

class ExampleProcessing(pp.BaseProcessing):
    def _process(self, dfs: List[pd.DataFrame]) -> List[pd.DataFrame]:
        return [df + 1 for df in dfs]

If your processing doesn't depent on each other data frames, then use pp.SeparatedProcessing.

class ExampleProcessing(pp.SeparatedProcessing):
    def sep_process(self, df: pd.DataFrame) -> pd.DataFrame:
        return df * 2

If you need to merge all dataframes and then apply your processing, use pp.MergedProcessing.

class ExampleProcessing(pp.SeparatedProcessing):
    def simul_process(self, df: pd.DataFrame) -> pd.DataFrame:
        return df.assign(col1_mean=df['col1'].mean())

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
peperoncino		peperoncino
tests		tests
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

peperoncino: A library for easy data processing for pandas

Install

How to use

Processing DataFrame

Predefined processings

Define processing

About

Releases

Packages

Languages

cafeal/peperoncino

Folders and files

Latest commit

History

Repository files navigation

peperoncino: A library for easy data processing for pandas

Install

How to use

Processing DataFrame

Predefined processings

Define processing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages