Skip to content

Commit

Permalink
add: deep transformer module (#24)
Browse files Browse the repository at this point in the history
  • Loading branch information
chrislemke committed Jan 4, 2023
1 parent 0306fc5 commit fd266cf
Show file tree
Hide file tree
Showing 27 changed files with 3,144 additions and 268 deletions.
11 changes: 9 additions & 2 deletions .github/workflows/build-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,16 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- name: Install poetry
run: |
curl -O -sSL https://install.python-poetry.org/install-poetry.py
python install-poetry.py -y --version 1.3.1
echo "PATH=${HOME}/.poetry/bin:${PATH}" >> $GITHUB_ENV
rm install-poetry.py
- uses: actions/setup-python@v4
with:
python-version: "3.10"
- run: pip install poetry==1.3.1
cache: "poetry"
- run: poetry env use python3.10
- run: poetry install --with docs
- run: poetry run mkdocs gh-deploy --force --clean --verbose
11 changes: 9 additions & 2 deletions .github/workflows/code-cov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,20 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install poetry
run: |
curl -O -sSL https://install.python-poetry.org/install-poetry.py
python install-poetry.py -y --version 1.3.1
echo "PATH=${HOME}/.poetry/bin:${PATH}" >> $GITHUB_ENV
rm install-poetry.py
- name: Set up Python 3.10
uses: actions/setup-python@v2
- uses: actions/setup-python@v4
with:
python-version: "3.10"
cache: "poetry"
- name: Install dependencies and project
run: |
python -m pip install poetry==1.3.1
poetry env use python3.10
poetry install --with test
- name: Run tests and collect coverage
run: poetry run pytest --cov src/sk_transformers --cov-report term-missing --cov-report xml
Expand Down
11 changes: 9 additions & 2 deletions .github/workflows/deploy-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,16 @@ jobs:
environment: deploy-package
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- name: Install poetry
run: |
curl -O -sSL https://install.python-poetry.org/install-poetry.py
python install-poetry.py -y --version 1.3.1
echo "PATH=${HOME}/.poetry/bin:${PATH}" >> $GITHUB_ENV
rm install-poetry.py
- uses: actions/setup-python@v4
with:
python-version: "3.10"
- run: pip install poetry==1.3.1
cache: "poetry"
- run: poetry env use python3.10
- run: poetry config pypi-token.pypi ${{ secrets.PYPI_TOKEN }}
- run: poetry publish --build
4 changes: 2 additions & 2 deletions .github/workflows/pr-title.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
steps:
- uses: deepakputhraya/action-pr-title@master
with:
regex: "^(build:|ci:|docs:|feat:|fix:|perf:|refactor:|revert:|style:|test:|security:).{12,30}$"
regex: "^(add:|build:|ci:|docs:|feat:|fix:|bug:|perf:|refactor:|revert:|style:|test:|security:).{12,60}$"
min_length: 10
max_length: 20
max_length: 60
github_token: "${{ secrets.GITHUB_TOKEN }}"
28 changes: 28 additions & 0 deletions .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: PreCommit

on:
pull_request:
branches: [develop]

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install poetry
run: |
curl -O -sSL https://install.python-poetry.org/install-poetry.py
python install-poetry.py -y --version 1.3.1
echo "PATH=${HOME}/.poetry/bin:${PATH}" >> $GITHUB_ENV
rm install-poetry.py
- uses: actions/setup-python@v4
with:
python-version: "3.10"
cache: "poetry"
- name: Install dependencies and project
run: |
poetry env use python3.10
poetry install
- uses: pre-commit/action@v3.0.0
env:
SKIP: poetry-lock,poetry-export
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ jobs:
prerelease: true
default-branch: main
pull-request-header: ":robot: I have created a release *beep* *boop*. This was predictable."
changelog-types: '[{"type":"add","section":"Features","hidden":false},{"type":"feat","section":"Features","hidden":false},{"type":"fix","section":"Bug Fixes","hidden":false},{"type":"chore","section":"Miscellaneous","hidden":true},{"type":"test","section":"Tests","hidden":false},{"type":"ci","section":"CI/CD","hidden":false},{"type":"refactor","section":"Maintenance","hidden":false},{"type":"perf","section":"Maintenance","hidden":false},{"type":"revert","section":"Maintenance","hidden":false},{"type":"docs","section":"Documentation","hidden":false},{"type":"security","section":"Security","hidden":false}]'
changelog-types: '[{"type":"merge","section":"Miscellaneous","hidden":true},{"type":"resolve","section":"Miscellaneous","hidden":true},{"type":"add","section":"Features","hidden":false},{"type":"feat","section":"Features","hidden":false},{"type":"fix","section":"Bug Fixes","hidden":false},{"type":"bug","section":"Bug Fixes","hidden":false},{"type":"chore","section":"Miscellaneous","hidden":true},{"type":"test","section":"Tests","hidden":false},{"type":"ci","section":"CI/CD","hidden":false},{"type":"refactor","section":"Maintenance","hidden":false},{"type":"perf","section":"Maintenance","hidden":false},{"type":"revert","section":"Maintenance","hidden":false},{"type":"docs","section":"Documentation","hidden":false},{"type":"security","section":"Security","hidden":false}]'
15 changes: 13 additions & 2 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,28 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
python-version: ["3.8", "3.9", "3.10"]

steps:
- uses: actions/checkout@v3
- name: Install poetry
run: |
curl -O -sSL https://install.python-poetry.org/install-poetry.py
python install-poetry.py -y --version 1.3.1
echo "PATH=${HOME}/.poetry/bin:${PATH}" >> $GITHUB_ENV
rm install-poetry.py
- uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: "poetry"
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: "poetry"
- name: Install dependencies and project
run: |
python -m pip install poetry==1.3.1
poetry env use ${{ matrix.python-version }}
poetry install --with test
- name: Check with isort
run: |
Expand Down
9 changes: 7 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
default_language_version:
python: python3.10

repos:
- repo: https://github.com/compilerla/conventional-pre-commit
rev: v2.1.1
Expand All @@ -16,10 +19,12 @@ repos:
"test",
"security",
"perf",
"resolve",
"merge",
]

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
Expand Down Expand Up @@ -75,7 +80,7 @@ repos:
- "-r"

- repo: https://github.com/python-poetry/poetry
rev: 1.3.1
rev: 1.3.0
hooks:
- id: poetry-check
- id: poetry-lock
Expand Down
10 changes: 10 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
@@ -0,0 +1,9 @@
# Security Policy

## Supported Versions

Patches will be released to the latest major version.

## Reporting a Vulnerability

Please report (suspected) security vulnerabilities to [chris@syhbl.mozmail.com](mailto:chris@syhbl.mozmail.com). If the issue is confirmed, we will release a patch as soon as possible depending on the complexity.
1 change: 1 addition & 0 deletions docs/API-reference/transformer/deep_transformer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: sk_transformers.deep_transformer
22 changes: 22 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,28 @@ With [Poetry](https://python-poetry.org/):
poetry install
```

## Available transformers
| Module | Transformer | Description |
| ------ | ----------- | ----------- |
|[`Datetime transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/datetime_transformer/)|[`DurationCalculatorTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/datetime_transformer/#sk_transformers.datetime_transformer.DurationCalculatorTransformer)|Calculates the duration between to given dates.|
|[`Deep transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/deep_transformer/)|[`ToVecTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/deep_transformer/#sk_transformers.deep_transformer.ToVecTransformer)|This transformer trains an [FT-Transformer](https://paperswithcode.com/method/ft-transformer) using the [pytorch-widedeep package](https://github.com/jrzaurin/pytorch-widedeep) and extracts the embeddings from its embedding layer.|
|[`Encoder transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/encoder_transformer/)|[`MeanEncoderTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/encoder_transformer/#sk_transformers.encoder_transformer.MeanEncoderTransformer)|Scikit-learn API for the [feature-engine MeanEncoder](https://feature-engine.readthedocs.io/en/latest/api_doc/encoding/MeanEncoder.html).|
|[`Generic transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/)|[`AggregateTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.AggregateTransformer)|This transformer uses Pandas groupby method and aggregate to apply function on a column grouped by another column.|
|[`Generic transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/)|[`ColumnDropperTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.ColumnDropperTransformer)|Drops columns from a dataframe using Pandas drop method.|
|[`Generic transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/)|[`DtypeTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.DtypeTransformer)|Transformer that converts a column to a different dtype.|
|[`Generic transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/)|[`FunctionsTransformer`]( https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.FunctionsTransformer)|This transformer is a plain wrapper around the [sklearn.preprocessing.FunctionTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html).|
|[`Generic transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/)|[`MapTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.MapTransformer)|This transformer iterates over all columns in the `features` list and applies the given callback to the column. For this it uses the `pandas.Series.map` method.
|[`Generic transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/)|[`NaNTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.NaNTransformer)|Replace NaN values with a specified value. Internally Pandas fillna method is used.|
|[`Generic transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/)|[`QueryTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.QueryTransformer)|Applies a list of queries to a dataframe. If it operates on a dataset used for supervised learning this transformer should be applied on the dataframe containing `X` and `y`.
|[`Generic transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/)|[`ValueIndicatorTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.ValueIndicatorTransformer)|Adds a column to a dataframe indicating if a value is equal to a specified value.|
|[`Generic transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/)|[`ValueReplacerTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.ValueReplacerTransformer)|Uses Pandas replace method to replace values in a column.|
[`Number transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/number_transformer/)|[`MathExpressionTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/number_transformer/#sk_transformers.number_transformer.MathExpressionTransformer)|Applies an operation to a column and a given value or column. The operation can be any operation from the `numpy` or `operator` package.
[`String transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/)|[`EmailTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.EmailTransformer)|Transforms an email address into multiple features.|
[`String transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/)|[`IPAddressEncoderTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.IPAddressEncoderTransformer)|Encodes IPv4 and IPv6 strings addresses to a float representation.|
[`String transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/)|[`PhoneTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.PhoneTransformer)|Transforms a phone number into multiple features.|
[`String transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/)|[`StringSimilarityTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.StringSimilarityTransformer)|Calculates the similarity between two strings using the `gestalt pattern matching` algorithm from the `SequenceMatcher` class.|
[`String transformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/)|[`StringSlicerTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.StringSlicerTransformer)|Slices all entries of specified string features using the slice() function.|

## Usage
Let's assume you want to use some method from [NumPy's mathematical functions, to sum up the values of column `foo` and column `bar`. You could
use the [`MathExpressionTransformer`](https://chrislemke.github.io/sk-transformers/number_transformer-reference/#sk-transformers.transformer.number_transformer.MathExpressionTransformer).
Expand Down
Binary file added docs/assets/images/robot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/stylesheets/extra.css
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
:root {
--md-primary-fg-color: #fc7805;
--md-primary-fg-color: #5b90b9;
--md-primary-fg-color--light: #F501FE;
--md-primary-fg-color--dark: #3820AA;
}
6 changes: 3 additions & 3 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ theme:
repo: fontawesome/brands/github
name: material
language: en
favicon: assets/images/icon.png
logo: assets/images/icon.png
favicon: assets/images/robot.png
logo: assets/images/robot.png
features:
- navigation.tracking
- navigation.top
Expand All @@ -52,7 +52,7 @@ plugins:
- mkdocstrings:
handlers:
python:
rendering:
options:
show_root_heading: false

extra_javascript:
Expand Down
Loading

0 comments on commit fd266cf

Please sign in to comment.