Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions .github/workflows/api-docs.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,19 @@
---
name: API Docs
env:
DEFAULT_KHIOPS_PYTHON_TUTORIAL_REVISION: main
DEFAULT_KHIOPS_PYTHON_TUTORIAL_REVISION: 11.0.0.0-b.0
DEFAULT_KHIOPS_SAMPLES_REVISION: 11.0.0
on:
workflow_dispatch:
inputs:
khiops-python-tutorial-revision:
default: main
default: 11.0.0.0-b.0
description: khiops-python-tutorial repo revision
khiops-samples-revision:
default: 11.0.0
description: khiops-samples repo revision
image-tag:
default: 11.0.0-a.0.0
default: 11.0.0-b.0.0
description: Development Docker Image Tag
pull_request:
paths:
Expand Down Expand Up @@ -41,7 +45,7 @@ jobs:
# because the `env` context is only accessible at the step level;
# hence, it is hard-coded
image: |-
ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:${{ inputs.image-tag || '11.0.0-a.0.0' }}
ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:${{ inputs.image-tag || '11.0.0-b.0.0' }}
# Use the 'runner' user (1001) from github so checkout actions work properly
# https://github.com/actions/runner/issues/2033#issuecomment-1598547465
options: --user 1001
Expand All @@ -56,7 +60,7 @@ jobs:
run: |
# Install package itself to install the samples datasets
pip3 install .
kh-download-datasets --force-overwrite
kh-download-datasets --force-overwrite --version ${{ inputs.khiops-samples-revision || env.DEFAULT_KHIOPS_SAMPLES_REVISION }}
kh-status

# Install the doc python requirements
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ name: Conda Package
env:
# Note: The default Khiops version must never be an alpha release as they are
# ephemeral. To test alpha versions run the workflow manually.
DEFAULT_KHIOPS_CORE_VERSION: 11.0.0a.0
DEFAULT_KHIOPS_CORE_VERSION: 11.0.0b.0
DEFAULT_SAMPLES_VERSION: 11.0.0
on:
workflow_dispatch:
inputs:
khiops-core-version:
default: 11.0.0a.0
default: 11.0.0b.0
description: khiops-core version for testing
khiops-samples-version:
default: 11.0.0
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/dev-docker.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: Dev Docker
env:
DEFAULT_KHIOPS_REVISION: 11.0.0-a.0
DEFAULT_KHIOPS_REVISION: 11.0.0-b.0
DEFAULT_IMAGE_INCREMENT: 0
DEFAULT_SERVER_REVISION: main
DEFAULT_PYTHON_VERSIONS: 3.8 3.9 3.10 3.11 3.12 3.13
Expand All @@ -14,7 +14,7 @@ on:
inputs:
khiops-revision:
type: string
default: 11.0.0-a.0
default: 11.0.0-b.0
description: Khiops Revision
image-increment:
type: number
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/pip.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ on:
default: 11.0.0
description: khiops-samples repo revision
image-tag:
default: 11.0.0-a.0.0
default: 11.0.0-b.0.0
description: Development Docker Image Tag
pull_request:
paths:
Expand Down Expand Up @@ -64,7 +64,7 @@ jobs:
# because the `env` context is only accessible at the step level;
# hence, it is hard-coded
image: |-
ghcr.io/khiopsml/khiops-python/khiopspydev-${{ matrix.container }}:${{ inputs.image-tag || '11.0.0-a.0.0' }}
ghcr.io/khiopsml/khiops-python/khiopspydev-${{ matrix.container }}:${{ inputs.image-tag || '11.0.0-b.0.0' }}
steps:
- name: Set parameters as env
run: |
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
name: Tests
env:
DEFAULT_SAMPLES_REVISION: 11.0.0
DEFAULT_KHIOPS_DESKTOP_REVISION: 11.0.0-a.0
DEFAULT_KHIOPS_DESKTOP_REVISION: 11.0.0-b.0
on:
workflow_dispatch:
inputs:
samples-revision:
default: 11.0.0
description: Git Tag/Branch/Commit for the khiops-samples Repo
image-tag:
default: 11.0.0-a.0.0
default: 11.0.0-b.0.0
description: Development Docker Image Tag
khiops-desktop-revision:
default: 11.0.0-a.0
default: 11.0.0-b.0
description: Khiops Windows Desktop Application Version
run-expensive-tests:
type: boolean
Expand Down Expand Up @@ -43,7 +43,7 @@ jobs:
# because the `env` context is only accessible at the step level;
# hence, it is hard-coded
image: |-
ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:${{ inputs.image-tag || '11.0.0-a.0.0' }}
ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:${{ inputs.image-tag || '11.0.0-b.0.0' }}
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
Expand Down Expand Up @@ -315,7 +315,7 @@ jobs:
# because the `env` context is only accessible at the step level;
# hence, it is hard-coded
image: |-
ghcr.io/khiopsml/khiops-python/khiopspydev-${{ matrix.container }}:${{ inputs.image-tag || '11.0.0-a.0.0' }}
ghcr.io/khiopsml/khiops-python/khiopspydev-${{ matrix.container }}:${{ inputs.image-tag || '11.0.0-b.0.0' }}
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
Expand Down
64 changes: 63 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,73 @@
- Example: 10.2.1.4 is the 5th version that supports khiops 10.2.1.
- Internals: Changes in *Internals* sections are unlikely to be of interest for data scientists.

## Unreleased
## 11.0.0.0-b.0 - 2025-07-10

### Added
- (`core`) API support for predictor interpretation and reinforcement.
- (`core`) API support for instance-variable coclustering model training.
- (`core`) Support for text types in prediction and coclustering models.
- (`core`) Analysis and coclustering report JSON serialization support.
- (`sklearn`) Automatic removal of newline characters in strings on Pandas
dataframe columns. This is to ensure the proper working of the Khiops engine.

### Changed
- (`core`) Syntax for additional data tables specification, which uses the data
paths.
- (`core`) API specification of the results path: full paths to report files are
now used instead of result directories.
- (`sklearn`) Specification of the hierarchical multi-table schemata, which now
uses data paths as in the Core API.
- (`general`) Various other changes and updates for Khiops 11.0.0-b.0
compatibility.

### Deprecated
- (`core`) The results directory parameter of the Core API functions. The full
path to the reports must now be specified instead.
- (`core`) The "``"-based secondary table path specification. The "/"-based data
paths must now be used instead.
- (`sklearn`) The specification syntax for hierarchical multi-table datasets.
The "/"-based data paths must now be used instead, as in the Core API.

### Removed
- (`general`) All functions, attributes and features that had been deprecated in
the 10.3.2.0 version.

## 10.3.2.0 - 2025-07-03

### Fixed
- (`sklearn`) Documentation display for the `train_test_split_dataset` sklearn
helper function.

## 10.3.1.0 - 2025-04-16

### Added
- (`sklearn`) Support for boolean and float targets in `KhiopsClassifier`.

### Fixed
- (`sklearn`) Crash when there were no informative trees in predictors.

### Deprecated
- (`core`) The `build_multi_table_dictionary_domain` helper function.

## 10.3.0.0 - 2025-02-10

### Fixed
- (`core`) Dictionary file `.json` extension check in the `khiops.dictionary.read_dictionary_file`
function.

### Changed
- (`sklearn`) The `train_test_split_dataset` helper has been moved from `khiops.utils` to
`khiops.sklearn`.
- (`sklearn`) The `transform_pairs` parameter of the `KhiopsEncoder` sklearn estimator has been
renamed to `transform_type_pairs`.

### Removed
- (`sklearn`) The `is_fitted_` estimator attribute. The Scikit-learn `check_is_fitted` function
can be used to test the fitted state of the estimators.
- (`sklearn`) The `n_pairs` parameter of the `KhiopsRegressor` sklearn estimator. It was never
supported.

## 10.2.4.0 - 2024-12-19

### Added
Expand Down
65 changes: 24 additions & 41 deletions doc/multi_table_primer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,40 +76,31 @@ feature object ``X``. Specifically, instead of a `pandas.DataFrame`, ``X`` must
specifies the dataset schema in the following way::

X = {
"main_table": <name of the main table>,
"tables" : {
<name of the main table>: (<dataframe of the main table>, <key of the main table>),
<name of table 1>: (<dataframe of table 1>, <key of table 1>),
<name of table 2>: (<dataframe of table 2>, <key of table 2>),
"main_table": (<dataframe of the main table>, <key of the main table>),
"additional_data_tables" : {
<data path to table 1>: (
<dataframe of table 1>, [<key of table 1>], <optional entity flag>
),
<data path to table 2>: (
<dataframe of table 2>, [<key of table 2>], <optional entity flag>
),
...
}
"relations" : [
(<name of the main table>, <name of a different table>, <entity flag>),
(<name of another table>, <name of yet another table>, <entity flag>),
...
],
}

The three fields of this dictionary are:

- ``main_table``: The name of the main table.
- ``tables``: A dictionary indexed by the tables' names. Each table is associated to a 2-tuple
containing the following fields:
- ``main_table``: a 2-tuple containing the following fields:
- The `pandas.DataFrame` object of the main table.
- The key columns' names: A list of strings.
.
- ``additional_data_tables``: A dictionary indexed by the data paths to the secondary
tables. Each data path is associated to a 2-tuple containing the following fields:

- The `pandas.DataFrame` object of the table.
- The key columns' names : Either a list of strings or a single string.

- ``relations``: An optional field containing a list of tuples describing the relations between
tables. The first two values (Strings) of each tuple correspond to names of both the parent and the child table
involved in the relation. A third value (Boolean) can be optionally added to the tuple to indicate if the relation is
either ``1:n`` or ``1:1`` (entity). For example, If the tuple ``(table1, table2, True)`` is contained in this
field, it means that:

- ``table1`` and ``table2`` are in a ``1:1`` relationship
- The key of ``table1`` is contained in that of ``table2`` (ie. keys are hierarchical)

If the ``relations`` field is not present then Khiops Python assumes that the tables are in a *star*
schema.
- The `pandas.DataFrame` object of the secondary table.
- The key columns' names : A list of strings.
- optionally, a flag which indicates if the secondary table is in
a ``1:1`` relationship to its parent table.

.. note::

Expand Down Expand Up @@ -138,9 +129,8 @@ We build the input ``X`` as follows::
accidents_df = pd.read_csv(f"{kh.get_samples_dir()}/AccidentsSummary/Accidents.txt", sep="\t")
vehicles_df = pd.read_csv(f"{kh.get_samples_dir()}/AccidentsSummary/Vehicles.txt", sep="\t")
X = {
"main_table" : "Accident",
"tables": {
"Accident": (accidents_df.drop("Gravity", axis=1), "AccidentId"),
"main_table" : (accidents_df.drop("Gravity", axis=1), ["AccidentId"]),
"additional_data_tables": {
"Vehicle": (vehicles_df, ["AccidentId", "VehicleId"])
}
}
Expand Down Expand Up @@ -170,19 +160,12 @@ We build the input ``X`` as follows::
places_df = pd.read_csv(f"{kh.get_samples_dir()}/Accidents/Places.txt", sep="\t")

X = {
"main_table": "Accidents",
"tables": {
"Accidents": (accidents_df.drop("Gravity", axis=1), "AccidentId"),
"main_table": (accidents_df.drop("Gravity", axis=1), ["AccidentId"]),
"additional_data_tables": {
"Vehicles": (vehicles_df, ["AccidentId", "VehicleId"]),
"Users": (users_df, ["AccidentId", "VehicleId"]),
"Places": (places_df, "AccidentId"),

"Vehicles/Users": (users_df, ["AccidentId", "VehicleId"]),
"Places": (places_df, ["AccidentId"], True),
},
"relations": [
("Accidents", "Vehicles"),
("Vehicles", "Users"),
("Accidents", "Places", True),
],
}

Both datasets can be found in the Khiops samples directory.
Expand Down
12 changes: 12 additions & 0 deletions khiops/sklearn/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,18 @@ def _check_multitable_spec(ds_spec):


def table_name_of_path(table_path):
"""Returns the table name as the last fragment of the table data path

Parameters
----------
table_path: str
Data path of the table, in the format "path/to/table".

Returns
-------
str
The name of the table.
"""
return table_path.split("/")[-1]


Expand Down
2 changes: 1 addition & 1 deletion packaging/conda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ requirements:
- python
run:
- python
- conda-forge/label/rc::khiops-core =11.0.0a.0
- khiops-core =11.0.0b.0
- pandas >=0.25.3
- scikit-learn >=0.22.2
run_constrained:
Expand Down
Loading