Skip to content

Commit

Permalink
Merge pull request #65 from ealcobaca/minor-updates
Browse files Browse the repository at this point in the history
Minor updates
  • Loading branch information
FelSiq committed Dec 17, 2019
2 parents 804fd69 + bd2564a commit 76ebbfe
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 16 deletions.
26 changes: 16 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,20 @@ Extracts meta-features from datasets to support the design of recommendation sys

## Measures

In MtL, meta-features are designed to extract general properties able to characterize datasets. The meta-feature values should provide relevant evidences about the performance of algorithms, allowing the design of MtL-based recommendation systems. Thus, these measures must be able to predict, with a low computational cost, the performance of the algorithms under evaluation. In this package, the meta-feature measures are divided into six groups:
In MtL, meta-features are designed to extract general properties able to characterize datasets. The meta-feature values should provide relevant evidences about the performance of algorithms, allowing the design of MtL-based recommendation systems. Thus, these measures must be able to predict, with a low computational cost, the performance of the algorithms under evaluation. In this package, the meta-feature measures are divided into 11 groups:

* **General**: General information related to the dataset, also known as simple measures, such as number of instances, attributes and classes.
* **Statistical**: Standard statistical measures to describe the numerical properties of a distribution of data.
* **Information-theoretic**: Particularly appropriate to describe discrete (categorical) attributes and their relationship with the classes.
* **Model-based**: Measures designed to extract characteristics like the depth, the shape and size of a Decision Tree (DT) model induced from a dataset.
* **Landmarking**: Represents the performance of simple and efficient learning algorithms. Include the subsampling and relative strategies to decrease the computation cost and enrich the relations between these meta-features (relative and subsampling landmarking are also available).
* **Clustering:** Clustering measures extract information about dataset based on external validation indexes.

- **General**: General information related to the dataset, also known as simple measures, such as the number of instances, attributes and classes.
- **Statistical**: Standard statistical measures to describe the numerical properties of data distribution.
- **Information-theoretic**: Particularly appropriate to describe discrete (categorical) attributes and their relationship with the classes.
- **Model-based**: Measures designed to extract characteristics from simple machine learning models.
- **Landmarking**: Performance of simple and efficient learning algorithms.
- **Relative Landmarking**: Relative performance of simple and efficient learning algorithms.
- **Subsampling Landmarking**: Performance of simple and efficient learning algorithms from a subsample of the dataset.
- **Clustering**: Clustering measures extract information about dataset based on external validation indexes.
- **Concept**: Estimate the variability of class labels among examples and the examples density.
- **Itemset**: Compute the correlation between binary attributes.
- **Complexity**: Estimate the difficulty in separating the data points into their expected classes.

## Dependencies

Expand Down Expand Up @@ -59,7 +65,7 @@ data = load_iris()
y = data.target
X = data.data

# Extract all measures
# Extract default measures
mfe = MFE()
mfe.fit(X, y)
ft = mfe.extract()
Expand All @@ -75,13 +81,13 @@ print(ft)
Several measures return more than one value. To aggregate the returned values, summarization function can be used. This method can compute `min`, `max`, `mean`, `median`, `kurtosis`, `standard deviation`, among others. The default methods are the `mean` and the `sd`. Next, it is possible to see an example of the use of this method:

```python
## Extract all measures using min, median and max
## Extract default measures using min, median and max
mfe = MFE(summary=["min", "median", "max"])
mfe.fit(X, y)
ft = mfe.extract()
print(ft)

## Extract all measures using quantile
## Extract default measures using quantile
mfe = MFE(summary=["quantiles"])
mfe.fit(X, y)
ft = mfe.extract()
Expand Down
16 changes: 12 additions & 4 deletions examples/01_introductory_examples/plot_pymfe_default.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
#
# The standard way to extract meta-features is using the MFE class.
# The parameters are the dataset and the group of measures to be extracted.
# By default, the method extract all the measures. For instance:
# By default, the method extracts general, info-theory, statistical,
# model-based and landmarking measures. For instance:

from sklearn.datasets import load_iris
from pymfe.mfe import MFE
Expand All @@ -23,7 +24,7 @@
X = data.data

###############################################################################
# Extracting all measures
# Extracting default measures
mfe = MFE()
mfe.fit(X, y)
ft = mfe.extract()
Expand All @@ -37,6 +38,13 @@
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))


###############################################################################
# Extracting all measures
mfe = MFE(groups="all")
mfe.fit(X, y)
ft = mfe.extract()
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))

###############################################################################
# Changing summarization function
# -------------------------------
Expand All @@ -48,14 +56,14 @@
#

###############################################################################
# Compute all measures using min, median and max
# Compute default measures using min, median and max
mfe = MFE(summary=["min", "median", "max"])
mfe.fit(X, y)
ft = mfe.extract()
print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))

###############################################################################
# Compute all measures using quantile
# Compute default measures using quantile
mfe = MFE(summary=["quantiles"])
mfe.fit(X, y)
ft = mfe.extract()
Expand Down
2 changes: 1 addition & 1 deletion examples/README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Measures

In MtL, meta-features are designed to extract general properties able to characterize datasets. The meta-feature values should provide relevant evidences about the performance of algorithms, allowing the design of MtL-based recommendation systems. Thus, these measures must be able to predict, with a low computational cost, the performance of the algorithms under evaluation. In this package, the meta-feature measures are divided into 11 groups:

- **Simple**: General information related to the dataset, also known as simple measures, such as the number of instances, attributes and classes.
- **General**: General information related to the dataset, also known as simple measures, such as the number of instances, attributes and classes.
- **Statistical**: Standard statistical measures to describe the numerical properties of data distribution.
- **Information-theoretic**: Particularly appropriate to describe discrete (categorical) attributes and their relationship with the classes.
- **Model-based**: Measures designed to extract characteristics from simple machine learning models.
Expand Down
8 changes: 7 additions & 1 deletion pymfe/mfe.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ class MFE:
groups_alias = [('default', _internal.DEFAULT_GROUP)]

def __init__(self,
groups: t.Union[str, t.Iterable[str]] = "all",
groups: t.Union[str, t.Iterable[str]] = "default",
features: t.Union[str, t.Iterable[str]] = "all",
summary: t.Union[str, t.Iterable[str]] = ("mean", "sd"),
measure_time: t.Optional[str] = None,
Expand All @@ -65,6 +65,12 @@ def __init__(self,
desired group of metafeatures for extraction. Use the method
``valid_groups`` to get a list of all available groups.
Setting with ``all`` enables all available groups.
Setting with ``default`` enables ``general``, ``info-theory``,
``statistical``, ``model-based`` and ``landmarking``. It is the
default value.
The value provided by the argument ``wildcard`` can be used to
select all metafeature groups rapidly.
Expand Down

0 comments on commit 76ebbfe

Please sign in to comment.