Merge pull request #65 from ealcobaca/minor-updates

Minor updates
ealcobaca · Dec 17, 2019 · 76ebbfe · 76ebbfe
2 parents 804fd69 + bd2564a
commit 76ebbfe
Show file tree

Hide file tree

Showing 4 changed files with 36 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -10,14 +10,20 @@ Extracts meta-features from datasets to support the design of recommendation sys
 
 ## Measures
 
-In MtL, meta-features are designed to extract general properties able to characterize datasets. The meta-feature values should provide relevant evidences about the performance of algorithms, allowing the design of MtL-based recommendation systems. Thus, these measures must be able to predict, with a low computational cost, the performance of the  algorithms under evaluation. In this package, the meta-feature measures are divided into six groups:
+In MtL, meta-features are designed to extract general properties able to characterize datasets. The meta-feature values should provide relevant evidences about the performance of algorithms, allowing the design of MtL-based recommendation systems. Thus, these measures must be able to predict, with a low computational cost, the performance of the  algorithms under evaluation. In this package, the meta-feature measures are divided into 11 groups:
 
-* **General**: General information related to the dataset, also known as simple measures, such as number of instances, attributes and classes.
-* **Statistical**: Standard statistical measures to describe the numerical properties of a distribution of data.
-* **Information-theoretic**: Particularly appropriate to describe discrete (categorical) attributes and their relationship with the classes.
-* **Model-based**: Measures designed to extract characteristics like the depth, the shape and size of a Decision Tree (DT) model induced from a dataset.
-* **Landmarking**: Represents the performance of simple and efficient learning algorithms. Include the subsampling and relative strategies to decrease the computation cost and enrich the relations between these meta-features (relative and subsampling landmarking are also available).
-* **Clustering:** Clustering measures extract information about dataset based on external validation indexes.
+
+- **General**: General information related to the dataset, also known as simple measures, such as the number of instances, attributes and classes.
+- **Statistical**: Standard statistical measures to describe the numerical properties of data distribution.
+- **Information-theoretic**: Particularly appropriate to describe discrete (categorical) attributes and their relationship with the classes.
+- **Model-based**: Measures designed to extract characteristics from simple machine learning models.
+- **Landmarking**: Performance of simple and efficient learning algorithms.
+- **Relative Landmarking**: Relative performance of simple and efficient learning algorithms.
+- **Subsampling Landmarking**: Performance of simple and efficient learning algorithms from a subsample of the dataset.
+- **Clustering**: Clustering measures extract information about dataset based on external validation indexes.
+- **Concept**: Estimate the variability of class labels among examples and the examples density.
+- **Itemset**: Compute the correlation between binary attributes.
+- **Complexity**: Estimate the difficulty in separating the data points into their expected classes.
 
 ## Dependencies
 
@@ -59,7 +65,7 @@ data = load_iris()
 y = data.target
 X = data.data
 
-# Extract all measures
+# Extract default measures
 mfe = MFE()
 mfe.fit(X, y)
 ft = mfe.extract()
@@ -75,13 +81,13 @@ print(ft)
 Several measures return more than one value. To aggregate the returned values, summarization function can be used. This method can compute `min`, `max`, `mean`, `median`, `kurtosis`, `standard deviation`, among others. The default methods are the `mean` and the `sd`. Next, it is possible to see an example of the use of this method:
 
 ```python
-## Extract all measures using min, median and max 
+## Extract default measures using min, median and max 
 mfe = MFE(summary=["min", "median", "max"])
 mfe.fit(X, y)
 ft = mfe.extract()
 print(ft)
 
-## Extract all measures using quantile
+## Extract default measures using quantile
 mfe = MFE(summary=["quantiles"])
 mfe.fit(X, y)
 ft = mfe.extract()

diff --git a/examples/01_introductory_examples/plot_pymfe_default.py b/examples/01_introductory_examples/plot_pymfe_default.py
@@ -12,7 +12,8 @@
 #
 # The standard way to extract meta-features is using the MFE class.
 # The parameters are the dataset and the group of measures to be extracted.
-# By default, the method extract all the measures. For instance:
+# By default, the method extracts general, info-theory, statistical,
+# model-based and landmarking measures. For instance:
 
 from sklearn.datasets import load_iris
 from pymfe.mfe import MFE
@@ -23,7 +24,7 @@
 X = data.data
 
 ###############################################################################
-# Extracting all measures
+# Extracting default measures
 mfe = MFE()
 mfe.fit(X, y)
 ft = mfe.extract()
@@ -37,6 +38,13 @@
 print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))
 
 
+###############################################################################
+# Extracting all measures
+mfe = MFE(groups="all")
+mfe.fit(X, y)
+ft = mfe.extract()
+print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))
+
 ###############################################################################
 # Changing summarization function
 # -------------------------------
@@ -48,14 +56,14 @@
 #
 
 ###############################################################################
-# Compute all measures using min, median and max
+# Compute default measures using min, median and max
 mfe = MFE(summary=["min", "median", "max"])
 mfe.fit(X, y)
 ft = mfe.extract()
 print("\n".join("{:50} {:30}".format(x, y) for x, y in zip(ft[0], ft[1])))
 
 ###############################################################################
-# Compute all measures using quantile
+# Compute default measures using quantile
 mfe = MFE(summary=["quantiles"])
 mfe.fit(X, y)
 ft = mfe.extract()

diff --git a/examples/README.txt b/examples/README.txt
@@ -11,7 +11,7 @@ Measures
 
 In MtL, meta-features are designed to extract general properties able to characterize datasets. The meta-feature values should provide relevant evidences about the performance of algorithms, allowing the design of MtL-based recommendation systems. Thus, these measures must be able to predict, with a low computational cost, the performance of the algorithms under evaluation. In this package, the meta-feature measures are divided into 11 groups:
 
-- **Simple**: General information related to the dataset, also known as simple measures, such as the number of instances, attributes and classes.
+- **General**: General information related to the dataset, also known as simple measures, such as the number of instances, attributes and classes.
 - **Statistical**: Standard statistical measures to describe the numerical properties of data distribution.
 - **Information-theoretic**: Particularly appropriate to describe discrete (categorical) attributes and their relationship with the classes.
 - **Model-based**: Measures designed to extract characteristics from simple machine learning models.

diff --git a/pymfe/mfe.py b/pymfe/mfe.py
@@ -40,7 +40,7 @@ class MFE:
     groups_alias = [('default', _internal.DEFAULT_GROUP)]
 
     def __init__(self,
-                 groups: t.Union[str, t.Iterable[str]] = "all",
+                 groups: t.Union[str, t.Iterable[str]] = "default",
                  features: t.Union[str, t.Iterable[str]] = "all",
                  summary: t.Union[str, t.Iterable[str]] = ("mean", "sd"),
                  measure_time: t.Optional[str] = None,
@@ -65,6 +65,12 @@ def __init__(self,
             desired group of metafeatures for extraction. Use the method
             ``valid_groups`` to get a list of all available groups.
 
+            Setting with ``all`` enables all available groups.
+
+            Setting with ``default`` enables ``general``, ``info-theory``,
+            ``statistical``, ``model-based`` and ``landmarking``. It is the
+            default value.
+
             The value provided by the argument ``wildcard`` can be used to
             select all metafeature groups rapidly.