-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #147 from teaearlgraycold/new_classifiers
New classifiers
- Loading branch information
Showing
16 changed files
with
884 additions
and
95 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
47 changes: 47 additions & 0 deletions
47
...rces/documentation/pipeline_operators/models/classifiers/ensemble/ExtraTrees.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Extra Trees Classifier | ||
* * * | ||
|
||
Fits an extra-trees classifier. | ||
|
||
## Dependencies | ||
sklearn.ensemble.ExtraTreesClassifier | ||
|
||
Parameters | ||
---------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['class', 'group', 'guess']} | ||
Input DataFrame for fitting the classifier | ||
criterion: int | ||
Integer that is used to select from the list of valid criteria, | ||
either 'gini', or 'entropy' | ||
max_features: int | ||
The number of features to consider when looking for the best split | ||
|
||
Returns | ||
------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['guess', 'group', 'class', 'SyntheticFeature']} | ||
Returns a modified input DataFrame with the guess column updated according to the classifier's predictions. | ||
Also adds the classifiers's predictions as a 'SyntheticFeature' column. | ||
|
||
|
||
Example Exported Code | ||
--------------------- | ||
|
||
```Python | ||
import numpy as np | ||
import pandas as pd | ||
from sklearn.cross_validation import train_test_split | ||
from sklearn.ensemble import ExtraTreesClassifier | ||
|
||
# NOTE: Make sure that the class is labeled 'class' in the data file | ||
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR') | ||
training_indices, testing_indices = train_test_split(tpot_data.index, stratify=tpot_data['class'].values, train_size=0.75, test_size=0.25) | ||
|
||
|
||
result1 = tpot_data.copy() | ||
|
||
etc1 = ExtraTreesClassifier(criterion="entropy", max_features=5, n_estimators=500, random_state=42) | ||
etc1.fit(result1.loc[training_indices].drop('class', axis=1).values, result1.loc[training_indices, 'class'].values) | ||
|
||
result1['etc1-classification'] = etc1.predict(result1.drop('class', axis=1).values) | ||
|
||
``` |
45 changes: 45 additions & 0 deletions
45
...on/pipeline_operators/models/classifiers/ensemble/GradientBoostingClassifier.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# Gradient Boosting Classifier | ||
* * * | ||
|
||
Fits a Gradient Boosting classifier. | ||
|
||
## Dependencies | ||
sklearn.ensemble.GradientBoostingClassifier | ||
|
||
Parameters | ||
---------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['class', 'group', 'guess']} | ||
Input DataFrame for fitting the random forest | ||
learning_rate: float | ||
Learning rate shrinks the contribution of each tree by learning_rate | ||
max_depth: int | ||
Maximum depth of the individual regression estimators | ||
|
||
Returns | ||
------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['guess', 'group', 'class', 'SyntheticFeature']} | ||
Returns a modified input DataFrame with the guess column updated according to the classifier's predictions. | ||
Also adds the classifiers's predictions as a 'SyntheticFeature' column. | ||
|
||
Example Exported Code | ||
--------------------- | ||
|
||
```Python | ||
import numpy as np | ||
import pandas as pd | ||
from sklearn.cross_validation import train_test_split | ||
from sklearn.ensemble import GradientBoostingClassifier | ||
|
||
# NOTE: Make sure that the class is labeled 'class' in the data file | ||
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR') | ||
training_indices, testing_indices = train_test_split(tpot_data.index, stratify=tpot_data['class'].values, train_size=0.75, test_size=0.25) | ||
|
||
result1 = tpot_data.copy() | ||
|
||
# Perform classification with a gradient boosting classifier | ||
gbc1 = GradientBoostingClassifier(learning_rate=1.0, max_depth=3, n_estimators=500, random_state=42) | ||
gbc1.fit(result1.loc[training_indices].drop('class', axis=1).values, result1.loc[training_indices, 'class'].values) | ||
|
||
result1['gbc1-classification'] = gbc1.predict(result1.drop('class', axis=1).values) | ||
|
||
``` |
47 changes: 47 additions & 0 deletions
47
...ntation/pipeline_operators/models/classifiers/linear_model/PassiveAggressive.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Passive Aggressive Classifier | ||
* * * | ||
|
||
Fits a Passive Aggressive classifier | ||
|
||
## Dependencies | ||
sklearn.linear_model.PassiveAggressiveClassifier | ||
|
||
Parameters | ||
---------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['class', 'group', 'guess']} | ||
Input DataFrame for fitting the classifier | ||
criterion: int | ||
Integer that is used to select from the list of valid criteria, | ||
either 'gini', or 'entropy' | ||
max_features: int | ||
The number of features to consider when looking for the best split | ||
|
||
Returns | ||
------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['guess', 'group', 'class', 'SyntheticFeature']} | ||
Returns a modified input DataFrame with the guess column updated according to the classifier's predictions. | ||
Also adds the classifiers's predictions as a 'SyntheticFeature' column. | ||
|
||
|
||
Example Exported Code | ||
--------------------- | ||
|
||
```Python | ||
import numpy as np | ||
import pandas as pd | ||
from sklearn.cross_validation import train_test_split | ||
from sklearn.linear_model import PassiveAggressiveClassifier | ||
|
||
# NOTE: Make sure that the class is labeled 'class' in the data file | ||
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR') | ||
training_indices, testing_indices = train_test_split(tpot_data.index, stratify=tpot_data['class'].values, train_size=0.75, test_size=0.25) | ||
|
||
|
||
result1 = tpot_data.copy() | ||
|
||
etc1 = ExtraTreesClassifier(criterion="gini", max_features=6, n_estimators=500, random_state=42) | ||
etc1.fit(result1.loc[training_indices].drop('class', axis=1).values, result1.loc[training_indices, 'class'].values) | ||
|
||
result1['etc1-classification'] = etc1.predict(result1.drop('class', axis=1).values) | ||
|
||
``` |
49 changes: 49 additions & 0 deletions
49
.../documentation/pipeline_operators/models/classifiers/naive_bayes/BernoulliNB.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# BernoulliNB Classifier | ||
* * * | ||
|
||
Fits a Naive Bayes classifier for multivariate Bernoulli models. | ||
|
||
## Dependencies | ||
sklearn.naive_bayes.BernoulliNB | ||
|
||
Parameters | ||
---------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['class', 'group', 'guess']} | ||
Input DataFrame for fitting the classifier | ||
alpha: float | ||
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing). | ||
binarize: float | ||
Threshold for binarizing (mapping to booleans) of sample features. | ||
fit_prior: int | ||
Whether to learn class prior probabilities or not. If false, a uniform prior will be used. | ||
Reduced to a boolean with modulus. | ||
|
||
Returns | ||
------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['guess', 'group', 'class', 'SyntheticFeature']} | ||
Returns a modified input DataFrame with the guess column updated according to the classifier's predictions. | ||
Also adds the classifiers's predictions as a 'SyntheticFeature' column. | ||
|
||
|
||
Example Exported Code | ||
--------------------- | ||
|
||
```Python | ||
import numpy as np | ||
import pandas as pd | ||
from sklearn.cross_validation import train_test_split | ||
from sklearn.naive_bayes import BernoulliNB | ||
|
||
# NOTE: Make sure that the class is labeled 'class' in the data file | ||
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR') | ||
training_indices, testing_indices = train_test_split(tpot_data.index, stratify=tpot_data['class'].values, train_size=0.75, test_size=0.25) | ||
|
||
|
||
result1 = tpot_data.copy() | ||
|
||
bnb1 = BernoulliNB(alpha=0.01, binarize=1.0, fit_prior=False) | ||
bnb1.fit(result1.loc[training_indices].drop('class', axis=1).values, result1.loc[training_indices, 'class'].values) | ||
|
||
result1['bnb1-classification'] = bnb1.predict(result1.drop('class', axis=1).values) | ||
|
||
``` |
42 changes: 42 additions & 0 deletions
42
...s/documentation/pipeline_operators/models/classifiers/naive_bayes/GaussianNB.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# GaussianNB Classifier | ||
* * * | ||
|
||
Fits a Gaussian Naive Bayes classifier | ||
|
||
## Dependencies | ||
sklearn.naive_bayes.GaussianNB | ||
|
||
Parameters | ||
---------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['class', 'group', 'guess']} | ||
Input DataFrame for fitting the classifier | ||
|
||
Returns | ||
------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['guess', 'group', 'class', 'SyntheticFeature']} | ||
Returns a modified input DataFrame with the guess column updated according to the classifier's predictions. | ||
Also adds the classifiers's predictions as a 'SyntheticFeature' column. | ||
|
||
|
||
Example Exported Code | ||
--------------------- | ||
|
||
```Python | ||
import numpy as np | ||
import pandas as pd | ||
from sklearn.cross_validation import train_test_split | ||
from sklearn.naive_bayes import GaussianNB | ||
|
||
# NOTE: Make sure that the class is labeled 'class' in the data file | ||
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR') | ||
training_indices, testing_indices = train_test_split(tpot_data.index, stratify=tpot_data['class'].values, train_size=0.75, test_size=0.25) | ||
|
||
|
||
result1 = tpot_data.copy() | ||
|
||
gnb1 = GaussianNB() | ||
gnb1.fit(result1.loc[training_indices].drop('class', axis=1).values, result1.loc[training_indices, 'class'].values) | ||
|
||
result1['gnb1-classification'] = gnb1.predict(result1.drop('class', axis=1).values) | ||
|
||
``` |
47 changes: 47 additions & 0 deletions
47
...ocumentation/pipeline_operators/models/classifiers/naive_bayes/MultinomialNB.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# MultinomialNB Classifier | ||
* * * | ||
|
||
Fits a Naive Bayes classifier for multinomial models | ||
|
||
## Dependencies | ||
sklearn.naive_bayes.MultinomialNB | ||
|
||
Parameters | ||
---------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['class', 'group', 'guess']} | ||
Input DataFrame for fitting the classifier | ||
alpha: float | ||
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing). | ||
fit_prior: int | ||
Whether to learn class prior probabilities or not. If false, a uniform prior will be used. | ||
Reduced to a boolean with modulus. | ||
|
||
Returns | ||
------- | ||
input_df: pandas.DataFrame {n_samples, n_features+['guess', 'group', 'class', 'SyntheticFeature']} | ||
Returns a modified input DataFrame with the guess column updated according to the classifier's predictions. | ||
Also adds the classifiers's predictions as a 'SyntheticFeature' column. | ||
|
||
|
||
Example Exported Code | ||
--------------------- | ||
|
||
```Python | ||
import numpy as np | ||
import pandas as pd | ||
from sklearn.cross_validation import train_test_split | ||
from sklearn.naive_bayes import MultinomialNB | ||
|
||
# NOTE: Make sure that the class is labeled 'class' in the data file | ||
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR') | ||
training_indices, testing_indices = train_test_split(tpot_data.index, stratify=tpot_data['class'].values, train_size=0.75, test_size=0.25) | ||
|
||
|
||
result1 = tpot_data.copy() | ||
|
||
mnb1 = MultinomialNB(alpha=1.0, fit_prior=True) | ||
mnb1.fit(result1.loc[training_indices].drop('class', axis=1).values, result1.loc[training_indices, 'class'].values) | ||
|
||
result1['mnb1-classification'] = mnb1.predict(result1.drop('class', axis=1).values) | ||
|
||
``` |
Oops, something went wrong.