Skip to content

Commit

Permalink
- Restored the DatabaseConfiguration.getDBnameSeparator() method.
Browse files Browse the repository at this point in the history
- Changed the RandomGenerator.getThreadLocalRandomUnseeded() to ensure we get different random numbers across threads.
- CERT no longer received any dbName parameter as we don't store anything on disk.
- Removed underscores from all temporary names in the framework.
- When we close() a Trainer that has not be loaded or saved the knowledgeBase will be deleted to remove any temporary files.
- The models of a specific dbName are added in a directory structure.
- Created an other level of abstraction for File-based Database Connectors and Configurations.
- Rename Folder to Directory on comments, methods, vars and config files.
- Empty parent directories of the algorithm output are automatically cleaned up.
  • Loading branch information
datumbox committed Dec 22, 2016
1 parent 2e387cf commit 6ed0170
Show file tree
Hide file tree
Showing 37 changed files with 384 additions and 457 deletions.
16 changes: 12 additions & 4 deletions CHANGELOG.md
@@ -1,17 +1,17 @@
CHANGELOG
=========

Version 0.8.0-SNAPSHOT - Build 20161221
Version 0.8.0-SNAPSHOT - Build 20161222
---------------------------------------

- Improved Validation:
- Removed the ValidationMetrics from the Algorithms. Now it is a separate object called Metrics.
- Removed the kFold validation from Algorithms. Now we offer a new validator mechanism.
- A single KnowledgeBase implementation is now used.
- Removed the unnecessary n & d model parameters from all models.
- Random unseeded filenames are now produced using RandomGenerator.getThreadLocalRandomUnseeded().
- Random unseeded filenames are now produced using RandomGenerator.getRandomUnseeded().
- Removing the need to call KnowledgeBase.init() in any predict/transform method.
- Improved DatabaseConnector: existsObject method, InMemory now stores objects independently, MapDB stores all files in folder.
- Improved DatabaseConnector: existsObject method, InMemory now stores objects independently, MapDB stores all files in directory.
- The training parameters are now provided on the constructor of the algorithms not with a setter.
- TextClassifier inherits from Modeler.
- Removed all unnecessary passing of class objects from Stepwise Regression, Wrappers and Ensumble learning classes.
Expand All @@ -21,8 +21,16 @@ Version 0.8.0-SNAPSHOT - Build 20161221
- Created a TrainableBundle to keep track of the Trainables of Modeler, AbstractBoostingBagging and StepwiseRegression.
- Removed automatic save after fit, now save() must be called.
- AbstractTrainer no longer stores a local copy of dbName. The save method accepts a dbName.
- The DatabaseConfiguration.getDBnameSeparator() method was removed.
- The rename() is created in DatabaseConnectors and it's used by KnowledgeBase to saveAs the models.
- Restored the DatabaseConfiguration.getDBnameSeparator() method.
- Changed the RandomGenerator.getThreadLocalRandomUnseeded() to ensure we get different random numbers across threads.
- CERT no longer received any dbName parameter as we don't store anything on disk.
- Removed underscores from all temporary names in the framework.
- When we close() a Trainer that has not be loaded or saved the knowledgeBase will be deleted to remove any temporary files.
- The models of a specific dbName are added in a directory structure.
- Created an other level of abstraction for File-based Database Connectors and Configurations.
- Rename Folder to Directory on comments, methods, vars and config files.
- Empty parent directories of the algorithm output are automatically cleaned up.

Version 0.7.1-SNAPSHOT - Build 20161217
---------------------------------------
Expand Down
Empty file modified LICENSE 100755 → 100644
Empty file.
3 changes: 1 addition & 2 deletions TODO.txt
@@ -1,8 +1,7 @@
CODE IMPROVEMENTS
=================

- Can we add all the files of a model in a single folder.
- Can we make the two constructors of the Trainers to call a common constructor to eliminate duplicate code?
- Save and load method for Dataset.

- Support of better Transformers (Zscore, decouple boolean transforming from numeric etc).
- Write a ShuffleSplitValidator class similar to KFold. Perhaps we need a single Validator class and separate Splitters.
Expand Down
Expand Up @@ -39,7 +39,7 @@ public class Modeler extends AbstractTrainer<Modeler.ModelParameters, Modeler.Tr
private static final String FS_KEY = "fs";
private static final String ML_KEY = "ml";

private TrainableBundle bundle = new TrainableBundle();
private final TrainableBundle bundle = new TrainableBundle();

/**
* It contains all the Model Parameters which are learned during the training.
Expand Down Expand Up @@ -240,9 +240,11 @@ protected void _fit(Dataframe trainingData) {
@Override
public void save(String dbName) {
initBundle();
String knowledgeBaseName = createKnowledgeBaseName(dbName);
bundle.save(knowledgeBaseName);
super.save(dbName);

String separator = knowledgeBase.getConf().getDbConfig().getDBnameSeparator();
String knowledgeBaseName = createKnowledgeBaseName(dbName, separator);
bundle.save(knowledgeBaseName, separator);
}

/** {@inheritDoc} */
Expand All @@ -269,13 +271,14 @@ private void initBundle() {
TrainingParameters trainingParameters = knowledgeBase.getTrainingParameters();
Configuration conf = knowledgeBase.getConf();
String dbName = knowledgeBase.getDbc().getDatabaseName();
String separator = conf.getDbConfig().getDBnameSeparator();

if(!bundle.containsKey(DT_KEY)) {
AbstractTransformer.AbstractTrainingParameters dtParams = trainingParameters.getDataTransformerTrainingParameters();

AbstractTransformer dataTransformer = null;
if(dtParams != null) {
dataTransformer = MLBuilder.load(dtParams.getTClass(), dbName + "_" + DT_KEY, conf);
dataTransformer = MLBuilder.load(dtParams.getTClass(), dbName + separator + DT_KEY, conf);
}
bundle.put(DT_KEY, dataTransformer);
}
Expand All @@ -285,15 +288,15 @@ private void initBundle() {

AbstractFeatureSelector featureSelector = null;
if(fsParams != null) {
featureSelector = MLBuilder.load(fsParams.getTClass(), dbName + "_" + FS_KEY, conf);
featureSelector = MLBuilder.load(fsParams.getTClass(), dbName + separator + FS_KEY, conf);
}
bundle.put(FS_KEY, featureSelector);
}

if(!bundle.containsKey(ML_KEY)) {
AbstractModeler.AbstractTrainingParameters mlParams = trainingParameters.getModelerTrainingParameters();

bundle.put(ML_KEY, MLBuilder.load(mlParams.getTClass(), dbName + "_" + ML_KEY, conf));
bundle.put(ML_KEY, MLBuilder.load(mlParams.getTClass(), dbName + separator + ML_KEY, conf));
}
}

Expand Down
Expand Up @@ -115,19 +115,16 @@ public void setSmoothingAverageRadius(int smoothingAverageRadius) {
this.smoothingAverageRadius = smoothingAverageRadius;
}
}

private final String dbName;

private final Configuration conf;

/**
* Constructor for the CETR class. It accepts as arguments the name of the
* database were the temporary results are stored and the Database Configuration.
*
* @param dbName
*
* @param conf
*/
public CETR(String dbName, Configuration conf) {
this.dbName = dbName;
public CETR(Configuration conf) {
this.conf = conf;
}

Expand Down
Expand Up @@ -44,8 +44,6 @@ public void testExtract() {

Configuration conf = Configuration.getConfiguration();

String dbName = this.getClass().getSimpleName();

String text;
try {
List<String> lines = Files.readAllLines(Paths.get(this.getClass().getClassLoader().getResource("datasets/example.com.html").toURI()), StandardCharsets.UTF_8);
Expand All @@ -64,7 +62,7 @@ public void testExtract() {
parameters.setNumberOfClusters(2);
parameters.setAlphaWindowSizeFor2DModel(3);
parameters.setSmoothingAverageRadius(2);
CETR instance = new CETR(dbName, conf);
CETR instance = new CETR(conf);
String expResult = "This domain is established to be used for illustrative examples in documents. You may use this domain in examples without prior coordination or asking for permission.";
String result = instance.extract(text, parameters);
assertEquals(expResult, result);
Expand Down

0 comments on commit 6ed0170

Please sign in to comment.