Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple instances of the same feature extractor with different configurations should be possible #39

Closed
daxenberger opened this issue Jun 9, 2015 · 66 comments

Comments

@daxenberger
Copy link
Member

Originally reported on Google Code with ID 39

We could e.g. want two "interesting words" FE with two different lists instead of having
to merge the lists.

This is currently not possible, as they read the parameters from the global space and
will both take the same.

Reported by torsten.zesch on 2013-08-07 16:01:42

@daxenberger
Copy link
Member Author

To make this work, we would have to allow for configuring dedicated pairs of FeatureExtractors
and their configuration.
All configuration for FeatureExtractors (and resp. MetaCollectors) should be handled
equally. See also issue 36.

Reported by daxenberger.j on 2013-08-07 16:33:05

@daxenberger
Copy link
Member Author

I am not sure anymore that this is actually possible in our current architecture.
Thus closing the issue.

Reported by torsten.zesch on 2014-03-19 08:46:38

  • Status changed: WontFix

@daxenberger
Copy link
Member Author

I came across the same problem today and I think it would be very useful, if you could
call a FeatureExtractor twice (or more times) from the same experiment using different
parameters (one parameter being the name of the feature). This would help to make FeatureExtractors
more generic and keep the specific part to the experiment. In my case, I wanted to
use different resources such as files (as in Torsten's example) or similarity measures
but do the same operations on them.   

Reported by lisa.beinborn@gmx.de on 2014-05-27 14:46:11

@daxenberger
Copy link
Member Author

As already mentioned above, this would require major changes in the core of the framework.
We'll keep the issue on the agenda, but this will definitely not be resolved before
the next release. 

Reported by daxenberger.j on 2014-05-27 14:52:45

  • Status changed: Accepted

@daxenberger
Copy link
Member Author

Reported by daxenberger.j on 2014-06-04 12:34:23

  • Labels added: Milestone-Release0.7.0

@daxenberger
Copy link
Member Author

Reported by daxenberger.j on 2014-08-29 10:03:18

  • Labels added: Milestone-Release0.8.0
  • Labels removed: Milestone-Release0.7.0

@daxenberger
Copy link
Member Author

Reported by daxenberger.j on 2014-10-29 17:24:00

@daxenberger
Copy link
Member Author

This issue was updated by revision r1277.

- Hackathon intermediate results

Reported by richard.eckart on 2014-12-07 19:45:46

@daxenberger
Copy link
Member Author

Reported by daxenberger.j on 2014-12-11 15:28:26

  • Status changed: Started

@daxenberger
Copy link
Member Author

Reported by daxenberger.j on 2014-12-11 15:30:30

@daxenberger
Copy link
Member Author

This issue was updated by revision r1447.


adapt ContextMetaCollector to new meta collector structure

Reported by daxenberger.j on 2015-04-10 13:59:34

@daxenberger
Copy link
Member Author

This issue was updated by revision r1452.


some fixes to make the demo work

Reported by daxenberger.j on 2015-04-10 16:16:16

@daxenberger
Copy link
Member Author

This issue was updated by revision r1458.


corrections in ValidityCheckTask

Reported by daxenberger.j on 2015-04-11 07:10:23

@daxenberger
Copy link
Member Author

This issue was updated by revision r1459.


updating proof-of-concept demo (currently broken)

Reported by daxenberger.j on 2015-04-11 07:26:15

@daxenberger
Copy link
Member Author

Remaining TODOs:

- demo is broken (topK for all FE instances is created in initialize())
- names of FEs are not used in DISCRIMINATORS.txt

Reported by daxenberger.j on 2015-04-11 07:28:53

@Horsmann
Copy link
Member

Horsmann commented Nov 4, 2015

how far is this issue? Is this usable yet?

@Horsmann
Copy link
Member

@daxenberger ping :)

@daxenberger
Copy link
Member Author

We have implemented a prototype of this, but not merged it into the trunk yet.
If you want to play with it, have a look at the code in the issue-39 branch, in particular into the examples/single/document/BasicTwentyNewsgroupsDemo class.

Horsmann added a commit that referenced this issue Jul 31, 2016
Horsmann added a commit that referenced this issue Jul 31, 2016
Horsmann added a commit that referenced this issue Jul 31, 2016
Horsmann added a commit that referenced this issue Jul 31, 2016
Horsmann added a commit that referenced this issue Jul 31, 2016
Horsmann added a commit that referenced this issue Jul 31, 2016
@Horsmann
Copy link
Member

@reckart I have some import resolution errors in a demo case that uses an ExternalResourceDescription as parameter for a feature extractor.

Any idea for the origin of the exception?

2016-07-31 16:33:20 DEBUG DefaultLoggingService:43 - [Evaluation-TwentyNewsgroupsTrainTest-b7e16165-572b-11e6-a228-e98b49b76be2] Problem stack trace:
org.dkpro.lab.storage.UnresolvedImportException: 
 -Unable to resolve import of task [org.dkpro.tc.core.task.ExtractFeaturesTask-Test-TwentyNewsgroupsTrainTest] pointing to [task-latest://org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest/output]; nested exception is org.dkpro.lab.storage.TaskContextNotFoundException: Task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest] has never been executed.
 -Unable to resolve import of task [org.dkpro.tc.weka.task.WekaTestTask-TwentyNewsgroupsTrainTest] pointing to [task-latest://org.dkpro.tc.core.task.ExtractFeaturesTask-Test-TwentyNewsgroupsTrainTest/output]; nested exception is org.dkpro.lab.storage.TaskContextNotFoundException: Task [org.dkpro.tc.core.task.ExtractFeaturesTask-Test-TwentyNewsgroupsTrainTest] has never been executed.; nested exception is org.dkpro.lab.storage.UnresolvedImportException: Unable to resolve import of task [org.dkpro.tc.core.task.ExtractFeaturesTask-Test-TwentyNewsgroupsTrainTest] pointing to [task-latest://org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest/output]; nested exception is org.dkpro.lab.storage.TaskContextNotFoundException: Task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest] has never been executed.
    at org.dkpro.lab.engine.impl.BatchTaskEngine.executeConfiguration(BatchTaskEngine.java:261)
    at org.dkpro.lab.engine.impl.BatchTaskEngine.run(BatchTaskEngine.java:133)
    at org.dkpro.lab.engine.impl.DefaultTaskExecutionService.run(DefaultTaskExecutionService.java:52)
    at org.dkpro.lab.Lab.run(Lab.java:107)
    at org.dkpro.tc.examples.single.pair.WekaExternalResourceDemo.runTrainTest(WekaExternalResourceDemo.java:134)
    at org.dkpro.tc.examples.single.pair.WekaExternalResourceDemo.main(WekaExternalResourceDemo.java:80)
Caused by: org.dkpro.lab.storage.UnresolvedImportException: Unable to resolve import of task [org.dkpro.tc.core.task.ExtractFeaturesTask-Test-TwentyNewsgroupsTrainTest] pointing to [task-latest://org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest/output]; nested exception is org.dkpro.lab.storage.TaskContextNotFoundException: Task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest] has never been executed.
    at org.dkpro.lab.engine.impl.BatchTaskEngine$ScopedTaskContext.resolve(BatchTaskEngine.java:521)
    at org.dkpro.lab.engine.impl.DefaultTaskContextFactory.resolveImports(DefaultTaskContextFactory.java:141)
    at org.dkpro.lab.engine.impl.DefaultTaskContextFactory.createContext(DefaultTaskContextFactory.java:97)
    at org.dkpro.lab.uima.engine.simple.SimpleExecutionEngine.run(SimpleExecutionEngine.java:77)
    at org.dkpro.lab.engine.impl.BatchTaskEngine.runNewExecution(BatchTaskEngine.java:329)
    at org.dkpro.lab.engine.impl.BatchTaskEngine.executeConfiguration(BatchTaskEngine.java:234)
    ... 5 more
Caused by: org.dkpro.lab.storage.TaskContextNotFoundException: Task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest] has never been executed.
    at org.dkpro.lab.engine.impl.ImportUtil.createContextNotFoundException(ImportUtil.java:124)
    at org.dkpro.lab.engine.impl.BatchTaskEngine.getLatestExecution(BatchTaskEngine.java:306)
    at org.dkpro.lab.engine.impl.BatchTaskEngine.access$000(BatchTaskEngine.java:64)
    at org.dkpro.lab.engine.impl.BatchTaskEngine$ScopedTaskContext.resolve(BatchTaskEngine.java:518)
    ... 10 more
2016-07-31 16:33:20 INFO  DefaultLoggingService:33 - [Evaluation-TwentyNewsgroupsTrainTest-b7e16165-572b-11e6-a228-e98b49b76be2] Shut down task
Exception in thread "main" org.dkpro.lab.storage.UnresolvedImportException: 
 -Unable to resolve import of task [org.dkpro.tc.core.task.ExtractFeaturesTask-Test-TwentyNewsgroupsTrainTest] pointing to [task-latest://org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest/output]; nested exception is org.dkpro.lab.storage.TaskContextNotFoundException: Task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest] has never been executed.
 -Unable to resolve import of task [org.dkpro.tc.weka.task.WekaTestTask-TwentyNewsgroupsTrainTest] pointing to [task-latest://org.dkpro.tc.core.task.ExtractFeaturesTask-Test-TwentyNewsgroupsTrainTest/output]; nested exception is org.dkpro.lab.storage.TaskContextNotFoundException: Task [org.dkpro.tc.core.task.ExtractFeaturesTask-Test-TwentyNewsgroupsTrainTest] has never been executed.; nested exception is org.dkpro.lab.storage.UnresolvedImportException: Unable to resolve import of task [org.dkpro.tc.core.task.ExtractFeaturesTask-Test-TwentyNewsgroupsTrainTest] pointing to [task-latest://org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest/output]; nested exception is org.dkpro.lab.storage.TaskContextNotFoundException: Task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest] has never been executed.
    at org.dkpro.lab.engine.impl.BatchTaskEngine.executeConfiguration(BatchTaskEngine.java:261)
    at org.dkpro.lab.engine.impl.BatchTaskEngine.run(BatchTaskEngine.java:133)
    at org.dkpro.lab.engine.impl.DefaultTaskExecutionService.run(DefaultTaskExecutionService.java:52)
    at org.dkpro.lab.Lab.run(Lab.java:107)
    at org.dkpro.tc.examples.single.pair.WekaExternalResourceDemo.runTrainTest(WekaExternalResourceDemo.java:134)
    at org.dkpro.tc.examples.single.pair.WekaExternalResourceDemo.main(WekaExternalResourceDemo.java:80)
Caused by: org.dkpro.lab.storage.UnresolvedImportException: Unable to resolve import of task [org.dkpro.tc.core.task.ExtractFeaturesTask-Test-TwentyNewsgroupsTrainTest] pointing to [task-latest://org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest/output]; nested exception is org.dkpro.lab.storage.TaskContextNotFoundException: Task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest] has never been executed.
    at org.dkpro.lab.engine.impl.BatchTaskEngine$ScopedTaskContext.resolve(BatchTaskEngine.java:521)
    at org.dkpro.lab.engine.impl.DefaultTaskContextFactory.resolveImports(DefaultTaskContextFactory.java:141)
    at org.dkpro.lab.engine.impl.DefaultTaskContextFactory.createContext(DefaultTaskContextFactory.java:97)
    at org.dkpro.lab.uima.engine.simple.SimpleExecutionEngine.run(SimpleExecutionEngine.java:77)
    at org.dkpro.lab.engine.impl.BatchTaskEngine.runNewExecution(BatchTaskEngine.java:329)
    at org.dkpro.lab.engine.impl.BatchTaskEngine.executeConfiguration(BatchTaskEngine.java:234)
    ... 5 more
Caused by: org.dkpro.lab.storage.TaskContextNotFoundException: Task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest] has never been executed.
    at org.dkpro.lab.engine.impl.ImportUtil.createContextNotFoundException(ImportUtil.java:124)
    at org.dkpro.lab.engine.impl.BatchTaskEngine.getLatestExecution(BatchTaskEngine.java:306)
    at org.dkpro.lab.engine.impl.BatchTaskEngine.access$000(BatchTaskEngine.java:64)
    at org.dkpro.lab.engine.impl.BatchTaskEngine$ScopedTaskContext.resolve(BatchTaskEngine.java:518)
    ... 10 more

Horsmann added a commit that referenced this issue Jul 31, 2016
@reckart
Copy link
Member

reckart commented Jul 31, 2016

@Horsmann try increasing the logging level so that lab prints the actual discriminator values that it compares and whether they match or not.

Horsmann added a commit that referenced this issue Jul 31, 2016
@Horsmann
Copy link
Member

@reckart

2016-07-31 17:37:15 DEBUG DataBinder:446 - DataBinder requires binding of required fields [weightingTf,normalization]
2016-07-31 17:37:15 DEBUG BeanUtils:456 - No property editor [dkpro.similarity.algorithms.lexical.string.CosineSimilarity$WeightingModeTfEditor] found for type dkpro.similarity.algorithms.lexical.string.CosineSimilarity$WeightingModeTf according to 'Editor' suffix convention
2016-07-31 17:37:15 DEBUG BeanUtils:456 - No property editor [dkpro.similarity.algorithms.lexical.string.CosineSimilarity$NormalizationModeEditor] found for type dkpro.similarity.algorithms.lexical.string.CosineSimilarity$NormalizationMode according to 'Editor' suffix convention
2016-07-31 17:37:15 DEBUG DataBinder:446 - DataBinder requires binding of required fields [size]
2016-07-31 17:37:15 DEBUG DataBinder:446 - DataBinder requires binding of required fields [segmentFeaturePath,featureExtractorName]
2016-07-31 17:37:15 DEBUG DataBinder:446 - DataBinder requires binding of required fields [outputDirectory,addInstanceId,featureFilters,dataWriterClass,learningMode,featureMode,developerMode,applyWeighting,isTesting,featureStoreClass]
2016-07-31 17:37:15 INFO  DefaultLoggingService:33 - [ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919] Initialized task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest]
2016-07-31 17:37:15 DEBUG ExtractFeaturesTask:414 - Found Discriminator [org.dkpro.tc.core.task.ExtractFeaturesTask|featureFilters]: []
2016-07-31 17:37:15 DEBUG ExtractFeaturesTask:414 - Found Discriminator [org.dkpro.tc.core.task.ExtractFeaturesTask|filesRoot]: null
2016-07-31 17:37:15 DEBUG ExtractFeaturesTask:414 - Found Discriminator [org.dkpro.tc.core.task.ExtractFeaturesTask|files_training]: null
2016-07-31 17:37:15 DEBUG ExtractFeaturesTask:414 - Found Discriminator [org.dkpro.tc.core.task.ExtractFeaturesTask|files_validation]: null
2016-07-31 17:37:15 DEBUG ExtractFeaturesTask:414 - Found Discriminator [org.dkpro.tc.core.task.ExtractFeaturesTask|learningMode]: singleLabel
2016-07-31 17:37:15 DEBUG ExtractFeaturesTask:414 - Found Discriminator [org.dkpro.tc.core.task.ExtractFeaturesTask|featureMode]: pair
2016-07-31 17:37:15 DEBUG ExtractFeaturesTask:414 - Found Discriminator [org.dkpro.tc.core.task.ExtractFeaturesTask|featureStore]: null
2016-07-31 17:37:15 DEBUG ExtractFeaturesTask:414 - Found Discriminator [org.dkpro.tc.core.task.ExtractFeaturesTask|developerMode]: false
2016-07-31 17:37:15 DEBUG ExtractFeaturesTask:414 - Found Discriminator [org.dkpro.tc.core.task.ExtractFeaturesTask|applyWeighting]: false
2016-07-31 17:37:15 DEBUG ExtractFeaturesTask:414 - Found Discriminator [org.dkpro.tc.core.task.ExtractFeaturesTask|featureSet]: [org.dkpro.tc.features.pair.similarity.SimilarityPairFeatureExtractor| textSimilarityResource, org.apache.uima.fit.internal.ExtendedExternalResourceDescription_impl: 
description = NULL
implementationName = NULL
name = dkpro.similarity.algorithms.lexical.uima.string.CosineSimilarityResource-0
------cut------
2016-07-31 17:37:15 INFO  DefaultLoggingService:33 - [ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919] Progress de.tudarmstadt.ukp.dkpro.core.io.bincas.BinaryCasReader 5/5 file (src/main/resources/data/twentynewsgroups/bydate-train/alt.atheism/49960.txt src/main/resources/data/twentynewsgroups/bydate-train/comp.graphics/37913.txt)
2016-07-31 17:37:16 INFO  DefaultLoggingService:33 - [ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919] Completing task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest]
2016-07-31 17:37:16 INFO  DefaultLoggingService:33 - [ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919] Running reports for task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest]
2016-07-31 17:37:16 INFO  DefaultLoggingService:33 - [ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919] Starting report [org.dkpro.lab.uima.reporting.UimaDescriptorsReport] (1/1)
2016-07-31 17:37:16 DEBUG FileSystemStorageService:240 - Storing to: target/results/WekaExternalResourceDemo/org.dkpro.lab/repository/ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919/AnalysisEngineDescription.xml.html
2016-07-31 17:37:16 DEBUG FileSystemStorageService:240 - Storing to: target/results/WekaExternalResourceDemo/org.dkpro.lab/repository/ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919/CollectionReaderDescription.xml.html
2016-07-31 17:37:16 INFO  DefaultLoggingService:33 - [ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919] Report complete [org.dkpro.lab.uima.reporting.UimaDescriptorsReport] (1/1)
2016-07-31 17:37:16 DEBUG FileSystemStorageService:240 - Storing to: target/results/WekaExternalResourceDemo/org.dkpro.lab/repository/ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919/METADATA.txt
2016-07-31 17:37:16 INFO  DefaultLoggingService:33 - [ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919] Completed task [org.dkpro.tc.core.task.ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest]
2016-07-31 17:37:16 INFO  DefaultLoggingService:33 - [ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919] Shut down task
2016-07-31 17:37:16 DEBUG ImportUtil:68 - No value match: [org.dkpro.tc.core.task.MetaInfoTask|featureSet] [^[org.dkpro.tc.features.pair.similarity.SimilarityPairFeatureExtractor| textSimilarityResource, org.apache.uima.fit.internal.ExtendedExternalResourceDescription_impl: 
description = NULL
implementationName = NULL
name = dkpro.similarity.algorithms.lexical.uima.string.CosineSimilarityResource-0
resourceSpecifier = org.apache.uima.resource.impl.CustomResourceSpecifier_impl: 
parameters = Array{0: org.apache.uima.resource.impl.Parameter_impl: 

The toString() of the ExternalResourceDescription that is passes as parameters makes the debug output endlessly verbose

I think this line says it doesn't find the feature

2016-07-31 17:37:16 INFO  DefaultLoggingService:33 - [ExtractFeaturesTask-Train-TwentyNewsgroupsTrainTest-a84a5c0b-5734-11e6-9ffd-819de4a89919] Shut down task
2016-07-31 17:37:16 DEBUG ImportUtil:68 - No value match: [org.dkpro.tc.core.task.MetaInfoTask|featureSet] [^[org.dkpro.tc.features.pair.similarity.SimilarityPairFeatureExtractor| textSimilarityResource, org.apache.uima.fit.internal.ExtendedExternalResourceDescription_impl: 
--------cut-----

@reckart
Copy link
Member

reckart commented Jul 31, 2016

Since all the features are now discriminable (right?) the discriminators should no longer contain the full specified text, no?

@Horsmann
Copy link
Member

Well, the discriminator uses also the parameters to have a full feature description and not just the name of the feature class e.g. LuceneNgram, ngramMin, 1, ngramMax, 3
In case the parameter is an externalResourceDescription it just calls toString() on it. This is why I get the whole resource text yet again.

@reckart
Copy link
Member

reckart commented Jul 31, 2016

Ok, that makes it verbose ;)

Maybe you don't keep a strict ordering when writing out your discriminator values? Mind that discriminators are compared based on their string values. If you have any non-determinism (e.g. using Sets or HashMaps which don't guarantee an order) you can get such effects.

Horsmann added a commit that referenced this issue Jul 31, 2016
@Horsmann
Copy link
Member

ok. I capture the ExtResDesc when generating the toString() value, parse it and return its name. This solves the problem. Those endlessly long texts seem to be quite a problem. Now it works.

Horsmann added a commit that referenced this issue Jul 31, 2016
Horsmann added a commit that referenced this issue Jul 31, 2016
Horsmann added a commit that referenced this issue Jul 31, 2016
Horsmann added a commit that referenced this issue Jul 31, 2016
Horsmann added a commit that referenced this issue Aug 1, 2016
Horsmann added a commit that referenced this issue Aug 1, 2016
Horsmann added a commit that referenced this issue Aug 1, 2016
Horsmann added a commit that referenced this issue Aug 1, 2016
Horsmann added a commit that referenced this issue Aug 1, 2016
Horsmann added a commit that referenced this issue Aug 2, 2016
Horsmann added a commit that referenced this issue Aug 2, 2016
Horsmann added a commit that referenced this issue Aug 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants