Skip to content
Enso: An Open Source Library for Benchmarking Embeddings + Transfer Learning Methods
Branch: master
Clone or download
lhz1029 Merge pull request #39 from IndicoDataSolutions/ben/fix_visualizer
FIX: visualizer array error with one dataset
Latest commit b983a64 Apr 8, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Data
Features FIX: remove IndicoEmotion Dec 11, 2017
docs
enso FIX: visualizer array error with one dataset Apr 8, 2019
.dockerignore ADD: dockerignore Apr 30, 2018
.gitignore FIX: installation guide May 14, 2018
BUILD initial commit Apr 10, 2017
Dockerfile FIX: ensure Docker is properly configured Apr 23, 2018
README.md Update README.md Jul 2, 2018
requirements.txt FIX: PR comments implemented Dec 19, 2018
setup.py ADD: Ticking finetune version Jul 31, 2018

README.md

enso

Enso

Enso is tool intended to provide a standard interface for the benchmarking of embedding and transfer learning methods for natural language processing tasks.

Installation

Enso is compatible with Python 3.4+.

You can install enso via pip:

pip install enso

or directly via setup.py:

git clone git@github.com:IndicoDataSolutions/Enso.git
python setup.py install

Download the included datasets by running:

python -m enso.download

Documentation

Complete API documentation is available at enso.readthedocs.io.

Usage and Workflow

Although there are other effective approaches to applying transfer learning to natural language processing, it's built on the assumption that the approach to "transfer learning" adheres to the below flow. This approach is designed to replicate a scenario where a pool of unlabeled data is available, and labelers with subject matter expertise have a limited amount of time to provide labels for a subset of the unlabeled data.

  • All examples in the dataset are "featurized" via a pre-trained source model (python -m enso.featurize)
  • Re-represented data is separated into train and test sets
  • A fixed number of examples from the train set is selected to use as training data via the selected sampling strategy
  • The training data subset is optionally over or under-sampled to account for variation in class balance
  • A target model is trained using the featurized training examples as inputs (python -m enso.experiment)
  • The target model is benchmarked on all featurized test examples
  • The process is repeated for all combinations of featurizers, dataset sizes, target model architectures, etc.
  • Results are visualized and manually inspected (python -m enso.visualize)

For detailed API documentation, refer to enso.readthedocs.org.

Contributions in the form of pull requests or issues are welcome!

Sample result visualization included below:

Enso Results Visualization

You can’t perform that action at this time.