Add sklearn example #7

jcklie · 2019-07-09T13:54:17Z

Is your feature request related to a problem? Please describe.
The API of this tool should be compatible with sklearn. It would be nice to document how to use these together.

Describe the solution you'd like
Add an example using e.g. cross validation, parameter grid search or pipelining.

AMarkard · 2019-10-23T13:32:56Z

Implementing the sklearn api for the neural networks turned out to be more difficult than expected.
As "fit" requires X and Ys seperatly rather than the torchtext.data.Iterator that is currently in use.
But skorch provides a nice solution by wrapping the Pytorch-network and also by SliceDataset which solves the Iterator issue. So I wrote a complete wrapper class which uses skorch to wrap the neural networks and adjusts them to thesklearn api as well as the project structure.
But after that the next issue came up, due to the fact of averaging the sentence inside the neural network required to not pad the data, problems occur with sklearn. Also the datatypes that are used are not supported by sklearn.
As a Collaborator of the skorch project states, the problem lies within our datastructure and the way sklearn handles the data.
"Getting pytorch Datasets to work with GridSearchCV is not trivially possible. The problem is that eventually, the Dataset leaves the skorch domain and is handled directly by sklearn. sklearn only works with a couple of data types (ndarray, scipy sparse, pandas DataFrame), so you will encounter an error sooner or later." (skorch-dev/skorch#212)
To finally conclude, in order to use sklearn the datahandling needs to be completely restructured.

jcklie · 2019-10-28T16:50:16Z

What happens if you replace torchtext with flair for the embeddings and just using pytorch datasets?

AMarkard · 2019-10-30T13:40:36Z

Flair sadly comes with other drawbacks regarding our system.
Flair seems to be very slow as well, at least for such huge data amounts. Nvidia Apex does NOT yield the improvement needed. After some investigation it seems like the bottleneck lies in the structure itself. Our system needs pairs of (embedded) sentences and frames. But Flair requires a wrapping as "Sentence"-objects. Therefore the usage needs to be as follows: sentence -> Flair's sentence object -> embedding -> taking the embeddings out of the sentence object -> dropping the object. (Compare old repo #25)

jcklie added documentation Improvements or additions to documentation enhancement New feature or request labels Jul 9, 2019

jcklie added this to the Ready for pypi milestone Jul 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sklearn example #7

Add sklearn example #7

jcklie commented Jul 9, 2019

AMarkard commented Oct 23, 2019

jcklie commented Oct 28, 2019

AMarkard commented Oct 30, 2019

Add sklearn example #7

Add sklearn example #7

Comments

jcklie commented Jul 9, 2019

AMarkard commented Oct 23, 2019

jcklie commented Oct 28, 2019

AMarkard commented Oct 30, 2019