You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The API of this tool should be compatible with sklearn. It would be nice to document how to use these together.
Describe the solution you'd like
Add an example using e.g. cross validation, parameter grid search or pipelining.
The text was updated successfully, but these errors were encountered:
Implementing the sklearn api for the neural networks turned out to be more difficult than expected.
As "fit" requires X and Ys seperatly rather than the torchtext.data.Iterator that is currently in use.
But skorch provides a nice solution by wrapping the Pytorch-network and also by SliceDataset which solves the Iterator issue. So I wrote a complete wrapper class which uses skorch to wrap the neural networks and adjusts them to thesklearn api as well as the project structure.
But after that the next issue came up, due to the fact of averaging the sentence inside the neural network required to not pad the data, problems occur with sklearn. Also the datatypes that are used are not supported by sklearn.
As a Collaborator of the skorch project states, the problem lies within our datastructure and the way sklearn handles the data.
"Getting pytorch Datasets to work with GridSearchCV is not trivially possible. The problem is that eventually, the Dataset leaves the skorch domain and is handled directly by sklearn. sklearn only works with a couple of data types (ndarray, scipy sparse, pandas DataFrame), so you will encounter an error sooner or later." (skorch-dev/skorch#212)
To finally conclude, in order to use sklearn the datahandling needs to be completely restructured.
Flair sadly comes with other drawbacks regarding our system.
Flair seems to be very slow as well, at least for such huge data amounts. Nvidia Apex does NOT yield the improvement needed. After some investigation it seems like the bottleneck lies in the structure itself. Our system needs pairs of (embedded) sentences and frames. But Flair requires a wrapping as "Sentence"-objects. Therefore the usage needs to be as follows: sentence -> Flair's sentence object -> embedding -> taking the embeddings out of the sentence object -> dropping the object. (Compare old repo #25)
Is your feature request related to a problem? Please describe.
The API of this tool should be compatible with sklearn. It would be nice to document how to use these together.
Describe the solution you'd like
Add an example using e.g. cross validation, parameter grid search or pipelining.
The text was updated successfully, but these errors were encountered: