You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first one is all I'm directly in need of right now, but it feels worthwhile to work out a standard for all of them while I'm at it.
I don't want to make the functions too complicated to understand, but it feels like maybe it should be one set of textdata functions (download_glove, process_glove, dataset_glove), with arguments about the specifics (something like dataset_glove({normal stuff plus}, token_set, dimensions)).
Let me know what you think and I can knock this out (I'm doing it anyway for personal/work use, so formalizing it won't be a lot of extra work).
The text was updated successfully, but these errors were encountered:
It looks like each download comes with everything zipped. So I would create 4 user facing functions. Lets prefix them with embedding_ . so we get embedding_glove6b(), embedding_glove42b() etc etc.
Ok, that sounds good. The downloads will be separate, but then I'll put a parameter in the dataset_ function to just load the appropriate sub-dataset (for 6b and 27b). I should have a PR for this within the next couple hours, depending on what other distractions come up.
I'd like to add the GloVe pre-trained word vectors, for use in tidymodels/textrecipes#20
The datasets are available here: https://nlp.stanford.edu/projects/glove/
There are 4 downloads, that break down like this:
The first one is all I'm directly in need of right now, but it feels worthwhile to work out a standard for all of them while I'm at it.
I don't want to make the functions too complicated to understand, but it feels like maybe it should be one set of textdata functions (download_glove, process_glove, dataset_glove), with arguments about the specifics (something like
dataset_glove({normal stuff plus}, token_set, dimensions)
).Let me know what you think and I can knock this out (I'm doing it anyway for personal/work use, so formalizing it won't be a lot of extra work).
The text was updated successfully, but these errors were encountered: