New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract datasets intro base-classes #19
Conversation
f9d0166
to
cf19a3a
Compare
@ncmeade @vaibhavad This appears to work now. Please review when you have time. |
I tried running pip3 install --no-index --find-links $HOME/python_wheels \
'numpy>=1.19.0' 'tqdm>=4.53.0' 'torch>=1.7.0' 'pytorch-lightning>=1.0.0' \
'spacy>=2.2.0' $HOME/python_wheels/en_core_web_sm-2.2.0.tar.gz 'torchtext>=0.6.0' \
'scikit-learn>=0.23.0' 'nltk>=3.5' 'gensim>=3.8.0' 'pandas>=1.1.0' I get the requirements error:
It looks like newer versions of We can fix this issue on Beluga by either constraining pip3 download --no-deps 'fsspec[http]>=0.8.1' 'idna-ssl>=1.0' I can't remember, but is there a reason why we don't pre-download all of the dependencies for |
Contrain it to
Some of the dependencies are allready available via the shared wheel house. If we download all the dependencies then I don't think it will use the shared wheel house. The shared wheel house is prefered because it is optimized for the CPU arch, etc.. |
Minor and unrelated to this PR: I think we should also change the constraint on This is only an issue when running the code on my local machine as the latest version of |
@ncmeade Yes, it should just be |
Thanks for the fixes, the changes Looks Good To Me. |
This looks good to merge to me. I ran the dataset pre-processing in a clean environment and trained baseline models for SST, IMDB, and Babi as a sanity check and everything looked reasonable. |
I need the datasets to be subclasses for TorchScript support, and we will anyway need it if we add more datasets. Unfortunately the cache will have to be rebuild, becuase this changes some of the
.pkl
file formats and filenames.work in progress: Everything should work. I ran a few iterations locally and rebuild the cache. I need to rerun experiments on compute-canada to make sure it works completely.