Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cache dir configurable #67

Conversation

dhpollack
Copy link
Contributor

I would like to make the default cache dir configurable with an environmental variable. This is a simple PR to allow one to do that with the variable DACY_CACHE_DIR.

@KennethEnevoldsen
Copy link
Collaborator

Thanks @dhpollack, I will take a look the coming week

Copy link
Collaborator

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the PR. Will merge it assuming all tests pass.

@dhpollack
Copy link
Contributor Author

@KennethEnevoldsen I am not really sure why this is failing CI. I don't have a windows machine so I can't really test it.

@KennethEnevoldsen
Copy link
Collaborator

@dhpollack I don't think it is your fix. It seems to me that there is a problem with downloading some of the sentiment models thus it is a problem with the dependencies or a server being down when the tests were run (i.e. there is a good chance that running these again or updating a link somewhere will fix the problem). Regardless, you don't need to fix anything I will have to take a closer look at this, it will likely be sometime this week. I Will let you know if there is a conflict with the current codebase.

@dhpollack
Copy link
Contributor Author

@dhpollack I don't think it is your fix. It seems to me that there is a problem with downloading some of the sentiment models thus it is a problem with the dependencies or a server being down when the tests were run (i.e. there is a good chance that running these again or updating a link somewhere will fix the problem). Regardless, you don't need to fix anything I will have to take a closer look at this, it will likely be sometime this week. I Will let you know if there is a conflict with the current codebase.

Got it. Yea, my colleague is having an issue with a bad SSL certificate today so maybe that is related. Thanks for the quick response.

@KennethEnevoldsen
Copy link
Collaborator

@dhpollack your colleague's problem might be fixable by setting open_unverified_connection=True. The default is False for security reasons. You can for example check the documentation of the emotion model for more on this.

@dhpollack
Copy link
Contributor Author

@dhpollack your colleague's problem might be fixable by setting open_unverified_connection=True. The default is False for security reasons. You can for example check the documentation of the emotion model for more on this.

Thanks, that option did actually solve the issue for us although if you are not using that on the CI, I would think it is something that should be fixed. Do you have an idea when the CI will work again and when we could merge this PR? Currently, I am using a bit of a hack with soft links to simulate this but it's really not a pretty solution.

@KennethEnevoldsen
Copy link
Collaborator

Extended this PR in another branch and it is thus moved to PR #70. The is a couple of good things I got to add here:

  1. The cache dir is now configurable! 🎉
  2. I have removed DaNLP as a dependency and now download their models directly from the Huggingface model hub where they were recently added. This has multiple benefits. For your colleague, it means that the open_unverified_connection argument is no longer necessary.
  3. I removed the readability module, if you or your colleagues use that I highly recommend the textdescriptives package developed @HLasse, and I

Turns out that the bug originated in the new version of spacy-transformers, I have added an issue on this #71, for now, I have now restricted to version numbers of packages to avoid breaking changes.

@dhpollack
Copy link
Contributor Author

@KennethEnevoldsen awesome work! Yea, we noticed the issue with the dependencies as well and pinned them ourselves. Also thanks for the help.

Just as a mini-fyi, instead of closing my PR and then opening another with a different target branch, you can click the "Edit" button at the PR and change the target merge branch. This has the advantage of keeping this discussion linked to the merge commit in the main branch and more importantly let's me have the glory of having a commit in the premiere Danish language NLP library. 😄

@KennethEnevoldsen
Copy link
Collaborator

The edits were a bit more extensive than what would have been possible in an edit. Had to go through a whole debugging session to figure out if the #71 issue was fixable.

However, credit is (extremely) important in open-science communities, therefore my dev branch used a PR (#69 ) from your branch to ensure that you are indeed listed as a contributor to DaCy 🎉:

Screenshot 2021-11-05 at 15 38 25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants