Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for unigram sentencepiece model #51

Merged
merged 36 commits into from
Jul 20, 2020

Conversation

tejasvaidhyadev
Copy link
Member

@tejasvaidhyadev tejasvaidhyadev commented Jul 4, 2020

As discussed in #44
I am adding support for the sentencepiece unigram model with DataDeps to download pre-trained vocab file with their log-likelihood
for more detail refer to the blog post here
Roadmap

  • DataDeps for Downloading Vocab or .protofile
  • unigram encoder algorithm implementation in Julia
  • Test and doc

@tejasvaidhyadev
Copy link
Member Author

  • Added Datadeps dependency for downloading vocabulary file
  • For now, I am hosting the vocabulary file on git.

@tejasvaidhyadev
Copy link
Member Author

Hi @Ayushk4,
should I add docs in Readme?

@Ayushk4
Copy link
Member

Ayushk4 commented Jul 13, 2020

Travis CI fails on Julia 1.0, due to new DataDeps as dependency - requires updating Project.toml.

@Ayushk4
Copy link
Member

Ayushk4 commented Jul 13, 2020

@oxinabox Should we drop support for 0.7 tests now in Appveyor?

@Ayushk4
Copy link
Member

Ayushk4 commented Jul 13, 2020

Since DataDeps have been newly added, Travis and Appveyor also need to be updated to allow DataDeps downloads
Check how to set variables for CI here -
https://github.com/JuliaText/CorpusLoaders.jl/blob/master/appveyor.yml#L2
https://github.com/JuliaText/CorpusLoaders.jl/blob/master/.travis.yml#L7

@oxinabox
Copy link
Member

@oxinabox Should we drop support for 0.7 tests now in Appveyor?

Yes

Copy link
Member

@Ayushk4 Ayushk4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look into setting DATADEPS variable for Travis and Appveyor CI. Also remove 0.7 Julia tests for appveyor.

tejasvaidhyadev and others added 5 commits July 14, 2020 05:57
Co-authored-by: Ayush Kaushal <ayushk4@gmail.com>
Co-authored-by: Ayush Kaushal <ayushk4@gmail.com>
@tejasvaidhyadev tejasvaidhyadev requested a review from Ayushk4 July 14, 2020 01:29
tejasvaidhyadev and others added 5 commits July 16, 2020 19:37
Co-authored-by: Lyndon White <oxinabox@ucc.asn.au>
Co-authored-by: Lyndon White <oxinabox@ucc.asn.au>
Co-authored-by: Lyndon White <oxinabox@ucc.asn.au>
@tejasvaidhyadev tejasvaidhyadev requested a review from oxinabox July 16, 2020 21:20
@tejasvaidhyadev
Copy link
Member Author

Thank you.
The suggested changes have been made and I hope PR is ready for further review

Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool just a few last things this this should be good to go.

@Ayushk4 can merge once happy with it.

Copy link
Member

@Ayushk4 Ayushk4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@tejasvaidhyadev tejasvaidhyadev changed the title Adding support for unigram sentencepiece model [WIP] Adding support for unigram sentencepiece model Jul 20, 2020
@Ayushk4 Ayushk4 merged commit 09d20a0 into JuliaText:master Jul 20, 2020
@tejasvaidhyadev tejasvaidhyadev deleted the stats branch September 5, 2020 22:20
@tejasvaidhyadev tejasvaidhyadev restored the stats branch September 5, 2020 22:20
@tejasvaidhyadev tejasvaidhyadev deleted the stats branch September 5, 2020 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants