-
Notifications
You must be signed in to change notification settings - Fork 739
2022-01-25-word2vec_wiki_1000_fr #6818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2022-01-25-word2vec_wiki_1000_fr #6818
Conversation
|
@luca-martial |
|
@maziyarpanahi yes seems strange, but the outputs looked good on sample text when I tried it. This is what I did to save it, do you see any red flags? |
Thanks @luca-martial It seems fine if the file |
|
@maziyarpanahi no, the zip file on my machine is 4.0k, so I was just wondering if the WordEmbeddings annotator was responsible for compressing the original binary so much. the unzipped version is 24.0k |
|
OK so that python part make_archive might be the problem. If the directory you saved the model |
|
So what I was trying to say is that before archiving, the results of: |
I can't say for sure, but it definitely failed. It should be the exact same size. This is how it is in Scala, you missed to set storageRef, but that shouldn't be a problem here: val embeddings = new WordEmbeddings()
.setStoragePath("src/test/resources/random_embeddings_dim4.txt", ReadAs.TEXT)
.setStorageRef("glove_4d")
.setDimension(4)
.setInputCols("document", "token")
.setOutputCol("embeddings")My guess is that the .bin file might not be compatible. |
|
@maziyarpanahi The models were perfectly compatible, what Luca was saying is that they were working in Spark NLP but not being saved properly. The problem here was saving the model before fitting. After creating a pipeline, fitting and saving the model with |
|
Oh sorry, I missed the |
|
thanks guys, I'll close this PR and reupload now |
No description provided.