Pretrained dumps as download #4

malteos · 2019-04-25T12:42:03Z

Thanks for the amazing work!

Can you provide pretrained dumps as download? That would make it much easier to test everything locally without the training step.

wetneb · 2019-04-25T12:53:29Z

Yes! I thought I had done that but now I remember that I got stuck by the file size limit on GitHub. I will see if I can upload that on Zenodo.

wetneb · 2019-04-25T13:04:30Z

I have added the PageRank vector and the bag of words model here: https://github.com/wetneb/opentapioca/releases/tag/v0.1.0
Those are the most expensive parts to train - indexing Wikidata in Solr should only take a few hours (if you restrict the types of the items as the sample profiles do).

LiuYuLOL · 2020-04-15T05:10:22Z

Hey, is it possible to provide the pre-trained classifier(s)? As far as I knew, with the provided two models, I still need a dataset-specific classifier.

While I'm just interested in using it on some plain text, e.g. news articles, Wikipedia pages, etc. So could you provide one classifier, e.g., the one used in the current web API? @wetneb (sorry for at you, but I'm not sure whether you can get a notification if I just write a comment.)

wetneb · 2020-04-15T06:43:50Z

Sure, I have added one to the release page.

LiuYuLOL · 2020-04-15T07:23:30Z

Sure, I have added one to the release page.

Great! Thanks for the help. The model passed the test and is run with a web server at localhost:8457.

eracle · 2022-08-09T18:38:52Z

Thanks, I will include them in the docker build process! are there any other files that could speed it up?

At the moment I have the following steps that I would like to replace with a downloadable artefact:

Training BOW
Preprocessing
Creating Wikidata graph
Compiling Wikidata graph
Computing Pagerank

senisioi · 2022-10-24T09:48:19Z

@wetneb do you remember which sklearn version you used to pickle those models?
Loading sample_classifier.pkl is not possible with sklearn==1.1.2

eracle · 2022-10-24T12:16:12Z

if you check my branch there is the working version pinned:

https://github.com/eracle/opentapioca

btw
scikit-learn==0.20.3

ziodave · 2023-01-14T16:09:00Z

You can convert to sample classifier file to the latest sklearn 1.2 by running a couple of bbe (binary search/replace):

bbe -e "s/sklearn.preprocessing.data/sklearn.preprocessing/g"
bbe -e "s/sklearn.svm.classes/sklearn.svm/g" sample_classifier.pkl

banyous · 2024-04-03T00:37:50Z

Hi all,

it seems that this pretrained files release no longer compatible with recent versions of scikit-learn (>1.3), and the solution @ziodave suggests doesn't work for me.

Is there a plan or a possibility for an updated pretrained files that are compatible with the recent versions of scikit-learn?

Thanks!

wetneb · 2024-04-03T06:37:00Z

I'm not working on this project at the moment but I would be happy to publish models if someone could train them :)

ziodave mentioned this issue Jan 14, 2023

opentapioca long-due contributions #53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrained dumps as download #4

Pretrained dumps as download #4

malteos commented Apr 25, 2019

wetneb commented Apr 25, 2019

wetneb commented Apr 25, 2019

LiuYuLOL commented Apr 15, 2020 •

edited

wetneb commented Apr 15, 2020

LiuYuLOL commented Apr 15, 2020 •

edited

eracle commented Aug 9, 2022 •

edited

senisioi commented Oct 24, 2022

eracle commented Oct 24, 2022

ziodave commented Jan 14, 2023 •

edited

banyous commented Apr 3, 2024 •

edited

wetneb commented Apr 3, 2024

Pretrained dumps as download #4

Pretrained dumps as download #4

Comments

malteos commented Apr 25, 2019

wetneb commented Apr 25, 2019

wetneb commented Apr 25, 2019

LiuYuLOL commented Apr 15, 2020 • edited

wetneb commented Apr 15, 2020

LiuYuLOL commented Apr 15, 2020 • edited

eracle commented Aug 9, 2022 • edited

senisioi commented Oct 24, 2022

eracle commented Oct 24, 2022

ziodave commented Jan 14, 2023 • edited

banyous commented Apr 3, 2024 • edited

wetneb commented Apr 3, 2024

LiuYuLOL commented Apr 15, 2020 •

edited

LiuYuLOL commented Apr 15, 2020 •

edited

eracle commented Aug 9, 2022 •

edited

ziodave commented Jan 14, 2023 •

edited

banyous commented Apr 3, 2024 •

edited