Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretrained dumps as download #4

Open
malteos opened this issue Apr 25, 2019 · 11 comments
Open

Pretrained dumps as download #4

malteos opened this issue Apr 25, 2019 · 11 comments

Comments

@malteos
Copy link

malteos commented Apr 25, 2019

Thanks for the amazing work!

Can you provide pretrained dumps as download? That would make it much easier to test everything locally without the training step.

@wetneb
Copy link
Member

wetneb commented Apr 25, 2019

Yes! I thought I had done that but now I remember that I got stuck by the file size limit on GitHub. I will see if I can upload that on Zenodo.

@wetneb
Copy link
Member

wetneb commented Apr 25, 2019

I have added the PageRank vector and the bag of words model here: https://github.com/wetneb/opentapioca/releases/tag/v0.1.0
Those are the most expensive parts to train - indexing Wikidata in Solr should only take a few hours (if you restrict the types of the items as the sample profiles do).

@LiuYuLOL
Copy link

LiuYuLOL commented Apr 15, 2020

Hey, is it possible to provide the pre-trained classifier(s)? As far as I knew, with the provided two models, I still need a dataset-specific classifier.

While I'm just interested in using it on some plain text, e.g. news articles, Wikipedia pages, etc. So could you provide one classifier, e.g., the one used in the current web API? @wetneb (sorry for at you, but I'm not sure whether you can get a notification if I just write a comment.)

@wetneb
Copy link
Member

wetneb commented Apr 15, 2020

Sure, I have added one to the release page.

@LiuYuLOL
Copy link

LiuYuLOL commented Apr 15, 2020

Sure, I have added one to the release page.

Great! Thanks for the help. The model passed the test and is run with a web server at localhost:8457.

@eracle
Copy link
Collaborator

eracle commented Aug 9, 2022

Thanks, I will include them in the docker build process! are there any other files that could speed it up?

At the moment I have the following steps that I would like to replace with a downloadable artefact:

  • Training BOW
  • Preprocessing
  • Creating Wikidata graph
  • Compiling Wikidata graph
  • Computing Pagerank

@senisioi
Copy link

@wetneb do you remember which sklearn version you used to pickle those models?
Loading sample_classifier.pkl is not possible with sklearn==1.1.2

@eracle
Copy link
Collaborator

eracle commented Oct 24, 2022

if you check my branch there is the working version pinned:

https://github.com/eracle/opentapioca

btw
scikit-learn==0.20.3

@ziodave
Copy link
Contributor

ziodave commented Jan 14, 2023

You can convert to sample classifier file to the latest sklearn 1.2 by running a couple of bbe (binary search/replace):

  1. bbe -e "s/sklearn.preprocessing.data/sklearn.preprocessing/g"
  2. bbe -e "s/sklearn.svm.classes/sklearn.svm/g" sample_classifier.pkl

@banyous
Copy link

banyous commented Apr 3, 2024

Hi all,

it seems that this pretrained files release no longer compatible with recent versions of scikit-learn (>1.3), and the solution @ziodave suggests doesn't work for me.

Is there a plan or a possibility for an updated pretrained files that are compatible with the recent versions of scikit-learn?

Thanks!

@wetneb
Copy link
Member

wetneb commented Apr 3, 2024

I'm not working on this project at the moment but I would be happy to publish models if someone could train them :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants