Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicate 20 newsgroup classification w/ scikit-learn + Anserini #1120

Closed
lintool opened this issue Apr 26, 2020 · 3 comments
Closed

Replicate 20 newsgroup classification w/ scikit-learn + Anserini #1120

lintool opened this issue Apr 26, 2020 · 3 comments

Comments

@lintool
Copy link
Member

lintool commented Apr 26, 2020

Let's try and replicate 20 newsgroup classification w/ scikit-learn using Anserini:
https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

That is, use Anserini to extract the tf-idf vectors that feed the classifiers in scikit-learn.

Whoever is interested in taking this on, I can guide step by step, but the first step is to write Collection class to index 20 newsgroup in its raw original format.

@yuki617
Copy link
Member

yuki617 commented Apr 30, 2020

Currently working on this.

@yuki617
Copy link
Member

yuki617 commented May 11, 2020

This is the link for the jupyter notebook which replicated the 20newsgroup https://github.com/yuki617/anserini/blob/tfidf/20newgroup_replication.ipynb

@lintool
Copy link
Member Author

lintool commented May 11, 2020

Closing this issue and move over to castorini/pyserini#99

The Anserini side has been completed, the rest to be done on the Pyserini end.

@lintool lintool closed this as completed May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants