Address benchmark inconsistencies in Annoy tutorial #1105 #1113

droudy · 2017-01-29T01:14:02Z

Uses average query time of 1000 random queries as opposed to only a single query. Includes a "dry run" before running queries. Also fixes a discrepancy where a comment says that the vector for "army" is being retrieved when the word is actually "science". Benchmarks were ran on a 2.4GHz 4 core i7 processor.

piskvorky · 2017-01-29T02:35:15Z

docs/notebooks/annoytutorial.ipynb

+      "Gensim: 0.007451029\n",
+      "Annoy: 0.002149934\n",
+      "\n",
+      "Annoy is 3.46570127269 times faster on average over 1000 random queries\n"


The focus and emphasis on such a level of precision is misleading (and unnecessary).

Also, please mention the other factors that affect this number, like index size etc. So people don't go away thinking "annoy is ~3.5x faster than gensim", whereas in reality this is anything between 1x-infinity.

@piskvorky Should I round to a smaller decimal place or leave the exact figure out completely?

I'd say round to a smaller decimal place, plus include a fat disclaimer that this number is by no means "constant" :)

It's completely incidental to this dataset, BLAS setup, Annoy parameters etc. The algos have fundamentally different complexity characteristics.

piskvorky · 2017-01-29T02:38:05Z

docs/notebooks/annoytutorial.ipynb

-      "('terrorism,', 0.6300898194313049)\n",
-      "('creditors', 0.6264415979385376)\n"
+      "('signature', 0.5921074748039246)\n",
+      "('\"dangerously', 0.5920691192150116)\n",


This looks like bad preprocessing. Any reason not to simply use utils.simple_preprocess?

piskvorky · 2017-01-30T05:38:13Z

This doesn't look right -- I still see "dangerously in the notebook as a token, which should never happen with simple_preprocess.

EDIT: disregard, github was showing me only partial changes. Thanks for the fixes 👍

droudy added 2 commits January 28, 2017 20:01

annoy tutorial benchmark fix

ac7b21b

fix running line

0e0a01f

droudy mentioned this pull request Jan 29, 2017

Annoy tutorial inconsistency #1105

Closed

piskvorky requested changes Jan 29, 2017

View reviewed changes

droudy added 2 commits January 29, 2017 10:44

add disclaimer and preprocessing

aa0ec0a

run all cells

c34aaa9

tmylk merged commit 6ece162 into piskvorky:develop Jan 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address benchmark inconsistencies in Annoy tutorial #1105 #1113

Address benchmark inconsistencies in Annoy tutorial #1105 #1113

droudy commented Jan 29, 2017 •

edited

Loading

piskvorky Jan 29, 2017

droudy Jan 29, 2017

piskvorky Jan 29, 2017 •

edited

Loading

piskvorky Jan 29, 2017 •

edited

Loading

piskvorky commented Jan 30, 2017 •

edited

Loading

Address benchmark inconsistencies in Annoy tutorial #1105 #1113

Address benchmark inconsistencies in Annoy tutorial #1105 #1113

Conversation

droudy commented Jan 29, 2017 • edited Loading

piskvorky Jan 29, 2017

Choose a reason for hiding this comment

droudy Jan 29, 2017

Choose a reason for hiding this comment

piskvorky Jan 29, 2017 • edited Loading

Choose a reason for hiding this comment

piskvorky Jan 29, 2017 • edited Loading

Choose a reason for hiding this comment

piskvorky commented Jan 30, 2017 • edited Loading

droudy commented Jan 29, 2017 •

edited

Loading

piskvorky Jan 29, 2017 •

edited

Loading

piskvorky Jan 29, 2017 •

edited

Loading

piskvorky commented Jan 30, 2017 •

edited

Loading