Permalink
Browse files

Fixed documentation.

  • Loading branch information...
1 parent 0c1ad40 commit 0188ef2126d888e866e6e7fbbd55bb784976f949 @alexksikes committed Oct 18, 2012
View
@@ -4,7 +4,7 @@ For example, for the query with the two animated movies, ["Lilo & Stitch" and "U
This module also adds the novel ability to combine full text queries with items. For example a query can be a combination of items and full text search keywords. In this case the results match the keywords and are re-ranked by similary to the queried items.
-It is important to note that Bayesian Sets does not care about how the actual [feature][3] engineering. As an example SimSearch implements a simple [bag of words][4] model. However any other feature binary features are possible. In this case you will need to create the index directly. The index is a set of files in a .xco and .yco format (more in the [tutorial][6]) that represents the presence of a feature value in a given item. So as long as you can create these files, SimSearch can read them and perform the matching.
+It is important to note that Bayesian Sets does not care about the actual [feature][3] engineering. For example SimSearch implements a simple [bag of words][4] model. However other feature types are possible as long as they can be binarized. In this case you will need to create the index directly. The index is a set of files in a .xco and .yco format (more in the [tutorial][6]) that represents the presence of a feature value in a given item. So as long as you can create these files, SimSearch can read them and perform the matching.
SimSearch has been [tested][5] on datasets with millions of documents and hundreds of thousands of features. Future plans include distributed search and real time indexing. For more information, feel free please to follow the [tutorial][6].
View
@@ -132,7 +132,7 @@ def _SetupSphinxClient(self, item_ids, log_scores):
self.sphinx_setup(self.wrap_cl)
def _AddStats(self, sphinx_results, item_ids):
- scores = self._GetDetailedScores(sphinx_results['ids'], item_ids)
+ scores = self._GetDetailedScores([match['id'] for match in sphinx_results['matches']], item_ids)
for scores, match in zip(scores, sphinx_results['matches']):
match['attrs']['@sim_scores'] = scores
View
@@ -160,13 +160,13 @@ First you need to install [Sphinx][2] and [fSphinx][3].
After you have installed Sphinx, let it index data (assuming Sphinx indexer is in /user/local/sphinx/):
- /usr/local/sphinx/bin/indexer -c ./config/indexer.conf --all
+ /usr/local/sphinx/bin/indexer -c ./config/sphinx_indexer.conf --all
And now let searchd serve the index:
- /usr/local/sphinx/bin/searchd -c ./config/indexer.conf
+ /usr/local/sphinx/bin/searchd -c ./config/sphinx_indexer.conf
-Note that the "indexer.conf" must have an attribute called "log_scores_attr" set to 1 and declared as a float.
+Note that the "sphinx_indexer.conf" must have an attribute called "log_scores_attr" set to 1 and declared as a float.
# log_score_attr must be set to 1
sql_query = \
File renamed without changes.

0 comments on commit 0188ef2

Please sign in to comment.