Skip to content

Commit

Permalink
Fixed documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
alexksikes committed Oct 18, 2012
1 parent 0c1ad40 commit 0188ef2
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ For example, for the query with the two animated movies, ["Lilo & Stitch" and "U

This module also adds the novel ability to combine full text queries with items. For example a query can be a combination of items and full text search keywords. In this case the results match the keywords and are re-ranked by similary to the queried items.

It is important to note that Bayesian Sets does not care about how the actual [feature][3] engineering. As an example SimSearch implements a simple [bag of words][4] model. However any other feature binary features are possible. In this case you will need to create the index directly. The index is a set of files in a .xco and .yco format (more in the [tutorial][6]) that represents the presence of a feature value in a given item. So as long as you can create these files, SimSearch can read them and perform the matching.
It is important to note that Bayesian Sets does not care about the actual [feature][3] engineering. For example SimSearch implements a simple [bag of words][4] model. However other feature types are possible as long as they can be binarized. In this case you will need to create the index directly. The index is a set of files in a .xco and .yco format (more in the [tutorial][6]) that represents the presence of a feature value in a given item. So as long as you can create these files, SimSearch can read them and perform the matching.

SimSearch has been [tested][5] on datasets with millions of documents and hundreds of thousands of features. Future plans include distributed search and real time indexing. For more information, feel free please to follow the [tutorial][6].

Expand Down
2 changes: 1 addition & 1 deletion simsearch/simsphinx.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ def _SetupSphinxClient(self, item_ids, log_scores):
self.sphinx_setup(self.wrap_cl)

def _AddStats(self, sphinx_results, item_ids):
scores = self._GetDetailedScores(sphinx_results['ids'], item_ids)
scores = self._GetDetailedScores([match['id'] for match in sphinx_results['matches']], item_ids)
for scores, match in zip(scores, sphinx_results['matches']):
match['attrs']['@sim_scores'] = scores

Expand Down
6 changes: 3 additions & 3 deletions tutorial/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,13 +160,13 @@ First you need to install [Sphinx][2] and [fSphinx][3].

After you have installed Sphinx, let it index data (assuming Sphinx indexer is in /user/local/sphinx/):

/usr/local/sphinx/bin/indexer -c ./config/indexer.conf --all
/usr/local/sphinx/bin/indexer -c ./config/sphinx_indexer.conf --all

And now let searchd serve the index:

/usr/local/sphinx/bin/searchd -c ./config/indexer.conf
/usr/local/sphinx/bin/searchd -c ./config/sphinx_indexer.conf

Note that the "indexer.conf" must have an attribute called "log_scores_attr" set to 1 and declared as a float.
Note that the "sphinx_indexer.conf" must have an attribute called "log_scores_attr" set to 1 and declared as a float.

# log_score_attr must be set to 1
sql_query = \
Expand Down
File renamed without changes.

0 comments on commit 0188ef2

Please sign in to comment.