You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This function builds on the functions in this repository and the Go functions in aih/bills.
Assumptions:
Each 'document' consists of an array of strings. The document has a unique id and each item in the array is also uniquely identified (either by an id or its ordinal position in the array).
The length of each document array may vary
The generic similarity functions would:
Calculate a vocabulary of n-grams from the total corpus of documents (an array of documents).
Vectorize the documents so that they each document can be stored as a (sparse) array of the length of the vocabulary
Store the vectorized matrix of all documents in a pickle file (or eventually in Postgresql) (MOD- matrix of all documents)
Calculate the similarity between each item of each array and all other items in the MOD
Apply an item threshold to find similar items for each item in a document
Apply a document threshold to find similar documents
Return 5 and 6 in a model form that can be stored to a database (item-to-item and document-to-document similarity)
The text was updated successfully, but these errors were encountered:
This function builds on the functions in this repository and the Go functions in aih/bills.
Assumptions:
The generic similarity functions would:
The text was updated successfully, but these errors were encountered: