You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
greedily search for word-combination (max: max-phrase-length) having minimum distance from the centroid.
each time vectorize the combination.
find the closest semantically closest word of the new formed phrase / word.
For proper nouns, check if present in dictionary. else, fingerspell.
NOTE: Don't count the stop words.
API Design Overview:
class SimSearch: properties-> vectorizer, clusterDB, dictVectDB, MAX_LENGTH_PHRASE (from config)
simSearch.splitTokens(sentence): iterates over each consecutive word combination and keeps track of minimum distance from the center. stop at MAX_PHRASE_LENGTH / ProperNoun reached, which ever earlier.
simSearch.tagPOS(word): Helper function for pos tagging.
simSearch.queryVideoClip(token): Query DB to obtain video clip from token.
simSearch.fingerspell(character_token): helper function to fingerspell mp4 of the character_token.
simSearch.query(setence): Stitch everything together and return list of video links. (in-order)
NOTE: Can be made faster by creating parallel jobs to work on splices of length MAX_LENGTH_PHRASE. if stopped before MAX_LENGTH_PHRASE, can be kept in a Queue with a position index in the token list.
The text was updated successfully, but these errors were encountered:
Overview:
Tasks: (similarity metric -> 0 for perfect match)
NOTE: Don't count the stop words.
API Design Overview:
class SimSearch
: properties-> vectorizer, clusterDB, dictVectDB, MAX_LENGTH_PHRASE (from config)simSearch.splitTokens(sentence)
: iterates over each consecutive word combination and keeps track of minimum distance from the center. stop at MAX_PHRASE_LENGTH / ProperNoun reached, which ever earlier.simSearch.tagPOS(word)
: Helper function for pos tagging.simSearch.queryVideoClip(token)
: Query DB to obtain video clip from token.simSearch.fingerspell(character_token)
: helper function to fingerspell mp4 of thecharacter_token
.simSearch.query(setence)
: Stitch everything together and return list of video links. (in-order)NOTE: Can be made faster by creating parallel jobs to work on splices of length MAX_LENGTH_PHRASE. if stopped before MAX_LENGTH_PHRASE, can be kept in a Queue with a position index in the token list.
The text was updated successfully, but these errors were encountered: