Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a NoSQL data storage #22

Closed
AddictedCS opened this issue Oct 5, 2013 · 2 comments
Closed

Add support for a NoSQL data storage #22

AddictedCS opened this issue Oct 5, 2013 · 2 comments
Assignees
Milestone

Comments

@AddictedCS
Copy link
Owner

Current backend, MSSQL, has its limits in terms of performance and scalability. Storing large number of songs in a database requires a data storage which is easily scalable horizontally, and has a good performance index.

After studying a large number of NoSQL solutions, would recommend MongoDB for the winner. REDIS would fit perfect for a caching layer, as it seems to perform not as well as other NoSQL database systems when stored set exceeds the amount of RAM available on dedicated machine. Possible amount of songs and fingerprints will definitely exceed RAM capacity. CouchDB seems very similar to Mongo. Because we don't really need master - master replication, as the data will be mostly inserted (with no frequent updates), would go for Mongo as well. Other database system has also been analyzed, though mainly these 3 have been considered as good candidates.

  • Use Aggregation capabilities in order to perform main query on HashTable - HashBin
  • Add complex index on HashTable, HashBin in order to burst performance
  • Map data models to the same structure as in MSSQL (connect entities via references)
  • Would store Album Title as a part of Track to minimize lookups for Album. In case if the album is not known, do not store untitled
  • Sharding (if required) will be performed over the HashTables (25 elements fit perfectly)
@ghost ghost assigned AddictedCS Oct 5, 2013
@AddictedCS
Copy link
Owner Author

Cassandra with Hadoop has been chosen as the winner for this particular issue. Implementation is in progress. Vagrant scripts has bee written to start a Cassandra cluster.

@AddictedCS
Copy link
Owner Author

MongoDb added, though its performance has to be reviewed. Another issues has been created for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant