Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-visiting the architecture of DC #554

Open
arya-hemanshu opened this issue Mar 27, 2018 · 1 comment
Open

Re-visiting the architecture of DC #554

arya-hemanshu opened this issue Mar 27, 2018 · 1 comment

Comments

@arya-hemanshu
Copy link
Contributor

Description

Consider an example of a table of size 6*8, where there are 8 attributes and 6 subjects. In current implementation we are saving combination of every subject with every attribute, which for a table this small does 48 operations instead of just 6 as subject is common for all the 8 attributes of a single record. This significantly slows down the saving of the records in the database when the dataset is too large e.g an excel sheet with 100000 rows and 50 attributes, to save this dataset to database DC run 5 million operations instead of just 100 thousand.

Error log

NA

@arya-hemanshu
Copy link
Contributor Author

arya-hemanshu commented Apr 10, 2018

While working on the issue, i took an approach a generating fixed_value and timed_value tables at runtime for every datasource.

Benefits:

  • As explained in the issue it takes care of the issue of saving same subject with multiple attributes which increases the number of transactions
  • For every datasource there would be one table for timed_values and one for fixed_values which allows java to access these tables through multiple thread
  • Corrupting one datasource will not effect the other datasources.

Disadvantages:

  • Keeping track of all the tables that gets generated for every datasource
  • a lot of code refactoring
  • managing communication between different tables at runtime.
  • if the link of the datasource would change it would also change the random generated names for timed and fixed values tables that could create conflict.

Another approach that we could take is using a nosql database, which would allow the same database structure as DC currently has and would also address the above listed issues.

Because it is a schema less approach we can reduce the number the operations as one row could have 10 cols but the other could have 100.

There would be no need to generate tables at runtime.

However it would require re-writing the backend from scratch but it would be more robust approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants