Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrameInternal - Using OrderedCollection over Array2D #105

Open
AtharvaKhare opened this issue Jun 20, 2019 · 1 comment

Comments

@AtharvaKhare
Copy link
Contributor

commented Jun 20, 2019

DataFrameInternal currently uses Array2D (Previously it used Matrix #44)

Is there any specific reason such as speed/functionality for choosing Array2D?

Currently, while adding/removing a row, entire dataframe gets re-created. This becomes problematic for large data - eg: reading a csv file with thousands of rows results in calling addRow for every row in csv. DataFrameInternal is recreated for every such call.

I think using OrderedCollection would be better, since we can add elements at arbitrary indices. Are there any negatives for using OrderedCollection?

@AtharvaKhare

This comment has been minimized.

Copy link
Contributor Author

commented Jun 21, 2019

The way I was thinking of implementing this is having column-oriented OrderedCollections and contents will also be an OrderedCollection which holds these columns.

To access a row, fetch it's index, and iterate through contents, fetching index-th element for every column.

Will have to do a detailed performance profiling to see speed-downs in fetching row (if any), and speed-ups in adding rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.