Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrameInternal - Using OrderedCollection over Array2D #105

AtharvaKhare opened this issue Jun 20, 2019 · 1 comment

DataFrameInternal - Using OrderedCollection over Array2D #105

AtharvaKhare opened this issue Jun 20, 2019 · 1 comment


Copy link

@AtharvaKhare AtharvaKhare commented Jun 20, 2019

DataFrameInternal currently uses Array2D (Previously it used Matrix #44)

Is there any specific reason such as speed/functionality for choosing Array2D?

Currently, while adding/removing a row, entire dataframe gets re-created. This becomes problematic for large data - eg: reading a csv file with thousands of rows results in calling addRow for every row in csv. DataFrameInternal is recreated for every such call.

I think using OrderedCollection would be better, since we can add elements at arbitrary indices. Are there any negatives for using OrderedCollection?

Copy link
Contributor Author

@AtharvaKhare AtharvaKhare commented Jun 21, 2019

The way I was thinking of implementing this is having column-oriented OrderedCollections and contents will also be an OrderedCollection which holds these columns.

To access a row, fetch it's index, and iterate through contents, fetching index-th element for every column.

Will have to do a detailed performance profiling to see speed-downs in fetching row (if any), and speed-ups in adding rows.


@olekscode olekscode added this to To do in DataFrame Jul 26, 2021
@olekscode olekscode added this to the v3.0 milestone Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants