Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overload toDataFrame for basic types to avoid surprising results #314

Merged
merged 2 commits into from
Mar 24, 2023

Conversation

koperagen
Copy link
Collaborator

Without overloads Iterable.toDataFrame produces unexpected result

val string = listOf("aaa", "aa", null)
string.toDataFrame()

=>
length
0 3
1 2
2 null

I think it will be more reasonable to create a dataframe with a single column for all basic types (numbers, strings, char, boolean), and do reflection scan for everything else

@koperagen koperagen marked this pull request as ready for review March 20, 2023 22:28
Copy link
Collaborator

@Jolanrensen Jolanrensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we need to take care of primitives. A list of strings gives me a dataframe of string lengths without the new overloads haha.
However, do all of these need to be overloads for toDataFrame? Or can these just be moved to createDataFrameImpl?

@koperagen
Copy link
Collaborator Author

I agree we need to take care of primitives. A list of strings gives me a dataframe of string lengths without the new overloads haha. However, do all of these need to be overloads for toDataFrame? Or can these just be moved to createDataFrameImpl?

I don't think so. At east we'll lose the ability to return this typed dataframe

@Jolanrensen
Copy link
Collaborator

I agree we need to take care of primitives. A list of strings gives me a dataframe of string lengths without the new overloads haha. However, do all of these need to be overloads for toDataFrame? Or can these just be moved to createDataFrameImpl?

I don't think so. At east we'll lose the ability to return this typed dataframe

Of course! You're right :)

Copy link
Collaborator

@Jolanrensen Jolanrensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks good!
Btw, maybe we should also add overloads for (primitive) arrays toDataFrame, since they aren't iterables

}.cast()

@DataSchema
public interface ValueProperty<T> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could maybe be moved to a separate file, similar to KeyValueProperty :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer it here, doesn't have much value on its own i think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought the same about KeyValueProperty, but who knows, we cannot know everything a user might want :) And a value-only DataSchema seems to me exactly the sort of thing people might want to use (or create themselves)

@koperagen
Copy link
Collaborator Author

The rest looks good! Btw, maybe we should also add overloads for (primitive) arrays toDataFrame, since they aren't iterables

Not sure, so let's return to it later c: I'll merge this

@koperagen koperagen merged commit b8b913c into master Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants