Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added some name repairing strategies #386

Merged
merged 5 commits into from
Jun 6, 2023
Merged

Conversation

zaleslaw
Copy link
Collaborator

Fixed #342

@zaleslaw zaleslaw marked this pull request as ready for review May 31, 2023 12:55
@zaleslaw zaleslaw requested a review from Jolanrensen June 5, 2023 09:38
@zaleslaw
Copy link
Collaborator Author

zaleslaw commented Jun 5, 2023

Need to say, that this is a solution for one IO only to solve user needs, but it's a good idea to make it generic and move later to API module to be available for other IO, because it's not a guarantee that other IO can handle properly the empty names, for example

* This strategy defines how the repeatable name column will be handled
* during the creation new dataframe from the IO sources.
*/
public enum class NameRepairStrategy {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear that it targets duplicate names just by looking at the enum class name. Maybe call it DuplicateNameStrategy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many dataframe frameworks it's named as NameRepairing, better to keep naming to reduce learning curve

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "repairing" insinuates a name is "broken", e.g. unsupported characters etc. This is definitely different IMO. Can you give an example from another library?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! Well, if in the future we also add "universal", which fixes name syntax too, I'm fine with the name. Then it won't be just about uniqueness

/** No actions, keep as is. */
NO,

/** Check the uniqueness of the column names without any actions. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this would then throw an exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed solution don't do that, and throwing of exception is happened in totally another place, this is why I could not guarantee that it happens (but for now happens)

* when the functionality will be enabled for all IO sources.
*/
private fun repairNameIfRequired(nameFromCell: String, columnNameCounters: MutableMap<String, Int>, nameRepairStrategy: NameRepairStrategy): String {
return when(nameRepairStrategy) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what problem with linting do you mean? I run it on my machine, there is no problems, also on TC

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no space after when, same with the ifs a bit below. No clue why the linter is not picking those up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will check twice, thank you

@zaleslaw
Copy link
Collaborator Author

zaleslaw commented Jun 6, 2023

Thanks for the review, good points @Jolanrensen

@zaleslaw zaleslaw merged commit 4875411 into Kotlin:master Jun 6, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disambiguate duplicated column names when reading data
2 participants