Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic support for non-string names. #1784

Merged
merged 8 commits into from
Sep 26, 2020
Merged

Conversation

ueshin
Copy link
Collaborator

@ueshin ueshin commented Sep 22, 2020

Currently names in Koalas, e.g., df.columns, df.colums.names, df.index.names, need to be string or tuple of string, but it should allow other data types which are supported by Spark.

before:

>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']])
>>> kdf.columns
Index(['0', '1'], dtype='object')

after:

>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']])
>>> kdf.columns
Int64Index([0, 1], dtype='int64')

@ueshin ueshin marked this pull request as ready for review September 24, 2020 01:34
Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks fine, but I wonder maybe if it wouldn't matter to abandon some of the features we could have supported before ??

databricks/koalas/tests/test_series.py Outdated Show resolved Hide resolved
databricks/koalas/tests/test_series.py Outdated Show resolved Hide resolved
@@ -1066,7 +1064,7 @@ def drop(self, labels):
>>> index.drop([1])
Int64Index([2, 3], dtype='int64')
"""
if not isinstance(labels, (tuple, list)):
if not is_list_like(labels): # TODO: tuple?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we don't need to consider tuple here because Index cannot have tuple value in Koalas anyway ??

>>> ks.Index([(1, 2), 3])  # Cannot have a tuple for Index as mixed type
Traceback (most recent call last):
...
pyarrow.lib.ArrowInvalid: Could not convert (1, 2) with type tuple: did not recognize Python value type when inferring an Arrow data type

>>> ks.Index([(1, 2), (3, 4)])  # This is not gonna be an Index but MultiIndex
MultiIndex([(1, 2),
            (3, 4)],
           )

Oh, but if we had any plan for supporting tuple for values, I think It's okay to remain the TODO as it is.

databricks/koalas/indexes.py Outdated Show resolved Hide resolved
@ueshin
Copy link
Collaborator Author

ueshin commented Sep 25, 2020

Let me revert some commits. Maybe we should discuss input checks separately.

@ueshin ueshin changed the title Support non-string names. Basic support for non-string names. Sep 25, 2020
Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good enough in terms of Basic support !

Let's discuss the details separately.

@ueshin
Copy link
Collaborator Author

ueshin commented Sep 26, 2020

Thanks! I'd merge this now. I'll submit PRs for each input check.

@ueshin ueshin merged commit 043978a into databricks:master Sep 26, 2020
@ueshin ueshin deleted the non-string_names branch September 26, 2020 00:43
@ueshin ueshin mentioned this pull request Oct 15, 2020
HyukjinKwon pushed a commit that referenced this pull request Oct 16, 2020
Fixes type annotations.
After ##1784, those should accept non-string tuples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants