Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'index_col' parameter to DataFrame.to_spark #906

Merged
merged 4 commits into from Oct 9, 2019

Conversation

HyukjinKwon
Copy link
Member

This PR adds index_col parameter to DataFrame.to_spark. This PR relates to #901.

After this PR, we can do a roundtrip in Koalas and Spark DataFrame easily.

>>> import databricks.koalas as ks
>>> kdf = ks.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]})
>>> sdf = kdf.to_spark(index_col="index").filter("a == 2")  # PySpark API
>>> kdf = sdf.to_koalas(index_col="index")
>>> kdf
       a  b  c
index
1      2  5  8

databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Outdated Show resolved Hide resolved
databricks/koalas/frame.py Show resolved Hide resolved
databricks/koalas/internal.py Outdated Show resolved Hide resolved
databricks/koalas/internal.py Outdated Show resolved Hide resolved
# A function to turn given numbers to Spark columns that represent Koalas index.
SPARK_INDEX_NAME_FORMAT = "__index_level_{}__".format
# A pattern to check if the name of a Spark column is a Koalas index name or not.
SPARK_INDEX_NAME_PATTERN = re.compile(r"__index_level_[0-9]+__")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are not related actually ..

@softagram-bot
Copy link

Softagram Impact Report for pull/906 (head commit: 29dab9e)

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

📄 Full report

Impact Report explained. Give feedback on this report to support@softagram.com

@codecov-io
Copy link

codecov-io commented Oct 8, 2019

Codecov Report

Merging #906 into master will decrease coverage by 0.02%.
The diff coverage is 87.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #906      +/-   ##
==========================================
- Coverage    94.3%   94.27%   -0.03%     
==========================================
  Files          34       34              
  Lines        6213     6239      +26     
==========================================
+ Hits         5859     5882      +23     
- Misses        354      357       +3
Impacted Files Coverage Δ
databricks/koalas/series.py 95.29% <100%> (-0.28%) ⬇️
databricks/koalas/internal.py 95.92% <80%> (+0.05%) ⬆️
databricks/koalas/groupby.py 90.84% <80%> (ø) ⬆️
databricks/koalas/frame.py 95.94% <92.85%> (-0.02%) ⬇️
databricks/koalas/missing/series.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e38e54...29dab9e. Read the comment docs.

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ueshin
Copy link
Collaborator

ueshin commented Oct 9, 2019

Thanks! merging.

@ueshin ueshin merged commit a540d8b into databricks:master Oct 9, 2019
@HyukjinKwon
Copy link
Member Author

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants