Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert df to pyspark DataFrame if it is koalas before writing #321

Merged
merged 7 commits into from
Sep 27, 2022

Conversation

dbeatty10
Copy link
Contributor

@dbeatty10 dbeatty10 commented Sep 24, 2022

resolves #320

Description

If the DataFrame to materialize as a table has type databricks.koalas.frame.DataFrame, then convert it to a pyspark.sql.dataframe.DataFrame.

Group of related pull requests

Checklist

@cla-bot cla-bot bot added the cla:yes label Sep 24, 2022
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-bigquery contributing guide.

@dbeatty10
Copy link
Contributor Author

dbeatty10 commented Sep 24, 2022

Did only manual testing for this. It worked.

Only tested on Spark 3.2.2. Ideally, this would be tested on Spark 3.1 also.

Here was the dbt Python model used for testing:

import databricks.koalas as ks


def model(dbt, session):
    dbt.config(
        materialized="table",
    )

    df = ks.DataFrame(
        {'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
        'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
        'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
        'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]}
        )

    return df

Used the following code to report the version of Spark (which came from here):

import pyspark
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.master("local[1]") \
                    .appName('SparkByExamples.com') \
                    .getOrCreate()

# PySpark Version 3.2.2 and 3.2.2
msg = f"PySpark Version {spark.version} and {spark.sparkContext.version}"
raise Exception(msg)

@dbeatty10 dbeatty10 merged commit 8d0c3bb into main Sep 27, 2022
@dbeatty10 dbeatty10 deleted the dbeatty/koalas-df branch September 27, 2022 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1236] [Feature] Convert df to pyspark DataFrame if it is koalas before writing
2 participants