Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][DuckSpark] Add Column and a bunch of DataFrame methods #8990

Merged
merged 75 commits into from
Sep 20, 2023

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Sep 19, 2023

This PR continues on where #8165 left off

Added DataFrame methods:

join
sort
withColumn
withColumnRenamed
transform
filter
select
columns
alias
drop
groupBy
union
unionByName
dropDuplicates
distinct
count

Functions:

col
upper
when
struct
lit
regexp_replace
array_contains
avg
sum
max
mean
min
count
transform

Group methods:

count
mean
avg
max
min
sum
agg

Misc:

__contains__ for DuckDBPyRelation has been added:

import duckdb

rel = duckdb.sql("select a as my_col")
'my_col' in rel
# True

Tishj added 30 commits July 5, 2023 14:44
Tishj added 27 commits August 9, 2023 14:06
…plicit conversion to ConstantExpression was added | implicit sort order is gone for certain operations, add 'sort(..)

' to test to make them deterministic again
…spark API, we want to be entirely compatible with spark
@Tishj
Copy link
Contributor Author

Tishj commented Sep 19, 2023

Sorry for dropping this massive PR 😅
CI finally passed so I couldn't help myself

I do think it would be good to have this in the release, just because it fleshes out the DuckSpark implementation.
Without these additions I don't know how much value we can get in terms of feedback

@Mytherin Mytherin merged commit e9d463f into duckdb:main Sep 20, 2023
47 of 48 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants