-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework of dfs functions to recursively (changing return type of recurse-able functions) #363
Conversation
…zed versions on which transformers like "recursively" can be called.
topDfs -> topmostChildren top -> roots
# Conflicts: # examples/jupyter-notebooks/titanic/Titanic.ipynb
# Conflicts: # core/generated-sources/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/Modify.kt # core/generated-sources/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/TestBase.kt # core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/Modify.kt # core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/TestBase.kt # docs/StardustDocs/topics/ColumnSelectors.md
core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/DataColumn.kt
Show resolved
Hide resolved
...generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl.kt
Outdated
Show resolved
Hide resolved
...generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl.kt
Outdated
Show resolved
Hide resolved
...generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl.kt
Show resolved
Hide resolved
...generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl.kt
Show resolved
Hide resolved
while (node != null) { | ||
node.data = null | ||
node = node.parent | ||
} | ||
} | ||
val dfs = fullTree.topDfs { it.data != null } | ||
return dfs.map { it.data!!.addPath(it.pathFromRoot()) } | ||
val subtrees = fullTree.topmostChildren { it.data != null } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
topmostChildren?
topDFS is a bad name, but topmostChildren leads me to the mist again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As my doc explains it takes the tree and returns first the children, closest to the root of the tree, where the predicate holds. It doesn't matter whether the children of those nodes also hold, those are ignored. So since we now have allChildren { something == true }
, I thought topmostChildren { something == true }
. It can also be viewed as subtree-roots where the predicate holds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@koperagen Any suggestions? :)
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/columns/constructors.kt
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/columns/tree/Utils.kt
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/html.kt
Outdated
Show resolved
Hide resolved
core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/ColumnsSelectionDsl.kt
Show resolved
Hide resolved
@@ -37,7 +37,7 @@ public fun <T, C> DataFrame<T>.replace(columns: Iterable<ColumnReference<C>>): R | |||
|
|||
public fun <T> DataFrame<T>.replaceAll( | |||
vararg valuePairs: Pair<Any?, Any?>, | |||
columns: ColumnsSelector<T, *> = { allDfs() }, | |||
columns: ColumnsSelector<T, *> = { all().recursively(includeGroups = false) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has the default value for includeGroups changed? I would argue that one of the main usages of dfsOf
is to convert / replace columns with some type and groups are not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that it's because of unification of overloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, by default groups were included in dfs {}
, but excluded in allDfs()
if I recall correctly. I had to choose a sensible default. But if you use colsOf<Int>().rec()
, then includeGroups irrelevant anyways, since Int
is not a Group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is interesting and a bit confusing. I think this parameter only exists for allDfs. For normal dfs it would be dfs { !it.isColumnGroup() }
. Now it's also present for rec
, recursively
. What if you remove it from them and leave only for allRec with the same default value (true)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.rec()
can not only be called on all()
, you can also call first { }.rec(includeGroups = false)
, so I think keeping it in rec/recursively too would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a function that can give all valueColumns right? If we did we could have valueCols {}.recursively()
or valueColsRecursively()
but that might get a bit long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus it's not just ColumnGroups that get included right? also FrameColumns and ValueColumns. So, just all of them. I think that's a good default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@koperagen Actually I do want to add valueCols {}
and frameCols {}
since we already have groups {}
as well. Probably in another PR though. Then you can call valueCols().rec()
if you want. Sounds okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a nice addition, not 100% related to this problem though. For this allRecursively function with previous default, something like "allLeafs" might fit better. I think it was named like that before and renamed later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also take it in another direction eventually, similar to colsOf<Type>()
have colsOf(vararg ColumnKind) {}.rec()
, that might be clearer compared to users having to figure out what a "leaf" means.
...erated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/columns/tree/TreeNode.kt
Show resolved
Hide resolved
# Conflicts: # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/convert.kt # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/describe.kt # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/inferType.kt # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/reorder.kt # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/columns/ColumnWithPath.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/describe.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/inferType.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/reorder.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/columns/ColumnWithPath.kt
Another thing that i noticed now is that it's hard to tell what rec does for |
# Conflicts: # docs/StardustDocs/topics/ColumnSelectors.md # gradle/libs.versions.toml
Yes, it works the same. You can read it like a sentence: "Select the first column of type Int recursively", which sort of makes sense. Basically: |
# Conflicts: # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/rename.kt # core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/api/flatten.kt # core/generated-sources/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/move.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/rename.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/api/flatten.kt # core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/move.kt
Same as #361 but using a different tactic.
Largest changes:
Streamlined behavior of
all
,cols
,groups
etc. such that if it's called onSingleColumn<*>
containing a singleColumnGroup
(such asDataFrame
), the operation will run on the children of thatColumnGroup
ColumnSet<*>
, it will simply run on the columns inside.This is enforced using the new
allInternal()
function combined withisSingleColumnWithGroup
. I think this simply enforces expected behavior, since all tests work with it (and break upon a slight modification of this function) and more importantly, the calls seem to make sense. Let me know what you think.Introducing
TransformableColumnSet
andTransformableSingleColumn
. A wrapper around the original ones that provide a way totransformResolve
as well as the normalresolve
.transformResolve
takes aColumnSetTransformer
that can get executed before resolving theColumnSet
orSingleColumn
. AColumnSetTransformer
contains implementations for bothtransform
andtransformSingle
, similar to the functions inUtils.kt
. Currently, the only implementation of this isRecursivelyTransformer
, which is configured and created in theTransformableColumnSet<>.recursively()
calls.Not all
ColumnSet
s are transformable. Functions that allow reconfiguration with aColumnSetTransformer
need to explicitly state that in their return type. In principle, any function that usestransform {}
can automatically opt-in to theTransformableColumnSet
return-type if that makes sense for them. Currently, I added the ability tocols {}
,all()
,groups {}
,children {}
,filter {}
,nameContains()
,endsWith()
,startsWith()
,except()
, andwithoutNulls()
.I added
TransformableSingleColumn
tofirst {}
,last {}
, andsingle {}
(which usessingleWithTransformerImpl
to pass on the transformer to the right place).Of course, if other functions are applicable to modification calls like
.recursively()
I'd gladly hear it..recursively() / .rec()
is the replacement ofdfs {}
,dfsOf<>{}
, andallDfs()
. because the name was confusing. I put deprecations in place and replaced all calls in the library to recursively. Some things to note:dfs {}
, called on either aColumnSet
orSingleColumn
would exclude the top-level of columns but still include column groups in the end-result, so a proper replacement of that call would becols {}.recursively()
if called on aSingleColumn
andcols {}.recursively(includeTopLevel = false)
if called on a normalColumnSet
. Of courserec()
can be used as shorthand.dfsOf<>()
, is replaced similarly to eithercolsOf<>().rec()
orcolsOf<>().rec(includeTopLevel = false)
.allDfs()
defaultedincludeGroups
tofalse
, so be careful, since this is nowtrue
. It can be replaced either withall().recursively()
orallRecursively()
(orallRec()
), again taking into accountincludeTopLevel
andincludeGroups
.Other smaller changes:
dfs
todepthFirstSearch
wheredepth first search
was actually meantTreeNode.dfs {}
->TreeNode.allChildren {}
(+ other dfs functions)TreeNode.topDfs {}
->TreeNode.topmostChildren {}
(+ other topDfs functions)Iterable<ColumnWithPath<*>>.dfs()
->flattenRecursively()
ColumnPath.something
overloads inColumnsSelectionDsl
since they are handled bySingleColumn<*>
ColumnSet
andSingleColumn