-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Basic Usage of Manipulation Functions #3360
Open
nathanrboyer
wants to merge
33
commits into
JuliaData:main
Choose a base branch
from
nathanrboyer:nb/manipulation_function_basics
base: main
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 18 commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
5785c72
Initial commit
nathanrboyer f5cc15f
assumes requested method is added
nathanrboyer 7c3db8a
Typo: missing :z
nathanrboyer 27d7e32
added subset(df, source_column_selector)
nathanrboyer be5fa9e
Added italics
nathanrboyer b1b3bab
Moved note to main text
nathanrboyer da6607d
Added error example and removed ° symbol
nathanrboyer 0c47d10
Moved Additional Resources to the end and cleaned
nathanrboyer 6f5dfc5
Capitalized Boolean
nathanrboyer e47cbdf
Merge branch 'main' into nb/manipulation_function_basics
nathanrboyer 886d998
Removed extra space character
nathanrboyer cabd73f
Change function broadcasting to avoid old language
nathanrboyer 2e9d2af
Made consistent with current proposal #3361
nathanrboyer b0777b1
Change α to apple and make consistent with #3380
nathanrboyer a111ef8
First round review corrections
nathanrboyer ce55607
Move to its own section
nathanrboyer 46363d9
Add new file to make and index
nathanrboyer cd4c539
Rewrite Basics.md conclusion
nathanrboyer d70af83
Review Edits Round 2
nathanrboyer e43346e
Merge branch 'main' into nb/manipulation_function_basics
nathanrboyer 6377441
Fix reference?
nathanrboyer 6e7ed84
maybe fix documenter?
nathanrboyer d2d3de8
make h function require broadcasting
nathanrboyer 0bdfc44
Fix existing typos in basics.md
nathanrboyer 72d87d2
Move back to basics.md and add comparison
nathanrboyer 043605a
Add more comments
nathanrboyer 26b503e
Add link to @with macro
nathanrboyer 679f65f
Delete redundant expression
nathanrboyer 79a1171
Clean up new section and delete with reference
nathanrboyer 621f253
Merge branch 'main' into nb/manipulation_function_basics
nathanrboyer 7614fc3
Merge branch 'main' into nb/manipulation_function_basics
nathanrboyer 82d935c
Fix admonitions
nathanrboyer d9864ba
Fix manipulation function reference
nathanrboyer File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1565,40 +1565,30 @@ julia> german[Not(5), r"S"] | |
984 rows omitted | ||
``` | ||
|
||
## Basic Usage of Transformation Functions | ||
## Basic Usage of Manipulation Functions | ||
|
||
In DataFrames.jl we have five functions that we can be used to perform | ||
transformations of columns of a data frame: | ||
In DataFrames.jl there are seven functions | ||
which can be used to perform operations on data frame columns: | ||
|
||
- `combine`: creates a new data frame populated with columns that are results of | ||
transformation applied to the source data frame columns, potentially combining | ||
- `combine`: creates a new data frame populated with columns that result from | ||
operations applied to the source data frame columns, potentially combining | ||
its rows; | ||
- `select`: creates a new data frame that has the same number of rows as the | ||
source data frame populated with columns that are results of transformations | ||
source data frame populated with columns that result from operations | ||
applied to the source data frame columns; | ||
- `select!`: the same as `select` but updates the passed data frame in place; | ||
- `transform`: the same as `select` but keeps the columns that were already | ||
present in the data frame (note though that these columns can be potentially | ||
modified by the transformation passed to `transform`); | ||
- `transform!`: the same as `transform` but updates the passed data frame in | ||
place. | ||
- `subset`: creates a new data frame populated with the same columns | ||
as the source data frame, but with only the rows where the passed operations are true; | ||
- `subset!`: the same as `subset` but updates the passed data frame in place; | ||
|
||
The fundamental ways to specify a transformation are: | ||
|
||
- `source_column => transformation => target_column_name`; In this scenario the | ||
`source_column` is passed as an argument to `transformation` function and | ||
stored in `target_column_name` column. | ||
- `source_column => transformation`; In this scenario we apply the | ||
transformation function to `source_column` and the target column names is | ||
automatically generated. | ||
- `source_column => target_column_name` renames the `source_column` to | ||
`target_column_name`. | ||
- `source_column` just keep the source column as is in the result without any | ||
transformation; | ||
|
||
These rules are typically called transformation mini-language. | ||
|
||
Let us move to the examples of application of these rules | ||
These functions and their methods are explained in more detail in the section | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add that not only more detail, but with more "slow paced" and informal approach :). |
||
[Data Frame Manipulation Functions](@ref). | ||
In this section, we will move straight to examples using the German dataset. | ||
|
||
```jldoctest dataframe | ||
julia> using Statistics | ||
|
@@ -2161,8 +2151,9 @@ julia> select(german, :Age, :Job, [:Age, :Job] => (+) => :res) | |
985 rows omitted | ||
``` | ||
|
||
In the examples given in this introductory tutorial we did not cover all | ||
options of the transformation mini-language. More advanced examples, in particular | ||
showing how to pass or produce multiple columns using the `AsTable` operation | ||
(which you might have seen in some DataFrames.jl demos) are given in the later | ||
sections of the manual. | ||
This concludes the introductory examples of data frame manipulations. | ||
See later sections of the manual, | ||
particularly [Data Frame Manipulation Functions](@ref), | ||
for additional explanations and functionality, | ||
including how to broadcast operation functions and operation pairs | ||
and how to pass or produce multiple columns using `AsTable`. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A larger question - maybe create a separate page for this tutorial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be better. Maybe "Manipulation Functions" under "User Guide" before "Split-apply-combine"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented below - I would put it as a "top level" with a name something along "A gentle introduction to manipulation functions" (so that we clearly signal that this material is less formal than the rest of the manual).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you think of making the new Top Level section something like "Beginner's Guide" or "User's Guide for Beginners" and then placing "Manipulation Functions" at a second level under that? I'm not volunteering to rewrite the entire User's Guide, but it could leave room for others to add similar "gentle" content to the documentation. It would also make the sidebar look cleaner by splitting up the current long name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. The other section that could go there is https://dataframes.juliadata.org/stable/man/basics/ as it has the same objective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but Basics contains the existing "Basic Usage of Manipulation Functions". I don't know how to differentiate it from this new section if they live next to each other.
I initially intended to just clarify some topics within that section, but now the scope has grown.
I can maybe overwrite that section if I add these topics that I don't currently cover:
The other sections under Basics use the German dataset, but I think it is easier to understand what is going on with smaller data frames where you know all the data values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I think it is OK just to expand that section (especially that it is top-level now already)
Agreed. just please use different variable names than these already used there so that using different dataframes does not lead to confusion.
Thank you! (sorry for so many comments, but - unfortunately - writing documentation is hard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (sort of). I did not use new variables names though. I had been and continued to just overwrite the definition of
df
. My data frames are so small and frequent that coming up with a new name each time would be a pain.