Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Basic Usage of Manipulation Functions #3360

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
5785c72
Initial commit
nathanrboyer Jul 18, 2023
f5cc15f
assumes requested method is added
nathanrboyer Jul 19, 2023
7c3db8a
Typo: missing :z
nathanrboyer Jul 19, 2023
27d7e32
added subset(df, source_column_selector)
nathanrboyer Jul 25, 2023
be5fa9e
Added italics
nathanrboyer Jul 25, 2023
b1b3bab
Moved note to main text
nathanrboyer Aug 2, 2023
da6607d
Added error example and removed ° symbol
nathanrboyer Aug 2, 2023
0c47d10
Moved Additional Resources to the end and cleaned
nathanrboyer Aug 17, 2023
6f5dfc5
Capitalized Boolean
nathanrboyer Aug 17, 2023
e47cbdf
Merge branch 'main' into nb/manipulation_function_basics
nathanrboyer Aug 17, 2023
886d998
Removed extra space character
nathanrboyer Aug 17, 2023
cabd73f
Change function broadcasting to avoid old language
nathanrboyer Sep 18, 2023
2e9d2af
Made consistent with current proposal #3361
nathanrboyer Sep 18, 2023
b0777b1
Change α to apple and make consistent with #3380
nathanrboyer Sep 21, 2023
a111ef8
First round review corrections
nathanrboyer Sep 28, 2023
ce55607
Move to its own section
nathanrboyer Sep 29, 2023
46363d9
Add new file to make and index
nathanrboyer Sep 29, 2023
cd4c539
Rewrite Basics.md conclusion
nathanrboyer Sep 29, 2023
d70af83
Review Edits Round 2
nathanrboyer Oct 2, 2023
e43346e
Merge branch 'main' into nb/manipulation_function_basics
nathanrboyer Oct 2, 2023
6377441
Fix reference?
nathanrboyer Oct 2, 2023
6e7ed84
maybe fix documenter?
nathanrboyer Oct 2, 2023
d2d3de8
make h function require broadcasting
nathanrboyer Oct 5, 2023
0bdfc44
Fix existing typos in basics.md
nathanrboyer Oct 12, 2023
72d87d2
Move back to basics.md and add comparison
nathanrboyer Oct 13, 2023
043605a
Add more comments
nathanrboyer Oct 13, 2023
26b503e
Add link to @with macro
nathanrboyer Oct 13, 2023
679f65f
Delete redundant expression
nathanrboyer Oct 15, 2023
79a1171
Clean up new section and delete with reference
nathanrboyer Oct 16, 2023
621f253
Merge branch 'main' into nb/manipulation_function_basics
nathanrboyer Oct 17, 2023
7614fc3
Merge branch 'main' into nb/manipulation_function_basics
nathanrboyer May 20, 2024
82d935c
Fix admonitions
nathanrboyer May 30, 2024
d9864ba
Fix manipulation function reference
nathanrboyer May 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ makedocs(
"Working with DataFrames" => "man/working_with_dataframes.md",
"Importing and Exporting Data (I/O)" => "man/importing_and_exporting.md",
"Joins" => "man/joins.md",
"Data Frame Manipulation Functions" => "man/manipulation_functions.md",
"Split-apply-combine" => "man/split_apply_combine.md",
"Reshaping" => "man/reshaping_and_pivoting.md",
"Sorting" => "man/sorting.md",
Expand Down
3 changes: 2 additions & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,7 @@ page](https://github.com/JuliaData/DataFrames.jl/releases).
Pages = ["man/basics.md",
"man/getting_started.md",
"man/joins.md",
"man/manipulation_functions.md",
"man/split_apply_combine.md",
"man/reshaping_and_pivoting.md",
"man/sorting.md",
Expand Down Expand Up @@ -277,7 +278,7 @@ missing please kindly report an issue
during which it is deprecated. The situations where such a breaking change
might be allowed are (still such breaking changes will be avoided if
possible):

* the affected functionality was previously clearly identified in the
documentation as being subject to changes (for example in DataFrames.jl 1.4
release propagation rules of `:note`-style metadata are documented as such);
Expand Down
45 changes: 18 additions & 27 deletions docs/src/man/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -1565,40 +1565,30 @@ julia> german[Not(5), r"S"]
984 rows omitted
```

## Basic Usage of Transformation Functions
## Basic Usage of Manipulation Functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A larger question - maybe create a separate page for this tutorial?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be better. Maybe "Manipulation Functions" under "User Guide" before "Split-apply-combine"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented below - I would put it as a "top level" with a name something along "A gentle introduction to manipulation functions" (so that we clearly signal that this material is less formal than the rest of the manual).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think of making the new Top Level section something like "Beginner's Guide" or "User's Guide for Beginners" and then placing "Manipulation Functions" at a second level under that? I'm not volunteering to rewrite the entire User's Guide, but it could leave room for others to add similar "gentle" content to the documentation. It would also make the sidebar look cleaner by splitting up the current long name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. The other section that could go there is https://dataframes.juliadata.org/stable/man/basics/ as it has the same objective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but Basics contains the existing "Basic Usage of Manipulation Functions". I don't know how to differentiate it from this new section if they live next to each other.

I initially intended to just clarify some topics within that section, but now the scope has grown.

I can maybe overwrite that section if I add these topics that I don't currently cover:

  • "Note that this time we use string column selectors because some of the column names have spaces in them."
  • "The benefit of select or combine over indexing is that it is easier to get the union of several column selectors."
  • "It is important to note that select always returns a data frame, even if a single column selected as opposed to indexing syntax."
  • "By default select copies columns of a passed source data frame. In order to avoid copying, pass the copycols=false keyword argument."

The other sections under Basics use the German dataset, but I think it is easier to understand what is going on with smaller data frames where you know all the data values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can maybe overwrite that section

Yes - I think it is OK just to expand that section (especially that it is top-level now already)

I think it is easier to understand what is going on with smaller data frames

Agreed. just please use different variable names than these already used there so that using different dataframes does not lead to confusion.

Thank you! (sorry for so many comments, but - unfortunately - writing documentation is hard)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (sort of). I did not use new variables names though. I had been and continued to just overwrite the definition of df. My data frames are so small and frequent that coming up with a new name each time would be a pain.


In DataFrames.jl we have five functions that we can be used to perform
transformations of columns of a data frame:
In DataFrames.jl there are seven functions
which can be used to perform operations on data frame columns:

- `combine`: creates a new data frame populated with columns that are results of
transformation applied to the source data frame columns, potentially combining
- `combine`: creates a new data frame populated with columns that result from
operations applied to the source data frame columns, potentially combining
its rows;
- `select`: creates a new data frame that has the same number of rows as the
source data frame populated with columns that are results of transformations
source data frame populated with columns that result from operations
applied to the source data frame columns;
- `select!`: the same as `select` but updates the passed data frame in place;
- `transform`: the same as `select` but keeps the columns that were already
present in the data frame (note though that these columns can be potentially
modified by the transformation passed to `transform`);
- `transform!`: the same as `transform` but updates the passed data frame in
place.
- `subset`: creates a new data frame populated with the same columns
as the source data frame, but with only the rows where the passed operations are true;
- `subset!`: the same as `subset` but updates the passed data frame in place;

The fundamental ways to specify a transformation are:

- `source_column => transformation => target_column_name`; In this scenario the
`source_column` is passed as an argument to `transformation` function and
stored in `target_column_name` column.
- `source_column => transformation`; In this scenario we apply the
transformation function to `source_column` and the target column names is
automatically generated.
- `source_column => target_column_name` renames the `source_column` to
`target_column_name`.
- `source_column` just keep the source column as is in the result without any
transformation;

These rules are typically called transformation mini-language.

Let us move to the examples of application of these rules
These functions and their methods are explained in more detail in the section
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add that not only more detail, but with more "slow paced" and informal approach :).

[Data Frame Manipulation Functions](@ref).
In this section, we will move straight to examples using the German dataset.

```jldoctest dataframe
julia> using Statistics
Expand Down Expand Up @@ -2161,8 +2151,9 @@ julia> select(german, :Age, :Job, [:Age, :Job] => (+) => :res)
985 rows omitted
```

In the examples given in this introductory tutorial we did not cover all
options of the transformation mini-language. More advanced examples, in particular
showing how to pass or produce multiple columns using the `AsTable` operation
(which you might have seen in some DataFrames.jl demos) are given in the later
sections of the manual.
This concludes the introductory examples of data frame manipulations.
See later sections of the manual,
particularly [Data Frame Manipulation Functions](@ref),
for additional explanations and functionality,
including how to broadcast operation functions and operation pairs
and how to pass or produce multiple columns using `AsTable`.