Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dplyr tutorial port, clean up docs so documenter is happy #279

Merged
merged 25 commits into from
Aug 4, 2021

Conversation

pdeffebach
Copy link
Collaborator

No description provided.


## What is DataFramesMeta.jl?

DataFramesMeta.jl is a Julia package to transform and summarize tabular data. It provides a more convenient syntax to work with DataFrames from [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl). For a deeper explanation of DataFramesMeta.jl, see the [documentation](https://github.com/JuliaData/DataFramesMeta.jl).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add that this is a DSL. The syntax is more convenient at the cost of syntax not being valid Julia code.

On the other hand DataFramesMeta.jl concepts try to mirror DataFrames.jl concepts (which is important I think for learning and using both)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be clearer now.

docs/src/dplyr.md Outdated Show resolved Hide resolved

Like dplyr, the DataFramesMeta.jl package contains a set of macros (or "verbs") that perform common data manipulation operations such as filtering for rows, selecting specific columns, re-ordering rows, adding new columns and summarizing data.

In addition, DataFramesMeta.jl contains a useful operation `@combine` to perform another common task which is the "split-apply-combine" concept. We will discuss that in a little bit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence is not clear to me and seems more detailed than the previous. Especially as in the previous you have written "summarizing data".
Also - if you keep this maybe give a link to "split-apply-combine" so people reading it know what we mean (not all of them might know it)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully it is clearer now.

docs/src/dplyr.md Outdated Show resolved Hide resolved
docs/src/dplyr.md Outdated Show resolved Hide resolved
docs/src/dplyr.md Outdated Show resolved Hide resolved
docs/src/dplyr.md Outdated Show resolved Hide resolved
docs/src/dplyr.md Outdated Show resolved Hide resolved

# Important DataFramesMeta.jl Verbs To Remember

dplyr verbs | Description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you call them dplyr verbs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base tutorial this came from uses the term "verb". I think the author likes the term because it sounds less technical than "function".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with verb, I am not clear why you use term "dplyr" - it seems these DataFramesMeta.jl verbs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that was a typo, sorry.

docs/src/dplyr.md Outdated Show resolved Hide resolved
`@combine` | summarise values
`groupby` | allows for group operations in the "split-apply-combine" concept

DataFramesMeta.jl also provides `@rselect`, `@rsubset`, `@rorderby`, and `@rtransform` for operations which act row-wise. We will expore the distinction between column-wise and row-wise transformations later in this turorial.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use term "whole-column" rather than "column-wise"? Alan Edelman was confused by "col-wise" (as it seems that one operation works vertically and the other horizontally which is not the case)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Hopefully the language is clearer.

sleepData = @select msleep :name :sleep_total
```

To select all the columns *except* a specific column, use the `Not` function for inverse selection. We preface the `Not` with `$` because it does not reference a column directly as a `Symbol`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the explanation of $ is not clear. The reader is not clear what would happen if you skipped $.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this. But we should merge a PR special-casing Not, Between, Regex, and r"..." so we don't have to worry about this.

docs/src/dplyr.md Outdated Show resolved Hide resolved
docs/src/dplyr.md Outdated Show resolved Hide resolved
docs/src/dplyr.md Outdated Show resolved Hide resolved
docs/src/dplyr.md Outdated Show resolved Hide resolved
log.txt Outdated
@@ -0,0 +1,11 @@
Doctests: DataFramesMeta: Test Failed at /home/peterwd/.julia/packages/Documenter/oBZFM/src/Documenter.jl:870
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not put this log file in the PR

src/DataFramesMeta.jl Outdated Show resolved Hide resolved
@pdeffebach
Copy link
Collaborator Author

Thanks for the review! Should be much improved now.

@select msleep $varnames
```

Similarly, to select the first column, use the syntax `$1`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is $ required here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Right now, the parsing for selecting columns is exactly the same as working with anonymous functions. So since @transform df :y = :x .+ 1 would be ambiguous if we allowed 1 to be a column selector in the anonymous function, we need the same thing when doing select.

Not ideal, though. We can change this before 1.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think $1 makes sense - it just probably should be well explained somewhere.

docs/src/dplyr.md Outdated Show resolved Hide resolved
pdeffebach and others added 2 commits August 4, 2021 09:48
@pdeffebach
Copy link
Collaborator Author

Thanks!

@pdeffebach pdeffebach merged commit 73bb4c4 into JuliaData:master Aug 4, 2021
@pdeffebach pdeffebach deleted the dplyr_port branch August 4, 2021 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants