Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@groupby on a function of a field #27

Open
zygmuntszpak opened this issue Aug 27, 2018 · 2 comments
Open

@groupby on a function of a field #27

zygmuntszpak opened this issue Aug 27, 2018 · 2 comments

Comments

@zygmuntszpak
Copy link

I would like to be able to group on the result of applying a function to one of the columns.
For example, suppose that I have a column :DATETIME which stores the year/month/day/h/m/s.
In some queries I might want to group on the DATE only, whereas in other queries I might want to group
on the TIME.

Hence, it would like to write something like this:

@apply t begin
    @groupby Date.Date.(:DATETIME) {length = length(_)}
end

Is this type of operation currently supported, but I am just using the wrong syntax? As a workaround I could always add more columns using @transform to explicitly split the DATETIME into DATE and TIME, but I was wondering if there is another solution.

@piever
Copy link
Collaborator

piever commented Aug 27, 2018

I think that I should simply allow the syntax your using (actually, I think it should have to be Date.Date(:DATETIME) without the dot as it is a element-wise operation). JuliaDB supports using a selection in a groupby function and you can use the @=> macro to get the selection in JuliaDBMeta more easily:

julia> using JuliaDBMeta

julia> iris = loadtable(Pkg.dir("JuliaDBMeta", "test", "tables", "iris.csv"));

help?> @=>
  @=>(expr...)

  Create a selector based on expressions expr. Symbols are used to select columns and infer an
  appropriate anonymous function. In this context, _ refers to the whole row. To use actual symbols,
  escape them with ^, as in ^(:a). Use cols(c) to refer to field c where c is a variable that
  evaluates to a symbol. c must be available in the scope where the macro is called.

     Examples
    ==========

  julia> t = table(@NT(a = [1,2,3], b = [4,5,6]));
  
  julia> select(t, @=>(:a, :a + :b))
  Table with 3 rows, 2 columns:
  a  a + b
  ────────
  1  5
  2  7
  3  9

julia> select(iris, @=>(:Species == "setosa"))
150-element Array{Bool,1}:
  true
  true
  true
  true
  true
  true
  true
  true
  true
  true
     
 false
 false
 false
 false
 false
 false
 false
 false
 false

julia> @groupby iris @=>(:Species=="setosa") {length = length(_)}
Table with 2 rows, 2 columns:
Species == "setosa"  length
───────────────────────────
false                100
true                 50

Note that this @=> macro is not specific to JuliaBDMeta function but you can use it with normal JuliaDB:

julia> groupby(length, iris, @=>(:Species=="setosa"))
Table with 2 rows, 2 columns:
Species == "setosa"  length
───────────────────────────
false                100
true                 50

@zygmuntszpak
Copy link
Author

Thank you very much for the clarification, and for this great package. For the cursory reader, the following is a solution to my example.

@apply r begin
    @groupby @=>(Dates.Date(:DATETIME)) {length = length(_)}
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants