# DataFrames and Query

Github: https://github.com/JuliaData/DataFrames.jl

Documentation: http://juliadata.github.io/DataFrames.jl/stable/

Great compact DataFrames tutorial: https://github.com/bkamins/Julia-DataFrames-Tutorial

In [2]:
using DataFrames

In [3]:
df = DataFrame(rand(4,4))

Unnamed: 0,x1,x2,x3,x4
1,0.152351,0.291341,0.437165,0.191181
2,0.840498,0.344257,0.470642,0.25419
3,0.225158,0.45405,0.899181,0.979446
4,0.120921,0.62493,0.271342,0.840114


In [6]:
d = Dict(:x => [1,2,3], :y => ["A","B","C"])
df = DataFrame(d)

Unnamed: 0,x,y
1,1,A
2,2,B
3,3,C


In [14]:
df = DataFrame(name=["John", "Sally", "Roger"], age=[54, 34, 79], children=[0, 2, 4])

Unnamed: 0,name,age,children
1,John,54,0
2,Sally,34,2
3,Roger,79,4


In [15]:
df[:name]

3-element Array{String,1}:
 "John" 
 "Sally"
 "Roger"

In [16]:
df[1]

3-element Array{String,1}:
 "John" 
 "Sally"
 "Roger"

In [17]:
df[1,2]

54

In [23]:
df[2,:]

Unnamed: 0,name,age,children
1,John,54,0


In [18]:
sort!(df, :age)

Unnamed: 0,name,age,children
1,Sally,34,2
2,John,54,0
3,Roger,79,4


Get infos about a DataFrame

In [20]:
names(df)

3-element Array{Symbol,1}:
 :name    
 :age     
 :children

In [21]:
nrow(df)

3

In [19]:
describe(df)

Unnamed: 0,variable,mean,min,median,max,nunique,nmissing,eltype
1,name,,John,,Sally,3.0,,String
2,age,55.6667,34,54.0,79,,,Int64
3,children,2.0,0,2.0,4,,,Int64


Extracting parts of a DataFrame

In [25]:
df[df[:children].>1, :]

Unnamed: 0,name,age,children
1,Sally,34,2
2,Roger,79,4


In [31]:
# for more complex queries
using Query

q = @from i in df begin
    @where i.children > 1
    @select i # select everything
    @collect DataFrame # output type
end

Unnamed: 0,name,age,children
1,Sally,34,2
2,Roger,79,4


In [32]:
typeof(q)

DataFrame