# Introduction to DataFrames
**[Bogumił Kamiński](http://bogumilkaminski.pl/about/), Dec 18, 2017**

A brief introduction to basic usage of `DataFrames`. Tested under `DataFrames` master on 2017-12-05.
I will try to keep it up to date as the package evolves.

In [1]:
using DataFrames

## Extras - selected functionalities of selected packages

### FreqTables: creating categorical tables

In [2]:
using FreqTables
df = DataFrame(a=rand('a':'d', 1000), b=rand(["x", "y", "z"], 1000))
ft = freqtable(df, :a, :b)

4×3 Named Array{Int64,2}
a ╲ b │   x    y    z
──────┼──────────────
'a'   │  76  106   84
'b'   │  71   85   88
'c'   │  64   67   98
'd'   │  93   68  100

In [3]:
ft[1,1], ft['b', "z"] # you can index the result using numbers or names

(76, 88)

### DataFramesMeta

In [4]:
using DataFramesMeta
df = DataFrame(x=1:8, y='a':'h', z=repeat([true,false], outer=4))

Unnamed: 0,x,y,z
1,1,'a',True
2,2,'b',False
3,3,'c',True
4,4,'d',False
5,5,'e',True
6,6,'f',False
7,7,'g',True
8,8,'h',False


In [5]:
@with(df, :x+:z) # expressions with columns of DataFrame

8-element Array{Int64,1}:
 2
 2
 4
 4
 6
 6
 8
 8

In [6]:
@with df begin # you can define code blocks
    a = :x[:z]
    b = :x[.!:z]
    :y + [a; b]
end

8-element Array{Char,1}:
 'b'
 'e'
 'h'
 'k'
 'g'
 'j'
 'm'
 'p'

In [7]:
a # @with creates hard scope so variables do not leak out

LoadError: [91mUndefVarError: a not defined[39m

In [8]:
df2 = DataFrame(a = [:a, :b, :c])
@with(df2, :a .== ^(:a)) # sometimes we want to work on raw Symbol, ^() escapes it

3-element BitArray{1}:
  true
 false
 false

In [9]:
df2 = DataFrame(x=1:3, y=4:6, z=7:9)
@with(df2, _I_(2:3)) # _I_(expression) is translated to df2[expression]

Unnamed: 0,y,z
1,4,7
2,5,8
3,6,9


In [10]:
@where(df, :x .< 4, :z .== true) # very useful macro for filtering

Unnamed: 0,x,y,z
1,1,'a',True
2,3,'c',True


In [11]:
@select(df, :x, y = 2*:x, z=:y) # create a new DataFrame based on the old one

Unnamed: 0,x,y,z
1,1,2,'a'
2,2,4,'b'
3,3,6,'c'
4,4,8,'d'
5,5,10,'e'
6,6,12,'f'
7,7,14,'g'
8,8,16,'h'


In [12]:
@transform(df, a=1, x = 2*:x, y=:x) # create a new DataFrame adding columns based on the old one

Unnamed: 0,x,y,z,a
1,2,1,True,1
2,4,2,False,1
3,6,3,True,1
4,8,4,False,1
5,10,5,True,1
6,12,6,False,1
7,14,7,True,1
8,16,8,False,1


In [13]:
@transform(df, a=1, b=:a) # old DataFrame is used and :a is not present there

LoadError: [91mKeyError: key :a not found[39m

WIP: @by, grouping, sorting