# Introduction to DataFrames
**[Bogumił Kamiński](http://bogumilkaminski.pl/about/), Dec 10, 2017**

A brief introduction to basic usage of `DataFrames`. Tested under `DataFrames` master on 2017-12-05.
I will try to keep it up to date as the package evolves.

In [1]:
using DataFrames

## Possible pitfalls

### Know what is copied when creating a `DataFrame`

In [2]:
x = DataFrame(rand(3, 5))

Unnamed: 0,x1,x2,x3,x4,x5
1,0.779853,0.0269166,0.422229,0.616003,0.311457
2,0.596416,0.188376,0.52324,0.135808,0.0991473
3,0.247002,0.920612,0.687747,0.0768951,0.783757


In [3]:
y = DataFrame(x)
x === y # no copyinng performed

true

In [4]:
y = copy(x)
x === y # not the same object

false

In [5]:
all(x[i] === y[i] for i in ncol(x)) # but the columns are the same

true

### Do not modify the parent of `GroupedDataFrame`

In [6]:
x = DataFrame(id=repeat([1,2], outer=3), x=1:6)
g = groupby(x, :id)

DataFrames.GroupedDataFrame  2 groups with keys: Symbol[:id]
First Group:
3×2 DataFrames.SubDataFrame{Array{Int64,1}}
│ Row │ id │ x │
├─────┼────┼───┤
│ 1   │ 1  │ 1 │
│ 2   │ 1  │ 3 │
│ 3   │ 1  │ 5 │
⋮
Last Group:
3×2 DataFrames.SubDataFrame{Array{Int64,1}}
│ Row │ id │ x │
├─────┼────┼───┤
│ 1   │ 2  │ 2 │
│ 2   │ 2  │ 4 │
│ 3   │ 2  │ 6 │

In [7]:
x[1:3, 1]=[2,2,2]
g # well - it is wrong now, g is only a view

DataFrames.GroupedDataFrame  2 groups with keys: Symbol[:id]
First Group:
3×2 DataFrames.SubDataFrame{Array{Int64,1}}
│ Row │ id │ x │
├─────┼────┼───┤
│ 1   │ 2  │ 1 │
│ 2   │ 2  │ 3 │
│ 3   │ 1  │ 5 │
⋮
Last Group:
3×2 DataFrames.SubDataFrame{Array{Int64,1}}
│ Row │ id │ x │
├─────┼────┼───┤
│ 1   │ 2  │ 2 │
│ 2   │ 2  │ 4 │
│ 3   │ 2  │ 6 │

### Remember that you can filter columns of a `DataFrame` using booleans

In [8]:
srand(1)
x = DataFrame(rand(5, 5))

Unnamed: 0,x1,x2,x3,x4,x5
1,0.236033,0.210968,0.555751,0.209472,0.0769509
2,0.346517,0.951916,0.437108,0.251379,0.640396
3,0.312707,0.999905,0.424718,0.0203749,0.873544
4,0.00790928,0.251662,0.773223,0.287702,0.278582
5,0.488613,0.986666,0.28119,0.859512,0.751313


In [9]:
x[x[:x1] .< 0.25] # well - we have filtered columns not rows by accident as you can select columns using booleans

Unnamed: 0,x1,x4
1,0.236033,0.209472
2,0.346517,0.251379
3,0.312707,0.0203749
4,0.00790928,0.287702
5,0.488613,0.859512


In [10]:
x[x[:x1] .< 0.25, :] # probably this is what we wanted

Unnamed: 0,x1,x2,x3,x4,x5
1,0.236033,0.210968,0.555751,0.209472,0.0769509
2,0.00790928,0.251662,0.773223,0.287702,0.278582
