# Introduction to DataFrames
**[Bogumił Kamiński](http://bogumilkaminski.pl/about/), Dec 2, 2017**

A brief introduction to basic usage of `DataFrames`. Tested under version `0.11`.
I will try to keep it up to date as the package evolves.

In [1]:
using DataFrames # load package

## Constructors

In [2]:
DataFrame() # empty DataFrame

In [3]:
DataFrame(A=1:3, B=rand(3), C=randstring.([3,3,3])) # keyword arguments

Unnamed: 0,A,B,C
1,1,0.130975,LaD
2,2,0.540133,RDe
3,3,0.949558,oN9


In [4]:
x = Dict("A" => [1,2], "B" => [true, false], "C" => ['a', 'b'])
DataFrame(x) # from dictionary, columns will be sorted

Unnamed: 0,A,B,C
1,1,True,'a'
2,2,False,'b'


In [5]:
DataFrame(:A => [1,2], :B => [true, false], :C => ['a', 'b']) # from pairs

Unnamed: 0,A,B,C
1,1,True,'a'
2,2,False,'b'


In [6]:
DataFrame([rand(3) for i in 1:3]) # from vector of vectors

Unnamed: 0,x1,x2,x3
1,0.466684,0.129099,0.0595383
2,0.487316,0.577872,0.742604
3,0.703949,0.324203,0.646678


In [7]:
DataFrame(rand(3)) # edge case vector of atoms

Unnamed: 0,x1,x2,x3
1,0.711074,0.165517,0.517115


In [8]:
DataFrame(rand(3), [:A, :B, :C]) # pass second argument to give column names

Unnamed: 0,A,B,C
1,0.281381,0.474771,0.195761


In [9]:
DataFrame(rand(3,4)) # from matrix

Unnamed: 0,x1,x2,x3,x4
1,0.617417,0.349705,0.372344,0.336678
2,0.464459,0.185916,0.495102,0.342605
3,0.484482,0.342165,0.0344403,0.0393361


In [10]:
DataFrame([Int, Float64, Any], [:A, :B, :C], 1) # pass column types, names and number of rows
# we get missing because Any >: Missing

Unnamed: 0,A,B,C
1,435792112,2.1531e-315,missing


In [11]:
DataFrame([Int, Float64, String], [:A, :B, :C], 1)
# it was created OK, only value for String is #undef so Jupyer has a problem with printing it

UndefRefError: [91mUndefRefError: access to undefined reference[39m

In [12]:
DataFrame([Int, Float64, String], [:A, :B, :C], 0) # columns are created, but there are no rows

Unnamed: 0,A,B,C


In [13]:
DataFrame(Int, 3, 5) # a quick way to create homogenous DataFrame

Unnamed: 0,x1,x2,x3,x4,x5
1,423316176,118872752,118872752,111204552,176622384
2,117575344,117575344,117575344,110284120,117604560
3,423280368,423280368,423280368,110253960,117604624


In [14]:
DataFrame([Int, Float64], 4) # similar, but with nonhomogenous columns

Unnamed: 0,x1,x2
1,118261360,5.4467e-316
2,117573520,5.44672e-316
3,117573520,5.44732e-316
4,111883576,6.75368e-316


In [15]:
x = DataFrame(A = [1, 2], B = [1.0, missing], C = ["a", "b"], D = [1, "a"])
convert(Array, x) # convert DataFrame to Matrix

2×4 Array{Any,2}:
 1  1.0       "a"  1   
 2   missing  "b"   "a"

In [16]:
y = DataFrame(x) # no change
z = copy(x) # copy (shallow)
(x === y), (x === z), isequal(x, z)

(true, false, true)