# Introduction to DataFrames

In [1]:
using DataFrames # load package

## Getting basic information about a data frame

Let's start by creating a `DataFrame` object, `x`, so that we can learn how to get information on that data frame.

In [2]:
x = DataFrame(A = [1, 2], B = [1.0, missing], C = ["a", "b"])

Unnamed: 0_level_0,A,B,C
Unnamed: 0_level_1,Int64,Float64⍰,String
1,1,1.0,a
2,2,missing,b


The standard `size` function works to get dimensions of the `DataFrame`,

In [3]:
size(x), size(x, 1), size(x, 2)

((2, 3), 2, 3)

as well as `nrow` and `ncol` from R.

In [4]:
nrow(x), ncol(x)

(2, 3)

`describe` gives basic summary statistics of data in your `DataFrame`.

In [5]:
describe(x)

Unnamed: 0_level_0,variable,mean,min,median,max,nunique,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Union…,Union…,Type
1,A,1.5,1,1.5,2,,,Int64
2,B,1.0,1.0,1.0,1.0,,1.0,"Union{Missing, Float64}"
3,C,,a,,b,2.0,,String


`names` will return the names of all columns,

In [6]:
names(x)

3-element Array{Symbol,1}:
 :A
 :B
 :C

and `eltypes` returns their types.

In [7]:
eltypes(x)

3-element Array{Type,1}:
 Int64                  
 Union{Missing, Float64}
 String                 

Here we create some large DataFrame

In [8]:
y = DataFrame(rand(1:10, 1000, 10));

and then we can use `first` to peek into its top rows

In [9]:
first(y, 5)

Unnamed: 0_level_0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,3,2,2,6,9,3,7,3,2,1
2,2,10,8,6,10,1,5,9,2,5
3,10,2,2,2,9,4,7,9,5,3
4,5,1,5,2,8,7,9,1,9,2
5,9,9,7,7,1,7,2,4,6,9


and `last` to see its bottom rows.

In [10]:
last(y, 3)

Unnamed: 0_level_0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,7,4,9,7,9,2,8,4,5,6
2,5,2,5,4,8,8,8,4,8,2
3,10,9,8,9,3,10,9,1,5,3


### Most elementary get and set operations

Given the `DataFrame`, `x`, here are three ways to grab one of its columns as a `Vector`:

In [11]:
x[:, :A], x[:, 1], x.A

([1, 2], [1, 2], [1, 2])

To grab one row as a DataFrame, we can index as follows.

In [12]:
x[1, :]

Unnamed: 0_level_0,A,B,C
Unnamed: 0_level_1,Int64,Float64⍰,String
1,1,1.0,a


We can grab a single cell or element with the same syntax to grab an element of an array.

In [13]:
x[1, 1]

1

Assignment can be done in ranges to a scalar,

In [14]:
x[1:2, 1:2] .= 1
x

Unnamed: 0_level_0,A,B,C
Unnamed: 0_level_1,Int64,Float64⍰,String
1,1,1.0,a
2,1,1.0,b


to a vector of length equal to the number of assigned rows,

In [15]:
x[1:2, 1:2] .= [3,4]
x

Unnamed: 0_level_0,A,B,C
Unnamed: 0_level_1,Int64,Float64⍰,String
1,3,3.0,a
2,4,4.0,b


or to another data frame of matching size.

In [16]:
x[1:2, 1:2] = DataFrame([5 6; 7 8], [:A, :B])
x

Unnamed: 0_level_0,A,B,C
Unnamed: 0_level_1,Int64,Float64⍰,String
1,5,6.0,a
2,7,8.0,b
