# Introduction to DataFrames
**[Bogumił Kamiński](http://bogumilkaminski.pl/about/), Apr 21, 2018**

In [1]:
using DataFrames # load package

## Getting basic information about a data frame

Let's start by creating a `DataFrame` object, `x`, so that we can learn how to get information on that data frame.

In [2]:
x = DataFrame(A = [1, 2], B = [1.0, missing], C = ["a", "b"])

Unnamed: 0,A,B,C
1,1,1.0,a
2,2,missing,b


The standard `size` function works to get dimensions of the `DataFrame`,

In [3]:
size(x), size(x, 1), size(x, 2)

((2, 3), 2, 3)

as well as `nrow` and `ncol` from R; `length` gives number of columns.

In [4]:
nrow(x), ncol(x), length(x)

(2, 3, 3)

`describe` gives basic summary statistics of data in your `DataFrame`.

In [5]:
describe(x)

A
Summary Stats:
Mean:           1.500000
Minimum:        1.000000
1st Quartile:   1.250000
Median:         1.500000
3rd Quartile:   1.750000
Maximum:        2.000000
Length:         2
Type:           Int64

B
Summary Stats:
Mean:           1.000000
Minimum:        1.000000
1st Quartile:   1.000000
Median:         1.000000
3rd Quartile:   1.000000
Maximum:        1.000000
Length:         2
Type:           Union{Float64, Missings.Missing}
Number Missing: 1
% Missing:      50.000000

C
Summary Stats:
Length:         2
Type:           String
Number Unique:  2



Use `showcols` to get informaton about columns stored in a DataFrame.

In [6]:
showcols(x)

2×3 DataFrames.DataFrame
│ Col # │ Name │ Eltype                           │ Missing │ Values          │
├───────┼──────┼──────────────────────────────────┼─────────┼─────────────────┤
│ 1     │ A    │ Int64                            │ 0       │ 1  …  2         │
│ 2     │ B    │ Union{Float64, Missings.Missing} │ 1       │ 1.0  …  missing │
│ 3     │ C    │ String                           │ 0       │ a  …  b         │

`names` will return the names of all columns,

In [7]:
names(x)

3-element Array{Symbol,1}:
 :A
 :B
 :C

and `eltypes` returns their types.

In [8]:
eltypes(x)

3-element Array{Type,1}:
 Int64                           
 Union{Float64, Missings.Missing}
 String                          

Here we create some large DataFrame

In [9]:
y = DataFrame(rand(1:10, 1000, 10));

and then we can use `head` to peek into its top rows

In [10]:
head(y)

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
1,8,6,1,2,7,10,5,1,5,10
2,8,9,6,6,10,4,9,3,10,9
3,5,1,4,3,10,5,1,10,5,9
4,2,9,2,2,5,7,7,9,9,5
5,4,8,4,10,8,5,1,2,1,10
6,8,6,6,8,3,3,3,6,8,6


and `tail` to see its bottom rows.

In [11]:
tail(y, 3)

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
1,1,10,5,7,8,6,1,2,3,6
2,1,1,2,7,9,7,3,3,3,3
3,4,6,1,2,1,1,4,7,9,4


### Most elementary get and set operations

Given the `DataFrame`, `x`, here are three ways to grab one of its columns as a `Vector`:

In [12]:
x[1], x[:A], x[:, 1]

([1, 2], [1, 2], [1, 2])

To grab one row as a DataFrame, we can index as follows.

In [13]:
x[1, :]

Unnamed: 0,A,B,C
1,1,1.0,a


We can grab a single cell or element with the same syntax to grab an element of an array.

In [14]:
x[1, 1]

1

Assignment can be done in ranges to a scalar,

In [15]:
x[1:2, 1:2] = 1
x

Unnamed: 0,A,B,C
1,1,1.0,a
2,1,1.0,b


to a vector of length equal to the number of assigned rows,

In [16]:
x[1:2, 1:2] = [1,2]
x

Unnamed: 0,A,B,C
1,1,1.0,a
2,2,2.0,b


or to another data frame of matching size.

In [17]:
x[1:2, 1:2] = DataFrame([5 6; 7 8])
x

Unnamed: 0,A,B,C
1,5,6.0,a
2,7,8.0,b
