In [1]:
# create a simple array of floating point numbers
# this array CANNOT store or indicate a 'missing' value NA
x = [1.1, 2.2, 3.3, 4.4, 5.5, 6.6]

6-element Array{Float64,1}:
 1.1
 2.2
 3.3
 4.4
 5.5
 6.6

The "1" in the type indicates that the array is one dimensional, which means the array is "flat" - neither a row or nor a column vector!

In [3]:
a = ["foo", "bar", 10]

3-element Array{Any,1}:
   "foo"
   "bar"
 10     

In [14]:
ndims(a)

1

In [15]:
# Vector and Matrix are types that are just aliases to one-
# and two-dimensional arrays

# v is a row matrix
v = [1.1 2.2 ]
# m is a matrix - ";", semicolons separate rows
m = [1.2 2.2 3.3; 4.4 5.5 6.6]
println("v: ", v, "\t type: ", typeof(v))
println("m: ", m, "\t type: ", typeof(m))

v: [1.1 2.2]	 type: Array{Float64,2}
m: [1.2 2.2 3.3; 4.4 5.5 6.6]	 type: Array{Float64,2}


In [22]:
println(v)
println(v')

[1.1 2.2]
[1.1; 2.2]


In [17]:
m

2×3 Array{Float64,2}:
 1.2  2.2  3.3
 4.4  5.5  6.6

In [20]:
v * m

1×3 Array{Float64,2}:
 11.0  14.52  18.15

In [33]:
using DataArrays

In [34]:
dvector = data([10, 20, 30, 40, 50])

5-element DataArrays.DataArray{Int64,1}:
 10
 20
 30
 40
 50

In [35]:
dmatrix = data([10 20 30; 40 50 60])

2×3 DataArrays.DataArray{Int64,2}:
 10  20  30
 40  50  60

In [39]:
dmatrix[1,1] = missing

missing

In [40]:
dmatrix

2×3 DataArrays.DataArray{Int64,2}:
   missing  20  30
 40         50  60

In [41]:
typeof(missing)

Missings.Missing

Arguably, the most important and commonly used data type in statistical computing, whether if is in R (data.frame)
or Python (Pandas), the data frame is the most popular.  This is due to the fact that all real-world data is mostly
in tabular or spreadsheet-like format.

This dataset can't be represented using a Data Arrays, because different types of data can be in different columns, thus, NOT a matrix!

All columns must be of the same length.

DataFrames are used to represent tabular data having DataArrays as columns.

In [42]:
using DataFrames

In [48]:
# tabular data structure
df = DataFrame(Name = ["Jeff", "Joe", "Kathy"], Count = [1, 2, 3], OS = ["Linux", "Windows", "Mac"])

Unnamed: 0,Name,Count,OS
1,Jeff,1,Linux
2,Joe,2,Windows
3,Kathy,3,Mac


In [49]:
typeof(df)

DataFrames.DataFrame

In [50]:
typeof(df[:Name])

Array{String,1}

In [51]:
typeof(df[:Count])

Array{Int64,1}

In [52]:
typeof(df[:OS])

Array{String,1}

In [57]:
size(df)

(3, 3)

In [61]:
rename!(df, :OS => :Computer)

Unnamed: 0,Name,Count,Computer
1,Jeff,1,Linux
2,Joe,2,Windows
3,Kathy,3,Mac


In [62]:
describe(df)

Name
Summary Stats:
Length:         3
Type:           String
Number Unique:  3

Count
Summary Stats:
Mean:           2.000000
Minimum:        1.000000
1st Quartile:   1.500000
Median:         2.000000
3rd Quartile:   2.500000
Maximum:        3.000000
Length:         3
Type:           Int64

Computer
Summary Stats:
Length:         3
Type:           String
Number Unique:  3



In [63]:
display(df)

Unnamed: 0,Name,Count,Computer
1,Jeff,1,Linux
2,Joe,2,Windows
3,Kathy,3,Mac
