# <span style="color:#2c061f"> Macro 318: Tutorial #2 </span>  

<br>

## <span style="color:#374045"> Math, Stats and Data with Julia </span>


#### <span style="color:#374045"> Lecturer: </span> <span style="color:#d89216"> <br> Dawie van Lill (dvanlill@sun.ac.za) </span>

# Introduction

In this tutorial we will work with data in Julia. The primary package that we will be using is `DataFrames.jl`. If you were working in Python, the equivalent package would be `pandas`. We will also explore some basic math topics and see how they can be coded up using Julia.  

In [12]:
import Pkg

In [21]:
Pkg.add("CategoricalArrays")
Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("DataFramesMeta")
Pkg.add("GLM")
Pkg.add("Random")
Pkg.add("RDatasets")
Pkg.add("Statistics")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Project.toml`
 [90m [336ed68f] [39m[92m+ CSV v0.9.11[39m
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.7/Project.toml`

In [14]:
using CategoricalArrays
using CSV
using DataFrames
using DataFramesMeta
using GLM
using Random
using RDatasets
using Statistics

# Working with data

We will start our discussion with how to work with data in Julia. We will cover some basic statistics and then in the last section move on to some fundamental ideas in mathematics (mostly related to calculus). 

The primary package for working with data in Julia is `DataFrames.jl`. For a comprehensive tutorial series on this package I would recommend Bogumił Kamiński's [Introduction to DataFrames](https://github.com/bkamins/Julia-DataFrames-Tutorial).

# DataFrames basics

In this section we discuss basic principles from the DataFrames package. For the first topic we look at how to construct and access DataFrames. 

## Constructors

The easiest thing to construct is an empty DataFrame. 

In [16]:
DataFrame() # empty DataFrame

You could also construct a DataFrame with different keyword arguments. Notice the different types of the different columns. 

In [20]:
DataFrame(A = 2:5, B = randn(4), C = "Hello")

Unnamed: 0_level_0,A,B,C
Unnamed: 0_level_1,Int64,Float64,String
1,2,-0.653416,Hello
2,3,-0.519066,Hello
3,4,-2.09998,Hello
4,5,0.909537,Hello


One of the most common ways to use constructors is through arrays. 

In [23]:
commodities = ["crude", "gas", "gold", "silver"]
last_price = [4.2, 11.3, 12.1, missing] # notice that the last value is missing

DataFrame(commod = commodities, price = last_price) # give names to columns

Unnamed: 0_level_0,commod,price
Unnamed: 0_level_1,String,Float64?
1,crude,4.2
2,gas,11.3
3,gold,12.1
4,silver,missing


One could also use array comprehensions to generate values for the DataFrame,  

In [24]:
DataFrame([rand(3) for i in 1:3], [:x1, :x2, :x3]) # see how we named the columns

Unnamed: 0_level_0,x1,x2,x3
Unnamed: 0_level_1,Float64,Float64,Float64
1,0.905493,0.453158,0.743097
2,0.683788,0.19378,0.511231
3,0.53654,0.230837,0.00504412


You can also create a DataFrame from a matrix, 

In [27]:
x = DataFrame(rand(3, 3), :auto) # automatically assign column names

Unnamed: 0_level_0,x1,x2,x3
Unnamed: 0_level_1,Float64,Float64,Float64
1,0.72952,0.353773,0.669856
2,0.703105,0.151893,0.567652
3,0.955099,0.695527,0.109053


Incidentally, you can convert the DataFrame into a matrix or array if you so wished, 

In [28]:
Matrix(x)

3×3 Matrix{Float64}:
 0.72952   0.353773  0.669856
 0.703105  0.151893  0.567652
 0.955099  0.695527  0.109053

## Importing data

## Structuring data

# Data analysis