# JuliaR

### On DataFrames and R (dplyr)-like functionality in Julia

<img src="meme.jpg" alt="Drawing" style="width: 500px;"/>


---

## Package management 

### To download

```Pkg.add``` $\equiv$ ```install.packages```

In [1]:
import Pkg # load Pkg package into namespace
Pkg.add("DataFrames") # download package from github

[32m[1m   Updating[22m[39m registry at `~/.julia/registries/General`


[?25l    

[32m[1m   Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`




[32m[1m  Resolving[22m[39m package versions...
[32m[1mNo Changes[22m[39m to `~/.julia/environments/v1.5/Project.toml`
[32m[1mNo Changes[22m[39m to `~/.julia/environments/v1.5/Manifest.toml`


### To load 

Either

In [2]:
import DataFrames 

Adds functions/variables/structs from ```DataFrames``` to namesapces with ```DataFrames.``` prefix.

i.e. ```DataFrames.nrow()``` calls the ```nrow()``` function from ```DataFrames```;

sans prefix, ```nrow()``` throws an error.

In [None]:
nrow

or

In [None]:
using DataFrames

Adds exported functions/variables/structs from ```DataFrames``` to namespace.

i.e. ```nrow()``` calls the ```nrow()``` function from ```DataFrames```.

In [None]:
nrow

---

## Getting help

1) type ```?``` then the thing you want help with in the REPL

In [None]:
? + 

2) Google it. 

3) Post in the slack

---

## Syntax 

- No ```assign``` or ```get``` operators, only ```=```

- In R the ```.``` has two uses

    1) we can use ```.``` within a variable name

    i.e. ```a.long.variable.name <- 1```

    2) can be used to define functions for a class

    i.e. if ```myclass``` is a class in R,

    ```
    > x <- c(2,3)
    > class(x) <- "myclass"
    > aFun.myclass <- function(x,...) 2*x + 1 
    > aFun(x)
    [1] 5 7
    attr(,"class")
    [1] "myclass"
    ```
    calls ```aFun.myclass(x)```.

    In Julia the ```.``` has two main uses 

    1) to access fields within an object/tuple (much like ```$``` in R)

    i.e. 

In [None]:
a = (b=1,c=2)
a.c

    2) to broadcast a function over an array (this may be my favourite feature)

    i.e. 

In [None]:
aFun = function(x) exp(x) end
A = [1 2; 3 4]
aFun.(A) ## elementwise exponential

- vectors

    In R ```c(1,2,3,4)```

    In julia 

In [None]:
[1;2;3;4]

- Heaps of others

---

## Loading data 

In [None]:
import CSV 

Either 

```CSV.read("income.csv", DataFrame)```

or 

In [None]:
incomeCSV = CSV.File("income.csv") # an object of CSV type
income = DataFrame(incomeCSV)

which is equivalent to 

In [None]:
income = CSV.File("income.csv") |> DataFrame # piping!!

There is also a ```DelimitedFiles``` package 

---

## Basic operations 

The usage of some functionality is *exactly* the same as R;

In [None]:
nrow(income)

In [None]:
ncol(income)

In [None]:
names(income)

---

## Indexing ```DataFrames```

There are many options (too many?).

Either 

In [None]:
income.Sex # most similar to income$Sex in R
income."Sex"

Or index with square brackets
```
income[rows, columns]
```

The rows argument can be 
```
: # get a COPY of all the rows
! # get a VIEW of all the rows 
```
Arrays indexing the rows are also allowed (i.e.```1:2:100``` or ```income[income.AHE.>10,:]```) but not recommended. 

The columns argument can be any of 
```
:Sex # the syntax for a Symbol type
"Sex" 
3
```
and can be passed as-is to return a ```Vector```

or as an ```Array``` to return a```DataFrame``` i.e.
```
[:Sex]
["Sex"]
[3]
```

In [None]:
## most similar to income$Sex in R
income.Sex 
income."Sex"
# returns a vector

## return a COPY of the column(s)
# as a vector
income[:,:Sex] 
income[:,"Sex"]
income[:,3]
# as a DataFrame
income[:,[:Sex]]
income[:,["Sex"]]
income[:,[3]]

## return a VIEW of the column(s)
# as a vector
income[!,:Sex] 
income[!,"Sex"]
income[!,3]
# as a DataFrame
income[!,[:Sex]] # this is my personal preference
income[!,["Sex"]]
income[!,[3]]


We can also pass arrays of indicies 

```income[!,[:Sex, :Year]]```

or invert selection with 

```income[!,Not(:Sex)]```


#### What is the difference between a *copy* and a *view*

In [None]:
aCopy = income[:,[:Sex]]
aCopy[1,:Sex] = "hello"
first(aCopy,5)

In [None]:
first(income,5)

In [None]:
aView = income[!,[:Sex]]
aView[1,:Sex] = "hello"
first(aView,5)

In [None]:
first(income,5)

***```income``` has how changed!!!!***

Be careful.

---

## Piping 

Pass the object on the left to the function of the right. 

i.e. In R we use the `%>%` infix operator 
```
> aFun <- function(x) x^2
> b <- 2 
> b %>% aFun
[1] 4
```

In julia its `|>`

In [None]:
fun(x) = x^2
b = 2 
b |> fun  

Can broadcast via piping too

In [None]:
A = [1 2; 3 4]
A .|> fun 

In [None]:
A |> fun 

If this is your jam, then see the `Piping` package for some cool stuff.

## A likeness with ```dplyr```

The ```DataFrames``` package has the same data manipulation functionality as ```dplyr```



``` R ``` functions and their ```julia``` equivalents;

 - In R ```rename()```$\equiv$ in julia ```rename()``` - rename columns.
 
 - In R ```filter()```$\equiv$ in julia ```filter()``` - picks cases based on their values.
 
 - In R ```select()```$\equiv$ in julia ```select()``` - picks variables based on their names.
 
 - In R ```mutate()```$\equiv$ in julia ```transform()``` - adds new variables that are functions of existing variables.

 - In R ```summarise()```$\equiv$ in julia ```combine()``` - reduces multiple values down to a single summary

 - In R ```arrange()```$\equiv$ in julia ```sort()``` - changes the ordering of the rows.
 
 - In R ```group_by()```$\equiv$in julia ```groupby()``` - returns a ```GroupedDataFrame``` object.
 

```rename!(), filter!(), select!(), transform!(), sort!()``` also exist to manipulate DataFrames in-place
 
Here, common syntax is either 
```
:ColumName => :NewName
:ColumName => function => :NewName
:ColumName => function
```

Ex's.



In [None]:
incomeCopy = copy(income);
head(x) = first(x,5)
rename!(incomeCopy, :AHE => :Income) |> head # ! to operator on incomeCopy in place

In [None]:
filter!(:Income => (x-> x>20), incomeCopy) |> head # for some reason this one is backwards to the others!
# also has an optional argument view::Bool to specify whether to return a view or a copy

In [None]:
select!(incomeCopy, Not(:Year)) |> head 

In [None]:
incomeAdjust(income,sex) = (isequal.(sex,"male")*0.87 + isequal.(sex,"female")) .* income
transform!(incomeCopy,[:Income,:Sex] => incomeAdjust => :AdjustedIncome) |> head 

In [None]:
incomeBySex = groupby(income, :Sex) 

In [None]:
keys(incomeBySex)

In [None]:
incomeBySex[(Sex="male",)] |> head 

In [None]:
incomeBySex[1] |> head 

In [None]:
incomeBySexYear = groupby(income, [:Sex, :Year])
keys(incomeBySexYear)

In [None]:
incomeBySexYear[(Sex="male",Year=1992)] |> head 

In [None]:
describe(income)

In [None]:
sort!(income, :AHE, rev = true) |> head 

---

## Functions 

Anonymous functions 

In [None]:
x -> 2*x + 1 

In-line function 

In [None]:
f1(x) = 2*x + 1

Other function definition 

In [None]:
f2 = function(x)
    2*x + 1 
end 

In [None]:
function f3(x)
    2*x + 1
end

Can also use a return keyword (like R)

In [None]:
function f4(x) 
    return 2*x + 1
end 

Optional/named arguments 

In [None]:
function f5(;x=1)
    return 2*x + 1
end

In [None]:
f5()

In [None]:
f5(1)

In [None]:
f5(x=1)

Typing and multiple dispatch are far too expansive topics to cover here. 

You can do most things without them to begin with. 

---

## Linear Models 

In [None]:
using GLM

In [None]:
lm1 = lm(@formula(log(AHE) ~ Year), income)

In [None]:
exp.(predict(lm1, DataFrame(Year=2021, Sex="female")))

In [None]:
using Plots 

see also `StatsPlots`

In [None]:
scatter(fitted(lm1.model), residuals(lm1.model))

In [None]:
lm2 = lm(@formula(log(AHE) ~ Sex + Year), income, contrasts = Dict(:Year => DummyCoding()))