# Julia workshop - first example

Welcome!

The purpose of this first example is to introduce you to Jupyter Notebooks and to let you interact with some moderately complex Julia code. This code may appear mysterious and confusing now, but by the end of the workshop, we will have covered enough Julia so that almost all of this code will make sense.

> *Note:* This example is inspired by a teaching approach described by [Mine Çetinkaya-Rundel](https://twitter.com/minebocek) in a lecture she gave at Harvard called *Let Them Eat Cake (First!)* You can watch a video here: https://www.youtube.com/watch?v=RsVOrpXAPXo

## Jupyter notebooks

This is an example of a [Jupyter Notebook](https://jupyter.org/). A Jupyter notebook consists of cells. There are three types of cells.

* Code cells
* Markdown cells
* Raw cells

The next cell is a Code cell that contains Julia code.

You can execute a Code cell by selecting it and either clicking the "Run" button above or by pressing *Shift-Enter*. The result from running the code will apear below.

Try it!

In [None]:
2 + 2

Modify the code and run it again. Don't be afraid to experiment.

The next few cells load Julia libraries we'll need for this first example. When you're ready, please execute each of those cells.

Julia uses *just-in-time compilation* so libraries can take a few seconds to load. The larger the library, the longer loading and compiling take. The good news is that Julia code is very fast after it has been compiled.

In [None]:
using CSV

In [None]:
using DataFrames

In [None]:
using Query

In [None]:
using Gadfly

## Example from Our World in Data

This example uses data from [Our World in Data](https://ourworldindata.org/) and illustrates how household incomes have changed over time for people at different points in income distributions for various countries.

You can read about this in detail at [How are the incomes of the rich changing relative to the incomes of the poor?](https://ourworldindata.org/income-inequality#how-are-the-incomes-of-the-rich-changing-relative-to-the-incomes-of-the-poor)

Execute the next cell to read the data from a CSV file.

In [None]:
household_income_raw = CSV.read("./data/real-disposable-household-income-indexed.csv")

## Data wrangling

Next we will do some data wrangling to get this data into a format that easier to plot.

The code probably won't make complete sense to you now, but but by the end of the workshop it should be much clearer. For now just execute the cell.

You may get a warning related to some code in a Julia library. That's OK and won't affect the example.

Notice how the structure the dataset changes.

In [None]:
household_income = household_income_raw |>
    @rename(:Entity => :Country) |>
    DataFrame |>
    (df -> stack(df, r"\d+")) |>
    (df -> categorical(df, [:Country, :Code])) |>
    @rename(:variable => :Decile, :value => :Index) |>
    DataFrame

## Data visualization

Now we're ready to make some plots. Execute the next cell to plot data from the United Kingdom.

Again, you may see a warning. This time the code will take longer to run. We'll talk about why in the workshop.

We'll also discuss what the plot means.

In [None]:
set_default_plot_size(800px, 600px)

household_income |>
    @filter(_.Country == "United Kingdom") |>
    DataFrame |>
    (df -> plot(df, x=:Year,
                    y=:Index,
                    color=:Decile,
                    layer(Geom.line, Geom.point),
                    layer(yintercept=[100], Geom.hline(color=["black"], style=[:dot])),
                    Guide.title("Growth of Real Disposable Household Income by Decile")))

Now we'll compare two countries. Execute the next cell to compare the United Kingdom and the United States.

This time the code should run faster.

In [None]:
countries = ["United Kingdom", "United States"]
household_income |>
    @filter(_.Country in countries) |>
    DataFrame |>
    (df -> plot(df, x=:Year,
                    y=:Index,
                    color=:Decile,
                    xgroup=:Country,
                    Geom.subplot_grid(layer(Geom.line, Geom.point),
                                      layer(DataFrame(yint=[100, 100], Country=countries),
                                            xgroup=:Country, yintercept=:yint,
                                            Geom.hline(color="black", style=:dot))),
                    Guide.title("Growth of Real Disposable Household Income by Decile")))

Now it's your turn to do some coding. Modify the example above to compare other countries.

If you want to see a list of countries in the dataset, you can run `show(unique(household_income.Country))` in a new code cell.