# Getting Started with Julia in Colab/Jupyter
You can either run this notebook in Google Colab, or using Jupyter on your own machine.

## Running on Google Colab
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia (the Jupyter kernel for Julia) and other packages. You can update `JULIA_VERSION` and the other parameters, if you know what you're doing. Installation takes 2-3 minutes.
3. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the _Checking the Installation_ section.

* _Note_: If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2 and 3.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.6.0" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia DataFrames Pipe CSV"
JULIA_PACKAGES_IF_GPU="CUDA"
JULIA_NUM_THREADS=4
#---------------------------------------------------#

if [ -n "$COLAB_GPU" ] && [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  if [ "$COLAB_GPU" = "1" ]; then
      JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia  

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Installing Julia 1.6.0 on the current Colab Runtime...
2022-01-11 10:38:28 URL:https://storage.googleapis.com/julialang2/bin/linux/x64/1.6/julia-1.6.0-linux-x86_64.tar.gz [112838927/112838927] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
Installing Julia package DataFrames...
Installing Julia package Pipe...
Installing Julia package CSV...


## Checking the Installation
The `versioninfo()` function should print your Julia version and some other info about the system (if you ever ask for help or file an issue about Julia, you should always provide this information).

In [1]:
versioninfo()

Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 4


# Step 1. Import the necessary libraries

In [2]:
using DataFrames
using CSV
using Pipe
using Statistics

# Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/02_Filtering_%26_Sorting/Euro12/Euro_2012_stats_TEAM.csv).


In [3]:
URL = "https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/02_Filtering_%26_Sorting/Euro12/Euro_2012_stats_TEAM.csv"
download(URL, "euro_stats.csv")

"euro_stats.csv"

# Step 3. Assign it to a variable called euro12.


In [4]:
euro12 = CSV.read("euro_stats.csv", DataFrame)

Unnamed: 0_level_0,Team,Goals,Shots on target,Shots off target,Shooting Accuracy
Unnamed: 0_level_1,String31,Int64,Int64,Int64,String7
1,Croatia,4,13,12,51.9%
2,Czech Republic,4,13,18,41.9%
3,Denmark,4,10,10,50.0%
4,England,5,11,18,50.0%
5,France,3,22,24,37.9%
6,Germany,10,32,32,47.8%
7,Greece,5,8,18,30.7%
8,Italy,6,34,45,43.0%
9,Netherlands,2,12,36,25.0%
10,Poland,2,15,23,39.4%


# Step 4. Select only the Goal column.

In [5]:
euro12.Goals

16-element Vector{Int64}:
  4
  4
  4
  5
  3
 10
  5
  6
  2
  2
  6
  1
  5
 12
  5
  2

# Step 5. How many team participated in the Euro2012?


In [6]:
nrow(euro12)

16

# Step 6. What is the number of columns in the dataset?


In [7]:
n_cols = ncol(euro12)

35


# Step 7. View only the columns Team, Yellow Cards and Red Cards and assign them to a dataframe called discipline

In [8]:
discipline = euro12[!, ["Team", "Yellow Cards", "Red Cards"]]

Unnamed: 0_level_0,Team,Yellow Cards,Red Cards
Unnamed: 0_level_1,String31,Int64,Int64
1,Croatia,9,0
2,Czech Republic,7,0
3,Denmark,4,0
4,England,5,0
5,France,6,0
6,Germany,4,0
7,Greece,9,1
8,Italy,16,0
9,Netherlands,5,0
10,Poland,7,1


# Step 8. Sort the teams by Red Cards, then to Yellow Cards

In [9]:
sort(discipline, [order("Red Cards",rev=true), "Yellow Cards"])

Unnamed: 0_level_0,Team,Yellow Cards,Red Cards
Unnamed: 0_level_1,String31,Int64,Int64
1,Republic of Ireland,6,1
2,Poland,7,1
3,Greece,9,1
4,Denmark,4,0
5,Germany,4,0
6,England,5,0
7,Netherlands,5,0
8,Ukraine,5,0
9,France,6,0
10,Russia,6,0


In [10]:
? sort

search: [0m[1ms[22m[0m[1mo[22m[0m[1mr[22m[0m[1mt[22m [0m[1ms[22m[0m[1mo[22m[0m[1mr[22m[0m[1mt[22m! [0m[1ms[22m[0m[1mo[22m[0m[1mr[22m[0m[1mt[22mperm [0m[1ms[22m[0m[1mo[22m[0m[1mr[22m[0m[1mt[22mperm! [0m[1ms[22m[0m[1mo[22m[0m[1mr[22m[0m[1mt[22mslices in[0m[1ms[22m[0m[1mo[22m[0m[1mr[22m[0m[1mt[22med C[0m[1ms[22mh[0m[1mo[22m[0m[1mr[22m[0m[1mt[22m i[0m[1ms[22ms[0m[1mo[22m[0m[1mr[22m[0m[1mt[22med



```
sort(v; alg::Algorithm=defalg(v), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)
```

Variant of [`sort!`](@ref) that returns a sorted copy of `v` leaving `v` itself unmodified.

# Examples

```jldoctest
julia> v = [3, 1, 2];

julia> sort(v)
3-element Vector{Int64}:
 1
 2
 3

julia> v
3-element Vector{Int64}:
 3
 1
 2
```

---

```
sort(A; dims::Integer, alg::Algorithm=DEFAULT_UNSTABLE, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)
```

Sort a multidimensional array `A` along the given dimension. See [`sort!`](@ref) for a description of possible keyword arguments.

To sort slices of an array, refer to [`sortslices`](@ref).

# Examples

```jldoctest
julia> A = [4 3; 1 2]
2×2 Matrix{Int64}:
 4  3
 1  2

julia> sort(A, dims = 1)
2×2 Matrix{Int64}:
 1  2
 4  3

julia> sort(A, dims = 2)
2×2 Matrix{Int64}:
 3  4
 1  2
```

---

```
sort(df::AbstractDataFrame, cols=All();
     alg::Union{Algorithm, Nothing}=nothing,
     lt::Union{Function, AbstractVector{<:Function}}=isless,
     by::Union{Function, AbstractVector{<:Function}}=identity,
     rev::Union{Bool, AbstractVector{Bool}}=false,
     order::Union{Ordering, AbstractVector{<:Ordering}}=Forward,
     view::Bool=false)
```

Return a data frame containing the rows in `df` sorted by column(s) `cols`. Sorting on multiple columns is done lexicographically.

`cols` can be any column selector (`Symbol`, string or integer; `:`, `Cols`, `All`, `Between`, `Not`, a regular expression, or a vector of `Symbol`s, strings or integers). If `cols` selects no columns, sort `df` on all columns (this behaviour is deprecated and will change in future versions).

If `rev` is `true`, reverse sorting is performed. To enable reverse sorting only for some columns, pass `order(c, rev=true)` in `cols`, with `c` the corresponding column index (see example below).

The `by` keyword allows providing a function that will be applied to each cell before comparison; the `lt` keyword allows providing a custom "less than" function. If both `by` and `lt` are specified, the `lt` function is applied to the result of the `by` function.

All the keyword arguments can be either a single value, which is applied to all columns, or a vector of length equal to the number of columns that the operation is performed on. In such a case each entry is used for the column in the corresponding position in `cols`.

If `alg` is `nothing` (the default), the most appropriate algorithm is chosen automatically among `TimSort`, `MergeSort` and `RadixSort` depending on the type of the sorting columns and on the number of rows in `df`.

If `view=false` a freshly allocated `DataFrame` is returned. If `view=true` then a `SubDataFrame` view into `df` is returned.

# Examples

```jldoctest
julia> df = DataFrame(x=[3, 1, 2, 1], y=["b", "c", "a", "b"])
4×2 DataFrame
 Row │ x      y
     │ Int64  String
─────┼───────────────
   1 │     3  b
   2 │     1  c
   3 │     2  a
   4 │     1  b

julia> sort(df, :x)
4×2 DataFrame
 Row │ x      y
     │ Int64  String
─────┼───────────────
   1 │     1  c
   2 │     1  b
   3 │     2  a
   4 │     3  b

julia> sort(df, [:x, :y])
4×2 DataFrame
 Row │ x      y
     │ Int64  String
─────┼───────────────
   1 │     1  b
   2 │     1  c
   3 │     2  a
   4 │     3  b

julia> sort(df, [:x, :y], rev=true)
4×2 DataFrame
 Row │ x      y
     │ Int64  String
─────┼───────────────
   1 │     3  b
   2 │     2  a
   3 │     1  c
   4 │     1  b

julia> sort(df, [:x, order(:y, rev=true)])
4×2 DataFrame
 Row │ x      y
     │ Int64  String
─────┼───────────────
   1 │     1  c
   2 │     1  b
   3 │     2  a
   4 │     3  b
```


# Step 9. Calculate the mean Yellow Cards given per Team


In [11]:
mean(discipline[!,"Yellow Cards"])

7.4375

# Step 10. Filter teams that scored more than 6 goals

In [12]:
euro12[euro12.Goals .> 6,:]

Unnamed: 0_level_0,Team,Goals,Shots on target,Shots off target,Shooting Accuracy,% Goals-to-shots
Unnamed: 0_level_1,String31,Int64,Int64,Int64,String7,String7
1,Germany,10,32,32,47.8%,15.6%
2,Spain,12,42,33,55.9%,16.0%


# Step 11. Select the teams that start with G

In [13]:
euro12[startswith.(euro12[!,:Team], "G"), :]

Unnamed: 0_level_0,Team,Goals,Shots on target,Shots off target,Shooting Accuracy,% Goals-to-shots
Unnamed: 0_level_1,String31,Int64,Int64,Int64,String7,String7
1,Germany,10,32,32,47.8%,15.6%
2,Greece,5,8,18,30.7%,19.2%


# Step 12. Select the first 7 columns

In [14]:
euro12[!, 1:7]

Unnamed: 0_level_0,Team,Goals,Shots on target,Shots off target,Shooting Accuracy
Unnamed: 0_level_1,String31,Int64,Int64,Int64,String7
1,Croatia,4,13,12,51.9%
2,Czech Republic,4,13,18,41.9%
3,Denmark,4,10,10,50.0%
4,England,5,11,18,50.0%
5,France,3,22,24,37.9%
6,Germany,10,32,32,47.8%
7,Greece,5,8,18,30.7%
8,Italy,6,34,45,43.0%
9,Netherlands,2,12,36,25.0%
10,Poland,2,15,23,39.4%


# Step 13. Select all columns except the last 3.

In [15]:
euro12[!, 1:end-3]

Unnamed: 0_level_0,Team,Goals,Shots on target,Shots off target,Shooting Accuracy
Unnamed: 0_level_1,String31,Int64,Int64,Int64,String7
1,Croatia,4,13,12,51.9%
2,Czech Republic,4,13,18,41.9%
3,Denmark,4,10,10,50.0%
4,England,5,11,18,50.0%
5,France,3,22,24,37.9%
6,Germany,10,32,32,47.8%
7,Greece,5,8,18,30.7%
8,Italy,6,34,45,43.0%
9,Netherlands,2,12,36,25.0%
10,Poland,2,15,23,39.4%


# Step 14. Present only the Shooting Accuracy from England, Italy and Russia


In [42]:
country_eq(x) =   in(x, ["England", "Italy", "Russia"])

country_eq (generic function with 1 method)

In [45]:
filter(x->country_eq.(x[:Team]) ,euro12)[!,["Team","Shooting Accuracy"]]

Unnamed: 0_level_0,Team,Shooting Accuracy
Unnamed: 0_level_1,String31,String7
1,England,50.0%
2,Italy,43.0%
3,Russia,22.5%
