# Getting Started with Julia in Colab/Jupyter
You can either run this notebook in Google Colab, or using Jupyter on your own machine.

## Running on Google Colab
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia (the Jupyter kernel for Julia) and other packages. You can update `JULIA_VERSION` and the other parameters, if you know what you're doing. Installation takes 2-3 minutes.
3. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the _Checking the Installation_ section.

* _Note_: If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2 and 3.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.6.0" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia DataFrames CSV Pipe"
JULIA_PACKAGES_IF_GPU="CUDA"
JULIA_NUM_THREADS=4
#---------------------------------------------------#

if [ -n "$COLAB_GPU" ] && [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  if [ "$COLAB_GPU" = "1" ]; then
      JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia  

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Installing Julia 1.6.0 on the current Colab Runtime...
2022-02-06 11:45:02 URL:https://storage.googleapis.com/julialang2/bin/linux/x64/1.6/julia-1.6.0-linux-x86_64.tar.gz [112838927/112838927] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
Installing Julia package DataFrames...


## Checking the Installation
The `versioninfo()` function should print your Julia version and some other info about the system (if you ever ask for help or file an issue about Julia, you should always provide this information).

In [1]:
versioninfo()

Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 4


# Imports

In [2]:
using DataFrames
using CSV
using Pipe

# Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).


In [3]:
download("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv", "cars1.csv")
download("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv", "cars2.csv")

"cars2.csv"

# Step 3. Assign each to a to a variable called cars1 and cars2

In [None]:
cars1 = CSV.read("cars1.csv", DataFrame)
cars2 = CSV.read("cars2.csv", DataFrame)

In [11]:
show(cars1,allcols=true)

[1m198×14 DataFrame[0m
[1m Row [0m│[1m mpg     [0m[1m cylinders [0m[1m displacement [0m[1m horsepower [0m[1m weight [0m[1m acceleration [0m[1m model [0m[1m origin [0m[1m car                               [0m[1m Column10 [0m[1m Column11 [0m[1m Column12 [0m[1m Column13 [0m[1m Column14 [0m
[1m     [0m│[90m Float64 [0m[90m Int64     [0m[90m Int64        [0m[90m String3    [0m[90m Int64  [0m[90m Float64      [0m[90m Int64 [0m[90m Int64  [0m[90m String                            [0m[90m Missing  [0m[90m Missing  [0m[90m Missing  [0m[90m Missing  [0m[90m Missing  [0m
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │    18.0          8           307  130           3504          12.0     70       1  chevrolet chevelle malibu         [90m  missing [0m[90m  missing [0m[90m  missing [0m[90m  missing 

In [13]:
show(cars2,allcols=true)

[1m200×9 DataFrame[0m
[1m Row [0m│[1m mpg     [0m[1m cylinders [0m[1m displacement [0m[1m horsepower [0m[1m weight [0m[1m acceleration [0m[1m model [0m[1m origin [0m[1m car                               [0m
[1m     [0m│[90m Float64 [0m[90m Int64     [0m[90m Int64        [0m[90m String3    [0m[90m Int64  [0m[90m Float64      [0m[90m Int64 [0m[90m Int64  [0m[90m String                            [0m
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │    33.0          4            91  53            1795          17.4     76       3  honda civic
   2 │    20.0          6           225  100           3651          17.7     76       1  dodge aspen se
   3 │    18.0          6           250  78            3574          21.0     76       1  ford granada ghia
   4 │    18.5          6           250  110           3645          16.2     76       1  pontiac ventura sj
   5 │  

# Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1


In [17]:
show(cars1[!, Between(:mpg, :car)],allcols=true)

[1m198×9 DataFrame[0m
[1m Row [0m│[1m mpg     [0m[1m cylinders [0m[1m displacement [0m[1m horsepower [0m[1m weight [0m[1m acceleration [0m[1m model [0m[1m origin [0m[1m car                               [0m
[1m     [0m│[90m Float64 [0m[90m Int64     [0m[90m Int64        [0m[90m String3    [0m[90m Int64  [0m[90m Float64      [0m[90m Int64 [0m[90m Int64  [0m[90m String                            [0m
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │    18.0          8           307  130           3504          12.0     70       1  chevrolet chevelle malibu
   2 │    15.0          8           350  165           3693          11.5     70       1  buick skylark 320
   3 │    18.0          8           318  150           3436          11.0     70       1  plymouth satellite
   4 │    16.0          8           304  150           3433          12.0     70       1  amc rebel

In [18]:
# This syntax works as well if you want from the beginand don't know the column name for some reason
cars1 = cars1[!, Between(begin, :car)]

Unnamed: 0_level_0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin
Unnamed: 0_level_1,Float64,Int64,Int64,String3,Int64,Float64,Int64,Int64
1,18.0,8,307,130,3504,12.0,70,1
2,15.0,8,350,165,3693,11.5,70,1
3,18.0,8,318,150,3436,11.0,70,1
4,16.0,8,304,150,3433,12.0,70,1
5,17.0,8,302,140,3449,10.5,70,1
6,15.0,8,429,198,4341,10.0,70,1
7,14.0,8,454,220,4354,9.0,70,1
8,14.0,8,440,215,4312,8.5,70,1
9,14.0,8,455,225,4425,10.0,70,1
10,15.0,8,390,190,3850,8.5,70,1


# Step 5. What is the number of observations in each dataset?


In [19]:
size(cars1)

(198, 9)

In [20]:
size(cars2)

(200, 9)

# Step 6. Join cars1 and cars2 into a single DataFrame called cars


In [21]:
append!(cars1,cars2)

Unnamed: 0_level_0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin
Unnamed: 0_level_1,Float64,Int64,Int64,String3,Int64,Float64,Int64,Int64
1,18.0,8,307,130,3504,12.0,70,1
2,15.0,8,350,165,3693,11.5,70,1
3,18.0,8,318,150,3436,11.0,70,1
4,16.0,8,304,150,3433,12.0,70,1
5,17.0,8,302,140,3449,10.5,70,1
6,15.0,8,429,198,4341,10.0,70,1
7,14.0,8,454,220,4354,9.0,70,1
8,14.0,8,440,215,4312,8.5,70,1
9,14.0,8,455,225,4425,10.0,70,1
10,15.0,8,390,190,3850,8.5,70,1


# Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.


In [25]:
owners = [ceil(i) for i in range(15000, 73000, length = 398)]

398-element Vector{Float64}:
 15000.0
 15147.0
 15293.0
 15439.0
 15585.0
 15731.0
 15877.0
 16023.0
 16169.0
 16315.0
 16461.0
 16608.0
 16754.0
     ⋮
 71393.0
 71540.0
 71686.0
 71832.0
 71978.0
 72124.0
 72270.0
 72416.0
 72562.0
 72708.0
 72854.0
 73000.0

In [26]:
cars1.owners = owners

398-element Vector{Float64}:
 15000.0
 15147.0
 15293.0
 15439.0
 15585.0
 15731.0
 15877.0
 16023.0
 16169.0
 16315.0
 16461.0
 16608.0
 16754.0
     ⋮
 71393.0
 71540.0
 71686.0
 71832.0
 71978.0
 72124.0
 72270.0
 72416.0
 72562.0
 72708.0
 72854.0
 73000.0

In [27]:
show(cars1,allcols=true)

[1m398×10 DataFrame[0m
[1m Row [0m│[1m mpg     [0m[1m cylinders [0m[1m displacement [0m[1m horsepower [0m[1m weight [0m[1m acceleration [0m[1m model [0m[1m origin [0m[1m car                               [0m[1m owners  [0m
[1m     [0m│[90m Float64 [0m[90m Int64     [0m[90m Int64        [0m[90m String3    [0m[90m Int64  [0m[90m Float64      [0m[90m Int64 [0m[90m Int64  [0m[90m String                            [0m[90m Float64 [0m
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │    18.0          8           307  130           3504          12.0     70       1  chevrolet chevelle malibu          15000.0
   2 │    15.0          8           350  165           3693          11.5     70       1  buick skylark 320                  15147.0
   3 │    18.0          8           318  150           3436          11.0     70       1  plymouth satellite             