# Getting Started with Julia in Colab/Jupyter
You can either run this notebook in Google Colab, or using Jupyter on your own machine.

## Running on Google Colab
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia (the Jupyter kernel for Julia) and other packages. You can update `JULIA_VERSION` and the other parameters, if you know what you're doing. Installation takes 2-3 minutes.
3. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the _Checking the Installation_ section.

* _Note_: If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2 and 3.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.6.0" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia DataFrames CSV Pipe"
JULIA_PACKAGES_IF_GPU="CUDA"
JULIA_NUM_THREADS=4
#---------------------------------------------------#

if [ -n "$COLAB_GPU" ] && [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  if [ "$COLAB_GPU" = "1" ]; then
      JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia  

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Installing Julia 1.6.0 on the current Colab Runtime...
2022-01-04 12:47:32 URL:https://storage.googleapis.com/julialang2/bin/linux/x64/1.6/julia-1.6.0-linux-x86_64.tar.gz [112838927/112838927] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
Installing Julia package DataFrames...
Installing Julia package CSV...
Installing Julia package Pipe...


## Checking the Installation
The `versioninfo()` function should print your Julia version and some other info about the system (if you ever ask for help or file an issue about Julia, you should always provide this information).

In [1]:
versioninfo()

Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 4


# Step 1. Import the necessary libraries

In [2]:
using CSV
using DataFrames
using Pipe
using Statistics

# Step 2. Import the dataset from this [chipotle](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv).

In [3]:
URL = "https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv"
download(URL, "chipotle.tsv")

"chipotle.tsv"

# Step 3. Assign it to a variable called chipo.

In [4]:
chipo = CSV.read("chipotle.tsv", DataFrame, delim = "\t")

Unnamed: 0_level_0,order_id,quantity,item_name,choice_description
Unnamed: 0_level_1,Int64,Int64,String,String
1,1,1,Chips and Fresh Tomato Salsa,
2,1,1,Izze,[Clementine]
3,1,1,Nantucket Nectar,[Apple]
4,1,1,Chips and Tomatillo-Green Chili Salsa,
5,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]"
6,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sour Cream, Guacamole, Lettuce]]"
7,3,1,Side of Chips,
8,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]"
9,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Cheese, Sour Cream, Lettuce]]"
10,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto Beans, Cheese, Sour Cream, Lettuce]]"


# Step 4. See the first 10 entries

In [5]:
first(chipo, 10)

Unnamed: 0_level_0,order_id,quantity,item_name,choice_description
Unnamed: 0_level_1,Int64,Int64,String,String
1,1,1,Chips and Fresh Tomato Salsa,
2,1,1,Izze,[Clementine]
3,1,1,Nantucket Nectar,[Apple]
4,1,1,Chips and Tomatillo-Green Chili Salsa,
5,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]"
6,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sour Cream, Guacamole, Lettuce]]"
7,3,1,Side of Chips,
8,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]"
9,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Cheese, Sour Cream, Lettuce]]"
10,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto Beans, Cheese, Sour Cream, Lettuce]]"


# Step 5. What is the number of observations in the dataset?

In [6]:
nrow(chipo)

4622

# Step 6. What is the number of columns in the dataset?

In [7]:
ncol(chipo)

5

# Step 7. Print the name of all the columns.

In [8]:
names(chipo)

5-element Vector{String}:
 "order_id"
 "quantity"
 "item_name"
 "choice_description"
 "item_price"

In [9]:
gchipo = groupby(chipo, :item_name)

Unnamed: 0_level_0,order_id,quantity,item_name,choice_description,item_price
Unnamed: 0_level_1,Int64,Int64,String,String,String7
1,1,1,Chips and Fresh Tomato Salsa,,$2.39
2,13,1,Chips and Fresh Tomato Salsa,,$2.39
3,25,1,Chips and Fresh Tomato Salsa,,$2.39
4,39,1,Chips and Fresh Tomato Salsa,,$2.95
5,82,1,Chips and Fresh Tomato Salsa,,$2.95
6,86,1,Chips and Fresh Tomato Salsa,,$2.95
7,104,1,Chips and Fresh Tomato Salsa,,$2.95
8,115,1,Chips and Fresh Tomato Salsa,,$2.39
9,116,1,Chips and Fresh Tomato Salsa,,$2.95
10,131,1,Chips and Fresh Tomato Salsa,,$2.39

Unnamed: 0_level_0,order_id,quantity,item_name,choice_description
Unnamed: 0_level_1,Int64,Int64,String,String
1,1500,1,Carnitas Salad,"[[Fresh Tomato Salsa (Mild), Roasted Chili Corn Salsa (Medium)], [Black Beans, Rice, Cheese, Sour Cream]]"


In [10]:
combine(gchipo,nrow)

Unnamed: 0_level_0,item_name,nrow
Unnamed: 0_level_1,String,Int64
1,Chips and Fresh Tomato Salsa,110
2,Izze,20
3,Nantucket Nectar,27
4,Chips and Tomatillo-Green Chili Salsa,31
5,Chicken Bowl,726
6,Side of Chips,101
7,Steak Burrito,368
8,Steak Soft Tacos,55
9,Chips and Guacamole,479
10,Chicken Crispy Tacos,47


# Step 9. Which was the most-ordered item?

In [11]:
chipo_groupby = @pipe chipo |> groupby(_, :item_name) |> combine(_, :quantity => sum=>:total_quantity)
sort(chipo_groupby, :total_quantity, rev = true)

Unnamed: 0_level_0,item_name,total_quantity
Unnamed: 0_level_1,String,Int64
1,Chicken Bowl,761
2,Chicken Burrito,591
3,Chips and Guacamole,506
4,Steak Burrito,386
5,Canned Soft Drink,351
6,Chips,230
7,Steak Bowl,221
8,Bottled Water,211
9,Chips and Fresh Tomato Salsa,130
10,Canned Soda,126


# Step 10. For the most-ordered item, how many items were ordered?
Exactly the same as above

# Step 11. What was the most ordered item in the choice_description column?

In [12]:
chipo

Unnamed: 0_level_0,order_id,quantity,item_name,choice_description
Unnamed: 0_level_1,Int64,Int64,String,String
1,1,1,Chips and Fresh Tomato Salsa,
2,1,1,Izze,[Clementine]
3,1,1,Nantucket Nectar,[Apple]
4,1,1,Chips and Tomatillo-Green Chili Salsa,
5,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]"
6,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sour Cream, Guacamole, Lettuce]]"
7,3,1,Side of Chips,
8,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]"
9,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Cheese, Sour Cream, Lettuce]]"
10,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto Beans, Cheese, Sour Cream, Lettuce]]"


In [13]:
choice_description_groupby = @pipe chipo |> groupby(_, :choice_description) |> combine(_, :quantity => sum => :total_quantity)
sort(choice_description_groupby, :total_quantity, rev = true)

Unnamed: 0_level_0,choice_description
Unnamed: 0_level_1,String
1,
2,[Diet Coke]
3,[Coke]
4,[Sprite]
5,"[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream, Lettuce]]"
6,"[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream]]"
7,"[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream, Guacamole, Lettuce]]"
8,[Lemonade]
9,"[Fresh Tomato Salsa (Mild), [Pinto Beans, Rice, Cheese, Sour Cream]]"
10,[Coca Cola]


# Step 12. How many items were orderd in total?

In [14]:
sum(chipo[!,:quantity])

4972

# Step 13. Turn the item price into a float

In [15]:
transform!(chipo, :item_price => ByRow(x -> parse(Float64, x[2:end])) => :item_price)

Unnamed: 0_level_0,order_id,quantity,item_name,choice_description
Unnamed: 0_level_1,Int64,Int64,String,String
1,1,1,Chips and Fresh Tomato Salsa,
2,1,1,Izze,[Clementine]
3,1,1,Nantucket Nectar,[Apple]
4,1,1,Chips and Tomatillo-Green Chili Salsa,
5,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]"
6,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sour Cream, Guacamole, Lettuce]]"
7,3,1,Side of Chips,
8,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]"
9,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Cheese, Sour Cream, Lettuce]]"
10,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto Beans, Cheese, Sour Cream, Lettuce]]"


In [16]:
chipo.item_price

4622-element PooledArrays.PooledVector{Float64, UInt32, SentinelArrays.ChainedVector{UInt32, Vector{UInt32}}}:
  2.39
  3.39
  3.39
  2.39
 16.98
 10.98
  1.69
 11.75
  9.25
  9.25
  4.45
  8.75
  8.75
  ⋮
 11.75
 11.25
  9.25
  2.15
  1.5
  8.75
  4.45
 11.75
 11.75
 11.25
  8.75
  8.75

# Step 14. How much was the revenue for the period in the dataset?

In [17]:
sum(chipo.quantity .* chipo.item_price)

39237.02

# Step 15. How many orders were made in the period?

In [18]:
order_counts = nrow(combine(groupby(chipo, :order_id), nrow => :count_per_order))

1834

# Step 16. What is the average revenue amount per order?

In [31]:
chipo[!, :revenue] = chipo.item_price .* chipo.quantity
grouped_by_order_id = combine(groupby(chipo, :order_id),:revenue=>sum)
mean(grouped_by_order_id.revenue_sum)

21.39423118865868

# Step 17. How many different items are sold?

In [33]:
num_items = nrow(combine(groupby(chipo, :item_name), nrow => :count_item_name))

50