# Ex2 - Getting and Knowing your Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [None]:
using DotEnv
using Pkg

DotEnv.load!()
path = ENV["ENV_PATH"]
Pkg.activate(path)
    
using CSV
using DataFrames
using Downloads
using Statistics

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

### Step 3. Assign it to a variable called chipo.

In [2]:
file = Downloads.download("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv")
chipo = CSV.read(file, DataFrame, delim='\t');

### Step 4. See the first 10 entries

In [3]:
first(chipo, 10)

Row,order_id,quantity,item_name,choice_description,item_price
Unnamed: 0_level_1,Int64,Int64,String,String,String7
1,1,1,Chips and Fresh Tomato Salsa,,$2.39
2,1,1,Izze,[Clementine],$3.39
3,1,1,Nantucket Nectar,[Apple],$3.39
4,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
5,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]",$16.98
6,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sour Cream, Guacamole, Lettuce]]",$10.98
7,3,1,Side of Chips,,$1.69
8,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]",$11.75
9,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Cheese, Sour Cream, Lettuce]]",$9.25
10,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto Beans, Cheese, Sour Cream, Lettuce]]",$9.25


### Step 5. What is the number of observations in the dataset?

In [4]:
n_observation = size(chipo, 1)
@show n_observation;

n_observation = 4622


### Step 6. What is the number of columns in the dataset?

In [5]:
n_columns = size(chipo, 2)
@show n_columns;

n_columns = 5


### Step 7. Print the name of all the columns.

In [6]:
column_names = names(chipo)
@show column_names

column_names = ["order_id", "quantity", "item_name", "choice_description", "item_price"]


5-element Vector{String}:
 "order_id"
 "quantity"
 "item_name"
 "choice_description"
 "item_price"

### Step 8. Which was the most-ordered item? 

In [7]:
most_ordered_item = combine(
    groupby(
        chipo, :item_name
    ), :quantity => sum => :total_ordered
)

most_ordered_item = sort(most_ordered_item, :total_ordered, rev=true)[1, :]

Row,item_name,total_ordered
Unnamed: 0_level_1,String,Int64
1,Chicken Bowl,761


### Step 9. For the most-ordered item, how many items were ordered?

In [8]:
chicken_bowl_total_ordered = sum(chipo[chipo[!, :item_name] .== "Chicken Bowl", "quantity"])
@show chicken_bowl_total_ordered;

chicken_bowl_total_ordered = 761


### Step 10. What was the most ordered item in the choice_description column?

In [9]:
most_ordered_cd = sort(
    combine(
        groupby(
            chipo, :choice_description
        ), :quantity => sum => "total_ordered"
    ), :total_ordered, rev=true
)

first(most_ordered_cd, 3)

Row,choice_description,total_ordered
Unnamed: 0_level_1,String,Int64
1,,1382
2,[Diet Coke],159
3,[Coke],143


### Step 11. How many items were orderd in total?

In [10]:
total_items_ordered = sum(chipo[!, :quantity])
@show total_items_ordered;

total_items_ordered = 4972


### Step 12. Turn the item price into a float

#### Step 12.a. Check the item price type

In [11]:
typeof(chipo[!, "item_price"])

PooledVector{String7, UInt32, Vector{UInt32}}[90m (alias for [39m[90mPooledArrays.PooledArray{String7, UInt32, 1, Array{UInt32, 1}}[39m[90m)[39m

#### Step 12.b. Change the type of item price to Float64

In [12]:
chipo[!, "item_price"] = strip.(chipo[!, "item_price"], '$')
chipo[!, "item_price"] = strip.(chipo[!, "item_price"], ' ')
chipo[!, "item_price"] = parse.(Float64, chipo[!, "item_price"]);

#### Step 12.c. Check the item price type

In [13]:
typeof(chipo[!, "item_price"])

Vector{Float64}[90m (alias for [39m[90mArray{Float64, 1}[39m[90m)[39m

### Step 13. How much was the revenue for the period in the dataset?

In [14]:
total_revenue = sum(chipo[!, :quantity] .* chipo[!, :item_price])
@show total_revenue;

total_revenue = 39237.02


### Step 14. How many orders were made in the period?

In [15]:
total_orders = combine(
    groupby(
        chipo, :order_id
    ), nrow => :total_orders
)

total_orders = size(total_orders, 1)

1834

### Step 15. What is the average revenue amount per order?

In [56]:
# Solution 1
using Statistics

chipo[!, "revenue"] = chipo[!, :quantity] .* chipo[!, :item_price]
grouped = combine(
    groupby(
        chipo, :order_id
    ), "revenue" => sum => :revenue_sum
)
average_revenue = mean(grouped[!, :revenue_sum])
@show average_revenue;

average_revenue = 21.39423118865868


### Step 16. How many different items are sold?

In [46]:
n_different_items_sold = length(unique(chipo[!, "item_name"]))
@show n_different_items_sold;

n_different_items_sold = 50
