# Data Engineering - Lab 1


## Intro to table of contents

In this lab we'll be introduced to the Google Colab environment. To start this notebook simply go to "Runtime" and choose "Run All". This will connect Colab to your virtual machine and run the code needed to get your database setup.

You'll notice that notebooks are setup with a table of contents navigation bar on the left nav bar of your screen. Mouse over and hover to see which options are available and click "Table of Contents" to see the major sections to this Notebook.  

## Database setups (do not modify)

In [1]:
# setups including getting the database

!pip -q install --upgrade ipython
!pip -q install --upgrade ipython-sql

!wget -O northwind.db https://github.com/matthewpecsok/data_engineering/raw/main/data/northwind.db

import sqlite3
con = sqlite3.connect("northwind.db")

%load_ext sql
%sql sqlite:///northwind.db

%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.9/817.9 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.4/85.4 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires ipython==7.34.0, but you have ipython 8.26.0 which is incompatible.[0m[31m
[0m--2024-08-28 17:05:48--  https://github.com/matthewpecsok/data_engineering/raw/main/data/northwind.db
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/matthewpecsok/data_engineering/main/data/northwind.db [following]
--2024-08-28 17:05:48--  https://raw.githubusercontent.com/matthewpecsok/data_enginee

# Mount Google Drive and Export Your Works

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## ERD of the Northwind Database


![ERD](https://github.com/matthewpecsok/data_engineering/blob/main/data/Northwind_ERD.png?raw=true)


Using %sql allows for one line sql statements
Using %%sql allows for multi-line sql statements and better readability for long sql queries.

In [None]:
%sql select * from products limit 4

In [None]:
%%sql

select
count(1) as product_count
from
products

## Lab question 1

In [None]:
%%sql
select count(1) as unitprice_gt_18
from Products where UnitPrice > 18

## Lab question 2


In [None]:
%%sql select
c.CompanyName,
count(o.OrderID) as order_count


from Orders o
join Customers c on o.CustomerID = c.CustomerID

group by c.CompanyName
order by order_count
desc
limit 5

## Lab question 3



In [None]:
%%sql select
od.OrderID,
ca.CategoryName

from Products p
join "Order Details" od on od.ProductID = p.ProductID
join Categories ca on ca.CategoryID = p.CategoryID

order by ca.CategoryID
limit 5

## Lab question 4



In [None]:
%%sql
select
    COUNT(OrderID) AS RegionCount,
    ShipRegion AS ShipRegion
from
    Orders
group by
    ShipRegion
having
    COUNT(OrderID) > 100
order by
    RegionCount desc

## Lab Question 5


In [None]:
%%sql
select
    max(UnitPrice) as max_unit_price,
    min(UnitPrice) as min_unit_price,
    round(avg(UnitPrice), 2) as average_unit_price
from
    Products

## Lab Question 6

In [None]:
%%sql
select
    (od.UnitPrice * od.Quantity) as original_total,
    (od.UnitPrice * od.Quantity) - (od.UnitPrice * od.Quantity * od.Discount) as discounted_price,
    (od.UnitPrice * od.Quantity * od.Discount) as discount_amount,
    (od.Discount) as discount_percentage,
    c.CompanyName
from
    "Order Details" od
join
    Orders o on od.OrderID = o.OrderID
join
    Customers c on o.CustomerID = c.CustomerID
where
    c.CompanyName = 'Ernst Handel'
    and (od.UnitPrice * od.Quantity) > 4000;

## Lab Question 7

In [None]:
%%sql
select
    strftime('%Y', OrderDate) AS order_year,
    COUNT(OrderID) AS order_count
from
    Orders
group by
    order_year
order by
    order_year;

## Lab Question 8

In [None]:
%%sql
select
    e.EmployeeID,
    e.LastName,
    e.FirstName,
    count(o.OrderID) as order_count
from
    Employees e
join
    Orders o ON e.EmployeeID = o.EmployeeID
group by
    e.EmployeeID, e.LastName, e.FirstName
order by
    order_count desc
limit 5

## Lab Question 9

In [None]:
%%sql
select
    COUNT(*)
from
    sqlite_master
where
    type = 'table';

## Lab Question 10

In [None]:
%%sql
select count(*) as view_count
from sqlite_master
where type = 'view'

## Lab Question 11

In [None]:
%%sql
SELECT
    COUNT(*) AS left_count
FROM
    Customers c
LEFT JOIN
    Orders o ON c.CustomerID = o.CustomerID;

## Lab Question 12

In [None]:
%%sql
SELECT
    COUNT(*) AS inner_count
FROM
    Customers c
INNER JOIN
    Orders o ON c.CustomerID = o.CustomerID;

## Lab Question 13

In [None]:
%sql create table CustomerFavoriteOrders (CustomerID_id integer, Orders_id integer,Customers_score integer)

## Lab Question 14

In [None]:
%sql insert into CustomerFavoriteOrders values (1,1,10)
%sql insert into CustomerFavoriteOrders values (1,2,7.5)
%sql insert into CustomerFavoriteOrders values (1,3,7.5)
%sql insert into CustomerFavoriteOrders values (1,5,3.5)
%sql insert into CustomerFavoriteOrders values (1,9,5.5)

## Lab Question 15

In [None]:
%sql select * from CustomerFavoriteOrders

add additional sections as needed to complete the lab

In [None]:
# replace ###### with your file name
# make sure you have your google drive mounted.

!cp "/content/drive/MyDrive/Colab Notebooks/A1_Vu_Nguyen.ipynb" ./
!jupyter nbconvert --to html "A1_Vu_Nguyen.ipynb"