# Discussion 06 Notebook

This notebook is an accompaniment to the associated discussion worksheet handout.

# Section I Pivoting and Unpivoting

## Question 1

_Pivoting._ Write a Pandas expression to reshape the following table such that its columns are the 3
types of metals in the `CType` column, and whose values are taken from the `USD` column. Also write the
resulting table.

In [1]:
import pandas as pd
data_q1 = {
    'Item': ['Item0', 'Item0', 'Item1', 'Item1'],
    'CType': ['Gold', 'Bronze', 'Gold', 'Silver'],
    'USD': ['1$', '2$', '3$', '4$'],
    'EUR': ['1€', '2€', '3€', '4€']
}
df_q1  = pd.DataFrame(data_q1)
df_q1

Unnamed: 0,Item,CType,USD,EUR
0,Item0,Gold,1$,1€
1,Item0,Bronze,2$,2€
2,Item1,Gold,3$,3€
3,Item1,Silver,4$,4€


In [2]:
# Write your code here

q1_result = ...
q1_result

Ellipsis

## Question 2

_Unpivoting._ Write a Pandas expression to reshape the following table from a _wide_ shape into a long shape, using the student and school columns as identifiers, and the rest of the subjects as variables.
Also write the resulting table. Hint: `use df.melt()`.

In [3]:
data_q2 = {
    'student': ['Andy', 'Bernie', 'Cindy', 'Deb'],
    'school': ['Z', 'Y', 'Z', 'Y'],
    'english': [90, 100, 60, 150],
    'math': [60, 5, 99, 4],
    'physics': [80, 4, 100, 5]
}

df_q2 = pd.DataFrame(data_q2)
df_q2

Unnamed: 0,student,school,english,math,physics
0,Andy,Z,90,60,80
1,Bernie,Y,100,5,4
2,Cindy,Z,60,99,100
3,Deb,Y,150,4,5


In [4]:
# Write your code here
q2_result = ...
q2_result

Ellipsis

# Section II Window Functions

## Load in the IMDB Performance database

This is a variation of the IMDB database with keys defined. Note that this is a pretty big database! So if you run the below lines, please also remember to delete the `imdb_perf_lecture` afterwards to save space on your limited postgreSQL server.

If you didn't load it in with a previous lecture, load in the `imdb_perf_lecture` database.

In [5]:
!ln -sf ../../proj/proj1/data .
!unzip -u data/imdbdb.zip -d data/

Archive:  data/imdbdb.zip


In [6]:
!psql postgresql://jovyan@127.0.0.1:5432/imdb -c 'SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE datname = current_database()  AND pid <> pg_backend_pid();'
!psql postgresql://jovyan@127.0.0.1:5432/postgres -c 'DROP DATABASE IF EXISTS imdb'
!psql postgresql://jovyan@127.0.0.1:5432/postgres -c 'CREATE DATABASE imdb'
!psql postgresql://jovyan@127.0.0.1:5432/imdb -f data/imdbdb.sql -q

 pg_terminate_backend 
----------------------
(0 rows)

DROP DATABASE
CREATE DATABASE
 set_config 
------------
 
(1 row)



In [7]:
%reload_ext sql

In [8]:
%sql postgresql://jovyan@127.0.0.1:5432/imdb

## Question 3
In Project 1, we worked with a movie sample table to explore gross earnings for each movie and stored
the results in a view called movie gross. Let’s explore how to leverage window functions to compute
the average earnings per genre across movies. Following the view definition, write a `SELECT` query to return the rows for the movie "Mr. & Mrs. Smith" ordered by genre alphabetically.

Create a new view called `movie_avg_genre` with the following columns: `movie_id`, `title`, `gross`, `genre`, `avg_gross_genre`.
- `gross`: The gross earnings of the given movie
- `avg_gross_genre`: The average gross earnings of movies with the same genre as the given movie. If a movie belongs to multiple genres, it should appear in multiple rows, with each row corresponding to a different genre. <br/>
Hint: Check out the movie genre info type, with `info_type_id = 3` from `movie_info_sample`.

In [9]:
%%sql
-- Check out the movie genre info type -- 
SELECT * 
FROM movie_info_sample
WHERE info_type_id = 3
ORDER BY RANDOM()
LIMIT 5;

id,movie_id,info_type_id,info
6662599,2314214,3,Music
5781304,1740842,3,Thriller
6929777,2466888,3,Action
5743387,1716717,3,Comedy
6209983,2018552,3,Short


In [10]:
%%sql
-- Check out movie_sample -- 
SELECT * 
FROM movie_sample
ORDER BY RANDOM()
LIMIT 5;

id,title,production_year
2421468,The Trip,2012
2340826,Thaskara Lahala,2010
2371539,The Garden,2010
2472411,Val dood!,2009
2421957,The Turing Enigma,2011


In [11]:
%%sql
-- How we created movie_gross in Project 1, Question 3a --
DROP VIEW IF EXISTS movie_gross;
CREATE VIEW movie_gross AS

WITH cleaned as (
    SELECT
        CAST(REGEXP_REPLACE(SUBSTRING(info, '[0-9,]+'), ',', '', 'g') AS float) AS gross,
        movie_id,
        title
    FROM
        movie_info_sample,
        movie_sample
    WHERE
        movie_id = movie_sample.id AND
        info_type_id = 107 AND
        info LIKE '%(USA)%' and info LIKE '$%'
)

SELECT
    max(gross) AS gross,
    movie_id,
    title
FROM
    cleaned
GROUP BY
    movie_id,
    title
ORDER BY gross DESC;

SELECT * FROM movie_gross
LIMIT 5

gross,movie_id,title
760507625.0,1704289,Avatar
658672302.0,2438179,Titanic
623357910.0,2346436,The Avengers
534858444.0,2360583,The Dark Knight
460935665.0,2310522,Star Wars


In [12]:
%%sql
DROP VIEW IF EXISTS movie_avg_genre;
CREATE VIEW movie_avg_genre AS 
-- Write your code here --
...

SELECT * FROM movie_avg_genre
WHERE title = 'Mr. & Mrs. Smith'
ORDER BY genre;

movie_id,title,gross,genre,average_genre
2132092,Mr. & Mrs. Smith,186336103.0,Action,42123826.131625965
2132092,Mr. & Mrs. Smith,186336103.0,Comedy,21583843.81801513
2132092,Mr. & Mrs. Smith,186336103.0,Romance,18470817.08139977


# Section IV [Optional] 

Consider the following table:

In [13]:
data_iv = {
    "Year": [2019, 2019, 2020, 2020, 2020, 2021, 2022, 2022],
    "College": ["L&S", "CNR", "Haas", "COE", "COC", "CNR", "L&S", "COE"],
    "Student Name": ["Lakshya", "Shadaj", "Cathy", "Evelyn", "Luke", "Shreya", "Audrey", "Alan"]
}
df_iv = pd.DataFrame(data_iv)
df_iv

Unnamed: 0,Year,College,Student Name
0,2019,L&S,Lakshya
1,2019,CNR,Shadaj
2,2020,Haas,Cathy
3,2020,COE,Evelyn
4,2020,COC,Luke
5,2021,CNR,Shreya
6,2022,L&S,Audrey
7,2022,COE,Alan


## Question 9
Transform the above relation to a matrix using college as columns and year as rows.

* Transforming __Matrix into Relation__ using __melt()__
* Transforming __Relation Matrix__ using __pivot_table()__ / __pivot()__

In [15]:
# Write your code here
pivot_table_q9 = ...
pivot_table_q9

Ellipsis