# More Practice with Movies

In this assignment I introduce a third table in the movie database, the release_date table.  Thus in the postgresql database we now have three tables:

* moviecast
* title
* release_date

The tables have the following schema:

```
 Table "public.title"
 Column |  Type  | Modifiers 
--------+--------+-----------
 index  | bigint | 
 title  | text   | 
 year   | bigint | 


  Table "public.moviecast"
  Column   |       Type       | Modifiers 
-----------+------------------+-----------
 index     | bigint           | 
 title     | text             | 
 year      | bigint           | 
 name      | text             | 
 type      | text             | 
 character | text             | 
 n         | double precision | 



 Table "public.release_date"
 Column  |  Type  | Modifiers 
---------+--------+-----------
 index   | bigint | 
 title   | text   | 
 year    | bigint | 
 country | text   | 
 date    | date   | 
 month   | bigint |
 day     | bigint |
 dow     | bigint |
```


### Hints

* Warning, some of these queries can create very large cartesian products!   You may want to use query to help reduce the size of both relations.
* The ``njoin`` operator is more efficient with memory than cartesian product.  You should prefer njoin where you can.


In [1]:
from reframe import Relation
import warnings
warnings.filterwarnings('ignore')
#from sols2 import *

moviecast = Relation('/home/faculty/millbr02/pub/cast.csv',sep=',')
title = Relation('/home/faculty/millbr02/pub/titles.csv',sep=',')
release_date = Relation('/home/faculty/millbr02/pub/release_dates.csv',sep=',')

In [2]:
%load_ext sql


In [3]:
%sql postgresql://osmaah02:@localhost/movies

'Connected: osmaah02@movies'

## 1.  How many movies were released in each country in the year 2014?

Your result should be a table that contains the columns country and year

In [4]:
release_date.query("year == 2014").groupby("country").count("year").rename("count_year", "year")

Unnamed: 0,country,year
0,Afghanistan,3
1,Albania,30
2,Algeria,3
3,Andorra,1
4,Angola,4
5,Antigua and Barbuda,1
6,Argentina,216
7,Armenia,16
8,Aruba,26
9,Australia,230


In [5]:
%%sql

SELECT country, COUNT(year) FROM release_date
WHERE year = 2014 GROUP BY country;

152 rows affected.


country,count
Costa Rica,19
Cambodia,79
Turkey,316
Cyprus,57
Samoa,1
Slovenia,113
Vietnam,143
Kuwait,346
Jamaica,24
Antigua and Barbuda,1


## 2.  Show all of the countries in alphabetical order that released a movie starring clint eastwood after 1975

Your answer should be a single column with the names of the countries.  The country names should not be duplicated.

In [6]:
#n = 1 for a star

In [7]:
#release_date.njoin(moviecast.query("name == 'Clint Eastwood' and n == 1 and year > 1975")).\
#project(["country"]).sort("country")
release_date.njoin(moviecast.query("name == 'Brad Pitt' and n == 1 and year < 2000")).\
project(["country"]).sort("country")

Unnamed: 0,country
12,Argentina
14,Australia
100,Austria
80,Belgium
5,Brazil
168,Bulgaria
49,Canada
23,Czech Republic
6,Denmark
124,Estonia


In [8]:
%%sql

SELECT DISTINCT country FROM release_date NATURAL JOIN moviecast 
WHERE name = 'Clint Eastwood' AND n = 1 AND year > 1975 ORDER BY country;

65 rows affected.


country
Argentina
Australia
Austria
Belgium
Bolivia
Brazil
Bulgaria
Canada
Chile
Colombia


## 3.  Show the name of the lead actor/actress and title of the movie where the movie was released on a sunday in the USA during 2014.

In [9]:
#sunday = 0

In [10]:
moviecast.query("n == 1").njoin(release_date.query("year == 2015 and country == 'USA' and dow == 0")).\
project(["name", "title"]).sort(["title", "name"])

Unnamed: 0,name,title
27,Krystal Pixie Adams,$kumbagz
26,Revon Yousif,2101
6,Nicholas Ryan Gibbs,A Horse Called Bear
9,Josh Hamilton,Amok
25,Reid Warner,Area 51
35,Numa Perrier,Beautiful Destroyer
17,Jeff (VI) Newton,Becoming Jody II: The Coveting
3,Andrew (II) Cheney,Beyond the Mask
37,Lois Robbins,Blowtorch
15,Richard (V) Moss,Breaking Bad Clayton's Diary


In [11]:
%%sql

SELECT DISTINCT name, title FROM moviecast NATURAL JOIN release_date 
WHERE n = 1 AND year = 2014 AND dow = 0 AND country = 'USA'
ORDER BY name, title;

42 rows affected.


name,title
April Hollingsworth,Prosper
Ashley (VII) James,Rebound (III)
Barry Corbin,Mountain Top
Brandon Jacobs,The Grievance Group
Cabrina Collesides,The Grievance Group
Cameron Bender,Find Me
Christopher (VI) Hunt,Right to Believe
Daniel (II) Gilchrist,Ghost in da hood
Dean Cain,Holiday Miracle
Donnie Yen,Xi you ji: Da nao tian gong


## 4. Show the top 10 actors/actresses that have starred in the most movies released in the USA in the christmas season since (and including) the year 2010

Lets define the Christmas season as the months of November and December.

In [12]:
release_date.query("(month==6 | month==7 | month==8) & country=='Germany' & year>2000").\
njoin(moviecast.query("n==1")).\
groupby(['name']).count("title").sort(['count_title','name'],ascending=False).\
project(['name','count_title']).head(10)

Unnamed: 0,name,count_title
184,Dan Castellaneta,21
399,John Cleese,17
206,Denis Lavant,11
579,Michael Herbig,9
540,Mark Wahlberg,6
109,Brad Pitt,6
7,Adam Sandler,6
847,Will Smith,5
735,Sandra Bullock,5
723,Ryan Reynolds,5


In [13]:
%%sql

SELECT name FROM moviecast NATURAL JOIN release_date 
WHERE n = 1 AND month >= 6 AND month <= 8 AND year >= 2010 AND country = 'Germany'
GROUP BY name ORDER BY COUNT(title) DESC LIMIT 10;

10 rows affected.


name
John Cleese
Denis Lavant
Mark Wahlberg
Adam Sandler
Henry Cavill
Ryan Gosling
Sylvester Stallone
Paul Rudd
Ryan Reynolds
Dermot Magennis


## 5.  Show the title of all of the movies that were released in Germany before they were released in the USA in the year 2014.

In [14]:
#CARTESIAN PRODUCT TO BE USED HERE

In [15]:
release_date.query("(country == 'Germany' or country == 'USA') and year == 2014").\
cartesian_product(release_date.query("(country == 'Germany' or country == 'USA') and year == 2014")).\
query("country_x == 'Germany' and title_x == title_y and date_x < date_y").rename("title_x", "title").\
project(["title"])

Unnamed: 0,title
42726,300: Rise of an Empire
83742,A Little Chaos
99123,A Million Ways to Die in the West
357182,Big Game
370854,Billy Elliot the Musical Live
396489,Blended
430669,Boyhood
488775,Captain America: The Winter Soldier
540045,Clouds of Sils Maria
685310,Der 7bte Zwerg


In [16]:
%%sql

SELECT A.title
FROM release_date A, release_date B
WHERE A.country = 'Germany' 
  AND B.country = 'USA' 
  AND A.date < B.date 
  AND A.title = B.title 
  AND A.year = 2014 
  AND B.year = 2014;

59 rows affected.


title
Neighbors
Night at the Museum: Secret of the Tomb
Northmen - A Viking Saga
Paddington
Paranormal Activity: The Marked Ones
Phoenix (II)
Praia do Futuro
Rio 2
RoboCop
Serena


In [17]:
%%sql


SELECT name FROM moviecast natural join
        (SELECT title FROM release_date WHERE year = 2010
            EXCEPT
        SELECT title FROM release_date WHERE country = 'USA') AS A

WHERE n > 1
LIMIT 10

10 rows affected.


name
Domenico Cavallo
Peppe Cavallo
Santo Cavallo
Isidoro Chiera
Iolanda Manno
Cesare Ritorito
Bruno Timpano
Nazareno Timpano
Artemio Vallone
Geoff Bell
