# Python and PostgreSQL project part IV: 
# Using SQLAlchemy to implement SELECT queries on imported CSV data.

# Overview:
## This notebook will iimplement various SELECT queries, such as for finding the average IGN scores for each year in the database, the average IGN scores for each game genre, etc.

# What is the average IGN score for the entire database/dataset?

## I.e., do a SELECT query to show the sample average of the dataset. The SQL code/syntax would be: <<< SELECT avg(score) FROM ign_ratings;

In [79]:
#import SQLAlchemy library 
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, String, MetaData, Float, Integer, ForeignKey, Boolean

from sqlalchemy import event, select, func

#specify the table so query can be done on it
ign_ratings = Table('ign_ratings', meta,
                      Column('ID', Integer, primary_key=True),
                      Column('score_phrase', String),
                      Column('title', String),
                      Column('url', String),
                      Column('platform', String),
                      Column('score', Float),
                      Column('genre', String),
                      Column('release_year', Integer),
                      Column('release_month', Integer),
                      Column('release_day', Integer),
                      Column('editors_choice_y', Integer),
                   )

#re-specify the password, host, and other info needed to access the database
db_string = "postgres://postgres:*******@localhost:5433/goods"
db_co = create_engine(db_string)

meta = MetaData(db_co)  

with db_co.connect() as conn: 
    # do SELECT statement, and print out its output
    select_statement = select([func.avg(ign_ratings.c.score)])
    result_set = conn.execute(select_statement)
    for r in result_set:
        print(r)   

(6.95037585910651,)


# Analysis of query: 
### The sample IGN score average is 6.95. The scale is out of a 10, so this score isn't particularly high or low. Also, a score of 7 for IGN (as with many other critic review sites/organizations) is considered fair or mediocre, so a sample average of this seems fairly sensible.

# What are the average IGN scores for each release year in the dataset/database?

# I.e., do a SELECT query to find the average IGN scores/ratings for each release year (1996-2016), and order consectively by release year:

In [42]:
#import SQLAlchemy library 
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, String, MetaData, Float, Integer, ForeignKey, Boolean

from sqlalchemy import event, select, func

#specify the table so query can be done on it
ign_ratings = Table('ign_ratings', meta,
                      Column('ID', Integer, primary_key=True),
                      Column('score_phrase', String),
                      Column('title', String),
                      Column('url', String),
                      Column('platform', String),
                      Column('score', Float),
                      Column('genre', String),
                      Column('release_year', Integer),
                      Column('release_month', Integer),
                      Column('release_day', Integer),
                      Column('editors_choice', Integer),
                   )

#re-specify the password, host, and other info needed to access the database
db_string = "postgres://postgres:*******@localhost:5433/goods"
db_co = create_engine(db_string)

meta = MetaData(db_co)  

with db_co.connect() as conn: 
    # do SELECT statement, and print out its output
    select_statement = select([ign_ratings.c.release_year, func.avg(ign_ratings.c.score)]).group_by(ign_ratings.c.release_year).order_by(ign_ratings.c.release_year)
    result_set = conn.execute(select_statement)
    for r in result_set:
        print(r)   

(1996, 6.16174496644295)
(1997, 6.56634146341464)
(1998, 6.9306784660767)
(1999, 6.98348387096775)
(2000, 6.69358851674641)
(2001, 7.12049418604651)
(2002, 6.97852564102564)
(2003, 7.19824561403509)
(2004, 7.17071213640923)
(2005, 7.22242647058822)
(2006, 6.7328642384106)
(2007, 6.63254658385092)
(2008, 6.4481462140992)
(2009, 6.84036751630112)
(2010, 6.93286867204695)
(2011, 7.05032618825722)
(2012, 7.4183266932271)
(2013, 7.46048850574714)
(2014, 7.46759776536314)
(2015, 7.65945205479453)
(2016, 7.57345132743363)


# Analysis of SELECT query results:

## Average video game ratings via IGN have fluctuated a bit over the 20-year period of 1996-2016. However, the most recent 5-year period of 2012-2016 (inclusive) has averaged higher than any other 5-year period within the database (the 2nd-highest 5-year inclusive period is 2001-2005).  

## Each of the years for 2012-2016 had scores exceeding 7.4, well above the overall sample average; the best overall year in terms of average IGN scores was in 2015, with 2016 not far behind it. 

## By contrast, average IGN scores were lowest during the 5-year (inclusive) period of 1996-2000. Not once did average IGN scores exceed 7, and 1996 is the lowest-rated year by a moderate margin.

# What are the average IGN scores for each genre in the database? How many games are there for each genre in the database?

# For the next SELECT query, find the average IGN scores/ratings for each genre (i.e., use a groupby on the average of IGN scores), and perform 2 additional operations:

## a.) order the results by genre in alphabetical order; and b.) count the number of games from each genre (to get a better sense of context).

# This query is equivalent to the following PostgreSQL query/code:

## <<<SELECT genre,  AVG(score), COUNT(genre) FROM ign_data GROUP BY genre ORDER BY genre ;

### This will show the average IGN score/rating for each genre, and count the number of games for each genre.

In [32]:
#import SQLAlchemy library 
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, String, MetaData, Float, Integer, ForeignKey, Boolean

from sqlalchemy import event, select, func

#specify the table so query can be done on it
ign_ratings = Table('ign_ratings', meta,
                      Column('ID', Integer, primary_key=True),
                      Column('score_phrase', String),
                      Column('title', String),
                      Column('url', String),
                      Column('platform', String),
                      Column('score', Float),
                      Column('genre', String),
                      Column('release_year', Integer),
                      Column('release_month', Integer),
                      Column('release_day', Integer),
                      Column('editors_choice', Integer),
                   )

#re-specify the password, host, and other info needed to access the database
db_string = "postgres://postgres:*******@localhost:5433/goods"
db_co = create_engine(db_string)

meta = MetaData(db_co)  

with db_co.connect() as conn: 
    # do SELECT statement, and print out its output
    select_statement = select([ign_ratings.c.genre, func.avg(ign_ratings.c.score), func.count(ign_ratings.c.genre)]).group_by(ign_ratings.c.genre).order_by(ign_ratings.c.genre)
    result_set = conn.execute(select_statement)
    for r in result_set:
        print(r)    

('Action', 6.62667895707138, 3797)
('Action, Adventure', 7.3718954248366, 765)
('Action, Compilation', 7.02921348314607, 89)
('Action, Editor', 7.5, 1)
('Action, Platformer', 6.06666666666667, 3)
('Action, Puzzle', 6.0, 1)
('Action, RPG', 7.46484848484848, 330)
('Action, Simulation', 7.059375, 32)
('Action, Strategy', 7.6, 1)
('Adult, Card', 6.05, 2)
('Adventure', 6.86485106382979, 1175)
('Adventure, Adult', 4.1, 1)
('Adventure, Adventure', 6.68, 5)
('Adventure, Compilation', 7.6, 11)
('Adventure, Episodic', 8.9, 4)
('Adventure, Platformer', 8.6, 1)
('Adventure, RPG', 7.83333333333333, 3)
('Baseball', 4.5, 1)
('Battle', 6.2125, 32)
('Board', 6.62586206896552, 116)
('Board, Compilation', 6.4, 7)
('Card', 6.70740740740741, 108)
('Card, Battle', 7.15740740740741, 54)
('Card, Compilation', 6.66666666666667, 3)
('Card, RPG', 7.88888888888889, 9)
('Casino', 5.12903225806452, 31)
('Compilation', 7.1, 54)
('Compilation, Compilation', 9.5, 1)
('Compilation, RPG', 8.2, 2)
('Educational', 5.955, 

# Analysis of results:

## Notice that relatively few game genres actually have average IGN scores exceeding 8, and none of them quite reach 9. Most game genres average a bit above 6 or 7 in terms of IGN scores, and several have scores of less than 4.

## In terms of the number of games per genre, Action is by far the highest, with over 3,700 games in the dataset (not even including the several sub-genres within Action, such as Action/Adventure). The next-most popular games are Sports and Shooters, each with over 1,500 each (even more, factoring in various sub-genres). Adventure games are also well-represented in the dataset, with over 1,000 games in the dataset, and similar genres such as Action/Adventure are also popular (765 ). . 

# Which games received IGN editors' choice awards? Do the average IGN scores for these games differ substantially with the overall sample average?

# Select all games' titles, genres, and release years for games that received IGN editors' choice awards:

## I.e., PostgreSQL SELECT query syntax would be: <<< SELECT title, genre, and release_year FROM ign_ratings WHERE editors_choice_y =1

In [24]:
#import SQLAlchemy library 
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, String, MetaData, Float, Integer, ForeignKey, Boolean

from sqlalchemy import event, select, func

#specify the table so query can be done on it
ign_ratings = Table('ign_ratings', meta,
                      Column('ID', Integer, primary_key=True),
                      Column('score_phrase', String),
                      Column('title', String),
                      Column('url', String),
                      Column('platform', String),
                      Column('score', Float),
                      Column('genre', String),
                      Column('release_year', Integer),
                      Column('release_month', Integer),
                      Column('release_day', Integer),
                      Column('editors_choice_y', Integer),
                   )

#re-specify the password, host, and other info needed to access the database
db_string = "postgres://postgres:*******@localhost:5433/goods"
db_co = create_engine(db_string)

meta = MetaData(db_co)  

with db_co.connect() as conn: 
    # do SELECT statement, and print out its output
    select_statement = select([ign_ratings.c.title, ign_ratings.c.genre, ign_ratings.c.release_year]).where(ign_ratings.c.editors_choice_y == 1)    
    result_set = conn.execute(select_statement)
    for r in result_set:
        print(r)   

('LittleBigPlanet PS Vita', 'Platformer', 2012)
('LittleBigPlanet PS Vita -- Marvel Super Hero Edition', 'Platformer', 2012)
('Guild Wars 2', 'RPG', 2012)
('Mark of the Ninja', 'Action, Adventure', 2012)
('Mark of the Ninja', 'Action, Adventure', 2012)
('Dark Souls (Prepare to Die Edition)', 'Action, RPG', 2012)
('Bastion', 'Action, RPG', 2012)
('The Walking Dead: The Game -- Episode 3: Long Road Ahead', 'Adventure', 2012)
('World of Warcraft: Mists of Pandaria', 'RPG', 2012)
('Pokemon White Version 2', 'RPG', 2012)
('Pokemon Black Version 2', 'RPG', 2012)
('The Walking Dead: The Game -- Episode 3: Long Road Ahead', 'Adventure', 2012)
('The Walking Dead: The Game -- Episode 3: Long Road Ahead', 'Adventure', 2012)
('The Walking Dead: The Game -- Episode 3: Long Road Ahead', 'Adventure', 2012)
('Rock Band Blitz', 'Music', 2012)
('Bad Piggies', 'Action', 2012)
('NBA 2K13', 'Sports', 2012)
('The World Ends with You: Solo Remix', 'RPG', 2012)
('The World Ends with You: Solo Remix', 'RPG', 2

('Zombie Infection', 'Action', 2009)
("Street Fighter IV (Collector's Edition)", 'Fighting', 2009)
("Street Fighter IV (Collector's Edition)", 'Fighting', 2009)
('Warhammer 40,000: Dawn of War II', 'Strategy', 2009)
('Life Force', 'Shooter', 2009)
('Sins of a Solar Empire: Entrenchment', 'Strategy', 2009)
('MLB 09: The Show', 'Sports', 2009)
("Resident Evil 5 (Collector's Edition)", 'Action', 2009)
('MadWorld', 'Action', 2009)
('Halo Wars (Limited Edition)', 'Strategy', 2009)
("Resident Evil 5 (Collector's Edition)", 'Action', 2009)
('Empire: Total War (Special Forces Edition)', 'Strategy', 2009)
('Resident Evil 5', 'Action', 2009)
('Resident Evil 5', 'Action', 2009)
('Ogre Battle: The March of the Black Queen', 'Strategy, RPG', 2009)
('Ogre Battle: The March of the Black Queen', 'Strategy', 2009)
('Henry Hatsworth in the Puzzling Adventure', 'Adventure', 2009)
('The Oregon Trail', 'Adventure', 2009)
('Valkyrie Profile: Covenant of the Plume', 'RPG', 2009)
('Resistance: Retribution', '

# Analysis of query:

## Notice that many of the games have scores >= 9, with the highest being a 9.7 (only 2 RPGs with this score as well as an IGN editors' coice award). However, quite a few also hover around a good, but less than outstanding, IGN score of 8 to 8.5. 

## However, we would likely wonder how many games are in the dataset that recieved IGN editors' choice ratings, and what are the average IGN scores for these games? Are games with editors' choice awards rated quite a bit higher than the sample as a whole?

# How many games in the entire database received an IGN editors' choice award?

In [51]:
#import SQLAlchemy library 
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, String, MetaData, Float, Integer, ForeignKey, Boolean

from sqlalchemy import event, select, func



#re-specify the password, host, and other info needed to access the database
db_string = "postgres://postgres:*******@localhost:5433/goods"
db_co = create_engine(db_string)

meta = MetaData(db_co)  

#specify the table so query can be done on it
ign_ratings = Table('ign_ratings', meta,
                      Column('ID', Integer, primary_key=True),
                      Column('score_phrase', String),
                      Column('title', String),
                      Column('url', String),
                      Column('platform', String),
                      Column('score', Float),
                      Column('genre', String),
                      Column('release_year', Integer),
                      Column('release_month', Integer),
                      Column('release_day', Integer),
                      Column('editors_choice_y', Integer),
                   )

with db_co.connect() as conn: 
    # do SELECT statement, and print out its output
    select_statement = select([func.count(ign_ratings.c.editors_choice_y)]).where(ign_ratings.c.editors_choice_y == 1)
    result_set = conn.execute(select_statement)
    for r in result_set:
        print(r)  

(3517,)


## Answer: 3,517 games in the dataset (roughly 19%) received an IGN editors' choice award. Therefore, IGN is somewhat selective in giving out these awards.

# However, to what extent do the average IGN scores for these award recipients differ from the dataset as a whole? How do these average IGN scores differ when grouped by genre and year of release, respectively?

# What is the average IGN rating for the games that received an IGN editors' choice award?

In [36]:
#import SQLAlchemy library 
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, String, MetaData, Float, Integer, ForeignKey, Boolean

from sqlalchemy import event, select, func


#specify the table so query can be done on it
ign_ratings = Table('ign_ratings', meta,
                      Column('ID', Integer, primary_key=True),
                      Column('score_phrase', String),
                      Column('title', String),
                      Column('url', String),
                      Column('platform', String),
                      Column('score', Float),
                      Column('genre', String),
                      Column('release_year', Integer),
                      Column('release_month', Integer),
                      Column('release_day', Integer),
                      Column('editors_choice_y', Integer),
                   )


#re-specify the password, host, and other info needed to access the database
db_string = "postgres://postgres:*******@localhost:5433/goods"
db_co = create_engine(db_string)

meta = MetaData(db_co) 




with db_co.connect() as conn: 
    # do SELECT statement, and print out its output
    select_statement = select([func.avg(ign_ratings.c.score)]).where(ign_ratings.c.editors_choice_y == 1)   
    result_set = conn.execute(select_statement)
    for r in result_set:
        print(r)   

(8.86858117713961,)


# How do games with editors choice awards differ with the sample as a whole in terms of genre and release date?

# Select all games' titles and release years for RPG games that received IGN editors' choice awards:

## I.e., PostgreSQL SELECT query syntax would be: <<< SELECT title, genre, and release_year FROM ign_ratings WHERE editors_choice_y =1 AND genre = 'RPG

In [27]:
#import SQLAlchemy library 
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, String, MetaData, Float, Integer, ForeignKey, Boolean

from sqlalchemy import and_, select, event, func

#specify the table so query can be done on it
ign_ratings = Table('ign_ratings', meta,
                      Column('ID', Integer, primary_key=True),
                      Column('score_phrase', String),
                      Column('title', String),
                      Column('url', String),
                      Column('platform', String),
                      Column('score', Float),
                      Column('genre', String),
                      Column('release_year', Integer),
                      Column('release_month', Integer),
                      Column('release_day', Integer),
                      Column('editors_choice_y', Integer),
                   )

#re-specify the password, host, and other info needed to access the database
db_string = "postgres://postgres:*******@localhost:5433/goods"
db_co = create_engine(db_string)

meta = MetaData(db_co)  

with db_co.connect() as conn: 
    # do SELECT statement, and print out its output
    select_statement = select([ign_ratings.c.title,ign_ratings.c.score, ign_ratings.c.release_year]).where(and_(ign_ratings.c.editors_choice_y == 1,
                                                                                            ign_ratings.c.genre == 'RPG'))   
    result_set = conn.execute(select_statement)
    for r in result_set:
        print(r)   

('Guild Wars 2', 9.0, 2012)
('World of Warcraft: Mists of Pandaria', 8.7, 2012)
('Pokemon White Version 2', 9.6, 2012)
('Pokemon Black Version 2', 9.6, 2012)
('The World Ends with You: Solo Remix', 9.5, 2012)
('The World Ends with You: Solo Remix', 9.5, 2012)
('Ni no Kuni: Wrath of the White Witch', 9.4, 2013)
('Persona 4 Golden', 9.3, 2012)
('Mass Effect', 9.0, 2012)
('Final Fantasy VII', 9.5, 1997)
('Xenogears', 9.5, 1998)
('Might and Magic VI: The Mandate of Heaven', 9.0, 1998)
('Fallout 2', 8.9, 1998)
("Baldur's Gate", 9.4, 1999)
('EverQuest', 8.4, 1999)
('Pokemon Blue Version', 10.0, 1999)
('Pokemon Red Version', 10.0, 1999)
('Darkstone', 9.0, 1999)
('Final Fantasy VIII', 9.0, 1999)
('System Shock 2', 9.0, 1999)
('Final Fantasy Anthology', 9.0, 1999)
('Suikoden II', 9.0, 1999)
('Grandia', 9.0, 1999)
('Harvest Moon 64', 8.2, 1999)
('Planescape: Torment', 9.2, 1999)
('Dragon Warrior Monsters', 9.0, 2000)
('Front Mission 3', 8.8, 2000)
('Vagrant Story', 9.6, 2000)
('EverQuest: Ruins 

## Notice the average of all games that received an IGN editors' choice award--8.87--is nearly 2 full points higher than that of the sample average. Therefore, these awards are given out to games that, at least on average, are rated as being of a higher quality.

# What is the average IGN score for each release year in the database (i.e., 1996-2016)?

## Implement SELECT AVG(score)...WHERE query to compute this average:

In [31]:
#import SQLAlchemy library 
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, String, MetaData, Float, Integer, ForeignKey, Boolean

from sqlalchemy import event, select, func

#specify the table so query can be done on it
ign_ratings = Table('ign_ratings', meta,
                      Column('ID', Integer, primary_key=True),
                      Column('score_phrase', String),
                      Column('title', String),
                      Column('url', String),
                      Column('platform', String),
                      Column('score', Float),
                      Column('genre', String),
                      Column('release_year', Integer),
                      Column('release_month', Integer),
                      Column('release_day', Integer),
                      Column('editors_choice_y', Integer),
                   )

#re-specify the password, host, and other info needed to access the database
db_string = "postgres://postgres:*******@localhost:5433/goods"
db_co = create_engine(db_string)

meta = MetaData(db_co)  

with db_co.connect() as conn: 
    # do SELECT statement, and print out its output
    select_statement = select([ign_ratings.c.release_year, func.avg(ign_ratings.c.score)]).where(ign_ratings.c.editors_choice_y == 1).group_by(ign_ratings.c.release_year).order_by(ign_ratings.c.release_year)   
    result_set = conn.execute(select_statement)
    for r in result_set:
        print(r) 

(1996, 9.1125)
(1997, 8.80625)
(1998, 8.95116279069767)
(1999, 9.003)
(2000, 9.00059880239522)
(2001, 8.98208092485549)
(2002, 8.89122807017544)
(2003, 8.87041198501873)
(2004, 8.76062992125984)
(2005, 8.7962962962963)
(2006, 8.69640718562874)
(2007, 8.71939655172415)
(2008, 8.81428571428571)
(2009, 8.83665480427046)
(2010, 8.72925925925926)
(2011, 8.75882352941177)
(2012, 8.96619047619047)
(2013, 9.15144927536231)
(2014, 9.21443298969072)
(2015, 9.09014084507042)
(2016, 9.36888888888889)


# Analysis of query:

## As expected, the results of this query differ quite a bit from the sample average IGN scores for each year. While there appears to be a somewhat similar trend compared with the sample average IGN scores over time, average IGN scores over time for games with editors' choice awards were actually quite high in the late 1990s. This suggests that--while games as a whole during the late 1990s were of a lower quality than games from most years of the 2000s--the highest-quality games from the late 1990s were actually very top-notch. 

## On the other hand, several of the highest average IGN ratings for games that received editors' choice awards ocurred during the period of 2012-2016, again suggesting that games during this period have been of a higher quality, whether judged as a whole or even among the highest-rated games.