# Lab 1 - Creating the SQL Tables

In this lab, use `sqlalchemy` to create, populate, and query a table from the baseball database, as well as for the `super_hero_powers.csv` table.  

In [1]:
import pandas as pd
artwork = pd.read_csv("./data/Artworks.csv")

## Part 1 - Baseball Managers

In this part of the lab, you will walk through the process of creating a manager table from [Lahman’s Baseball Database](http://www.seanlahman.com/baseball-archive/statistics/)

## Task 1 - Download, unzip, rename 

1. Download the baseball database linked above (save to desktop)
2. Unzip the file and rename to `baseball`
3. Load the `core/Managers.csv` file into a pandas `DataFrame` using `read_csv`
4. Inspect the `column` names and `dtypes`

In [2]:
managers = pd.read_csv("./databases/baseball/core/Managers.csv")

In [3]:
managers.head()

Unnamed: 0,playerID,yearID,teamID,lgID,inseason,G,W,L,rank,plyrMgr
0,wrighha01,1871,BS1,,1,31,20,10,3.0,Y
1,woodji01,1871,CH1,,1,28,19,9,2.0,Y
2,paborch01,1871,CL1,,1,29,10,19,8.0,Y
3,lennobi01,1871,FW1,,1,14,5,9,8.0,Y
4,deaneha01,1871,FW1,,2,5,2,3,8.0,Y


In [4]:
managers.columns

Index(['playerID', 'yearID', 'teamID', 'lgID', 'inseason', 'G', 'W', 'L',
       'rank', 'plyrMgr'],
      dtype='object')

In [5]:
managers.dtypes

playerID     object
yearID        int64
teamID       object
lgID         object
inseason      int64
G             int64
W             int64
L             int64
rank        float64
plyrMgr      object
dtype: object

#### Task 2 - Create a `sqlalchemy` types `dict`

In [6]:
from sqlalchemy import Integer, String
sql_types = {'playerID':String,
      'playerMgr':String,
      'teamID':String,
      'lgID':String,
      'yearID':Integer,
      'inseason':Integer,
      'G':Integer,
      'W':Integer,
      'L':Integer,
      'rank':Integer}

#### Task 4 - Create an `engine` and `schema`

In [7]:
!rm databases/baseball.db

'rm' is not recognized as an internal or external command,
operable program or batch file.


In [8]:
from sqlalchemy import create_engine
mang_eng = create_engine("sqlite:///databases/baseball.db")
mang_eng.echo = True
schema = pd.io.sql.get_schema(managers, 'manager', keys='playerID', con=mang_eng, dtype=sql_types)
print(schema)


CREATE TABLE manager (
	"playerID" VARCHAR NOT NULL, 
	"yearID" INTEGER, 
	"teamID" VARCHAR, 
	"lgID" VARCHAR, 
	inseason INTEGER, 
	"G" INTEGER, 
	"W" INTEGER, 
	"L" INTEGER, 
	rank INTEGER, 
	"plyrMgr" TEXT, 
	CONSTRAINT manager_pk PRIMARY KEY ("playerID")
)




#### Execute the 'schema'

In [9]:
mang_eng.execute(schema)

2019-01-26 21:24:01,294 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2019-01-26 21:24:01,295 INFO sqlalchemy.engine.base.Engine ()
2019-01-26 21:24:01,298 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2019-01-26 21:24:01,300 INFO sqlalchemy.engine.base.Engine ()
2019-01-26 21:24:01,303 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE manager (
	"playerID" VARCHAR NOT NULL, 
	"yearID" INTEGER, 
	"teamID" VARCHAR, 
	"lgID" VARCHAR, 
	inseason INTEGER, 
	"G" INTEGER, 
	"W" INTEGER, 
	"L" INTEGER, 
	rank INTEGER, 
	"plyrMgr" TEXT, 
	CONSTRAINT manager_pk PRIMARY KEY ("playerID")
)


2019-01-26 21:24:01,305 INFO sqlalchemy.engine.base.Engine ()
2019-01-26 21:24:01,308 INFO sqlalchemy.engine.base.Engine ROLLBACK


OperationalError: (sqlite3.OperationalError) table manager already exists [SQL: '\nCREATE TABLE manager (\n\t"playerID" VARCHAR NOT NULL, \n\t"yearID" INTEGER, \n\t"teamID" VARCHAR, \n\t"lgID" VARCHAR, \n\tinseason INTEGER, \n\t"G" INTEGER, \n\t"W" INTEGER, \n\t"L" INTEGER, \n\trank INTEGER, \n\t"plyrMgr" TEXT, \n\tCONSTRAINT manager_pk PRIMARY KEY ("playerID")\n)\n\n']

#### Task 5 - Use `to_sql` with `if_exists='append'` to insert the data

In [10]:
managers.to_sql('manager',
               con=mang_eng,
               dtype=sql_types,
               index=False,
               if_exists='append')

2019-01-26 21:24:03,746 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("manager")
2019-01-26 21:24:03,748 INFO sqlalchemy.engine.base.Engine ()
2019-01-26 21:24:03,754 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-01-26 21:24:03,803 INFO sqlalchemy.engine.base.Engine INSERT INTO manager ("playerID", "yearID", "teamID", "lgID", inseason, "G", "W", "L", rank, "plyrMgr") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2019-01-26 21:24:03,805 INFO sqlalchemy.engine.base.Engine (('wrighha01', 1871, 'BS1', None, 1, 31, 20, 10, 3.0, 'Y'), ('woodji01', 1871, 'CH1', None, 1, 28, 19, 9, 2.0, 'Y'), ('paborch01', 1871, 'CL1', None, 1, 29, 10, 19, 8.0, 'Y'), ('lennobi01', 1871, 'FW1', None, 1, 14, 5, 9, 8.0, 'Y'), ('deaneha01', 1871, 'FW1', None, 2, 5, 2, 3, 8.0, 'Y'), ('fergubo01', 1871, 'NY2', None, 1, 33, 16, 17, 5.0, 'Y'), ('mcbridi01', 1871, 'PH1', None, 1, 28, 21, 7, 1.0, 'Y'), ('hastisc01', 1871, 'RC1', None, 1, 25, 4, 21, 9.0, 'Y')  ... displaying 10 of 3469 total bound parameter set

IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: manager.playerID [SQL: 'INSERT INTO manager ("playerID", "yearID", "teamID", "lgID", inseason, "G", "W", "L", rank, "plyrMgr") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (('wrighha01', 1871, 'BS1', None, 1, 31, 20, 10, 3.0, 'Y'), ('woodji01', 1871, 'CH1', None, 1, 28, 19, 9, 2.0, 'Y'), ('paborch01', 1871, 'CL1', None, 1, 29, 10, 19, 8.0, 'Y'), ('lennobi01', 1871, 'FW1', None, 1, 14, 5, 9, 8.0, 'Y'), ('deaneha01', 1871, 'FW1', None, 2, 5, 2, 3, 8.0, 'Y'), ('fergubo01', 1871, 'NY2', None, 1, 33, 16, 17, 5.0, 'Y'), ('mcbridi01', 1871, 'PH1', None, 1, 28, 21, 7, 1.0, 'Y'), ('hastisc01', 1871, 'RC1', None, 1, 25, 4, 21, 9.0, 'Y')  ... displaying 10 of 3469 total bound parameter sets ...  ('bakerdu01', 2017, 'WAS', 'NL', 1, 160, 95, 65, 1.0, 'N'), ('speiech01', 2017, 'WAS', 'NL', 2, 2, 2, 0, 1.0, 'N'))]

#### Task 6 - Query the table to make sure it all worked

In [11]:
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import select

In [18]:
Base = automap_base()
Base.prepare(mang_eng, reflect=True)
Table = Base.classes.example_table
#Table2 - Base.metadata.tables['example_table']

2019-01-26 21:26:29,524 INFO sqlalchemy.engine.base.Engine SELECT name FROM sqlite_master WHERE type='table' ORDER BY name
2019-01-26 21:26:29,526 INFO sqlalchemy.engine.base.Engine ()
2019-01-26 21:26:29,531 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("manager")
2019-01-26 21:26:29,532 INFO sqlalchemy.engine.base.Engine ()
2019-01-26 21:26:29,536 INFO sqlalchemy.engine.base.Engine SELECT sql FROM  (SELECT * FROM sqlite_master UNION ALL   SELECT * FROM sqlite_temp_master) WHERE name = 'manager' AND type = 'table'
2019-01-26 21:26:29,537 INFO sqlalchemy.engine.base.Engine ()
2019-01-26 21:26:29,542 INFO sqlalchemy.engine.base.Engine PRAGMA foreign_key_list("manager")
2019-01-26 21:26:29,543 INFO sqlalchemy.engine.base.Engine ()
2019-01-26 21:26:29,545 INFO sqlalchemy.engine.base.Engine SELECT sql FROM  (SELECT * FROM sqlite_master UNION ALL   SELECT * FROM sqlite_temp_master) WHERE name = 'manager' AND type = 'table'
2019-01-26 21:26:29,547 INFO sqlalchemy.engine.base.Engine ()

AttributeError: managers

In [13]:
Session = sessionmaker(bind=mang_eng)
session = Session()

In [17]:
stmt = select('*').select_from(Table)
result = session.execute(stmt).fetchall()

ArgumentError: FROM expression expected

In [None]:
from more_sqlalchemy import result_dicts
result >> result_dicts

## Part 2 - Super Hero Powers

Now make a database and table for the super hero powers.

## Problem 1
    
**Task:** One the `super_hero_powers.csv` and verify that the contents of the columns are all Boolean.  In this problem, you need to

1. Create a `dict` that defines the `pandas` column type
2. Read the file in using a `pd.read_csv`.
3. Clean up all the column labels.
    
**Be sure to write clean code!**


## Problem 2
    
Now define an `sqlalchemy` table for these data using `pandas` `to_sql` dataframe method.  You can use the `sqlalchemy.String` and `sqlalchemy.Boolean` columns type, which are [documented here](https://docs.sqlalchemy.org/en/latest/core/type_basics.html)

## Problem 3
    
Now you need to make a new `engine`, `inspect` your database, and make a `session` to query the database.

## Problem 4
    
Perform `sqlalchemy` queries to answer each of the following questions.

1. How many heroes have both Super Strength and Super Speed?
2. How many heroes have names that start with the word *Black*
3. Are heroes with Agility more likely to have Stealth?
4. What fraction of all heroes that can fly also have Super Strength?
5. Consider heroes that have names that contain `"girl"`, `"boy"`, `"woman"`, or `"man"`.  Compute the following ratio

$$\frac{N(\text{boy or man})}{N(\text{girl or woman}}$$

**Hint:** You will need to use some combination of `where`, `group_by`, and `count` for each part.

## Problem 5

Tell me another cool fact about the super powers.