# Lab 1 - Creating the SQL Tables

In this lab, use `sqlalchemy` to create, populate, and query a table from the baseball database, as well as for the `super_hero_powers.csv` table.  

In [1]:
import pandas as pd
artwork = pd.read_csv("./data/Artworks.csv")
import sqlalchemy
sqlalchemy.__version__

'1.2.7'

In [2]:
pd.__version__

'0.24.0'

## Part 1 - Baseball Managers

In this part of the lab, you will walk through the process of creating a manager table from [Lahman’s Baseball Database](http://www.seanlahman.com/baseball-archive/statistics/)

## Task 1 - Download, unzip, rename 

1. Download the baseball database linked above (save to desktop)
2. Unzip the file and rename to `baseball`
3. Load the `core/Managers.csv` file into a pandas `DataFrame` using `read_csv`
4. Inspect the `column` names and `dtypes`

In [3]:
import pandas as pd
managers = pd.read_csv('~/Desktop/baseball/core/Managers.csv')
managers.head()

Unnamed: 0,playerID,yearID,teamID,lgID,inseason,G,W,L,rank,plyrMgr
0,wrighha01,1871,BS1,,1,31,20,10,3.0,Y
1,woodji01,1871,CH1,,1,28,19,9,2.0,Y
2,paborch01,1871,CL1,,1,29,10,19,8.0,Y
3,lennobi01,1871,FW1,,1,14,5,9,8.0,Y
4,deaneha01,1871,FW1,,2,5,2,3,8.0,Y


In [4]:
managers.columns

Index(['playerID', 'yearID', 'teamID', 'lgID', 'inseason', 'G', 'W', 'L',
       'rank', 'plyrMgr'],
      dtype='object')

**Question:** Is there a candidate for a primary key?

In [5]:
[(col, managers[col].is_unique) for col in managers]

[('playerID', False),
 ('yearID', False),
 ('teamID', False),
 ('lgID', False),
 ('inseason', False),
 ('G', False),
 ('W', False),
 ('L', False),
 ('rank', False),
 ('plyrMgr', False)]

**Solution:** Add the `index` as an actual column

In [6]:
from dfply import mutate
managers = (managers >>
            mutate(id = managers.index))

In [7]:
managers.id.is_unique

True

In [8]:
managers.columns

Index(['playerID', 'yearID', 'teamID', 'lgID', 'inseason', 'G', 'W', 'L',
       'rank', 'plyrMgr', 'id'],
      dtype='object')

In [9]:
managers.dtypes

playerID     object
yearID        int64
teamID       object
lgID         object
inseason      int64
G             int64
W             int64
L             int64
rank        float64
plyrMgr      object
id            int64
dtype: object

In [10]:
managers.shape

(3469, 11)

In [11]:
managers.head()

Unnamed: 0,playerID,yearID,teamID,lgID,inseason,G,W,L,rank,plyrMgr,id
0,wrighha01,1871,BS1,,1,31,20,10,3.0,Y,0
1,woodji01,1871,CH1,,1,28,19,9,2.0,Y,1
2,paborch01,1871,CL1,,1,29,10,19,8.0,Y,2
3,lennobi01,1871,FW1,,1,14,5,9,8.0,Y,3
4,deaneha01,1871,FW1,,2,5,2,3,8.0,Y,4


#### Task 2 - Create a `sqlalchemy` types `dict`

In [12]:
from sqlalchemy import String, Integer
sql_types = {'id':Integer,
             'playerID':String, 
             'plyrMgr':String,
             'teamID':String, 
             'lgID':String, 
             'yearID':Integer, 
             'inseason':Integer, 
             'G':Integer, 
             'W':Integer, 
             'L':Integer,
             'rank':Integer}

#### Task 4 - Create an `engine` and `schema`

In [11]:
!rm databases/baseball.db

rm: cannot remove 'databases/baseball.db': No such file or directory


In [13]:
from sqlalchemy import create_engine
mang_eng = create_engine("sqlite:///databases/baseball.db")
mang_eng.echo = True
schema = pd.io.sql.get_schema(managers, 'manager', keys='id', con=mang_eng, dtype=sql_types)
print(schema)


CREATE TABLE manager (
	"playerID" VARCHAR, 
	"yearID" INTEGER, 
	"teamID" VARCHAR, 
	"lgID" VARCHAR, 
	inseason INTEGER, 
	"G" INTEGER, 
	"W" INTEGER, 
	"L" INTEGER, 
	rank INTEGER, 
	"plyrMgr" VARCHAR, 
	id INTEGER NOT NULL, 
	CONSTRAINT manager_pk PRIMARY KEY (id)
)




#### Execute the `schema`

In [14]:
mang_eng.execute(schema)

2019-01-30 12:30:48,988 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2019-01-30 12:30:48,989 INFO sqlalchemy.engine.base.Engine ()
2019-01-30 12:30:48,992 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2019-01-30 12:30:48,993 INFO sqlalchemy.engine.base.Engine ()
2019-01-30 12:30:48,995 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE manager (
	"playerID" VARCHAR, 
	"yearID" INTEGER, 
	"teamID" VARCHAR, 
	"lgID" VARCHAR, 
	inseason INTEGER, 
	"G" INTEGER, 
	"W" INTEGER, 
	"L" INTEGER, 
	rank INTEGER, 
	"plyrMgr" VARCHAR, 
	id INTEGER NOT NULL, 
	CONSTRAINT manager_pk PRIMARY KEY (id)
)


2019-01-30 12:30:48,996 INFO sqlalchemy.engine.base.Engine ()
2019-01-30 12:30:49,017 INFO sqlalchemy.engine.base.Engine COMMIT


<sqlalchemy.engine.result.ResultProxy at 0x1fb52b18630>

In [15]:
managers.shape

(3469, 11)

#### Task 5 - Use `to_sql` with `if_exists='append'` to insert the data

In [16]:
managers.to_sql('manager', 
                con=mang_eng, 
                dtype=sql_types, 
                index=False,
                if_exists='append')

2019-01-30 12:30:54,238 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("manager")
2019-01-30 12:30:54,239 INFO sqlalchemy.engine.base.Engine ()
2019-01-30 12:30:54,246 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-01-30 12:30:54,377 INFO sqlalchemy.engine.base.Engine INSERT INTO manager ("playerID", "yearID", "teamID", "lgID", inseason, "G", "W", "L", rank, "plyrMgr", id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2019-01-30 12:30:54,379 INFO sqlalchemy.engine.base.Engine (('wrighha01', 1871, 'BS1', None, 1, 31, 20, 10, 3.0, 'Y', 0), ('woodji01', 1871, 'CH1', None, 1, 28, 19, 9, 2.0, 'Y', 1), ('paborch01', 1871, 'CL1', None, 1, 29, 10, 19, 8.0, 'Y', 2), ('lennobi01', 1871, 'FW1', None, 1, 14, 5, 9, 8.0, 'Y', 3), ('deaneha01', 1871, 'FW1', None, 2, 5, 2, 3, 8.0, 'Y', 4), ('fergubo01', 1871, 'NY2', None, 1, 33, 16, 17, 5.0, 'Y', 5), ('mcbridi01', 1871, 'PH1', None, 1, 28, 21, 7, 1.0, 'Y', 6), ('hastisc01', 1871, 'RC1', None, 1, 25, 4, 21, 9.0, 'Y', 7)  ... displaying 10 of

#### Task 6 - Query the table to make sure it all worked

In [17]:
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import select

mang_eng2 = create_engine("sqlite:///databases/baseball.db") 
Session = sessionmaker(mang_eng)
session = Session()

In [18]:
Base = automap_base()
Base.prepare(mang_eng2, reflect=True)
Manager = Base.classes.manager

In [19]:
from more_sqlalchemy import result_dicts
stmt = select('*').select_from(Manager)
session.execute(stmt).fetchmany(5) >> result_dicts

2019-01-30 12:31:00,422 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-01-30 12:31:00,424 INFO sqlalchemy.engine.base.Engine SELECT * 
FROM manager
2019-01-30 12:31:00,427 INFO sqlalchemy.engine.base.Engine ()


[{'playerID': 'wrighha01',
  'yearID': 1871,
  'teamID': 'BS1',
  'lgID': None,
  'inseason': 1,
  'G': 31,
  'W': 20,
  'L': 10,
  'rank': 3,
  'plyrMgr': 'Y',
  'id': 0},
 {'playerID': 'woodji01',
  'yearID': 1871,
  'teamID': 'CH1',
  'lgID': None,
  'inseason': 1,
  'G': 28,
  'W': 19,
  'L': 9,
  'rank': 2,
  'plyrMgr': 'Y',
  'id': 1},
 {'playerID': 'paborch01',
  'yearID': 1871,
  'teamID': 'CL1',
  'lgID': None,
  'inseason': 1,
  'G': 29,
  'W': 10,
  'L': 19,
  'rank': 8,
  'plyrMgr': 'Y',
  'id': 2},
 {'playerID': 'lennobi01',
  'yearID': 1871,
  'teamID': 'FW1',
  'lgID': None,
  'inseason': 1,
  'G': 14,
  'W': 5,
  'L': 9,
  'rank': 8,
  'plyrMgr': 'Y',
  'id': 3},
 {'playerID': 'deaneha01',
  'yearID': 1871,
  'teamID': 'FW1',
  'lgID': None,
  'inseason': 2,
  'G': 5,
  'W': 2,
  'L': 3,
  'rank': 8,
  'plyrMgr': 'Y',
  'id': 4}]

## Part 2 - Awards for Managers

Now add a table for the `AwardsManagers.csv` table.

In [20]:
Awards = pd.read_csv('~/Desktop/baseball/core/AwardsManagers.csv')
Awards.head()

Unnamed: 0,playerID,awardID,yearID,lgID,tie,notes
0,larusto01,BBWAA Manager of the Year,1983,AL,,
1,lasorto01,BBWAA Manager of the Year,1983,NL,,
2,andersp01,BBWAA Manager of the Year,1984,AL,,
3,freyji99,BBWAA Manager of the Year,1984,NL,,
4,coxbo01,BBWAA Manager of the Year,1985,AL,,


In [21]:
Awards.columns

Index(['playerID', 'awardID', 'yearID', 'lgID', 'tie', 'notes'], dtype='object')

In [22]:
[(col, Awards[col].is_unique) for col in Awards]

[('playerID', False),
 ('awardID', False),
 ('yearID', False),
 ('lgID', False),
 ('tie', False),
 ('notes', False)]

In [23]:
Awards.shape

(179, 6)

In [24]:
Awards.head()

Unnamed: 0,playerID,awardID,yearID,lgID,tie,notes
0,larusto01,BBWAA Manager of the Year,1983,AL,,
1,lasorto01,BBWAA Manager of the Year,1983,NL,,
2,andersp01,BBWAA Manager of the Year,1984,AL,,
3,freyji99,BBWAA Manager of the Year,1984,NL,,
4,coxbo01,BBWAA Manager of the Year,1985,AL,,


In [25]:
from dfply import mutate
Awards = (Awards >>
            mutate(id = Awards.index))

In [26]:
Awards.id.is_unique

True

In [27]:
Awards.dtypes

playerID    object
awardID     object
yearID       int64
lgID        object
tie         object
notes       object
id           int64
dtype: object

In [28]:
from sqlalchemy import String, Integer
sql_types = {'id':Integer,
             'playerID':String, 
             'awardID':String, 
             'lgID':String, 
             'yearID':Integer, 
             'LgID':String, 
             'tie':String, 
             'notes':String, 
             'id':Integer}

In [29]:
from sqlalchemy import create_engine
schema = pd.io.sql.get_schema(Awards, 'awards', keys='id', con=mang_eng, dtype=sql_types)
print(schema)


CREATE TABLE awards (
	"playerID" VARCHAR, 
	"awardID" VARCHAR, 
	"yearID" INTEGER, 
	"lgID" VARCHAR, 
	tie VARCHAR, 
	notes VARCHAR, 
	id INTEGER NOT NULL, 
	CONSTRAINT awards_pk PRIMARY KEY (id)
)




In [30]:
mang_eng.execute(schema)

2019-01-30 12:31:41,935 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE awards (
	"playerID" VARCHAR, 
	"awardID" VARCHAR, 
	"yearID" INTEGER, 
	"lgID" VARCHAR, 
	tie VARCHAR, 
	notes VARCHAR, 
	id INTEGER NOT NULL, 
	CONSTRAINT awards_pk PRIMARY KEY (id)
)


2019-01-30 12:31:41,936 INFO sqlalchemy.engine.base.Engine ()
2019-01-30 12:31:41,958 INFO sqlalchemy.engine.base.Engine COMMIT


<sqlalchemy.engine.result.ResultProxy at 0x1fb52b184e0>

In [31]:
Awards.shape

(179, 7)

In [32]:
Awards.to_sql('awards', 
                con=mang_eng, 
                dtype=sql_types, 
                index=False,
                if_exists='append')

2019-01-30 12:36:57,858 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("awards")
2019-01-30 12:36:57,860 INFO sqlalchemy.engine.base.Engine ()
2019-01-30 12:36:57,863 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-01-30 12:36:57,868 INFO sqlalchemy.engine.base.Engine INSERT INTO awards ("playerID", "awardID", "yearID", "lgID", tie, notes, id) VALUES (?, ?, ?, ?, ?, ?, ?)
2019-01-30 12:36:57,869 INFO sqlalchemy.engine.base.Engine (('larusto01', 'BBWAA Manager of the Year', 1983, 'AL', None, None, 0), ('lasorto01', 'BBWAA Manager of the Year', 1983, 'NL', None, None, 1), ('andersp01', 'BBWAA Manager of the Year', 1984, 'AL', None, None, 2), ('freyji99', 'BBWAA Manager of the Year', 1984, 'NL', None, None, 3), ('coxbo01', 'BBWAA Manager of the Year', 1985, 'AL', None, None, 4), ('herzowh01', 'BBWAA Manager of the Year', 1985, 'NL', None, None, 5), ('mcnamjo99', 'BBWAA Manager of the Year', 1986, 'AL', None, None, 6), ('lanieha01', 'BBWAA Manager of the Year', 1986, 'NL', N

In [33]:
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import select

mang_eng2 = create_engine("sqlite:///databases/baseball.db") 
Session = sessionmaker(mang_eng)
session = Session()

In [35]:
Base = automap_base()
Base.prepare(mang_eng2, reflect=True)
Award = Base.classes.awards

In [36]:
from more_sqlalchemy import result_dicts
stmt = select('*').select_from(Award)
session.execute(stmt).fetchmany(5) >> result_dicts

2019-01-30 12:38:11,054 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-01-30 12:38:11,056 INFO sqlalchemy.engine.base.Engine SELECT * 
FROM awards
2019-01-30 12:38:11,057 INFO sqlalchemy.engine.base.Engine ()


[{'playerID': 'larusto01',
  'awardID': 'BBWAA Manager of the Year',
  'yearID': 1983,
  'lgID': 'AL',
  'tie': None,
  'notes': None,
  'id': 0},
 {'playerID': 'lasorto01',
  'awardID': 'BBWAA Manager of the Year',
  'yearID': 1983,
  'lgID': 'NL',
  'tie': None,
  'notes': None,
  'id': 1},
 {'playerID': 'andersp01',
  'awardID': 'BBWAA Manager of the Year',
  'yearID': 1984,
  'lgID': 'AL',
  'tie': None,
  'notes': None,
  'id': 2},
 {'playerID': 'freyji99',
  'awardID': 'BBWAA Manager of the Year',
  'yearID': 1984,
  'lgID': 'NL',
  'tie': None,
  'notes': None,
  'id': 3},
 {'playerID': 'coxbo01',
  'awardID': 'BBWAA Manager of the Year',
  'yearID': 1985,
  'lgID': 'AL',
  'tie': None,
  'notes': None,
  'id': 4}]

## Part 3 - Super Hero Powers

Now make a database and table for the super hero powers.

## Problem 1
    
**Task:** One the `super_hero_powers.csv` and verify that the contents of the columns are all Boolean.  In this problem, you need to

1. Create a `dict` that defines the `pandas` column type
2. Read the file in using a `pd.read_csv`.
3. Clean up all the column labels.
    
**Be sure to write clean code!**


## Problem 2
    
Now define an `sqlalchemy` table for these data using `pandas` `to_sql` dataframe method.  You can use the `sqlalchemy.String` and `sqlalchemy.Boolean` columns type, which are [documented here](https://docs.sqlalchemy.org/en/latest/core/type_basics.html)

## Problem 3
    
Now you need to make a new `engine`, `inspect` your database, and make a `session` to query the database.

## Problem 4
    
Perform `sqlalchemy` queries to answer each of the following questions.

1. How many heroes have both Super Strength and Super Speed?
2. How many heroes have names that start with the word *Black*
3. Are heroes with Agility more likely to have Stealth?
4. What fraction of all heroes that can fly also have Super Strength?
5. Consider heroes that have names that contain `"girl"`, `"boy"`, `"woman"`, or `"man"`.  Compute the following ratio

$$\frac{N(\text{boy or man})}{N(\text{girl or woman}}$$

**Hint:** You will need to use some combination of `where`, `group_by`, and `count` for each part.

## Problem 5

Tell me another cool fact about the super powers.