# SQLAlchemy Basics

SQLAlchemy provides a common interface to many different relational databases, such as MySQl, Oracle, PostgrSql, etc.

It consists of two main pieces:

* the Core (Relational Model)
* ORM (User Data Model - data models and classes created by the developer).

To connect to a database, we need to instantiate an engine using the SQLAlchemy `create_engine` function passing it the connection string (dbase type and path). the engine provides the common interface to the database.

In [1]:
from sqlalchemy import create_engine

# connection string --> 'driver:///path/to/database'
engine = create_engine('sqlite:///../data/sqlalchemy/census.sqlite')

Once we have an engine, we can connect to the engine.

In [4]:
connection = engine.connect()
engine.table_names()

['census', 'state_fact']

SQLAlchemy can be used to automatically load tables from a database using `reflection`. Reflection is the process of reading the database and building the metadata based on that information, enabling the working with existing databases. 

To perform reflection, you need to import the `Table` object from the SQLAlchemy package. Then, use the `Table` object to read the table from the engine and autoload the columns. 

To autoload the columns with the engine, you have to specify the keyword arguments `autoload=True` and `autoload_with=engine` to `Table()`.

`Table` takes four arguments:

* 1st, name of the table as a string
* 2nd, metadata object, container object that keeps together different features of the database.
* 3rd and 4th,  `autoload` and `autoload_with` arguments.

In [14]:
from sqlalchemy import Table, MetaData

metadata = MetaData()
census = Table('census', metadata, autoload=True, autoload_with=engine)

# display census table metadata
repr(census)

"Table('census', MetaData(bind=None), Column('state', VARCHAR(length=30), table=<census>), Column('sex', VARCHAR(length=1), table=<census>), Column('age', INTEGER(), table=<census>), Column('pop2000', INTEGER(), table=<census>), Column('pop2008', INTEGER(), table=<census>), schema=None)"

We can access the tables columns through the `columns` attribute, and retrieve a list of column names using the `keys()` method.

In [16]:
census.columns.keys()

['state', 'sex', 'age', 'pop2000', 'pop2008']

We can use the `metadata` container to find out more details about the reflected table such as the columns and their types. Table objects are stored in the `metadata.tables` dictionary, so you can get the metadata of your census table with `metadata.tables['census']` (similar to result of `repr()` function.

In [17]:
metadata.tables

immutabledict({'census': Table('census', MetaData(bind=None), Column('state', VARCHAR(length=30), table=<census>), Column('sex', VARCHAR(length=1), table=<census>), Column('age', INTEGER(), table=<census>), Column('pop2000', INTEGER(), table=<census>), Column('pop2008', INTEGER(), table=<census>), schema=None)})

In [18]:
metadata.tables['census']

Table('census', MetaData(bind=None), Column('state', VARCHAR(length=30), table=<census>), Column('sex', VARCHAR(length=1), table=<census>), Column('age', INTEGER(), table=<census>), Column('pop2000', INTEGER(), table=<census>), Column('pop2008', INTEGER(), table=<census>), schema=None)

We can see that the `census` table has five columns (`state`, `sex`, `age`, `pop2000` and `pop2008`) and their datatypes.

## Querying the database

We can execute sql queries using the `execute` method, takes the sql query string as an argument and returns a proxy object, which can be used in a variety of ways to retrieve the data.

In [31]:
result_proxy = connection.execute("SELECT * FROM census WHERE state = 'Florida'")
result_proxy

<sqlalchemy.engine.result.ResultProxy at 0x7f467a8f5710>

We can retrieve the result set using the `fetchall()` method.

In [32]:
result_set = result_proxy.fetchall()

# returns a list of tuple, 172
result_set[:5]

[('Florida', 'M', 0, 96891, 118845),
 ('Florida', 'M', 1, 96241, 118562),
 ('Florida', 'M', 2, 95962, 117764),
 ('Florida', 'M', 3, 97571, 115442),
 ('Florida', 'M', 4, 98921, 113414)]

Instead of writing sql queries (which can vary between implementations), we can use the `select` function which takes a list of the tables or columns required as the sole argument.

In [34]:
from sqlalchemy import select

query = select([census])
print(query)

SELECT census.state, census.sex, census.age, census.pop2000, census.pop2008 
FROM census


In [35]:
result_set = connection.execute(query).fetchall()
len(result_set)

8772

We can retrive row(s) using slices. With individual rows we can retrieve the column values using the column names.

In [37]:
result_set[0]['state']

'Illinois'

In [45]:
result_set[10]['pop2008']

86565