# Interacting with Databases

In many applications data rarely comes from text files, that being a fairly inefficient way to store large amounts of data. SQL-based relational databases (such as SQL Server, PostgreSQL, and MySQL) are in wide use, and many alternative non-SQL (so-called NoSQL) databases have become quite popular. The choice of database is usually dependent on the performance, data integrity, and scalability needs of an application.

Loading data from SQL into a DataFrame is fairly straightforward, and pandas has some functions to simplify the process. As an example, I’ll use an in-memory SQLite database using Python’s built-in sqlite3 driver:

In [1]:
import sqlite3

In [14]:
query = """
CREATE TABLE tble
(a VARCHAR(20), b VARCHAR(20),
 c REAL, d INTEGER
);"""

In [15]:
con = sqlite3.connect(':memory:')
con.execute(query)
con.commit()

Then, insert a few rows of data:

In [16]:
data = [('Atlanta', 'Georgia', 1.25, 6),
        ('Tallahassee', 'Florida', 2.6, 3),
        ('Sacramento', 'California', 1.7, 5)]


stmt = "INSERT INTO tble VALUES(?, ?, ?, ?)"

In [17]:
con.executemany(stmt, data)
con.commit()

Most Python SQL drivers (PyODBC, psycopg2, MySQLdb, pymssql, etc.) return a list of tuples when selecting data from a table:

In [19]:
cursor = con.execute('select * from tble')

In [20]:
rows = cursor.fetchall()

rows

[('Atlanta', 'Georgia', 1.25, 6),
 ('Tallahassee', 'Florida', 2.6, 3),
 ('Sacramento', 'California', 1.7, 5)]

You can pass the list of tuples to the DataFrame constructor, but you also need the column names, contained in the cursor’s description attribute:

In [None]:
cursor.description

(('a', None, None, None, None, None, None),
 ('b', None, None, None, None, None, None),
 ('c', None, None, None, None, None, None),
 ('d', None, None, None, None, None, None))

This is quite a bit of munging that you’d rather not repeat each time you query the database. pandas has a read_frame function in its pandas.io.sql module that simplifies the process. Just pass the select statement and the connection object:

In [21]:
import pandas.io.sql as sql

In [25]:
sql.read_sql_query('select * form tble', con)

DatabaseError: Execution failed on sql 'select * form tble': near "form": syntax error