In [1]:
import sqlite3

from prettytable import PrettyTable

The [sqlite3](https://docs.python.org/2/library/sqlite3.html) library has been available as part of the Python Standard Library as of version 2.5, and _is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language._

We'll also be using the [prettytable](http://code.google.com/p/prettytable/wiki/Tutorial) library to print out the results nicely. 

It was already present in my Linux Mint version of PyCharm, but if needed, [installation instructions should be here.](http://code.google.com/p/prettytable/wiki/Installation)

---

Ok, so let's get started! 

First create a [Connection](https://docs.python.org/2/library/sqlite3.html#sqlite3.Connection) object that represents the database.

In [2]:
connection = sqlite3.connect('reuters.db')

Once we have a a [Connection](https://docs.python.org/2/library/sqlite3.html#sqlite3.Connection), you can create a [Cursor](https://docs.python.org/2/library/sqlite3.html#sqlite3.Cursor) object which will execute SQL statements/queries on the database with its [.execute()](https://docs.python.org/2/library/sqlite3.html#sqlite3.Cursor.execute) method.

In [3]:
cursor = connection.cursor()

In [4]:
query = '''
SELECT *
FROM frequency as f
WHERE term = 'net'
AND count = 5
'''

In [5]:
results_generator = cursor.execute(query)

---

Executing our query with the Cursor object returns a [generator](https://www.jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/) which yields each new result without smashing everything into memory at once.

It has a [.next()](http://anandology.com/python-practice-book/iterators.html) method if we want to the next result. We can run this until all results are exhausted, or just iterate through with a for loop.

In [6]:
results_generator.next()

(u'12616_txt_earn', u'net', 5)

Let's execute our query again real quick, since we 'yielded the first result out of our object' in our previous example.

In [7]:
results_generator = cursor.execute(query)

---

## [(Pretty) Printing](https://code.google.com/p/prettytable/wiki/Tutorial) out our results table.

Column names are stored in the description attribute of the sqlite3 cursor object.

It is a list of tuples with the first position containing the column name.

In [8]:
colnames = [colname[0] for colname in results_generator.description]

In [9]:
# Instantiate a PrettyTable object with our column names.
table_output = PrettyTable(colnames)

# Sets padding between columns. Default is 1.
table_output.padding_width = 1

# Add each row to the table for printing. Only useful if table fits in memory.
for row in results_generator:
     table_output.add_row(row) 

In [10]:
print table_output

+----------------+------+-------+
|     docid      | term | count |
+----------------+------+-------+
| 12616_txt_earn | net  |   5   |
| 1314_txt_earn  | net  |   5   |
| 1438_txt_earn  | net  |   5   |
| 1602_txt_earn  | net  |   5   |
| 16848_txt_earn | net  |   5   |
| 20291_txt_earn | net  |   5   |
| 20324_txt_earn | net  |   5   |
| 20728_txt_earn | net  |   5   |
| 21412_txt_earn | net  |   5   |
| 2253_txt_earn  | net  |   5   |
| 2789_txt_earn  | net  |   5   |
| 4067_txt_earn  | net  |   5   |
| 4307_txt_earn  | net  |   5   |
| 4914_txt_earn  | net  |   5   |
| 5543_txt_earn  | net  |   5   |
| 5925_txt_earn  | net  |   5   |
| 6466_txt_earn  | net  |   5   |
|  696_txt_earn  | net  |   5   |
| 8979_txt_earn  | net  |   5   |
| 9502_txt_earn  | net  |   5   |
| 9818_txt_earn  | net  |   5   |
| 9839_txt_earn  | net  |   5   |
| 9937_txt_earn  | net  |   5   |
+----------------+------+-------+


--- 

Noticing that the only parameters that are changing are the database name and query, we can just wrap it in a quick function like any good ~~lazy~~ programmer would.

In [11]:
def query_db(database_name, query):
    '''Opens a connections to a SQL database using the sql3lite Python library,
        and runs a provided query on the database.

    Args:
        database_name (string): String containing the name of the database to
         be opened.

        query (string): SQL query to be executed on the database

    Returns:
        tabled_results (PrettyTable object): Results in a table to be printed.
    '''
    connection = sqlite3.connect(database_name)
    cursor = connection.cursor()
    results_generator = cursor.execute(query)

    tabled_results = pprint_results(results_generator)
    return tabled_results
    
def pprint_results(results_generator):
    ''' Formats results from sql3lite query into a nicer format to be printed.

    Add columns from the sql database, and ascii separators.

    Note: Put all results into memory to construct table.

    Args:
        results_generator (sqlite3.Cursor object): Cursor object containing the
            results after executing a query on it.

    Returns:
        table_output (PrettyTable object): A PrettyTable object containing the
         results of the SQL query in a nicely formatted container.
    '''

    colnames = [colname[0] for colname in results_generator.description]

    table_output = PrettyTable(colnames)
    table_output.padding_width = 1

    for row in results_generator:
         table_output.add_row(row)

    return table_output

In [12]:
database_name = 'reuters.db'

In [13]:
query = '''
SELECT *
FROM frequency
WHERE term = 'net'
AND count = 5
'''

In [14]:
print query_db(database_name, query)

+----------------+------+-------+
|     docid      | term | count |
+----------------+------+-------+
| 12616_txt_earn | net  |   5   |
| 1314_txt_earn  | net  |   5   |
| 1438_txt_earn  | net  |   5   |
| 1602_txt_earn  | net  |   5   |
| 16848_txt_earn | net  |   5   |
| 20291_txt_earn | net  |   5   |
| 20324_txt_earn | net  |   5   |
| 20728_txt_earn | net  |   5   |
| 21412_txt_earn | net  |   5   |
| 2253_txt_earn  | net  |   5   |
| 2789_txt_earn  | net  |   5   |
| 4067_txt_earn  | net  |   5   |
| 4307_txt_earn  | net  |   5   |
| 4914_txt_earn  | net  |   5   |
| 5543_txt_earn  | net  |   5   |
| 5925_txt_earn  | net  |   5   |
| 6466_txt_earn  | net  |   5   |
|  696_txt_earn  | net  |   5   |
| 8979_txt_earn  | net  |   5   |
| 9502_txt_earn  | net  |   5   |
| 9818_txt_earn  | net  |   5   |
| 9839_txt_earn  | net  |   5   |
| 9937_txt_earn  | net  |   5   |
+----------------+------+-------+


In [15]:
new_query = '''
SELECT *
FROM frequency
WHERE term = 'net'
AND count = 6
'''

In [16]:
print query_db(database_name, new_query)

+-----------------+------+-------+
|      docid      | term | count |
+-----------------+------+-------+
|  1011_txt_earn  | net  |   6   |
| 17199_txt_crude | net  |   6   |
|  20393_txt_earn | net  |   6   |
|  21248_txt_earn | net  |   6   |
|  21260_txt_earn | net  |   6   |
|  21386_txt_earn | net  |   6   |
|  4788_txt_earn  | net  |   6   |
|  8999_txt_earn  | net  |   6   |
+-----------------+------+-------+


---

Quick aside regarding the instructions:



---
_Many questions ask you to count the number of records returned by a query. Perhaps the easiest way to count the number of records returned by a query Q is to write Q as a subquery:_

```SELECT count(*) FROM (
  SELECT ...
) x;```

_(In SQLite, the alias "x" is not required, but in other dialects of SQL it is. So we've included it here.)_

---

If you're like me, and at first said.. "what?", hopefully this helps!

#### Nesting SQL queries working example

In [17]:
# Following the same instruction syntax using our previous query

nested_query = '''
SELECT count(*)
FROM (SELECT *
      FROM frequency
      WHERE term = 'net'
      AND count = 6) x;
'''

In [18]:
print query_db(database_name, nested_query)

+----------+
| count(*) |
+----------+
|    8     |
+----------+


And of course, since this is Python, we could just make a template string out of the count query string, and pass in our original queries into the right place with [Python's .format() method](https://docs.python.org/2/library/stdtypes.html#str.format) on strings.

In [19]:
nested_query_template = '''
SELECT count(*)
FROM ({0}) x;
'''.format(query)

print query_db(database_name, nested_query_template)

+----------+
| count(*) |
+----------+
|    23    |
+----------+


In [20]:
nested_query_template = '''
SELECT count(*)
FROM ({0}) x;
'''.format(new_query)

print query_db(database_name, nested_query_template)

+----------+
| count(*) |
+----------+
|    8     |
+----------+


But a function might be better there too. ;)

---

That's it! That should be enough for some basic SQL querying. :)

Disclaimer: This probably only works well for valid SQL queries. 

---

Author: Fernando Hernandez

First Draft