# `DocTable` Baisc Overview

A `DocTable` instance is a reference to a single database table, and can be used to [insert, delete](examples/doctable_insert_delete.html), [select](examples/doctable_select.html), or [update](examples/doctable_update.html) the rows of that table. The table schema is typically created by providing a [`dataclass`](https://realpython.com/python-data-classes/) which inherits from `DocTableSchema` that would represent a row, and the dataclass can be used to insert or update rows into the table. See the [full documentation](https://devincornell.github.io/doctable/ref/doctable.DocTable.html) for a full list of class methods.

In [1]:
import random
random.seed(0)
import pandas as pd
import numpy as np
from dataclasses import dataclass

import sys
sys.path.append('..')
import doctable

## Schema Definitions
First we create a schema definiton and use it to instantiate the DocTable. By default, the DocTable will connect to an in-memory database, so the examples here will not rely on a filesystem database. Printing the object will show the table name as well as the number of rows in the table.

In [2]:
@dataclass
class Record(doctable.DocTableSchema):
    id: int = doctable.IDCol()
    name: str = doctable.Col(nullable=False)
    age: int = None
    is_old: bool = None

db = doctable.DocTable(target=':memory:', schema=Record)
print(db)

<DocTable (4 cols)::sqlite:///:memory::_documents_>


Access table columns as objects by subscripting the doctable directly.

In [3]:
db['id']

Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False)

As we'll show later, these column objects also have some operators defined such that they can be used to construct complex queries and functions.

In [4]:
db['id'] > 3

<sqlalchemy.sql.elements.BinaryExpression object at 0x7fb969fbf5e0>

## Inserting Rows
We use the `.insert()` method to insert a row passed as a dictionary of column name -> value entries.

In [5]:
for i in range(5):
    age = random.random() # number in [0,1]
    is_old = age > 0.5
    #row = {'name':'user_'+str(i), 'age':age, 'is_old':is_old}
    record = Record(name='user_'+str(i), age=age, is_old=is_old)
    db.insert(record)
print(db)

<DocTable (4 cols)::sqlite:///:memory::_documents_>


## Select Statements
Now we show how to select data from the table. Use the `.count()` method to check the number of rows. It also accepts some column conditionals to count entries that satisfy a given criteria

In [6]:
db.count(), db.count(db['is_old']==True)

(5, 3)

Use the `.select()` method with no arguments to retrieve all rows of the table. You can also choose to select one or more columns to select.

In [7]:
db.select()

[Record(id=1, name='user_0', age=0.8444218515250481, is_old=True),
 Record(id=2, name='user_1', age=0.7579544029403025, is_old=True),
 Record(id=3, name='user_2', age=0.420571580830845, is_old=False),
 Record(id=4, name='user_3', age=0.25891675029296335, is_old=False),
 Record(id=5, name='user_4', age=0.5112747213686085, is_old=True)]

In [8]:
db.select('name')

['user_0', 'user_1', 'user_2', 'user_3', 'user_4']

In [9]:
db.select(['id','name'])

[Record(id=1, name='user_0', age=None, is_old=None),
 Record(id=2, name='user_1', age=None, is_old=None),
 Record(id=3, name='user_2', age=None, is_old=None),
 Record(id=4, name='user_3', age=None, is_old=None),
 Record(id=5, name='user_4', age=None, is_old=None)]

In [10]:
db.select(db['age'].sum)

[2.7931393069577677]

The SUM() and COUNT() SQL functions have been mapped to `.sum` and `.count` attributes of columns.

In [12]:
db.select([db['age'].sum,db['age'].count], as_dataclass=False)

[(2.7931393069577677, 5)]

Alternatively, to see the results as a pandas dataframe, we can use ```.select_df()```.

In [14]:
db.select_df()

Unnamed: 0,id,name,age,is_old
0,1,user_0,0.844422,True
1,2,user_1,0.757954,True
2,3,user_2,0.420572,False
3,4,user_3,0.258917,False
4,5,user_4,0.511275,True


Now we can select specific elements of the db using the ```where``` argument of the ```.select()``` method.

In [15]:
db.select(where=db['is_old']==True)

[Record(id=1, name='user_0', age=0.8444218515250481, is_old=True),
 Record(id=2, name='user_1', age=0.7579544029403025, is_old=True),
 Record(id=5, name='user_4', age=0.5112747213686085, is_old=True)]

In [16]:
db.select(where=db['id']==3)

[Record(id=3, name='user_2', age=0.420571580830845, is_old=False)]

We can update the results in a similar way, using the ```where``` argument.

In [17]:
db.update({'name':'smartypants'}, where=db['id']==3)
db.select()

[Record(id=1, name='user_0', age=0.8444218515250481, is_old=True),
 Record(id=2, name='user_1', age=0.7579544029403025, is_old=True),
 Record(id=3, name='smartypants', age=0.420571580830845, is_old=False),
 Record(id=4, name='user_3', age=0.25891675029296335, is_old=False),
 Record(id=5, name='user_4', age=0.5112747213686085, is_old=True)]

In [18]:
db.update({'age':db['age']*100})
db.select()

[Record(id=1, name='user_0', age=84.4421851525048, is_old=True),
 Record(id=2, name='user_1', age=75.79544029403024, is_old=True),
 Record(id=3, name='smartypants', age=42.0571580830845, is_old=False),
 Record(id=4, name='user_3', age=25.891675029296334, is_old=False),
 Record(id=5, name='user_4', age=51.12747213686085, is_old=True)]

And we can delete elements using the ```.delete()``` method.

In [19]:
db.delete(where=db['id']==3)
db.select()

[Record(id=1, name='user_0', age=84.4421851525048, is_old=True),
 Record(id=2, name='user_1', age=75.79544029403024, is_old=True),
 Record(id=4, name='user_3', age=25.891675029296334, is_old=False),
 Record(id=5, name='user_4', age=51.12747213686085, is_old=True)]

# Notes on DB Interface
DocTable2 allows you to access columns through direct subscripting, then relies on the power of sqlalchemy column objects to do most of the work of constructing queries. Here are a few notes on their use. For more demonstration, see the example in examples/dt2_select.ipynb

In [20]:
# subscript is used to access underlying sqlalchemy column reference (without querying data)
db['id']

Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False)

In [21]:
# conditionals are applied directly to the column objects (as we'll see with "where" clause)
db['id'] < 3

<sqlalchemy.sql.elements.BinaryExpression object at 0x7fb945684df0>

In [22]:
# can also access using .col() method
db.col('id')

Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False)

In [23]:
# to access all column objects (only useful for working directly with sql info)
db.columns

<sqlalchemy.sql.base.ImmutableColumnCollection at 0x7fb9457bd340>

In [24]:
# to access more detailed schema information
db.schema_table()

Unnamed: 0,name,type,nullable,default,autoincrement,primary_key
0,id,INTEGER,False,,auto,1
1,name,VARCHAR,False,,auto,0
2,age,INTEGER,True,,auto,0
3,is_old,BOOLEAN,True,,auto,0


In [25]:
# If needed, you can also access the sqlalchemy table object using the .table property.
db.table

Table('_documents_', MetaData(bind=Engine(sqlite:///:memory:)), Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False), Column('name', String(), table=<_documents_>, nullable=False), Column('age', Integer(), table=<_documents_>), Column('is_old', Boolean(), table=<_documents_>), schema=None)

In [26]:
# the count method is also an easy way to count rows in the database
db.count()

4

In [27]:
# the print method makes it easy to see the table name and total row count
print(db)

<DocTable (4 cols)::sqlite:///:memory::_documents_>
