## SQL - Major SQL commands
### BIOINF 575 - Fall 2021

##### RESOURCES
https://sqlite.org/index.html    
https://www.sqlite.org/fullsql.html  
https://docs.python.org/3/library/sqlite3.html  
https://www.sqlite.org/lang_aggfunc.html  
https://www.sqlitetutorial.net/sqlite-create-table/    
https://www.sqlite.org/syntaxdiagrams.html    
https://www.tutorialspoint.com/python_network_programming/python_databases_and_sql.htm  
https://www.tutorialspoint.com/python/python_database_access.htm  
https://www.python-course.eu/sql_python.php  
https://www.sqlalchemy.org/library.html#reference    
https://docs.sqlalchemy.org/en/13/orm/  
https://docs.sqlalchemy.org/en/14/orm/tutorial.html#version-check  
https://towardsdatascience.com/sql-in-python-for-beginners-b9a4f9293ecf  
https://database.guide/2-sample-databases-sqlite/
https://www.sqlitetutorial.net/sqlite-sample-database/

#### What is a database? 

* Is an organized collection of data (files)
* A way to store and retrieve that information
* A relational database is structured to recognize relations between the data elements


*  A collection of data

        * Dictionary
            {"EGFR":6.8, "MYC": 4.5, "WNT1":11.7}

        * Tab-separated text file, or pd.DataFrame


| GeneID  | GeneSymbol  | ExpressionValue  |
|---------|-------------|------------------|
| 7471    | WNT1        |             11.7 |
| 4609    | MYC         |              4.5 |
| 1956    | EGFR        |              6.8 |


Entity-Relationship Diagram - shows the relations between tables in a relational database
- tables are connected by fields (columns) that are common - called keys


https://dcm.uhcl.edu/yue/courses/csci4333/current/notes/model/ERDiagram.html


<img src = https://dcm.uhcl.edu/yue/courses/csci4333/current/notes/model/ER1_1.png width = 450 />

Relationship diagram as shown by the Relational Database Management Systems (RDBM)

<img src = https://dcm.uhcl.edu/yue/courses/csci4333/current/notes/model/Relationship_1.png width = 600 />



More examples:

https://www.sqlitetutorial.net/wp-content/uploads/2015/11/sqlite-sample-database-color.jpg      
https://upload.wikimedia.org/wikipedia/commons/b/b8/Sql_hospital.png      
https://www.researchgate.net/profile/Adam_Richards3/publication/282134102/figure/fig3/AS:289128232046602@1445944950296/Database-entity-diagram-Data-collected-from-NCBI-the-Gene-Ontology-and-UniProt-are.png      




#### Relational Database Management Systems (RDBMS)
* Software programs such as Oracle, MySQL, SQLServer, DB2, postgreSQL, SQLite 
* They handle the data storage, indexing, logging, tracking and security (access)  
* They have a very fine-grained way of granting permissions to users at the level of commands that may be used
    * Create a database
    * Create a table
    * Update or insert data
    * View certain tables ... and many more   
* An important part of learning databases is to understand the type of data which is stored in columns and rows.  
* Likewise when we get to the database design section, it is critically important to know what type of data you will be modeling and storing (and roughly how much, in traditional systems) 
* Exactly which types are available depends on the database system



#### Why use databases and Relational Database Management Systems?
* Easy, efficient, secure, collaborative management of data that maintains data integrity

#### What is the Structured Query Language (SQL) ?
* SQL is the standard language for relational database management systems
* SQL is used to communicate with a database

#### Why SQLite?
SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. Some applications can use SQLite for internal data storage. 
* SQLite is often the technology of choice for small applications, particularly those of embedded systems and devices like phones and tablets, smart appliances, and instruments.
* It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle.

#### Integrated Development Environment (IDE)

There are many tools:
https://dbeaver.io - recommended, please install it for the next session

<img src = https://github.com/dbeaver/dbeaver/wiki/images/ug/SQL-Editor.png width = 600/>

https://github.com/dbeaver/dbeaver/wiki/SQL-Editor

Other IDE tools:    
https://www.adminer.org    
https://razorsql.com
https://sqlitebrowser.org    
https://towardsdatascience.com/10-best-sql-editor-tools-in-the-market-126acd64ba06    
https://learnsql.com/blog/best-sql-editor/     


#### sqlite3
The sqlite3 module in the Python standard library provides a SQL interface to communicate with databases.<br>
https://docs.python.org/3/library/sqlite3.html

Once you have a `Connection`, you can create a `Cursor` object and call its execute() method to perform SQL commands.

`Cursor` objects represent a database cursor, which is used to manage the context of a fetch/retrieval operation.  
A call to the `Cursor`'s execute() method is used to perform SQL commands.

#### SQLite uses a greatly simplified set of data types:
* INTEGER - numeric
* REAL - numeric
* TEXT – text of any length
    * Dates are held as text
* BLOB – binary large objects
    * Such as images

#### The data

https://datacarpentry.org/python-ecology-lesson/

Download the database file `portal_mammals.sqlite` from:   
https://datacarpentry.org/python-ecology-lesson/setup.html      
"     
The data we will be using is a time-series for a small mammal community in southern Arizona.    
This is part of a project studying the effects of rodents and ants on the plant community that has been running for almost 40 years.    
The rodents are sampled on a series of 24 plots, with different experimental manipulations controlling which rodents are allowed to access which plots.    
"   
https://datacarpentry.org/sql-ecology-lesson/00-sql-introduction/index.html


In [11]:
# bring into the environment the  functionality that 
# can be used to connect to a database

from sqlite3 import connect

'''    
    Establish a connection to the database.
    This statement creates the file at the given path if it does not exist and that will be an empty database.
    The file was provided in this case so the statement should just establish the connection.
'''

# create a connection object to be able to acceess the database

connection = connect('portal_mammals.sqlite')

# create a cursor to interact with the database and 
# send commands to the database 

cursor = connection.cursor()


#### The cursor and the connection should be closed when we no loger run queries.
```python
cursor.close()
connection.close()
```

We are still using them so we keep them open for now.     
After we close them, we should run the above cell again to reestablish the access to the database.

In [12]:
cursor.close()
connection.close()

In [13]:
connection = connect('test.sqlite')
cursor = connection.cursor()

In [14]:
# what is the type of the connection object?

type(connection)


sqlite3.Connection

In [15]:
# what can the connection object do? 
# look at only for non __ methods and attributes

[elem for elem in dir(connection) if not elem.startswith("__")]

['DataError',
 'DatabaseError',
 'Error',
 'IntegrityError',
 'InterfaceError',
 'InternalError',
 'NotSupportedError',
 'OperationalError',
 'ProgrammingError',
 'backup',
 'close',
 'commit',
 'create_aggregate',
 'create_collation',
 'create_function',
 'cursor',
 'enable_load_extension',
 'execute',
 'executemany',
 'executescript',
 'in_transaction',
 'interrupt',
 'isolation_level',
 'iterdump',
 'load_extension',
 'rollback',
 'row_factory',
 'set_authorizer',
 'set_progress_handler',
 'set_trace_callback',
 'text_factory',
 'total_changes']

In [16]:
# what is the type of the cursor object?
type(cursor)


sqlite3.Cursor

In [17]:
# what can the cursor object do? 
# look at only for non __ methods and attributes

[element for element in dir(cursor) if not element.startswith("__")]

['arraysize',
 'close',
 'connection',
 'description',
 'execute',
 'executemany',
 'executescript',
 'fetchall',
 'fetchmany',
 'fetchone',
 'lastrowid',
 'row_factory',
 'rowcount',
 'setinputsizes',
 'setoutputsize']

#### Major SQL commands: SELECT, INSERT, DELETE, UPDATE
#### SELECT - Retrieves data from one or more tables and doesn’t change the data at all 

<img src = http://www.tutorialscan.com/wp-content/uploads/2018/04/sql_select.jpg width = 300 />

http://www.tutorialscan.com/wp-content/uploads/2018/04/sql_select.jpg

A more complex diagram with all the options is available at:
https://www.sqlite.org/syntaxdiagrams.html#select-stmt

* SELECT  * (means all columns), or the comma separated names of the columns of data you wish to return
    * Returns columns (left to right) in the order received. 
    * '*' selects ALL rows and ALL columns and returns them by column order and row_id
* FROM is the table source or sources (comma separated)
* WHERE (optional) is the predicate clause: conditions for the query
    * Evaluates to True or False for each row
    * This clause almost always includes Column-Value pairs.
    * Omitting the Where clause returns ALL the records in that table.
    * Note: the match is case sensitive
* ORDER BY (optional) indicates a sort order for the output data 
    * default is row_id, which can be very non-intuitive  
    * ASCending or DESCending can be appended to change the sort order.  (ASC is default)
* GROUP BY (optional) groups by a column and creates summary data for a different column
* HAVING (optional) allows restrictions on the rows selected
    * a GROUP BY clause is required before HAVING
* LIMIT (optional) reduces the number of rows retrieved to the number provided after this clause
* In most SQL clients, the ";" indicates the end of a statement and requests execution


In [18]:
# In every SQLite database, there is a special table: sqlite_master
# sqlite_master -  describes the contents of the database

# the SQL command can be ran as is when using a Integrated Development Environment 
# e.g. DBeaver

sql = '''SELECT type, name FROM sqlite_master LIMIT 5;'''
cursor.execute(sql)

<sqlite3.Cursor at 0x10ac279d0>

In [19]:
list(cursor)

[]

In [21]:
cursor.close()
connection.close()

In [22]:
connection = connect('portal_mammals.sqlite')

# create a cursor to interact with the database and 
# send commands to the database 

cursor = connection.cursor()

In [23]:
# look at the help of the fetchall method for the cursor 
help(cursor.fetchall)

Help on built-in function fetchall:

fetchall(...) method of sqlite3.Cursor instance
    Fetches all rows from the resultset.



In [24]:
sql = '''
SELECT name,type FROM sqlite_master;
'''
cursor.execute(sql)
cursor.fetchall()

[('surveys', 'table'),
 ('species', 'table'),
 ('plots', 'table'),
 ('sqlite_sequence', 'table'),
 ('species_survey', 'view')]

In [25]:
sql = '''
SELECT name,type FROM sqlite_master;
'''
cursor.execute(sql)

# use a for loop to get the rows in the result one by one
for row in cursor: 
    print(row)

('surveys', 'table')
('species', 'table')
('plots', 'table')
('sqlite_sequence', 'table')
('species_survey', 'view')


In [26]:
sql = '''
SELECT name,type FROM sqlite_master;
'''
cursor.execute(sql)

# use fetchone retrieve just one row from the results
cursor.fetchone()

('surveys', 'table')

In [27]:
# See the result header

cursor.description

(('name', None, None, None, None, None, None),
 ('type', None, None, None, None, None, None))

In [4]:
def get_header(cursor):
    '''
    Makes a tab delimited header row from the cursor description.
    Arguments:
        cursor: a cursor after a select query
    Returns:
        string: A string consisting of the column names separated by tabs, no new line
    '''
    return '\t'.join([row[0] for row in cursor.description])


In [29]:
get_header(cursor)

'name\ttype'

In [None]:
print(get_header(cursor))

In [31]:
print(get_header(cursor))

name	type


#### Different ways to retrieve results - observe the different data structures displayed

In [None]:
# we already have the query in the sql variable

sql

In [35]:
sql = '''
SELECT name
FROM sqlite_master 
WHERE name = "surveys"; -- condition that allows the selection of specific rows
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

name
surveys


In [37]:
# See the result

cursor.execute(sql)
print("Iterate through the cursor:")
for row in cursor: 
    print(row)
    
print()

cursor.execute(sql)
print("Use the Cursor fetchall() method:")
cursor.fetchall()

Iterate through the cursor:
('AB', 'Amphispiza', 'bilineata', 'Bird')
('AH', 'Ammospermophilus', 'harrisi', 'Rodent')
('AS', 'Ammodramus', 'savannarum', 'Bird')
('BA', 'Baiomys', 'taylori', 'Rodent')
('CB', 'Campylorhynchus', 'brunneicapillus', 'Bird')
('CM', 'Calamospiza', 'melanocorys', 'Bird')
('CQ', 'Callipepla', 'squamata', 'Bird')
('CS', 'Crotalus', 'scutalatus', 'Reptile')
('CT', 'Cnemidophorus', 'tigris', 'Reptile')
('CU', 'Cnemidophorus', 'uniparens', 'Reptile')
('CV', 'Crotalus', 'viridis', 'Reptile')
('DM', 'Dipodomys', 'merriami', 'Rodent')
('DO', 'Dipodomys', 'ordii', 'Rodent')
('DS', 'Dipodomys', 'spectabilis', 'Rodent')
('DX', 'Dipodomys', 'sp.', 'Rodent')
('EO', 'Eumeces', 'obsoletus', 'Reptile')
('GS', 'Gambelia', 'silus', 'Reptile')
('NL', 'Neotoma', 'albigula', 'Rodent')
('NX', 'Neotoma', 'sp.', 'Rodent')
('OL', 'Onychomys', 'leucogaster', 'Rodent')

Use the Cursor fetchall() method:


[('AB', 'Amphispiza', 'bilineata', 'Bird'),
 ('AH', 'Ammospermophilus', 'harrisi', 'Rodent'),
 ('AS', 'Ammodramus', 'savannarum', 'Bird'),
 ('BA', 'Baiomys', 'taylori', 'Rodent'),
 ('CB', 'Campylorhynchus', 'brunneicapillus', 'Bird'),
 ('CM', 'Calamospiza', 'melanocorys', 'Bird'),
 ('CQ', 'Callipepla', 'squamata', 'Bird'),
 ('CS', 'Crotalus', 'scutalatus', 'Reptile'),
 ('CT', 'Cnemidophorus', 'tigris', 'Reptile'),
 ('CU', 'Cnemidophorus', 'uniparens', 'Reptile'),
 ('CV', 'Crotalus', 'viridis', 'Reptile'),
 ('DM', 'Dipodomys', 'merriami', 'Rodent'),
 ('DO', 'Dipodomys', 'ordii', 'Rodent'),
 ('DS', 'Dipodomys', 'spectabilis', 'Rodent'),
 ('DX', 'Dipodomys', 'sp.', 'Rodent'),
 ('EO', 'Eumeces', 'obsoletus', 'Reptile'),
 ('GS', 'Gambelia', 'silus', 'Reptile'),
 ('NL', 'Neotoma', 'albigula', 'Rodent'),
 ('NX', 'Neotoma', 'sp.', 'Rodent'),
 ('OL', 'Onychomys', 'leucogaster', 'Rodent')]

In [5]:
# note that if you have a large result 
# this function will try to make a very large string from it
# so it is recommended for results with less than 10 rows and 10 columns
# for other cases use the for loop to go through the rows in the result 

def get_results(cursor):
    '''
    Makes a tab delimited table from the cursor results.
    Arguments:
        cursor: a cursor after a select query
    Returns:
        string: A string consisting of the column names separated by tabs, no new line
    ''' 
    res = list()
    for row in cursor.fetchall():        
        res.append('\t'.join(list(map(str,row))))
    return "\n".join(res)

In [33]:
# use the functions we created for a nice display of the results

cursor.execute(sql)

print(get_header(cursor))
print(get_results(cursor))

name	type
surveys	table
species	table
plots	table
sqlite_sequence	table
species_survey	view


In [34]:
# WHERE clause example (-- denotes comment)
# more examples with comments later
# in the WHERE clause we put conditions to filter the data

sql = '''
SELECT name
FROM sqlite_master 
WHERE type = "table"; -- condition that allows the selection of specific rows
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

name
surveys
species
plots
sqlite_sequence


In [36]:
# Selects all columns (*) of the species table 
# retrieves only 20 rows due to the LIMIT clause
# The first column is the species id

sql = '''
SELECT *
FROM species LIMIT 20;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

species_id	genus	species	taxa
AB	Amphispiza	bilineata	Bird
AH	Ammospermophilus	harrisi	Rodent
AS	Ammodramus	savannarum	Bird
BA	Baiomys	taylori	Rodent
CB	Campylorhynchus	brunneicapillus	Bird
CM	Calamospiza	melanocorys	Bird
CQ	Callipepla	squamata	Bird
CS	Crotalus	scutalatus	Reptile
CT	Cnemidophorus	tigris	Reptile
CU	Cnemidophorus	uniparens	Reptile
CV	Crotalus	viridis	Reptile
DM	Dipodomys	merriami	Rodent
DO	Dipodomys	ordii	Rodent
DS	Dipodomys	spectabilis	Rodent
DX	Dipodomys	sp.	Rodent
EO	Eumeces	obsoletus	Reptile
GS	Gambelia	silus	Reptile
NL	Neotoma	albigula	Rodent
NX	Neotoma	sp.	Rodent
OL	Onychomys	leucogaster	Rodent


Aliasing column names to make them easier to understand - add a new name for a column next to the column name using quotes if the alias contains spaces, the AS keyword can also be used
- e.g.: column_name AS "Alias name"
- e.g.: column_name "Alias name"
- e.g.: column_name Alias_name

In [38]:
# example of aliasing columns

sql = '''
SELECT species_id ID, species as "Species Name"
FROM species LIMIT 5;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

ID	Species Name
AB	bilineata
AH	harrisi
AS	savannarum
BA	taylori
CB	brunneicapillus


In [39]:
# we can use aggregate functions to summarize data
# COUNT returns a single number, which is the count of all rows in the table

sql = '''
SELECT count(*) FROM species;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

count(*)
54


In [40]:
sql = '''
SELECT count(species_id) AS 'Number of species' 
FROM species;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

Number of species
54


In [42]:
# select first 50 rows from the surveys table - all columns

sql = '''
SELECT *
FROM surveys
LIMIT 50;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
1	7	16	1977	2	NL	M	32.0	None
2	7	16	1977	3	NL	M	33.0	None
3	7	16	1977	2	DM	F	37.0	None
4	7	16	1977	7	DM	M	36.0	None
5	7	16	1977	3	DM	M	35.0	None
6	7	16	1977	1	PF	M	14.0	None
7	7	16	1977	2	PE	F	None	None
8	7	16	1977	1	DM	M	37.0	None
9	7	16	1977	1	DM	F	34.0	None
10	7	16	1977	6	PF	F	20.0	None
11	7	16	1977	5	DS	F	53.0	None
12	7	16	1977	7	DM	M	38.0	None
13	7	16	1977	3	DM	M	35.0	None
14	7	16	1977	8	DM	None	None	None
15	7	16	1977	6	DM	F	36.0	None
16	7	16	1977	4	DM	F	36.0	None
17	7	16	1977	3	DS	F	48.0	None
18	7	16	1977	2	PP	M	22.0	None
19	7	16	1977	4	PF	None	None	None
20	7	17	1977	11	DS	F	48.0	None
21	7	17	1977	14	DM	F	34.0	None
22	7	17	1977	15	NL	F	31.0	None
23	7	17	1977	13	DM	M	36.0	None
24	7	17	1977	13	SH	M	21.0	None
25	7	17	1977	9	DM	M	35.0	None
26	7	17	1977	15	DM	M	31.0	None
27	7	17	1977	15	DM	M	36.0	None
28	7	17	1977	11	DM	M	38.0	None
29	7	17	1977	11	PP	M	None	None
30	7	17	1977	10	DS	F	52.0	None
31	7	17	1977	15	DM	F	3

In [43]:
sql = '''
SELECT * 
FROM surveys
LIMIT 20;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
1	7	16	1977	2	NL	M	32.0	None
2	7	16	1977	3	NL	M	33.0	None
3	7	16	1977	2	DM	F	37.0	None
4	7	16	1977	7	DM	M	36.0	None
5	7	16	1977	3	DM	M	35.0	None
6	7	16	1977	1	PF	M	14.0	None
7	7	16	1977	2	PE	F	None	None
8	7	16	1977	1	DM	M	37.0	None
9	7	16	1977	1	DM	F	34.0	None
10	7	16	1977	6	PF	F	20.0	None
11	7	16	1977	5	DS	F	53.0	None
12	7	16	1977	7	DM	M	38.0	None
13	7	16	1977	3	DM	M	35.0	None
14	7	16	1977	8	DM	None	None	None
15	7	16	1977	6	DM	F	36.0	None
16	7	16	1977	4	DM	F	36.0	None
17	7	16	1977	3	DS	F	48.0	None
18	7	16	1977	2	PP	M	22.0	None
19	7	16	1977	4	PF	None	None	None
20	7	17	1977	11	DS	F	48.0	None


In [44]:
# DISTINCT selects  non-duplicated elements (rows)

sql = '''
SELECT month, year FROM surveys LIMIT 5;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

print()

sql = '''
SELECT DISTINCT month, year FROM surveys LIMIT 5;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

month	year
7	1977
7	1977
7	1977
7	1977
7	1977

month	year
7	1977
8	1977
9	1977
10	1977
11	1977


In [39]:
# Other aggregate functions are available for numerical columns
# https://www.sqlite.org/lang_aggfunc.html
# comments are added using -- in front of the test to comment or using /* comment */ 

sql = '''
select MAX(DISTINCT plot_id) MAX_PID FROM surveys; -- comment LIMIT 5; /* comment */
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))



MAX_PID
24


In [47]:
# count the number of rows in surveys


sql = '''
SELECT count(*)
FROM surveys
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))


count(*)
35549


In [50]:
# count the number of distinct years in surveys


sql = '''
SELECT count(DISTINCT year) "No of Years"
FROM surveys
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))


No of Years
26


In [52]:
# count the distinct plots in surveys


sql = '''
SELECT count(DISTINCT plot_id)
FROM surveys
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

count(DISTINCT plot_id)
24


In [54]:
# count the distinct  taxa in species

sql = '''
SELECT count(DISTINCT taxa)
FROM species
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

count(DISTINCT taxa)
4


#### WHERE clause operators
https://www.sqlite.org/lang_expr.html

<> ,  != 	inequality <br>
<			less than <br>
<= 			less than or equal <br>
=			equal <br>
'>			greater than <br>
'>= 		greater than or equal <br>
BETWEEN v1 AND v2	tests that a value to lies in a given range <br>
EXISTS		test for existence of rows matching query <br>
IN			tests if a value falls within a given set or query <br>
IS [ NOT ] NULL	is or is not null <br>
[ NOT ] LIKE		tests value to see if like or not like another <br>

% is the wildcard in SQL, used in conjunction with LIKE



........................................................................

Merriam's kangaroo rat (Dipodomys merriami - DM)

<img src = https://upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Merriam%27s_Kangaroo_Rat.jpg/440px-Merriam%27s_Kangaroo_Rat.jpg width = 200 />

https://en.wikipedia.org/wiki/Merriam%27s_kangaroo_rat      
........................................................................

In [55]:
# we can put multiple conditions in the  where clause 
# combine conditions with AND, OR
# let's see the DM rats that have a weight < 40 and were seen on a specific plot in 1990

sql = '''
SELECT * FROM surveys 
WHERE plot_id = '11' AND 
      species_id = "DM" AND 
      weight < 40 AND 
      year = 1990;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
17221	2	25	1990	11	DM	F	37.0	39.0
17550	5	24	1990	11	DM	F	34.0	23.0
17649	6	22	1990	11	DM	M	33.0	20.0
17730	7	22	1990	11	DM	F	36.0	33.0
17808	8	17	1990	11	DM	F	37.0	35.0
17816	8	17	1990	11	DM	F	35.0	23.0
17874	9	25	1990	11	DM	F	None	39.0
17943	10	16	1990	11	DM	F	37.0	39.0
18051	11	11	1990	11	DM	F	37.0	38.0
18156	12	16	1990	11	DM	F	34.0	37.0
18182	12	16	1990	11	DM	F	37.0	37.0


In [None]:
sql = '''
SELECT * FROM surveys 
WHERE species_id IN ("DS", "DM", "PF") AND
      plot_id IN (10, 11) AND
      year IN (1980, 1983) AND 
      month IN (1,3) AND
      sex = "M";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

In [59]:
sql = '''
SELECT * 
FROM species 
WHERE species_id in ("AH","PE", "DM")
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

species_id	genus	species	taxa
AH	Ammospermophilus	harrisi	Rodent
DM	Dipodomys	merriami	Rodent
PE	Peromyscus	eremicus	Rodent


In [56]:
# between allows us to check if a value is between 2 limits
sql = '''
SELECT * FROM surveys 
WHERE species_id = 'DM' AND 
      weight BETWEEN 30 AND 40 AND
      plot_id BETWEEN 6 AND 8 AND
      year BETWEEN 1989 AND 1990;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
15469	1	11	1989	8	DM	M	36.0	39.0
16161	6	4	1989	8	DM	M	36.0	34.0
16201	6	4	1989	8	DM	F	50.0	38.0
16306	7	4	1989	8	DM	F	None	32.0
16414	7	30	1989	8	DM	F	36.0	36.0
16545	10	8	1989	6	DM	M	36.0	40.0
16673	11	5	1989	8	DM	F	35.0	39.0
16942	1	7	1990	8	DM	F	36.0	38.0
17055	1	30	1990	8	DM	F	37.0	39.0
17079	1	30	1990	8	DM	F	35.0	35.0
17114	1	30	1990	8	DM	F	37.0	36.0
17197	2	25	1990	8	DM	F	36.0	38.0
17229	2	25	1990	8	DM	F	35.0	40.0
17369	3	30	1990	8	DM	F	36.0	39.0
17441	4	26	1990	8	DM	F	37.0	39.0
17442	4	26	1990	8	DM	F	37.0	40.0
17516	5	24	1990	8	DM	M	36.0	32.0
17529	5	24	1990	8	DM	M	35.0	30.0
17610	6	22	1990	8	DM	M	36.0	37.0
17851	9	25	1990	8	DM	F	None	39.0
17937	10	16	1990	8	DM	F	35.0	40.0
18026	11	11	1990	8	DM	F	36.0	35.0
18029	11	11	1990	8	DM	M	37.0	40.0
18040	11	11	1990	8	DM	M	36.0	35.0
18063	11	11	1990	8	DM	M	36.0	38.0
18065	11	11	1990	8	DM	F	36.0	40.0
18129	12	16	1990	8	DM	F	36.0	40.0
18153	12	16	1990	8	DM	M	37.0	36.0
1

In [57]:
# LIKE allows you to look for patterns

sql = '''
SELECT * 
FROM species 
WHERE genus LIKE '%per%'; -- not case sensitive
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

species_id	genus	species	taxa
AH	Ammospermophilus	harrisi	Rodent
PE	Peromyscus	eremicus	Rodent
PF	Perognathus	flavus	Rodent
PH	Perognathus	hispidus	Rodent
PL	Peromyscus	leucopus	Rodent
PM	Peromyscus	maniculatus	Rodent
SS	Spermophilus	spilosoma	Rodent
ST	Spermophilus	tereticaudus	Rodent


In [None]:
# Retrieve rows from surveys where the species_id is DM 
# and has been seen in 1990 or 1995 in June-August on plot 1

# put the most restrictive condition first if possible
sql = '''
SELECT * 
FROM species 
WHERE species_id = "DM" AND
      year IN (1990,1995) AND
      month BETWEEN 6 AND 8 AND
      plot_id = 1
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))



In [60]:
# ORDER BY (optional) - indicates a sort order given by a column in the the output data and the sort order: ASC or DESC

sql = '''
SELECT *
FROM surveys 
WHERE month = 6 AND year = 1983 AND sex = "F" AND plot_id BETWEEN 10 AND 20
ORDER BY plot_id  ASC, species_id DESC
--LIMIT 20;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
8016	6	18	1983	11	OT	F	20.0	30.0
8046	6	18	1983	11	DS	F	52.0	128.0
8010	6	18	1983	11	DO	F	36.0	46.0
7938	6	17	1983	12	PP	F	22.0	20.0
7970	6	17	1983	12	OT	F	20.0	15.0
7936	6	17	1983	12	DO	F	34.0	42.0
7962	6	17	1983	12	DO	F	33.0	39.0
7957	6	17	1983	12	DM	F	36.0	44.0
7980	6	17	1983	12	DM	F	31.0	16.0
7981	6	17	1983	13	DM	F	34.0	49.0
7989	6	17	1983	13	DM	F	35.0	46.0
7953	6	17	1983	14	DO	F	31.0	19.0
7972	6	17	1983	14	DO	F	33.0	15.0
7976	6	17	1983	14	DO	F	33.0	46.0
8031	6	18	1983	15	OL	F	20.0	45.0
8033	6	18	1983	15	NL	F	29.0	117.0
7934	6	17	1983	17	DM	F	33.0	38.0
7968	6	17	1983	18	OT	F	20.0	22.0
7959	6	17	1983	18	NL	F	32.0	162.0
7977	6	17	1983	18	DS	F	47.0	64.0
7983	6	17	1983	19	RM	F	16.0	15.0
7986	6	17	1983	19	PF	F	16.0	7.0
7937	6	17	1983	20	NL	F	31.0	None
7967	6	17	1983	20	NL	F	32.0	150.0
7939	6	17	1983	20	DS	F	50.0	111.0
7947	6	17	1983	20	DO	F	35.0	41.0
7971	6	17	1983	20	DM	F	36.0	None


The PRAGMA statement is an SQL extension specific to SQLite and used to modify the operation of the SQLite library or to query the SQLite library for internal (non-table) data.
https://www.sqlite.org/pragma.html

____
Sqlite3 also has some PRAGMA methods <br>
This is an SQL extension specific to SQLite that is used to modify the operation of the SQLite library or to query the SQLite library for internal (non-table) data <br>
https://www.sqlite.org/pragma.html <br>
The code below shows how to get the schema (columns and columns information)

In [6]:
from sqlite3 import connect
connection = connect("portal_mammals.sqlite")
cursor = connection.cursor()

# also run the get_header() and get_results() funtion defintion

In [7]:
# get the information about the data table
# other PRAGMA methods can provide other types of information

sql = 'PRAGMA table_info("species")'
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

cid	name	type	notnull	dflt_value	pk
0	species_id	TEXT	0	None	0
1	genus	TEXT	0	None	0
2	species	TEXT	0	None	0
3	taxa	TEXT	0	None	0


In [8]:
sql = '''SELECT * FROM pragma_table_info("surveys")  '''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

cid	name	type	notnull	dflt_value	pk
0	record_id	BIGINT	0	None	0
1	month	BIGINT	0	None	0
2	day	BIGINT	0	None	0
3	year	BIGINT	0	None	0
4	plot_id	BIGINT	0	None	0
5	species_id	TEXT	0	None	0
6	sex	TEXT	0	None	0
7	hindfoot_length	FLOAT	0	None	0
8	weight	FLOAT	0	None	0


___

In [10]:
# SUB-QUERY - we can have a query in a query

sql = '''
SELECT *
FROM species
WHERE species_id IN
    (SELECT DISTINCT species_id 
    FROM surveys
    WHERE year = 1997 AND month = 4); 
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

species_id	genus	species	taxa
DM	Dipodomys	merriami	Rodent
DO	Dipodomys	ordii	Rodent
OL	Onychomys	leucogaster	Rodent
OT	Onychomys	torridus	Rodent
PB	Chaetodipus	baileyi	Rodent
PE	Peromyscus	eremicus	Rodent
PF	Perognathus	flavus	Rodent
PM	Peromyscus	maniculatus	Rodent
PP	Chaetodipus	penicillatus	Rodent
RF	Reithrodontomys	fulvescens	Rodent
RM	Reithrodontomys	megalotis	Rodent


In [9]:
sql = '''
SELECT DISTINCT species_id 
    FROM surveys
    WHERE year = 1997 AND month = 4; 
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

species_id
DM
PP
RM
PF
PM
OT
PB
PE
DO
OL
RF


In [11]:
# GROUP BY groups by a column and creates summary data for a different column
# count entries for each species ID - how many sightings we have for each species

sql = '''
SELECT species_id ID, count(*) AS record_no
FROM surveys 
GROUP BY species_id 
ORDER BY record_no DESC;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

ID	record_no
DM	10596
PP	3123
DO	3027
PB	2891
RM	2609
DS	2504
OT	2249
PF	1597
PE	1299
NL	1252
OL	1006
PM	899
None	763
AH	437
AB	303
SS	248
SH	147
SA	75
RF	75
CB	50
BA	46
SO	43
SF	43
DX	40
PC	39
PL	36
PH	32
CQ	16
CM	13
OX	12
UR	10
PI	9
UP	8
RO	8
PG	8
PX	6
SU	5
PU	5
US	4
UL	4
ZL	2
RX	2
AS	2
ST	1
SC	1
CV	1
CU	1
CT	1
CS	1


In [12]:
# specify column in aggregate function and alias the name of the columns
# count sightings for each plot

sql = '''
SELECT plot_id as "Plot No", count(record_id) as "Record Number" 
FROM surveys 
GROUP BY plot_id;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

Plot No	Record Number
1	1995
2	2194
3	1828
4	1969
5	1194
6	1582
7	816
8	1891
9	1936
10	469
11	1918
12	2365
13	1538
14	1885
15	1069
16	646
17	2039
18	1445
19	1189
20	1390
21	1173
22	1399
23	571
24	1048


In [13]:
# HAVING allows restrictions on the rows used or selected
# a GROUP BY clause is required before HAVING

sql = '''
SELECT plot_id as "Plot No", count(record_id) counts 
FROM surveys 
GROUP BY plot_id
HAVING counts < 1000;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

Plot No	counts
7	816
10	469
16	646
23	571


In [14]:
# HAVING allows restrictions on the rows used or selected
# a GROUP BY clause is required before HAVING

# if you are filtering by the aggregate result 
# it does not work to put the condition in the where clause


sql = '''
SELECT plot_id as "Plot No", count(record_id) counts 
FROM surveys 
WHERE counts < 1000
GROUP BY plot_id;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

OperationalError: misuse of aggregate: count()

In [20]:
# Select month, year combinations with more than 300 records

# HAVING allows restrictions on the rows used or selected
# a GROUP BY clause is required before HAVING

sql = '''
SELECT month, year, count(*) rec_no
FROM surveys 
GROUP BY month, year
HAVING rec_no > 300;

'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))


month	year	rec_no
4	1981	342
7	1997	542
10	1982	306
12	2002	361


#### A `PRIMARY KEY` is a very important concept to understand.  
* It is the designation for a column or a set of columns from a table.
* It is recommended to be a serial value and not something related to the business needs of the data in the table.

* A primary key is used to uniquely identify a row of data; combined with a column name, uniquely locates a data entry
* A primary key by definition must be `UNIQUE` and `NOT NULL` 
* The primary key of a table, should be a (sequential) non-repeating and not null value  
* Primary keys are generally identified at time of table creation  
* A common method for generating a primary key, is to set the datatype to `INTEGER` and declare `AUTOINCREMENT` which will function when data is inserted into the table
* Primary keys can be a composite of 2 or more columns that uniquely identify the data in the table



#### A `FOREIGN KEY` is a column(s) that points to the `PRIMARY KEY` of another table 

* The purpose of the foreign key is to ensure referential integrity of the data. 
In other words, only values that are supposed to appear in the database are permitted.<br>
Only the values that exist in the `PRIMARY KEY` column are allowed to be present in the FOREIGN KEY column.
Example: A `gene` table has the `PRIMARY KEY` `gene_id`. The GO2_gene GO term is associated with a gene

They are also the underpinning of how tables are joined and relationships portrayed in the database


#### JOIN tables

* Multiple tables contain different data that we want to retrieve from a single query
* In order to assemble data as part of a query, a JOIN between tables is needed
* This is a very common practice, since it’s rare for all the data you want to be in a single table


* INNER JOIN - return only those rows where there is matching content in BOTH tables (is the default when JOIN is used)
* OUTER JOIN - returns all rows from both tables even if one of the tables is blank
* SELF JOIN - can be used to join a table to itself (through aliasing), to compare data internal to the table

```sql
SELECT ... FROM table1 [INNER] JOIN table2 ON conditional_expression
```


In [21]:
sql = '''
SELECT *
FROM surveys AS sv
INNER JOIN species AS sp
      ON sv.species_id = sp.species_id
WHERE plot_id = "6" AND 
      year = 2001 AND 
      sex = "F" AND 
      genus LIKE "%on%";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight	species_id	genus	species	taxa
31762	1	22	2001	6	OT	F	20.0	26.0	OT	Onychomys	torridus	Rodent
32002	4	22	2001	6	SH	F	29.0	132.0	SH	Sigmodon	hispidus	Rodent
32099	5	27	2001	6	SH	F	30.0	86.0	SH	Sigmodon	hispidus	Rodent
32101	5	27	2001	6	SH	F	29.0	123.0	SH	Sigmodon	hispidus	Rodent
32104	5	27	2001	6	OT	F	20.0	34.0	OT	Onychomys	torridus	Rodent
32106	5	27	2001	6	OT	F	20.0	14.0	OT	Onychomys	torridus	Rodent
32940	10	14	2001	6	OT	F	20.0	27.0	OT	Onychomys	torridus	Rodent
33154	11	18	2001	6	OT	F	20.0	27.0	OT	Onychomys	torridus	Rodent


In [22]:
sql = '''
SELECT record_id, year, month, sp.species_id, taxa
FROM surveys AS sv
INNER JOIN species AS sp
      ON sv.species_id = sp.species_id
WHERE plot_id = "6" AND 
      year = 2001 AND 
      sex = "F" AND 
      genus LIKE "%on%";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

record_id	year	month	species_id	taxa
31762	2001	1	OT	Rodent
32002	2001	4	SH	Rodent
32099	2001	5	SH	Rodent
32101	2001	5	SH	Rodent
32104	2001	5	OT	Rodent
32106	2001	5	OT	Rodent
32940	2001	10	OT	Rodent
33154	2001	11	OT	Rodent


#### See the create table statement

In [23]:
# sql column in the sqlite_master table

sql = '''
SELECT sql
FROM sqlite_master 
WHERE type = "table" and name = "species"
LIMIT 2;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

sql
CREATE TABLE species (
	species_id TEXT, 
	genus TEXT, 
	species TEXT, 
	taxa TEXT
)


### CREATE TABLE  - statement
https://www.sqlitetutorial.net/sqlite-create-table/

```sql
CREATE TABLE [IF NOT EXISTS] [schema_name].table_name (
    column_1 data_type PRIMARY KEY,
    column_2 data_type NOT NULL,
    column_3 data_type DEFAULT 0,
    table_constraints
) [WITHOUT ROWID];
```

In this syntax:

* First, specify the name of the table that you want to create after the CREATE TABLE keywords. The name of the table cannot start with sqlite_ because it is reserved for the internal use of SQLite.
* Second, use `IF NOT EXISTS` option to create a new table if it does not exist. Attempting to create a table that already exists without using the IF NOT EXISTS option will result in an error.
* Third, optionally specify the schema_name to which the new table belongs. The schema can be the main database, temp database or any attached database.
* Fourth, specify the column list of the table. Each column has a name, data type, and the column constraint. SQLite supports `PRIMARY KEY, UNIQUE, NOT NULL`, and `CHECK` column constraints.
* Fifth, specify the table constraints such as PRIMARY KEY, FOREIGN KEY, UNIQUE, and CHECK constraints.
* Finally, optionally use the `WITHOUT ROWID` option. By default, a row in a table has an implicit column, which is referred to as the rowid, oid or _rowid_ column. The rowid column stores a 64-bit signed integer key that uniquely identifies the row inside the table. If you don’t want SQLite creates the rowid column, you specify the WITHOUT ROWID option. A table that contains the rowid column is known as a rowid table. Note that the WITHOUT ROWID option is only available in SQLite 3.8.2 or later.

https://www.sqlite.org/syntaxdiagrams.html#create-table-stmt

<img src = "https://www.sqlite.org/images/syntax/create-table-stmt.gif" width="800"/>

Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:
https://www.sqlite.org/datatype3.html
* `NULL`. The value is a NULL value.
* `INTEGER`. The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
* `REAL`. The value is a floating point value, stored as an 8-byte IEEE floating point number.
* `TEXT`. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).
* `BLOB`. The value is a blob of data, stored exactly as it was input.

#### A `PRIMARY KEY` is a very important concept to understand.  
* It is the designation for a column or a set of columns from a table.
* It is recommended to be a serial value and not something related to the business needs of the data in the table.

* A primary key is used to uniquely identify a row of data; combined with a column name, uniquely locates a data entry
* A primary key by definition must be `UNIQUE` and `NOT NULL` 
* The primary key of a table, should be a (sequential) non-repeating and not null value  
* Primary keys are generally identified at time of table creation  
* A common method for generating a primary key, is to set the datatype to `INTEGER` and declare `AUTOINCREMENT` which will function when data is inserted into the table
* Primary keys can be a composite of 2 or more columns that uniquely identify the data in the table



#### A `FOREIGN KEY` is a column(s) that points to the `PRIMARY KEY` of another table 

* The purpose of the foreign key is to ensure referential integrity of the data. 
In other words, only values that are supposed to appear in the database are permitted.<br>
Only the values that exist in the `PRIMARY KEY` column are allowed to be present in the FOREIGN KEY column.
Example: A `gene` table has the `PRIMARY KEY` `gene_id`. The GO2_gene GO term is associated with a gene

They are also the underpinning of how tables are joined and relationships portrayed in the database


The `sqlite_master` has the following create statement: 
```sql
CREATE TABLE sqlite_master ( type TEXT, name TEXT, tbl_name TEXT, rootpage INTEGER, sql TEXT );
```

##### The `connection` object methods can be used to save or revert/reset the changes after a command that makes changes to the database
##### `COMMIT` - save the changes 
##### `ROLLBACK` - revert the changes 


In [24]:
# select the sql statement for the survey table

sql = '''
SELECT sql
FROM sqlite_master 
WHERE name = "sqlite_master";
'''
cursor.execute(sql)

# fetchall returns a list of tuples, 
# we want the first element in the list 
# and then the first element in the tuple

print(cursor.fetchall()[0][0])

IndexError: list index out of range

In [25]:
# select the sql statement for the survey table

sql = '''
SELECT sql
FROM sqlite_master 
WHERE name = "surveys";
'''
cursor.execute(sql)

# fetchall returns a list of tuples, 
# we want the first element in the list 
# and then the first element in the tuple

print(cursor.fetchall()[0][0])

CREATE TABLE surveys (
	record_id BIGINT, 
	month BIGINT, 
	day BIGINT, 
	year BIGINT, 
	plot_id BIGINT, 
	species_id TEXT, 
	sex TEXT, 
	hindfoot_length FLOAT, 
	weight FLOAT
)


#### We create the table `survey_summary` with the columns: `id`, `species_id`, `year`, and `count`

In [30]:
# To remove a table use the drop command

sql='''
DROP TABLE IF EXISTS survey_summary;
'''
try:
    cursor.execute(sql)
except connection.DatabaseError:
    print("Dropping the survey_summary table resulted in a database error!")
    connection.rollback()
    raise
else:
    connection.commit()
finally:
    print("done!")


done!


In [31]:
# Write and run a create table statement for the survey_summary table

sql='''
CREATE TABLE IF NOT EXISTS survey_summary (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      species_id TEXT NOT NULL,                     -- REFERENCES  species_id in species table
      year BIGINT NOT NULL,                      -- year
      count BIGINT NOT NULL,                   -- count information for  each species and year
      FOREIGN KEY (species_id) REFERENCES  species  (species_id)
    );
'''
try:
    cursor.execute(sql)
except connection.DatabaseError:
    print("Creating the survey_summary table resulted in a database error!")
    connection.rollback()
    raise
else:
    connection.commit()
finally:
    print("done!")
    
    

done!


##### Similar error handling, as seen above, can be when executing any statement that changes the database.

##### Check if the new table appears in the `sqlite_master` table 

In [32]:
sql = '''
SELECT name
FROM sqlite_master 
WHERE name LIKE "su%"
LIMIT 4;
'''
cursor.execute(sql)
print(cursor.fetchall())

[('surveys',), ('survey_summary',)]


  
<br><br> 
The `sqlite_sequence` table is created and initialized automatically whenever a regular table is created if it has a column with the `AUTOINCREMENT` option set.<br>
https://www.sqlite.org/autoinc.html


##### Check if the new table appears in the `sqlite_master` table 

In [33]:
sql = '''
SELECT name
FROM sqlite_master;
'''
cursor.execute(sql)
print(cursor.fetchall())

[('surveys',), ('species',), ('plots',), ('sqlite_sequence',), ('species_survey',), ('survey_summary',)]


In [34]:
# the sqlite_sequence records keep track of the autoincrement column form different tables
# by storing the value where the autoincrement is currently
# if there was no data added in the table, no record exists in the sqlite_sequence table

sql = '''
SELECT *
FROM sqlite_sequence;
'''
cursor.execute(sql)
print(cursor.fetchall())

[]


### INDEXING

Indexes are lookup table, like the index of a book.
They are usually created for columns that have unique/ or less redundant values and provide a way to quicky search 
the values.<br>
Indexing creates a copy of the indexed columns together with a link to the location of the additional information.<br> 
The index data is stored in a data structure that allows for fast sorting. <br>
E.g.: balanced-tree - every leaf is at most n nodes away from the root) that allows for fast sorting. <br>
All queries (statements) regarding an indexed table are applied to the index


* One important function in Relational Databases is to be able to create indexes on columns in tables  
* These indexes are pre-calculated and stored in the database 
* Indexes should be created on columns that are used in queries and joins   
* They will rapidly speed up query return rate and improve query performance

To create an index use the following command:

```sql
CREATE INDEX indexName ON tableName (columnName)
```

In [35]:
sql = '''
CREATE INDEX survey_summary_idx 
ON survey_summary (species_id)
'''
cursor.execute(sql)
connection.commit()


##### Check if the new index appears in the `sqlite_master` table 

In [36]:
sql = '''
SELECT name, sql
FROM sqlite_master 
WHERE type= "index";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

name	sql
survey_summary_idx	CREATE INDEX survey_summary_idx 
ON survey_summary (species_id)



#### Remove the index

In [37]:
sql = '''
DROP INDEX survey_summary_idx 
'''
cursor.execute(sql)
connection.commit()


##### Check if the index was removed from the `sqlite_master` table 

In [38]:
sql = '''
SELECT name, sql
FROM sqlite_master 
WHERE type= "index";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

name	sql



In [40]:
cursor.close()
connection.close()

### INSERT - statement

Makes changes to the database table<br>
Adds new data to a table (if the constraints are met)
Constraint examples: 
* For one designated column or a group of columns that are designated as Primary Key the values are unique
* The value inserted in a column that has a Foreign Key constraint should exist in the column that it refers to

```sql
INSERT INTO <tablename> (<column1>, <column2>, <column3>) VALUES (value1, value2, value3);
```

##### One simple INSERT command adds 1 row of data at a time into an existing table  

##### Connection object allows us to:
* ##### COMMIT - save the changes 
* ##### ROLLBACK - reverts/discards the changes

<br>

##### Let's see what is in the table (it should be nothing):

In [None]:
sql = '''
SELECT *
FROM survey_summary;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

<br>

##### Let's try an insert:
```sql
INSERT INTO <tablename> (<column1>, <column2>, <column3>) VALUES (value1, value2, value3);
```

In [None]:
values_list = ["DM",1960, 0]

sql = '''
INSERT INTO survey_summary (species_id, year, count) 
VALUES (?,?,?);
'''
cursor.execute(sql,values_list)
connection.commit()

In [None]:
# This command retrieves the identifier of the last row from the most current query
# The gene_go_id

id_value = cursor.lastrowid
id_value

In [None]:
# the record in sqlite_sequence keep track of the autoincrement
sql = '''
SELECT *
FROM sqlite_sequence;
'''
cursor.execute(sql)
print(cursor.fetchall())

<br>


##### We have a row in the table!!! And the gene_go_id was automatically generated.

In [None]:
sql = '''
SELECT *
FROM survey_summary ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

#### You can have a Python "table" structure (list of lists) of insert values and get them all inserted in one command, each sublist having the correct number of values.


In [None]:
values_tbl = [["TEST",1950,2], ["TEST",1955,1], ["TEST",1974,2]]

sql = '''
INSERT INTO survey_summary (species_id, year, count) 
VALUES (?,?,?);
'''
cursor.executemany(sql,values_tbl)
connection.commit()


In [None]:
sql = '''
SELECT *
FROM survey_summary ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

#### UPDATE - statement - changes the table rows



MODIFIES DATA (already in a table)  in all rows matching the WHERE clause 

```sql
UPDATE table_name 
SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];
```

Update is generally a single row command, but use of the where clause can cause data to be updated in multiple rows <br>
(whether you intended to or not !!!!)

The following statement updates the evidence for all entries for all genes associated with the 2 biological processses 

In [None]:
# the following statement updates multiple rows
# before an update a select on the same table and 
# with the same where clause should be ran 
# to see the records that will be updated

sql = '''
UPDATE survey_summary
SET count = 20 
WHERE species_id = "TEST";
'''
cursor.execute(sql)
connection.commit()

In [None]:
sql = '''
SELECT *
FROM survey_summary ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

#### DELETE - statement - deletes table rows

* MAKES CHANGES TO THE DATA
* Row level deletion – can’t delete less than this. 

```sql
DELETE FROM <tablename> WHERE <column> = <value>
```

* The WHERE predicate is the same as for the SELECT statement, that is, it determines which rows will be deleted  



In [None]:
# before a delete statement a select on the same table and 
# with the same where clause should be ran to see the records 
# that will be removed

sql = '''
DELETE FROM survey_summary 
WHERE year < 1960;
'''
cursor.execute(sql)
connection.commit()


In [None]:
sql = '''
SELECT *
FROM survey_summary ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

```sql
DELETE FROM <tablename>; 
```

* This would delete all rows of data from a table.
* Preserves table structure (table still exists)
* Optimized for speed in SQLite, no row-by-row execution.
* EXISTS <table_name> still evaluates to True


In [27]:
# Delete all data from the table - but keep the table 

sql = '''
DELETE FROM survey_summary;
'''
cursor.execute(sql)
connection.commit()


OperationalError: no such table: survey_summary

In [None]:
sql = '''
SELECT *
FROM survey_summary ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

<br>

#### `DROP TABLE` - statement - removes a table (permanently)

In [None]:
sql = '''
DROP TABLE IF EXISTS survey_summary;
'''
cursor.execute(sql)
connection.commit()

In [None]:
sql = '''
SELECT name AS "TABLE NAME"
FROM sqlite_master;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

In [None]:
# we can create the summary with a select

sql = '''

SELECT species_id, year, count(record_id)
FROM surveys 
WHERE species_id <> "None"
GROUP BY species_id, year;'''
cursor.execute(sql)
#cursor.fetchall()

#### VIEW in a database

* A view is a virtual table which can be created from a query on existing tables
* Views are created to give a more human readable version of the normalized data / tables
* http://www.sqlitetutorial.net/sqlite-create-view/
* An SQLite view is read only

```sql
CREATE [TEMP] VIEW [IF NOT EXISTS] view_name(column-name-list) AS    
select-statement;
```

In [None]:
# survey summary for years > 80 information for easy access
sql = '''
CREATE VIEW IF NOT EXISTS species_survey(species_id, species, year, counts) AS
SELECT sv.species_id, species, year, count(record_id)
FROM surveys AS sv
INNER JOIN species AS sp
ON sv.species_id = sp.species_id
WHERE sv.species_id <> "None"
GROUP BY sv.species_id,species, year


'''
cursor.execute(sql)
connection.commit()

In [None]:
# gene go information 
sql = '''
SELECT *
FROM species_survey
LIMIT 10;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

```sql
DROP VIEW [IF EXISTS] view_name;
```

In [None]:
# remove view from the database

sql = '''
DROP VIEW IF EXISTS species_survey;
'''
cursor.execute(sql)
connection.commit()

In [None]:
# And close()

cursor.close()
connection.close()

#### To remove the database, delete the .sqlite file.