# Practice Suggestions

## Suggestion 1  (challenging)

Create and query a database representing a research programme.  In a research programme, individual researchers are involved in one or more projects (many to many), and each project may use one or more individual organisms (one to many)  The tables will be:

Researcher --  Project -- Organisms

* The Researcher table has information like name, role (PI, Co-PI, technician)
* The Project table has information like project ID, funding agency
* The Organisms table has information like ear-tag number, species

Create three CSV files with a few lines of data, where: a researcher is involved in more than one project, and a different projects include the same researcher; AND a project includes multiple organisms from the organisms table, but each organism is only used within one project.

Build the database, and the code to fill it and query it.


## Suggestion 2 (easier)


Create and query a database representing a simple laboratory information management system.  In a LIMS, Projects involve multiple Organisms (one to many), and each Organism may provide several Samples (one to many).

The tables will be:

Project -- Organisms -- Samples

* The Project table has information like project ID, funding agency
* The organisms table has information like ear-tag number, species
* The Samples table has information like tissue (blood, kidney, lung, etc.), collection date, freezer-location

Create three CSV files with a few lines of data.

Build the database, and the code to fill it and query it.


## Suggestion 1 (challenging)

In [123]:
### %load_ext sql
#%config SqlMagic.autocommit=False
%sql mysql+pymysql://root:root@127.0.0.1:3306/mysql #conectar al servidor que se inició antes.
    
     
  
    
            
def create_table(filename, header, rows): #rows must be a list
    table = open(filename, "w") #create the file
    table.write(header + "\n")  #write the header
    for line in rows:  #add all the rows
        table.write(line + "\n")       
    table.close()  #close the file

    
def check_content(filename):
    checkcontent = open(filename, "r")
    print(checkcontent.read())  # print the content of the file
    checkcontent.close()

    
def is_number(s): #this is used to check if a string contains only digits
    try:
        float(s)
        return True #it returns true if the string contains only digits (can be transformed into a float)
    except ValueError:
        return False    #it returns true if the string doesn't contain only digits (can't be transformed into a float)
    
    
    
    
def create_table_mysql(database, table, columns):   #database must be already created

    #transform the list column into a readable sql command (argument_1)

    elements = ','.join(columns) #transform the list into a string in which the different elements are separated by ","
    elements = "("+elements+")" #add brackets at the beginning and the end of the string
    argument_1 = "CREATE TABLE "+table+elements #create table command
    argument_2 = "DESCRIBE "+table #describe table command
   
    
    
    import pymysql.cursors #bring mysqul into python

    # Connect to the database
    connection = pymysql.connect(host='localhost',
                             user='root',
                             password='root',
                             db=database, #use the approrpiate database
                             charset='utf8mb4',  # note utf8... this is important for unusual characters!
                             cursorclass=pymysql.cursors.DictCursor)

    connection.autocommit = False  # note that it is possible to delay putting changes into the database!

    try:  
        with connection.cursor() as cursor:
            sql = argument_1
            cursor.execute(sql)
            sql = argument_2
            cursor.execute(sql)
            results = cursor.fetchall()
            print(results)
            connection.commit() #commit changes
            
    finally: 
        print("")
        connection.close()


    
    
    

def load_table_mysql(database, table, filename):   #filename is the file containing the table

    import csv

    header = []
    content = []
    
    with open(filename) as csvfile: #read filename
        rows = csv.DictReader(csvfile, delimiter=",", quotechar='"') #create a dictionary whose keys are imported from the table header
        
        for row in rows: #iterate through every row
            for key in row.keys():
                a = is_number(row[key])  #import is_number function. Returns false if the string doesn't contain only digits
                if a == False: 
                    row[key] = "\""+row[key]+"\"" #add "" aroun the string if it is not a number
                header.append(key)  #add the key to the list header
                content.append(row[key])  #add the value to the list content
        
            #when the loop ends the full row is contained in the "content" list and the header is contained in the "header" list
            
            header_string = ','.join(header)  #convert the list into a string
            content_string = ','.join(content) #convert the list into a string
            argument = "INSERT INTO "+table+"("+header_string+")"+" VALUES "+"("+content_string+")"  #write a sql readable command
            #print (argument)
            
            #header and content must be emptied before the next iteration (that corresponds to the next row)            
            header = []  
            content = []
            
            import pymysql.cursors #bring mysqul into python

            # Connect to the database
            connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='root',
                                 db=database, #use the appropriate database
                                 charset='utf8mb4',  # note utf8... this is important for unusual characters!
                                 cursorclass=pymysql.cursors.DictCursor)

            connection.autocommit = False  # note that it is possible to delay putting changes into the database!

            try: 
                with connection.cursor() as cursor:
                    sql = argument
                    cursor.execute(sql)       
                    connection.commit()
            
            finally: 
                print("")
                connection.close()    
        


        
        
        
        
#create database


%sql create database suggested_practice1;
#%sql show databases       
    
    
    

#create the csv files    
    
table = "Researcher.csv"
first_line = "id,name,role"
content = ["1,Pedro,PI", "2,María,PI", "3,Juan,technician", "4,Julia,technician"]

create_table(table,first_line,content)

check_content(table)



table = "Project.csv"
first_line = "id,funding_agency"
content = ["1,FA1", "2,FA2", "3,FA1"]

create_table(table,first_line,content)

check_content(table)



table = "Organism.csv"
first_line = "ear_tag_number,species,project_id"
content = ["1,rat,1", "2,mouse,1", "3,rat,2", "4,rat,2", "5,rabbit,3"]

create_table(table,first_line,content)

check_content(table)



table = "Link.csv" #link table is used to relate people to projects
first_line = "project_id,role,person_id"
content = ["1,PI,1", "1,technician,3", "2,PI,2", "2,technician,4", "3,PI,1", "3,technician,4"]

create_table(table,first_line,content)

check_content(table)



#load the csv files into sql


database_name = "suggested_practice1"
table_name = "Researcher"
columns_info = ["id INTEGER NOT NULL PRIMARY KEY", "name VARCHAR(20) NOT NULL", "role VARCHAR(20) NOT NULL"]


create_table_mysql(database_name,table_name,columns_info)    

filename = "Researcher.csv"

load_table_mysql(database_name, table_name, filename)






database_name = "suggested_practice1"
table_name = "Project"
columns_info = ["id INTEGER NOT NULL PRIMARY KEY", "funding_agency VARCHAR(20) NOT NULL"]


create_table_mysql(database_name,table_name,columns_info)    

filename = "Project.csv"

load_table_mysql(database_name, table_name, filename)





database_name = "suggested_practice1"
table_name = "Organism"
columns_info = ["ear_tag_number INTEGER NOT NULL PRIMARY KEY", "species VARCHAR(20) NOT NULL", "project_id INTEGER NOT NULL"]


create_table_mysql(database_name,table_name,columns_info)    

filename = "Organism.csv"

load_table_mysql(database_name, table_name, filename)





database_name = "suggested_practice1"
table_name = "Link"
columns_info = ["project_id INTEGER NOT NULL", "role VARCHAR(20) NOT NULL", "person_id INTEGER NOT NULL"]


create_table_mysql(database_name,table_name,columns_info)    

filename = "Link.csv"

load_table_mysql(database_name, table_name, filename)








0 rows affected.
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
1 rows affected.
id,name,role
1,Pedro,PI
2,María,PI
3,Juan,technician
4,Julia,technician

id,funding_agency
1,FA1
2,FA2
3,FA1

ear_tag_number,species,project_id
1,rat,1
2,mouse,1
3,rat,2
4,rat,2
5,rabbit,3

project_id,role,person_id
1,PI,1
1,technician,3
2,PI,2
2,technician,4
3,PI,1
3,technician,4

[{'Field': 'id', 'Type': 'int(11)', 'Null': 'NO', 'Key': 'PRI', 'Default': None, 'Extra': ''}, {'Field': 'name', 'Type': 'varchar(20)', 'Null': 'NO', 'Key': '', 'Default': None, 'Extra': ''}, {'Field': 'role', 'Type': 'varchar(20)', 'Null': 'NO', 'Key': '', 'Default': None, 'Extra': ''}]





[{'Field': 'id', 'Type': 'int(11)', 'Null': 'NO', 'Key': 'PRI', 'Default': None, 'Extra': ''}, {'Field': 'funding_agency', 'Type': 'varchar(20)', 'Null': 'NO', 'Key': '', 'Default': None, 'Extra': ''}]




[{'Field': 'ear_tag_number', 'Type': 'int(11)', 'Null': 'NO', 'Key': 'PRI', 'Default': None, 'Extra': ''}, {'Field': 'species', 'Type'

In [125]:
#%sql show databases
%sql use suggested_practice1;
#%sql drop database suggested_practice1;
%sql show tables;
#%sql describe Researcher
#%sql SELECT * FROM Researcher
#%sql drop table Researcher

#%sql INSERT INTO Researcher (id, name, role, project_id) VALUES (1, "Pedro", "PI", "1 2 3");
#%sql SELECT * FROM Researcher



 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
0 rows affected.
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
4 rows affected.


Tables_in_suggested_practice1
Link
Organism
Project
Researcher


In [129]:
%sql SELECT * FROM Organism JOIN Project ON \
     Organism.project_id = Project.id;

 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
5 rows affected.


ear_tag_number,species,project_id,id,funding_agency
1,rat,1,1,FA1
2,mouse,1,1,FA1
3,rat,2,2,FA2
4,rat,2,2,FA2
5,rabbit,3,3,FA1


In [131]:
%sql SELECT * FROM Researcher JOIN Link JOIN Project ON \
     Researcher.id = Link.person_id AND \
     Link.project_id = Project_id

 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
18 rows affected.


id,name,role,project_id,role_1,person_id,id_1,funding_agency
1,Pedro,PI,1,PI,1,1,FA1
1,Pedro,PI,1,PI,1,2,FA2
1,Pedro,PI,1,PI,1,3,FA1
3,Juan,technician,1,technician,3,1,FA1
3,Juan,technician,1,technician,3,2,FA2
3,Juan,technician,1,technician,3,3,FA1
2,María,PI,2,PI,2,1,FA1
2,María,PI,2,PI,2,2,FA2
2,María,PI,2,PI,2,3,FA1
4,Julia,technician,2,technician,4,1,FA1
