# Problem 1: Controls
Write a Python script that proves that the lines of data in Germplasm.tsv, and LocusGene are in the same sequence, based on the AGI Locus Code (ATxGxxxxxx). (hint: This will help you decide how to load the data into the database)

# Answers:
First of all, it is neccesary to import the csv library to be able to use its functions, with the command **"import csv"**. With this, we can handle the input files ("LocusGene.tsv" and "Germplasm.tsv").

The first two variables we declared ("sameorder" and "mismatch") are used to create the output. The former serves as a **counter of the line that is being checked in each iteration of the loops** and the latter serves as a **counter of the times two lines are not the same in both documents**.

Next, we have to open the files in *read mode* --> using open ("file", "r"). Then, using the csv.DictReader function, we load the data from both files to their respective functions ("locus_data" and "germplasm_data"). Since these are ".tsv" files, the delimiter is set to a tab ("\t"). Even though there are no quotation marks in the documents, I set it to double quotation marks (") just in case of need in the future.

Then, all we have to do is a ***for*** loop. The for loop nested structure in this code works because when the files are read, the file pointer is set to the end of the last line read. With each iteration of the loops, the next lines are read and the pointer is subsecuentially placed on the end of those lines. Also, since we intruduced a ***"break"*** in the nested loop, only one iteration is run, resulting in the designed behaviour consisting of **comparing the first line in the LocusGene file with the first line in the Germplasm file, the second with the second and so on.**

As for what the loops do themselves, they simply **print a string stating the line number that is being compared, show the comparison *per se* and showing the result of the comparisons thanks to the if/else block**. When I say that the lines from both documents are being compared, I mean that the "Locus" column from both documents is being compared. This is done in the **" if row1["Locus"] == row2["Locus"]"** line. That is because the **DictReader function outputs a variant of a dictionary created from separating each column of the document**. Its keys are the header line's text for each column (if present, otherwise they can be declared in the DictReader statement) and the values are, well, the values of each row corresponding to each column (in this case, for example, the first element of the dictionary would be {"Locus" : "AT1G01040"}, the second {"Gene" : "DCL1"} and so on until the first line is recorded. Then the loop would iterate over the second line and fill in the values in that row). If the lines are not the same, the **"mismatch"** counter is incremented. At the end of each iteration the **"sameorder"** counter increments as well to show the next lines to be checked. The "sameorder" function is set to 1 at the beginning because even though technically the first lines to be compared are number 0 (at least in Python), it is easier for a human to understand if the count begins in 1.

Finally, the final results are created. If the "mismatch" variable's value is 0, that means that **all comparisons showed that the lines were the same, so both documents are in the same order.**

In [1]:
import csv #First of all, we need to import the csv library to be able to use its functions

#These variables are used in creating the output of the file.
sameorder = 1
mismatch = 0

#Next, we need to open both files in read mode and read their contents with the csv.DictReader function
germplasm_file =  open ("Germplasm.tsv", "r")
germplasm_data = csv.DictReader(germplasm_file, delimiter = "\t", quotechar = '"')
locus_file = open("LocusGene.tsv", "r")
locus_data = csv.DictReader(locus_file, delimiter = "\t", quotechar = '"')
#print(germplasm_data)
#print (locus_data)

#The nested for loops allows us to easily compare both files.
for row1 in locus_data:   
    for row2 in germplasm_data:
        print("Now checking line number: {}".format(sameorder))
        print(row1["Locus"] + " vs. " + row2["Locus"])
        if row1["Locus"] == row2["Locus"]:
            print("Result: both are the same\n")
        else:
            print("Result: both are not the same\n")
            mismatch += 1
        sameorder +=1    
        break
        
#The final results are created here.        
print ("FINAL RESULTS:")                
if mismatch == 0:
    print("All lines are in the same order!")
else:
    print("Warning! {} lines are not in the same order".format(mismatch))



Now checking line number: 1
AT1G01040 vs. AT1G01040
Result: both are the same

Now checking line number: 2
AT1G01060 vs. AT1G01060
Result: both are the same

Now checking line number: 3
AT1G01140 vs. AT1G01140
Result: both are the same

Now checking line number: 4
AT1G01220 vs. AT1G01220
Result: both are the same

Now checking line number: 5
AT2G03720 vs. AT2G03720
Result: both are the same

Now checking line number: 6
AT2G03800 vs. AT2G03800
Result: both are the same

Now checking line number: 7
AT2G04240 vs. AT2G04240
Result: both are the same

Now checking line number: 8
AT2G05210 vs. AT2G05210
Result: both are the same

Now checking line number: 9
AT3G02130 vs. AT3G02130
Result: both are the same

Now checking line number: 10
AT3G02140 vs. AT3G02140
Result: both are the same

Now checking line number: 11
AT3G02230 vs. AT3G02230
Result: both are the same

Now checking line number: 12
AT3G02260 vs. AT3G02260
Result: both are the same

Now checking line number: 13
AT3G02310 vs. AT3G02

# Problem 2: Design and create the database

* It should have two tables - one for each of the two data files.
* The two tables should be linked in a 1:1 relationship
* you may use either sqlMagic or pymysql to build the database


# Answers:
In this particular case, I finally decided to use **sqlMagic**, due to (from my point of view) its output is much easier to read.

First, I had to run the code that is used to connect to MySQL in Docker. All these previous steps are explained in the code box. It is important to have in mind all these previous steps, because otherwise, we would not be able to move on to the following ones (errors of different kinds can occur).

Then I had to create a database for this exam, called **"genetics"**, and inside of that database I created two tables (**"germplasm"** and **"locus_gene"**). Since **the "Locus" column is shared between both input files (Germplasm.tsv and LocusGene.tsv)** and locus' codes shouldn't appear more than once, I decided to use them as the ***primary keys*** for their respective tables. The locus column is the primary key, which means that **its values must be unique in the entire table**. It also means that **this will be the column that will be used to link both tables in this case**. This will be no problem, since as it has been proven in the last problem, **the AGI Locus Code in both input files are in the same sequence.**
The values contained in this column ("Locus") must be **VARCHAR**, variable characters **up to length 15** (even though all codes in the input file are 9 characters long, we are simply ensuring that if there is any AGI code that is longer it can fit). The "pubmed" column is to be filled with INTEGERs (as well as the "protein_length" column in the locus_gene table). The "NOT NULL" command ensures us that all columns must contain information (basically, they can't be null). Both tables have different columns but a similar configuration.

In [4]:
##Previous steps --> just to check in the terminal that we have the docker images needed:
#docker images
#docker start course-mysql
#mysql -h 127.0.0.1 -P 3306 --protocol=tcp -u root -p  ## password: "root"

#These commands enable us to to connect to MySQL in Docker
%load_ext sql
%config SqlMagic.autocommit=False
%sql mysql+pymysql://root:root@127.0.0.1:3306/mysql          
#%sql show databases

#Now I will create my own database, which will contain the two tables of interest
%sql CREATE DATABASE genetics
%sql USE genetics
#%sql show databases


#%sql DROP TABLE germplasm; #Just in case we need to delete the table and start over.
#%sql DROP TABLE locus_gene; #Just in case we need to delete the table and start over.

#These commands create the new table "germplasm"
%sql CREATE TABLE germplasm(locus VARCHAR(15) PRIMARY KEY NOT NULL, germplasm VARCHAR(30) NOT NULL, phenotype VARCHAR(2000) NOT NULL, pubmed INTEGER NOT NULL)
%sql DESCRIBE germplasm    #Just checking that the table has been created properly

#These commands create the new table "locus_gene"
%sql CREATE TABLE locus_gene(locus VARCHAR(15) PRIMARY KEY NOT NULL, gene VARCHAR(30) NOT NULL, protein_length INTEGER NOT NULL);
%sql DESCRIBE locus_gene   #Just checking that the table has been created properly

#%sql SELECT * FROM germplasm   #Another way of checking that the table has been created properly
#%sql SELECT * FROM locus_gene  #Another way of checking that the table has been created properly
%sql show tables

The sql extension is already loaded. To reload it, use:
  %reload_ext sql
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
(pymysql.err.ProgrammingError) (1007, "Can't create database 'genetics'; database exists")
[SQL: CREATE DATABASE genetics]
(Background on this error at: http://sqlalche.me/e/f405)
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
0 rows affected.
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
0 rows affected.
0 rows affected.
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
0 rows affected.
0 rows affected.
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
0 rows affected.
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
4 rows affected.
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
0 rows affected.
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
3 rows affected.
 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
2 rows affected.


Tables_in_genetics
germplasm
locus_gene


In [5]:
%sql DESCRIBE germplasm

 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
4 rows affected.


Field,Type,Null,Key,Default,Extra
locus,varchar(15),NO,PRI,,
germplasm,varchar(30),NO,,,
phenotype,varchar(2000),NO,,,
pubmed,int(11),NO,,,


In [6]:
%sql DESCRIBE locus_gene

 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
3 rows affected.


Field,Type,Null,Key,Default,Extra
locus,varchar(15),NO,PRI,,
gene,varchar(30),NO,,,
protein_length,int(11),NO,,,


# Problem 3: Fill the database
Using pymysql, create a Python script that reads the data from these files, and fills the database. There are a variety of strategies to accomplish this. I will give all strategies equal credit - do whichever one you are most confident with.

# Answers:
As previously stated, right now the pointer in both files is at their end. Before doing anything else with them, it is necessary to ***reset the pointers' position to the beginning of the first data line***. We can do this by setting them at the beginning of the file and then reading the first line, the header line, skipping it and setting the pointers in the desired position.

Once again, we have to iterate over both files. I did this with a for loop for each one. Both of them are essentially the same. As I said before, "row" is, in this case, a variation of dictionary that contains the name of each column as keys and the values corresponding to each entry for those columns as values. In the SQL commands I'm inserting into each of the database's columns their corresponding values by accessing the values in the row dictionary and adding them in the correct place. The **.format** method allows me to introduce "placeholders" in the SQL command ***({})*** and then passing the corresponding Python variables to each placeholder in order of appearance.

In [7]:
import pymysql.cursors   #This command imports the pymysql.cursors library

#Set the pointer to the beginning of the first line of data (skipping the header line)
germplasm_file.seek(0)
germplasm_file.readline()
locus_file.seek(0)
locus_file.readline()

#stablish a connection with the database.
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='root',
                             db='genetics',
                             charset='utf8mb4',  
                             cursorclass=pymysql.cursors.DictCursor)

#insert the data
try:
    with connection.cursor() as cursor:
        for row in locus_data:
            sql = """INSERT INTO locus_gene (locus, gene, protein_length)
            VALUES ('{}', '{}', {});""".format(row["Locus"], row["Gene"], row["ProteinLength"])
            cursor.execute(sql) 
        for row in germplasm_data:
            sql = """INSERT INTO germplasm (locus, germplasm, phenotype, pubmed)
            VALUES ('{}', '{}', '{}', {});""".format(row["Locus"], row["germplasm"], row["phenotype"], row["pubmed"])
            cursor.execute(sql) ### Maybe this script does not comply with the DRY principle, but I couldn't think of anything better
    connection.commit() #this commits the changes to the database.
finally:
    print("")
    connection.close() #Here I'm closing the connection to the database.
    locus_file.close() #Here I'm closing the connection to the "LocusGene.tsv" file.
    germplasm_file.close() #Here I'm closing the connection to the "Germplasm.tsv" file.




In [14]:
%sql SELECT * FROM germplasm #Just checking that the data was properly inserted

 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
32 rows affected.


locus,germplasm,phenotype,pubmed
AT1G01040,CS3828,Increased abundance of miRNA precursors.,17369351
AT1G01060,lhy-101,"The mutant plants are hypersensitive to both FRc and Rc light treatments in hypocotyl elongation and exhibits a small reciprocal enlargement in cotyledon area, albeit not statistically significant.",16891401
AT1G01140,SALK_058629,hypersensitive to low potassium media,17486125
AT1G01220,SALK_012400C,"fkgp-1 mutants have about 40 times more L-fucose than wild type Arabidopsis plants, but the levels of other monosaccharides do not appear to differ significantly in the mutants. No obvious phenotypic abnormalities were observed in the fkgp-1 mutants, nor were any differences in the sugar composition of cell wall polysaccharides detected.",18199744
AT2G03720,SALK_042433,Multiple straight hairs,16367956
AT2G03800,gek1-1,Ethanol hypersensitivity.,15215505
AT2G04240,xerico,Resistant to exogenous ABA. Seeds contained lower amounts of endogenous ABA than wildtype.,17933900
AT2G05210,pot1-1,No visible phenotype.,17627276
AT3G02130,rpk2-2,The homozygous progeny is indistinguishable from wild-type plants during vegetative growth but showed several morphological alterations after bolting. These plants displayed enhanced inflorescence branching and formed three times as many siliques and flowers as did wild-type plants.,17419837
AT3G02140,afp4-1,Decreased germination on high concentrations of glucose and sorbitol.,18484180


In [15]:
%sql SELECT * FROM locus_gene #Just checking that the data was properly inserted

 * mysql+pymysql://root:***@127.0.0.1:3306/mysql
32 rows affected.


locus,gene,protein_length
AT1G01040,DCL1,332
AT1G01060,LHY,290
AT1G01140,CIPK9,223
AT1G01220,FKGP,190
AT2G03720,MRH6,189
AT2G03800,GEK1,196
AT2G04240,XERICO,256
AT2G05210,POT1A,221
AT3G02130,RPK2,284
AT3G02140,TMAC2,300


# Problem 4: Create reports, written to a file
1. Create a report that shows the full, joined, content of the two database tables (including a header line)

2. Create a joined report that only includes the Genes SKOR and MAA3

3. Create a report that counts the number of entries for each Chromosome (AT1Gxxxxxx to AT5Gxxxxxxx)

4. Create a report that shows the average protein length for the genes on each Chromosome (AT1Gxxxxxx to AT5Gxxxxxxx)

When creating reports 2 and 3, remember the "Don't Repeat Yourself" rule! 

All reports should be written to **the same file**.  You may name the file anything you wish.

# Answers:

For the ***Report 1***, first of all, as we previously already did, it is neccesary to import the csv library to be able to use its functions, with the command **"import csv"**; and also **stablishing a connection with the database** (with connection = pymysql.connect([...])). After that, we will use the **try/finally** commands, to create the desired report from our database. **These three steps are common for the 4 different Reports.** The next step will be similar in each report, having some differences in each case. In this report, after connecting with the recent created database "genetics", we retrieved the different columns of the tables with **"SELECT"**, selected the tables with **"FROM"** and indicated with **"WHERE"** that the column "locus" is the same in both tables. All of this must be inside a **"try"** command. With the following commands we are finally gathering all the columns previouly stated from the database to a new single output --> **results1**. This will show the full, joined, content of the two database tables (including a header line). After that, the last thing to do is (after the **"finally"** command) ending the connection with the database (connection.close()), to ensure everything works as intended --> **IT HAS TO BE DONE WITH ALL THE REPORTS**.
**As we can see, both files were in the same order. The first three columns correspond to the 'LocusGene.tsv' file and the last four correspond to the file 'Germplasm.tsv'.**

For the ***Report 2***, we will just have to add the opperation asked ("Create a joined report that only includes the Genes SKOR and MAA3") with the one previously created in Report 1. I did this by creating the own Report 2 operation inside the "with" command: '''AND gene in ('SKOR', 'MAA3')'''. After that, combining this with the one of the Report 1 --> **cursor.execute(sql1 + sql2)** makes that **it only shows the records that contains the Genes SKOR and MAA3**.
Final result: **Joined tables filtered by Genes 'SKOR' and 'MAA3'.**

For the ***Report 3***, as we need to retrieve the information separately inside each different chromosome, it can be done this way: "for i in range(1, 6):" --> we are indicating that there will be 5 possibilities: [1, 2, 3, 4, 5]. These numbers will be retrieved from this code: "sql3 = f"SELECT COUNT(*) AS NumberOfEntriesChr{i} FROM germplasm WHERE locus LIKE 'AT{i}G%'"", which tells us that the number of the records from the table "germplasm" containing the structure **'AT{i}G%'** will be counted separately, according to the third character of this structure. So, what we are interested in inside this structure is the **third character** (represented as "i"), which will give us the number of the chromosome (as we already know, between 1 and 5) of each record. With the "for" command, we iterate this operation throughout the entire database.
Final result: **As we can see, there are 4 entries for chromosome 1, 4 entries for chromosome 2, 9 entries for chromosome 3, 8 entries for chromosome 4 and 7 entries for chromosome 5.**

For the ***Report 4***, the procedure is quite similar to the one carried out in report 3. The difference here is that the table needed is "locus_gene", and instead of counting the different records in each chromosome, **we need to retrieve the average protein length for the genes on each Chromosome**. This can be done with this code: sql4 = f"SELECT AVG(protein_length) AS avgLengthChr{i} FROM locus_gene WHERE locus LIKE 'AT{i}G%'".
**As we can see, chromosome 1 has an average protein length of 258.75, chromosome 2 has an average protein length of 215.5, chromosome 3 has an average protein length of 252, chromosome 4 has an average protein length of 277.5 and finally chromosome 5 has an average protein length of 271.2857.**


Finally, the last thing to do is ***to write all the reports created in one same file.***
Here, the main command that is going to be needed is the **write function**. We need to create and define the file that will contain all our reports. In our case is --> **'Reports_Sergio.tsv'**.

The main issue will be operating with all these four reports. As we have previously seen, **Reports 1 and 2 are quite similar, and the same goes for Reports 3 and 4**. Thus, I have decided to iterate with a "for" loop with the first two reports, and with another "for" loop with the other two (note that in the first "for", we indicate that the code starts with the value = 1 --> first report; and in the second "for", the code starts with the value = 3 --> Third report). So, the main structure of these two iterations will be quite similar, taking into account that the main difference is defined by the **results of the reports themselves** --> [results1 and results2 in the first "for"] and [results3 and results4 in the second "for"]; and the **start value** (previously commented).

The last (but not least) thing that we must have in mind is to stablish the writemode into ***append flag -->'a'*** at the end of each of the two "for" loops. **This will allow us to open the file and add new information to it, without destroying the existing information.**

**It is also important to check that all reports have been written properly in the file that we just created** (view last code).

# Report 1

In [28]:
import pymysql.cursors #This command imports the pymysql.cursors library

#stablish a connection with the database.
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='root',
                             db='genetics',
                             charset='utf8mb4',  
                             cursorclass=pymysql.cursors.DictCursor)

try:
    with connection.cursor() as cursor:
        sql1 = '''
                SELECT germplasm.locus, gene, protein_length, germplasm, phenotype, pubmed
                FROM germplasm, locus_gene 
                WHERE germplasm.locus = locus_gene.locus
              '''
        cursor.execute(sql1)
        results1 = cursor.fetchall()
finally:
    connection.close()

In [29]:
results1

[{'locus': 'AT1G01040',
  'gene': 'DCL1',
  'protein_length': 332,
  'germplasm': 'CS3828',
  'phenotype': 'Increased abundance of miRNA precursors.',
  'pubmed': 17369351},
 {'locus': 'AT1G01060',
  'gene': 'LHY',
  'protein_length': 290,
  'germplasm': 'lhy-101',
  'phenotype': 'The mutant plants are hypersensitive to both FRc and Rc light treatments in hypocotyl elongation and exhibits a small reciprocal enlargement in cotyledon area, albeit not statistically significant.',
  'pubmed': 16891401},
 {'locus': 'AT1G01140',
  'gene': 'CIPK9',
  'protein_length': 223,
  'germplasm': 'SALK_058629',
  'phenotype': 'hypersensitive to low potassium media',
  'pubmed': 17486125},
 {'locus': 'AT1G01220',
  'gene': 'FKGP',
  'protein_length': 190,
  'germplasm': 'SALK_012400C',
  'phenotype': 'fkgp-1 mutants have about 40 times more L-fucose than wild type Arabidopsis plants, but the levels of other monosaccharides do not appear to differ significantly in the mutants. No obvious phenotypic abno

# Report 2

In [31]:
import pymysql.cursors #This command imports the pymysql.cursors library

#stablish a connection with the database.
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='root',
                             db='genetics',
                             charset='utf8mb4',  
                             cursorclass=pymysql.cursors.DictCursor)

try:
    with connection.cursor() as cursor:
        sql2 = '''
                AND gene in ('SKOR', 'MAA3')
              '''
        cursor.execute(sql1 + sql2)
        results2 = cursor.fetchall()
finally:
    connection.close()

In [32]:
results2

[{'locus': 'AT3G02850',
  'gene': 'SKOR',
  'protein_length': 234,
  'germplasm': 'CS3816',
  'phenotype': 'The skor-1 mutant is sensitive to toxic cations in addition to K+ depletion.',
  'pubmed': 17568770},
 {'locus': 'AT4G15570',
  'gene': 'MAA3',
  'protein_length': 294,
  'germplasm': 'maa3',
  'phenotype': 'Homozygotes are not recovered. Female gametophyte development is delayed and asynchronous. During fertilization, fusion of polar nuclei does not occur. Polar nuclei nucloeli are smaller than WT.',
  'pubmed': 18772186}]

# Report 3

In [33]:
import pymysql.cursors #This command imports the pymysql.cursors library

#stablish a connection with the database.
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='root',
                             db='genetics',
                             charset='utf8mb4',  
                             cursorclass=pymysql.cursors.DictCursor)

try:
    results3 = []
    with connection.cursor() as cursor:
        for i in range(1, 6):
            sql3 = f"SELECT COUNT(*) AS NumberOfEntriesChr{i} FROM germplasm WHERE locus LIKE 'AT{i}G%'"
            cursor.execute(sql3)
            results3.extend(cursor.fetchall())
finally:
    connection.close()

In [34]:
results3

[{'NumberOfEntriesChr1': 4},
 {'NumberOfEntriesChr2': 4},
 {'NumberOfEntriesChr3': 9},
 {'NumberOfEntriesChr4': 8},
 {'NumberOfEntriesChr5': 7}]

# Report 4

In [35]:
import pymysql.cursors #This command imports the pymysql.cursors library

#stablish a connection with the database.
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='root',
                             db='genetics',
                             charset='utf8mb4',  
                             cursorclass=pymysql.cursors.DictCursor)

try:
    results4 = []
    with connection.cursor() as cursor:
        for i in range(1, 6):
            sql4 = f"SELECT AVG(protein_length) AS avgLengthChr{i} FROM locus_gene WHERE locus LIKE 'AT{i}G%'"
            cursor.execute(sql4)
            results4.extend(cursor.fetchall())
finally:
    connection.close()

In [36]:
results4

[{'avgLengthChr1': Decimal('258.7500')},
 {'avgLengthChr2': Decimal('215.5000')},
 {'avgLengthChr3': Decimal('252.0000')},
 {'avgLengthChr4': Decimal('277.5000')},
 {'avgLengthChr5': Decimal('271.2857')}]

# WRITING REPORTS

In [79]:
writemode = 'w'
for i, results in enumerate([results1, results2], start=1):
    with open('Reports_Sergio.tsv', writemode) as file:
        file.write(f'Report{i}\n')
        file.write('\t'.join([str(k) for k in results[0].keys()]) + '\n')
        for result in results:
            file.write('\t'.join([str(v) for v in result.values()]) + '\n')
        file.write('\n\n')
    writemode = 'a'
    
for i, results in enumerate([results3, results4], start=3):
    with open('Reports_Sergio.tsv', writemode) as file:
        file.write(f'Report{i}\n')
        for result in results:
            file.write('\t'.join([str(k)+'. '+str(v)  for k,v in result.items()]) + '\n')
        file.write('\n\n')
    writemode = 'a'

In [80]:
reportsfile = open("Reports_Sergio.tsv", "r")  #Just checking that all reports have been written properly in the file that we just created
print(reportsfile.read())

Report1
locus	gene	protein_length	germplasm	phenotype	pubmed
AT1G01040	DCL1	332	CS3828	Increased abundance of miRNA precursors.	17369351
AT1G01060	LHY	290	lhy-101	The mutant plants are hypersensitive to both FRc and Rc light treatments in hypocotyl elongation and exhibits a small reciprocal enlargement in cotyledon area, albeit not statistically significant.	16891401
AT1G01140	CIPK9	223	SALK_058629	hypersensitive to low potassium media	17486125
AT1G01220	FKGP	190	SALK_012400C	fkgp-1 mutants have about 40 times more L-fucose than wild type Arabidopsis plants, but the levels of other monosaccharides do not appear to differ significantly in the mutants. No obvious phenotypic abnormalities were observed in the fkgp-1 mutants, nor were any differences in the sugar composition of cell wall polysaccharides detected.	18199744
AT2G03720	MRH6	189	SALK_042433	Multiple straight hairs	16367956
AT2G03800	GEK1	196	gek1-1	Ethanol hypersensitivity.	15215505
AT2G04240	XERICO	256	xerico	Resistant to exog