# Análisis de proteínas homólogas: factores de transcripción asociados a la inducción somática de *Arabidopsis thaliana* en *Ricinus communis*. 
### Giany Angulo
Biología - Universidad EAFIT

# Resumen

<div class=text-justify>En el presente trabajo se buscan homologías entre 17 proteínas asociados a la regulación de la embriogénesis somática en la planta modelo *Arabidopsis thaliana* y el proteoma de *Ricinus communis*. Para ello se realiza un BLASTp y se analizan los matches. 

# Introducción

<div class=text-justify>
Actualmente el mundo enfrenta una alarmante crisis alimentaria. La FAO (Organización de las Naciones Unidas para la Alimentación y la Agricultura) estimó que para el 2017, 1 de cada 9 personas padecía de desnutrición. Se trata de 821 millones de personas subalimentadas donde los principales afectados son los niños con retrasos en el crecimiento y otras afecciones asociadas a la pérdida de peso [1].<br />
<div class=text-justify>
Las causas son muchas: conflictos armados, políticas de gobierno, el acelerado crecimiento demográfico, cambio climático, resistencia a antimicrobianos en los cultivos, entre otros [1]. En este panorama de hambre mundial se necesitan cultivos económicos, resistentes y con un alto contenido nutricional. Económicos para que todas las personas puedan tener acceso al alimento, y para hacerlos económicos es necesario reducir costos de producción; Resistentes porque con el cambio climático han aumentado y seguirán aumentan desastres extremos como sequias e inundaciones, y todo esto afecta la disponibilidad y la estabilidad de los alimentos de origen agrícola que constituyen la única fuente de alimento para muchas personas[1]; también deben ser resistentes a patógenos que arruinan los cultivos y merman la capacidad de estos de abastecer a la población.<br />
<div class=text-justify>
Por ello la biotecnología nos ofrece la alternativa de generar cultivos de mayor rendimiento, resistentes a la sequía, resistentes a sus patógenos naturales y capaces de producir en mayor cantidad aquellos compuestos que le dan su valor nutricional. También, la biotecnología permite una mejor forma de propagación masiva de material vegetal a través de la propagación in vitro, que facilita homogeneizar los estándares de rendimiento, entre otras ventajas. Las dos metodologías ideales para la propagación de material vegetal in vitro son la organogénesis y la embriogénesis somática; en la organogénesis se usan fitohormonas para inducir que se genere un individuo nuevo a partir de un explante [2].  Esto permite someter el explante a transformaciones para generar individuos con alguna resistencia o mejora adquirida. El inconveniente que se presenta aquí es que las transformaciones que se realizan a través de organogénesis suelen dar como resultado organismos quiméricos con unas células transformadas y otras no. Eso sucede porque no se parte de una única célula transformada para generar la nueva planta.<br />
<div class=text-justify>
En la embriogénesis somática, en cambio, se devuelve una célula somática a condición de embrión[3], lo que podría permitir originar individuos nuevos a partir de una única célula transformada. El problema en este caso es que inducir a una célula somática al estado de embrión requiere la activación de un patrón de expresión génica específico que por lo general se desconoce en el cultivo de interés. La buena noticia es que este patrón ya se ha sido dilucidado en gran medida en plantas modelos como *Arabidopsis thaliana* [4], lo que hace pensar que se pueden encontrar los genes homólogos en otras plantas para conocer sus requerimientos específicos durante la embriogénesis somática.<br />
<div class=text-justify>
En este trabajo se tomaron 17 factores [4] de transcripción involucrados en la embriogénesis, en la maduración de semilla y en la geminación en *Arabidopsis thaliana* y se compararon con el proteoma de *Ricinus commuis* o higuerilla para encontrar homologías que permitan similitudes tanto en secuencias como en funciones en la segunda especie. Lo anterior es el primer paso para dilucidar el patrón de expresión y regulación génica que podría inducir embriogénesis somática en *Ricinus communis*. 


# Metodología

Los 17 genes alanizados fueron los siguientes:
   1. ANT
   2. AIL1
   3. PLT1
   4. PLT2
   5. AIL5
   6. AIL6
   7. PLT7
   8. BBM
   9. LEC1
   10. LEC2
   11. L1L
   12. ABI3
   13. FUS3
   14. AGL15
   15. PKL
   16. VAL
   17. NF-YA9
   
Primero se descargaron los archivos FASTA de cada una de las proteínas de *A. thaliana* de la base de datos de UniProt. El proteoma de *R. communis* se descargó de la base de datos Taxonomy de NCBI. Posteriormente se usó la aplicación makeblastdb para crear una base de datos para BLAST a partir de los archivos FASTA obtenidos. Esa base de datos se sometió a un BLASTp usando el programa DIAMOND. Se presentó un inconveniente al generar la salida pues esta fue formato texto. Las herramientas para analizar salidas de BLAST que ofrece Biopython sólo soportan el formato XML, por lo que la salida se analizó como una lista de listas. 

# Resultados

In [138]:
from Bio import SeqIO
for seq_record in SeqIO.parse("ANT.fasta", "fasta"):
    ANT= seq_record.id[3:9]
for seq_record in SeqIO.parse("AIL1.fasta", "fasta"):
    AIL1= seq_record.id[3:9]
for seq_record in SeqIO.parse("AIL5.fasta", "fasta"):
    AIL5= seq_record.id[3:13]
for seq_record in SeqIO.parse("AIL6.fasta", "fasta"):
    AIL6= seq_record.id[3:9]
for seq_record in SeqIO.parse("PLT1.fasta", "fasta"):
    PLT1= seq_record.id[3:9]
for seq_record in SeqIO.parse("PLT2.fasta", "fasta"):
    PLT2= seq_record.id[3:9]
for seq_record in SeqIO.parse("PLT7.fasta", "fasta"):
    PLT7= seq_record.id[3:9]
for seq_record in SeqIO.parse("BBM.fasta", "fasta"):
    BBM= seq_record.id[3:9]
for seq_record in SeqIO.parse("LEC1.fasta", "fasta"):
    LEC1= seq_record.id[3:9]
for seq_record in SeqIO.parse("LEC2.fasta", "fasta"):
    LEC2= seq_record.id[3:9]
for seq_record in SeqIO.parse("L1L.fasta", "fasta"):
    L1L= seq_record.id[3:9]
for seq_record in SeqIO.parse("ABI3.fasta", "fasta"):
    ABI3= seq_record.id[3:9]
for seq_record in SeqIO.parse("FUS3.fasta", "fasta"):
    FUS3= seq_record.id[3:9]
for seq_record in SeqIO.parse("AGL15.fasta", "fasta"):
    AGL15= seq_record.id[3:9]
for seq_record in SeqIO.parse("PKL.fasta", "fasta"):
    PKL= seq_record.id[3:9]
for seq_record in SeqIO.parse("VAL.fasta", "fasta"):
    VAL= seq_record.id[3:9]
for seq_record in SeqIO.parse("NF-YA9.fasta", "fasta"):
    NF_YA9= seq_record.id[3:13]

In [139]:
print(AIL5)

A0A178U889


## Análisis del BLAST

In [89]:
#Primero se convirtió la salida del BLAST (BLASTP_output) en un string, y ese string se dividió por líneas.
with open('BLASTP_output', 'r') as f2:
    data = f2.read()
data_string=str(data)
blast_output=data_string.split("\n")
blast_output

In [90]:
#Lo siguiente fue decantar todos los matches para descartar los queries vacíos:
def lista_de_matches(x):
    i=0
    query=[]
    while i < (len(x)-4):
        if len(x[i])!=0:
            if x[i][0] == ">":
                query= query + x[i-4:i+5]
        i=i+1
    return query
matches=lista_de_matches(blast_output)
matches

['Query= NP_001310615.1 B3 domain-containing transcription factor LEC2 [Ricinus communis]',
 '',
 'Length=399',
 '',
 '>spQ1PFR7LEC2_ARATH_B3_domain_containing_transcription_factor_LEC2_OSArabidopsis_thaliana_OX3702_GNLEC2_PE2_SV1',
 'Length=363',
 '',
 ' Score = 198.7 bits (504),  Expect = 5.9e-54',
 ' Identities = 121/233 (51%), Positives = 153/233 (65%), Gaps = 5/233 (2%)',
 'Query= NP_001310634.1 B3 domain-containing transcription factor ABI3 [Ricinus communis]',
 '',
 'Length=767',
 '',
 '>spQ01593ABI3_ARATH_B3_domain_containing_transcription_factor_ABI3_OSArabidopsis_thaliana_OX3702_GNABI3_PE1_SV1',
 'Length=720',
 '',
 ' Score = 170.6 bits (431),  Expect = 3.3e-45',
 ' Identities = 88/123 (71%), Positives = 94/123 (76%), Gaps = 3/123 (2%)',
 'Query= NP_001310645.1 AP2-like ethylene-responsive transcription factor At1g16060 [Ricinus communis]',
 '',
 'Length=327',
 '',
 '>trA0A178U889A0A178U889_ARATH_AINTEGUMENTA_like_5_OSArabidopsis_thaliana_OX3702_GNAIL5_PE4_SV1',
 'Length=440'

## Análisis de las proteínas con homologías

### ANT

In [92]:
#Éste código se realizó para buscar las proteínas con homologías metiendo el nombre de la proteína y la lista de matches.
#La salida es el query para conocer el ID de la proteína de higuerilla con la que hizo match, y el e-value del alineamiento.
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "ANT")

('Query= XP_025015758.1 LOW QUALITY PROTEIN: floral homeotic protein APETALA 2 [Ricinus communis]',
 ' Score = 175.3 bits (443),  Expect = 8.4e-47')

In [95]:
#Conociendo el ID de la proteína, se añadió el siguiente código para realizar una búsqueda a través de Entrez
#con el objetivo de conocer qué tan descrita está o si registra la misma función que la proteína en el organismo modelo.
from Bio import Entrez
from Bio import SeqIO
f_t = ['XP_025015758.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_025015758.1
LOCUS:XP_025015758
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002995406.1) annotated using gene prediction method: Gnomon.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; ; ##RefSeq-Attributes-START## ; frameshifts :: corrected 1 indel ; ##RefSeq-Attributes-END##; COMPLETENESS: full length.


In [107]:
from Bio import Entrez
from Bio import SeqIO
f_t = [ANT] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q38914
LOCUS:ANT_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On or before Nov 22, 2005 this sequence version replaced gi:75319672, gi:75220707.; 
[FUNCTION] Transcription activator that recognizes and binds to the DNA consensus sequence 5'-CAC
[AG]N
[AT]TNCCNANG-3'. Required for the initiation and growth of ovules integumenta, and for the development of female gametophyte. Plays a critical role in the development of gynoecium marginal tissues (e.g. stigma, style and septa), and in the fusion of carpels and of medial ridges leading to ovule primordia. Also involved in organs initiation and development, including floral organs. Maintains the meristematic competence of cells and consequently sustains expression of cell cycle regulators during organogenesis, thus controlling the final size of each organ by controlling their cell number. Regulates INO autoinduction and expression pattern. As ANT promotes petal cell identity and mediates down-regulation of AG in flower whorl 2, it func

### AIL1

In [119]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "AIL1")

('Query= XP_025013972.1 floral homeotic protein APETALA 2 isoform X4 [Ricinus communis]',
 ' Score = 175.3 bits (443),  Expect = 7.5e-47')

In [121]:
from Bio import Entrez
from Bio import SeqIO
f_t = ["XP_025013972.1"] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_025013972.1
LOCUS:XP_025013972
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002994429.1) annotated using gene prediction method: Gnomon.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; COMPLETENESS: full length.


In [122]:
from Bio import Entrez
from Bio import SeqIO
f_t = [AIL1] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q1PFE1
LOCUS:AIL1_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On or before Jun 16, 2007 this sequence version replaced gi:75260422, gi:75262280.; 
[FUNCTION] Probably acts as a transcriptional activator. Binds to the GCC-box pathogenesis-related promoter element. May be involved in the regulation of gene expression by stress factors and by components of stress signal transduction pathways (By similarity). {ECO:0000250}.; 
[SUBCELLULAR LOCATION] Nucleus {ECO:0000305}.; 
[TISSUE SPECIFICITY] Expressed in roots, seedlings, inflorescence, and siliques. Also detected at low levels in leaves. {ECO:0000269|PubMed:15988559}.; 
[SIMILARITY] Belongs to the AP2/ERF transcription factor family. AP2 subfamily. {ECO:0000305}.; 
[SEQUENCE CAUTION] Sequence=AAG51860.1; Type=Erroneous gene model prediction; Evidence={ECO:0000305}; Sequence=ABK28464.1; Type=Erroneous termination; Positions=416; Note=Translated as stop.; Evidence={ECO:0000305}.


### PLT1

In [124]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "PLT1")

('Query= XP_002524409.1 AP2/ERF and B3 domain-containing transcription factor RAV1 [Ricinus communis]',
 ' Score = 56.6 bits (135),  Expect = 3.3e-11')

In [125]:
from Bio import Entrez
from Bio import SeqIO
f_t = ["XP_002524409.1"] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_002524409.1
LOCUS:XP_002524409
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002994440.1) annotated using gene prediction method: Gnomon, supported by EST evidence.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; COMPLETENESS: full length.


In [123]:
from Bio import Entrez
from Bio import SeqIO
f_t = [PLT1] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q5YGP8
LOCUS:PLET1_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On or before Aug 31, 2007 this sequence version replaced gi:75223334, gi:75274047.; 
[FUNCTION] Probably acts as a transcriptional activator. Binds to the GCC-box pathogenesis-related promoter element. May be involved in the regulation of gene expression by stress factors and by components of stress signal transduction pathways (By similarity). Master regulator of basal/root fate. Essential for root quiescent center (QC) and columella specification, stem cell activity, as well as for establishment of the stem cell niche during embryogenesis. Modulates the root polar auxin transport by regulating the distribution of PIN genes. Essential role in respecifying pattern and polarity in damaged roots. Direct target of the transcriptional corepressor TPL. Expression levels and patterns regulated post-transcriptionally by root meristem growth factors (RGFs). {ECO:0000250, ECO:0000269|PubMed:15454085, ECO:0000269|PubMed:1563540

### PLT2

In [127]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "PLT2")

('Query= XP_015582262.1 ethylene-responsive transcription factor 5-like [Ricinus communis]',
 ' Score = 42.0 bits (97),  Expect = 6.1e-07')

In [128]:
from Bio import Entrez
from Bio import SeqIO
f_t = ['XP_015582262.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_015582262.1
LOCUS:XP_015582262
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002994786.1) annotated using gene prediction method: Gnomon, supported by EST evidence.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; COMPLETENESS: full length.


In [129]:
from Bio import Entrez
from Bio import SeqIO
f_t = [PLT2] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q5YGP7
LOCUS:PLET2_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On or before Aug 31, 2007 this sequence version replaced gi:75223335, gi:75266698.; 
[FUNCTION] Probably acts as a transcriptional activator. Binds to the GCC-box pathogenesis-related promoter element. May be involved in the regulation of gene expression by stress factors and by components of stress signal transduction pathways (By similarity). Master regulator of basal/root fate. Essential for root quiescent center (QC) and columella specification, stem cell activity, as well as for establishment of the stem cell niche during embryogenesis. Modulates the root polar auxin transport by regulating the distribution of PIN genes. Essential role in respecifying pattern and polarity in damaged roots. Direct target of the transcriptional corepressor TPL. Expression levels and patterns regulated post-transcriptionally by root meristem growth factors (RGFs). {ECO:0000250, ECO:0000269|PubMed:15454085, ECO:0000269|PubMed:1563540

### AIL5

In [130]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "AIL5")

('Query= XP_025013917.1 ethylene-responsive transcription factor ERF109-like [Ricinus communis]',
 ' Score = 55.8 bits (133),  Expect = 2.7e-11')

In [132]:
from Bio import Entrez
from Bio import SeqIO
f_t = ['XP_025013917.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_025013917.1
LOCUS:XP_025013917
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002994423.1) annotated using gene prediction method: Gnomon.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; ; ##RefSeq-Attributes-START## ; ab initio :: 1% of CDS bases ; ##RefSeq-Attributes-END##; COMPLETENESS: full length.


In [141]:
##No se encontró la proteina de A. thaliana en la base de datos empleada.

### AIL6

In [142]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "AIL6")

('Query= XP_025014755.1 ethylene-responsive transcription factor ABI4 isoform X2 [Ricinus communis]',
 ' Score = 43.1 bits (100),  Expect = 3.8e-07')

In [143]:
from Bio import Entrez
from Bio import SeqIO
f_t = ['XP_025014755.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_025014755.1
LOCUS:XP_025014755
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002994593.1) annotated using gene prediction method: Gnomon.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; COMPLETENESS: full length.


In [171]:
#No se encontró la proteina de A. thaliana en la base de datos empleada.

### PLT7

In [146]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "PLT7")

([], [])

### BBM

In [147]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "BBM")

('Query= XP_015573687.1 AP2-like ethylene-responsive transcription factor At1g79700 [Ricinus communis]',
 ' Score = 256.9 bits (655),  Expect = 1.7e-71')

In [148]:
from Bio import Entrez
from Bio import SeqIO
f_t = ['XP_015573687.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_015573687.1
LOCUS:XP_015573687
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002994321.1) annotated using gene prediction method: Gnomon.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; COMPLETENESS: full length.


In [149]:
from Bio import Entrez
from Bio import SeqIO
f_t = [BBM] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q6PQQ4
LOCUS:BBM_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On or before Apr 11, 2006 this sequence version replaced gi:75246547, gi:75263840, gi:11358596.; 
[FUNCTION] Transcription factor that promotes cell proliferation, differentiation and morphogenesis, especially during embryogenesis. {ECO:0000269|PubMed:12172019}.; 
[SUBCELLULAR LOCATION] Nucleus {ECO:0000305}.; 
[TISSUE SPECIFICITY] Mostly expressed in developing seeds. Also expressed in roots, seedlings, and siliques, and, at low levels, in leaves. {ECO:0000269|PubMed:12172019, ECO:0000269|PubMed:15988559}.; 
[DEVELOPMENTAL STAGE] Expressed in embryo throughout embryogenesis. Also present in free nuclear endosperm, but disappears once endosperm cellularization begins. {ECO:0000269|PubMed:12172019}.; 
[MISCELLANEOUS] Was named 'Baby boom' because overexpressing transgenic plants exhibit several spontaneous somatic embryos.; 
[SIMILARITY] Belongs to the AP2/ERF transcription factor family. AP2 subfamily. {ECO:0000305}.


### LEC1

In [150]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "LEC1")

([], [])

### LEC2

In [151]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "LEC2")

('Query= NP_001310615.1 B3 domain-containing transcription factor LEC2 [Ricinus communis]',
 ' Score = 198.7 bits (504),  Expect = 5.9e-54')

In [152]:
from Bio import Entrez
from Bio import SeqIO
f_t = ['NP_001310615.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


NP_001310615.1
LOCUS:NP_001310615
SOURCE:Ricinus communis (castor bean)
[PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from KC146386.1.; On Apr 22, 2016 this sequence version replaced XP_015573736.1.; ; ##Evidence-Data-START## ; Transcript exon combination :: KC146386.1 
[ECO:0000332] ; RNAseq introns :: single sample supports all introns SAMEA1034164, SAMEA1034165 
[ECO:0000348] ; ##Evidence-Data-END##


In [153]:
from Bio import Entrez
from Bio import SeqIO
f_t = [LEC2] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q1PFR7
LOCUS:LEC2_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On or before May 26, 2009 this sequence version replaced gi:75163189, gi:75173393.; 
[FUNCTION] Transcription regulator that plays a central role in embryo development. Required for the maintenance of suspensor morphology, specification of cotyledon identity, progression through the maturation phase and suppression of premature germination. Ectopic expression is sufficient to promote somatic embryogenesis. {ECO:0000269|PubMed:11573014, ECO:0000269|PubMed:16492731, ECO:0000269|PubMed:18287041}.; 
[SUBCELLULAR LOCATION] Nucleus {ECO:0000305}.; 
[DEVELOPMENTAL STAGE] Expressed during embryo development. {ECO:0000269|PubMed:11573014}.; 
[DISRUPTION PHENOTYPE] Pigmented seeds. Distorted seedlings with elongated hypocotyl and curled cotyledons. Presence of trichomes and accumulation of anthocyanins on cotyledons. Unusual pattern of storage product accumulation in seedlings. {ECO:0000269|PubMed:12244265}.; 
[SEQUENCE CAUTION]

### L1L

In [154]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "L1L")

([], [])

### ABI3

In [155]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "ABI3")

('Query= NP_001310634.1 B3 domain-containing transcription factor ABI3 [Ricinus communis]',
 ' Score = 170.6 bits (431),  Expect = 3.3e-45')

In [156]:
from Bio import Entrez
from Bio import SeqIO
f_t = ['NP_001310634.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


NP_001310634.1
LOCUS:NP_001310634
SOURCE:Ricinus communis (castor bean)
[PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from AB774162.1.; ; ##Evidence-Data-START## ; Transcript exon combination :: AB774162.1 
[ECO:0000332] ; RNAseq introns :: mixed/partial sample support SAMEA1034165, SAMEA1034166 
[ECO:0000350] ; ##Evidence-Data-END##


In [157]:
from Bio import Entrez
from Bio import SeqIO
f_t = [ABI3] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q01593
LOCUS:ABI3_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On or before May 26, 2009 this sequence version replaced gi:75097935, gi:122178491, gi:75218455, gi:75218562, gi:320551.; 
[FUNCTION] Participates in abscisic acid-regulated gene expression during seed development. Regulates the transcription of SGR1 and SGR2 that are involved in leaf and embryo degreening. {ECO:0000269|PubMed:19531597, ECO:0000269|PubMed:24043799}.; 
[SUBUNIT] Interacts (via C-terminus) with SPK1, SCAR3, ABI5, APRR1, AIP2, AIP3 and AIP4. Binds to BZIP10 and BZIP25 and forms complexes made of ABI3, BZIP53 and BZIP25 or BZIP10. {ECO:0000269|PubMed:10743655, ECO:0000269|PubMed:11489176, ECO:0000269|PubMed:12657652, ECO:0000269|PubMed:15998807, ECO:0000269|PubMed:17267444, ECO:0000269|PubMed:19531597}.; 
[INTERACTION] Q8RXD3:AIP2; NbExp=3; IntAct=EBI-1578892, EBI-2312425; Q9M4B5:AIP3; NbExp=2; IntAct=EBI-1578892, EBI-2130809.; 
[SUBCELLULAR LOCATION] Nucleus. Cytoplasm. Note=Predominantly found in the nuc

### FUS3

In [158]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "FUS3")

('Query= XP_015570693.1 B3 domain-containing transcription factor FUS3 isoform X2 [Ricinus communis]',
 ' Score = 213.8 bits (543),  Expect = 1.4e-58')

In [159]:
from Bio import Entrez
from Bio import SeqIO
f_t = ['XP_015570693.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_015570693.1
LOCUS:XP_015570693
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002994282.1) annotated using gene prediction method: Gnomon.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; COMPLETENESS: full length.


In [160]:
from Bio import Entrez
from Bio import SeqIO
f_t = [FUS3] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q9LW31
LOCUS:FUS3_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On or before May 26, 2009 this sequence version replaced gi:75100671, gi:75136697, gi:75201299, gi:75274430.; 
[FUNCTION] Transcription regulator involved in gene regulation during late embryogenesis. Its expression to the epidermis is sufficient to control foliar organ identity by regulating positively the synthesis abscisic acid (ABA) and negatively gibberellin production. Negatively regulates TTG1 in the embryo. Positively regulates the abundance of the ABI3 protein in the seed. Cooperates with KIN10 to regulate developmental phase transitions and lateral organ development and act both as positive regulators of abscisic acid (ABA) signaling during germination (PubMed:22026387, PubMed:22902692). {ECO:0000269|PubMed:14675433, ECO:0000269|PubMed:15363412, ECO:0000269|PubMed:22026387, ECO:0000269|PubMed:22902692}.; ACTIVITY REGULATION: Phosphorylation by KIN10 is required to positively regulates embryogenesis, seed yiel

### AGL15

In [161]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "AGL15")

('Query= XP_025015594.1 LOW QUALITY PROTEIN: agamous-like MADS-box protein AGL18 [Ricinus communis]',
 ' Score = 165.2 bits (417),  Expect = 4.8e-44')

In [162]:
from Bio import Entrez
from Bio import SeqIO
f_t = ['XP_025015594.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_025015594.1
LOCUS:XP_025015594
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002995023.1) annotated using gene prediction method: Gnomon.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; ; ##RefSeq-Attributes-START## ; internal stop codons :: corrected 1 genomic stop codon ; ##RefSeq-Attributes-END##; COMPLETENESS: full length.


In [163]:
from Bio import Entrez
from Bio import SeqIO
f_t = [AGL15] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q38847
LOCUS:AGL15_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On Apr 14, 2006 this sequence version replaced gi:2129535.; 
[FUNCTION] Transcription factor involved in the negative regulation of flowering, probably through the photoperiodic pathway. Acts as both an activator and a repressor of transcription. Binds DNA in a sequence-specific manner in large CArG motif 5'-CC (A/T)8 GG-3'. Participates probably in the regulation of programs active during the early stages of embryo development. Prevents premature perianth senescence and abscission, fruits development and seed desiccation. Stimulates the expression of at least DTA4, LEC2, FUS3, ABI3, AT4G38680/CSP2 and GRP2B/CSP4. Can enhance somatic embryo development in vitro. {ECO:0000269|PubMed:10318690, ECO:0000269|PubMed:10662856, ECO:0000269|PubMed:12226488, ECO:0000269|PubMed:12743119, ECO:0000269|PubMed:14615187, ECO:0000269|PubMed:15084721, ECO:0000269|PubMed:15686521, ECO:0000269|PubMed:17521410, ECO:0000269|PubMed:18305206

### PKL 

In [164]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "PKL")

('Query= XP_025015659.1 LOW QUALITY PROTEIN: DNA helicase INO80 [Ricinus communis]',
 ' Score = 237.7 bits (605),  Expect = 4.2e-65')

In [165]:
from Bio import Entrez
from Bio import SeqIO
f_t = ['XP_025015659.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_025015659.1
LOCUS:XP_025015659
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002995114.1) annotated using gene prediction method: Gnomon, supported by EST evidence.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; ; ##RefSeq-Attributes-START## ; frameshifts :: corrected 1 indel ; ##RefSeq-Attributes-END##; COMPLETENESS: full length.


In [166]:
from Bio import Entrez
from Bio import SeqIO
f_t = [PKL] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q9S775
LOCUS:PKL_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On or before Jun 15, 2007 this sequence version replaced gi:122242445, gi:25453560.; 
[FUNCTION] Chromatin remodeling factor that represses the expression of embryonic trait genes (such as NFYB9/LEC1) upon and after seed germination and thus enables the developmental switch to post-germinative growth. Silences some MADS-box proteins such as PHE1 and PHE2. Plays a role during carpel differentiation. Regulates late processes in cytokinin signaling. {ECO:0000269|PubMed:10535738, ECO:0000269|PubMed:10570159, ECO:0000269|PubMed:16359393, ECO:0000269|PubMed:21357580}.; 
[SUBUNIT] Interacts with TAF12B. {ECO:0000269|PubMed:21357580}.; 
[SUBCELLULAR LOCATION] Nucleus {ECO:0000255|PROSITE-ProRule:PRU00768, ECO:0000269|PubMed:16359393, ECO:0000269|PubMed:19245862}.; 
[TISSUE SPECIFICITY] Mostly expressed in tissue undergoing significant differentiation (meristems and primordia) such as young seedlings, influorescent tissue and yo

### VAL

In [167]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "VAL")

('Query= XP_025013310.1 LOW QUALITY PROTEIN: histone-lysine N-methyltransferase ASHH2 [Ricinus communis]',
 ' Score = 40.8 bits (94),  Expect = 9.1e-06')

In [168]:
from Bio import Entrez
from Bio import SeqIO
f_t = ['XP_025013310.1'] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


XP_025013310.1
LOCUS:XP_025013310
SOURCE:Ricinus communis (castor bean)
[MODEL REFSEQ: This record is predicted by automated computational analysis. This record is derived from a genomic sequence (NW_002994357.1) annotated using gene prediction method: Gnomon.~Also see:~ Documentation of NCBI's Annotation Process~; ; ##Genome-Annotation-Data-START## ; Annotation Provider :: NCBI ; Annotation Status :: Full annotation ; Annotation Name :: Ricinus communis Annotation Release 101 ; Annotation Version :: 101 ; Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline ; Annotation Software Version :: 8.0 ; Annotation Method :: Best-placed RefSeq; Gnomon ; Features Annotated :: Gene; mRNA; CDS; ncRNA ; ##Genome-Annotation-Data-END##; ; ##RefSeq-Attributes-START## ; ab initio :: 6% of CDS bases ; internal stop codons :: corrected 3 genomic stop codon ; ##RefSeq-Attributes-END##; COMPLETENESS: full length.


In [169]:
from Bio import Entrez
from Bio import SeqIO
f_t = [VAL] 
Entrez.email = "gangulo@eafit.edu.co"

for prot in f_t:
    print()
    print(prot)
    proteina = Entrez.efetch(db="protein", id=prot, retmode="xml")
    proteinaXML = Entrez.read(proteina)[0]
    print("LOCUS:"+proteinaXML["GBSeq_locus"])
    print("SOURCE:"+proteinaXML["GBSeq_source"])
    info = proteinaXML["GBSeq_comment"].split("[")
    newl = []
    for e in info:
        newl.append("["+e)

for ne in newl:
    print(ne)


Q8W4L5
LOCUS:VAL1_ARATH
SOURCE:Arabidopsis thaliana (thale cress)
[On May 26, 2009 this sequence version replaced gi:75097004.; 
[FUNCTION] Transcriptional repressor of gene expression involved in embryonic pathways, such as LEC1, ABI3, and FUS3. Repressor of the sugar-inducible genes involved in the seed maturation program in seedlings. Plays an essential role in regulating the transition from seed maturation to seedling growth. Functionally redundant with VAL2/HSL1. {ECO:0000269|PubMed:15894743, ECO:0000269|PubMed:17158584, ECO:0000269|PubMed:17267611}.; 
[SUBUNIT] Interacts with SNL1. {ECO:0000269|PubMed:19962994}.; 
[SUBCELLULAR LOCATION] Nucleus {ECO:0000255|PROSITE-ProRule:PRU00326, ECO:0000269|PubMed:15894743}.; 
[TISSUE SPECIFICITY] Expressed in flowers and at lower levels in roots, stems and leaves. {ECO:0000269|PubMed:15894743}.; 
[SEQUENCE CAUTION] Sequence=AAB63089.1; Type=Erroneous gene model prediction; Evidence={ECO:0000305}.


### NF-YA9

In [170]:
def query_evalue(lista, proteina):
    i=0
    b=len(proteina)
    prot=""
    e_value=[]
    query=[]
    while i < len(lista):
        if len(lista[i])!=0:
            if lista[i][0] == ">":
                j=1
                while j< len(lista[i]):
                    if lista[i][j:j+b] == proteina:
                        prot= lista[i][j:j+b]
                        e_value= lista[i+3]
                        query = lista[i-4]
                    j=j+1
        i=i+1
    return query,e_value

query_evalue(matches, "NF-YA9")

([], [])

# Análisis y conclusiones

<div class=text-justify>De las 17 proteínas analizadas 14 tuvieron homología en el proteoma de higuerilla con bajos valores de e-value: ANT, AIL1, AIL5, AIL6, PLT1, PLT2,PLT7, BBM, LEC2, ABI3, FUS3, AGL15, PKL y VAL. Sin embargo, aunque estas proteínas se encuentran en su mayoría caracterizadas en *A. thaliana* con funciones regulatorias en la embriogénesis, no es el mismo caso para higuerilla como puede observarse al comparar las búsquedas en la base de dato de NCBI. Sin embargo, los resultados invitan a evaluar si las proteínas homólogas en higuerilla identificadas aquí tienen funciones similares a las de *A. thaliana*, y también si se rigen por patrones similares de regulación.<br />
    
<div class=text-justify>En particular, el gen de la proteína LEC2 fue identificado previmente en higuerilla también mediante un BLAST usando la secuencia conocida de *A. thaliana* [5]. En ese trabajo observaron que ambas proteínas tienen un dominio B3 central altamente conservado y señalaron que esa característica sumada a similares patrónes de regulación podrían sugerir que los homólogos tienen funciones similares asociadas al desarrollo de semillas en ambas especies de plantas. Hay mucho por hacer todavia en el reconocimiento de genes homólogos que puedan esclarecer los requerimientos necesarios para la inducción de embriogénesis somática *in vitro*.
    

# Bibliografía

1. FAO, F., OMS, P., & UNICEF. (2017). El estado de la seguridad alimentaria y la nutrición en el mundo 2017. Fomentando la resiliencia en aras de la paz y la seguridad alimentaria. ROMA, FAO.
2. Pernisová, M., Klíma, P., Horák, J., Válková, M., Malbeck, J., Souček, P., ... & Za, E. (2009). Cytokinins modulate auxin-induced organogenesis in plants via regulation of the auxin efflux. Proceedings of the National Academy of Sciences, 106(9), 3609-3614.
3. Us-Camas, R., Rivera-Solís, G., Duarte-Aké, F., & De-la-Pena, C. (2014). In vitro culture: an epigenetic challenge for plants. Plant Cell, Tissue and Organ Culture (PCTOC), 118(2), 187-201.
4. Horstman, A., Li, M., Heidmann, I., Weemen, M., Chen, B., Muino, J. M., ... & Boutilier, K. (2017). The BABY BOOM transcription factor activates the LEC1-ABI3-FUS3-LEC2 network to induce somatic embryogenesis. Plant physiology, pp-00232.
5. Kim, H. U., Jung, S. J., Lee, K. R., Kim, E. H., Lee, S. M., Roh, K. H., & Kim, J. B. (2014). Ectopic overexpression of castor bean LEAFY COTYLEDON2 (LEC2) in Arabidopsis triggers the expression of genes that encode regulators of seed maturation and oil body proteins in vegetative tissues. FEBS open bio, 4, 25-32.