<a href="https://colab.research.google.com/github/AlexanderPico/retrondb-notebooks/blob/main/making-queries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making Queries
Examples of querying the retron database using the `retrondb` package.

### Connecting to the database
If you are not sure how to do this, review `getting-started.ipynb`.

In [2]:
import retrondb as rdb
dbr = rdb.connect_retronDB('sandbox') #this database is for demos and tutorials; it is not the actual database


[92m Success[00m: Connected to sandbox with 95 retrons



### Query by node ID
If you know the node IDs you are looking for, then this is the way to retrieve them.

In [29]:
# Simple node ID lookup. Only works for one node ID at a time.
rdb.get_retron(dbr,"1")

Unnamed: 0,_id,node,ncrna,ensemble prediction,rtdna (sequencing values),rt/cladea,retron (sub)b,msr/msd familiyc,rt-dna production,bacterial editing,mammalian editing group
0,62f1b857959907d83045e6a1,1,AATAATCTTACGCGGATAGAAATGTAATTATCGGTTGTTAGGAGAT...,,,1,I-A,IA/IIA1,57,25,1


To retrieve multiple retrons by node ID, use this `set` syntax with comparison operators: 
 * `$in` : in a list of values
 * `$nin` : not in a list of values
 * `$eq` : equal to a value
 * `$neq` : not equal to a value

In [30]:
node_list = ["1","28","64"]
rdb.get_retrons_by(dbr, "node", {"$in":node_list})

Unnamed: 0,_id,node,ncrna,ensemble prediction,rtdna (sequencing values),rt/cladea,retron (sub)b,msr/msd familiyc,rt-dna production,bacterial editing,mammalian editing group
0,62f1b857959907d83045e6a1,1,AATAATCTTACGCGGATAGAAATGTAATTATCGGTTGTTAGGAGAT...,,,1.0,I-A,IA/IIA1,57,25,1
1,62f1b857959907d83045e6a2,28,ACATACGGGGCGGGAACGCGGAATTGGACAACGTTATTTGACGTAC...,,,1.0,I-A,IA/IIA1,93,32,2
2,62f1b857959907d83045e6a3,64,CTTACAGACGGGCTGCCTAGGGGTCAACTGGACATAAGATCGGGGC...,((((((((.........((((((((((......)))).)))))).(...,0.00003;0.00003;0.00003;0.00006;0.00006;0.0000...,,,,26,78,2


### Query by any property
Alternatively, you can the same syntax above to query on any property in the database.

In [31]:
rdb.get_retrons_by(dbr, "retron (sub)b", {"$eq":"I-A"})

Unnamed: 0,_id,node,ncrna,ensemble prediction,rtdna (sequencing values),rt/cladea,retron (sub)b,msr/msd familiyc,rt-dna production,bacterial editing,mammalian editing group
0,62f1b857959907d83045e6a1,1,AATAATCTTACGCGGATAGAAATGTAATTATCGGTTGTTAGGAGAT...,,,1,I-A,IA/IIA1,57,25,1
1,62f1b857959907d83045e6a2,28,ACATACGGGGCGGGAACGCGGAATTGGACAACGTTATTTGACGTAC...,,,1,I-A,IA/IIA1,93,32,2


### Format query results
The default result is provides as pandas `DataFrame`. However, you can also specify `JSON` or `dict` or `raw` (for pymongo objects).

In [32]:
# Get a single retron by identifier as JSON, a dictionary or DataFrame
ret_raw=rdb.get_retrons_by(dbr, "retron (sub)b", {"$eq":"I-A"},"raw")
ret_json=rdb.get_retrons_by(dbr, "retron (sub)b", {"$eq":"I-A"},"json")
ret_dict=rdb.get_retrons_by(dbr, "retron (sub)b", {"$eq":"I-A"},"dict")
ret_df=rdb.get_retrons_by(dbr, "retron (sub)b", {"$eq":"I-A"})
print("Raw pymongo object: ")
print(ret_raw)
print("\nJSON String: ")
print(ret_json)
print("\nPython Dictionay: ")
print(ret_dict)
print("\nPandas DataFrame: ")
ret_df

Raw pymongo object: 
<pymongo.cursor.Cursor object at 0x7fc42989b400>

JSON String: 
[{"_id": {"$oid": "62f1b857959907d83045e6a1"}, "node": "1", "ncrna": "AATAATCTTACGCGGATAGAAATGTAATTATCGGTTGTTAGGAGATACAAGATGCTTCACCAACTTGTTTGGAGTAAGTGCTGTATCACGAAGTGCGCCGATCGATGTAGTCGGAAAAGCATAGGAAGAGAACCGGATTTGTAAATTCTTTTTAGTCTGCCACCGAACGGGAATATCGCGAACTTCTACAGGACAATGCCTAGCACGCTCTTGGAGGTAAGGCTCACAAAGGCTGTGGTTCTGAGGGCGCGGAGTATACCGGGAAACTCTACCCGGCA", "ensemble prediction": null, "rtdna (sequencing values)": null, "rt/cladea": "1", "retron (sub)b": "I-A", "msr/msd familiyc": "IA/IIA1", "rt-dna production": "57", "bacterial editing": "25", "mammalian editing group": "1"}, {"_id": {"$oid": "62f1b857959907d83045e6a2"}, "node": "28", "ncrna": "ACATACGGGGCGGGAACGCGGAATTGGACAACGTTATTTGACGTACCCTGCAGGGGAATTGTTCTTATGGGTGTGTATCGCGGCCCCGAACGGATAACCCCCGGGTTGTAGGTTCATAGCAGCCAAACGGTGTTCCGGATGTCCCATACACTGCTA", "ensemble prediction": null, "rtdna (sequencing values)": null, "rt/cladea": "1", "retron (sub)b": "I-A", "msr/

Unnamed: 0,_id,node,ncrna,ensemble prediction,rtdna (sequencing values),rt/cladea,retron (sub)b,msr/msd familiyc,rt-dna production,bacterial editing,mammalian editing group
0,62f1b857959907d83045e6a1,1,AATAATCTTACGCGGATAGAAATGTAATTATCGGTTGTTAGGAGAT...,,,1,I-A,IA/IIA1,57,25,1
1,62f1b857959907d83045e6a2,28,ACATACGGGGCGGGAACGCGGAATTGGACAACGTTATTTGACGTAC...,,,1,I-A,IA/IIA1,93,32,2
