<a href="https://colab.research.google.com/github/AlexanderPico/retrondb-notebooks/blob/main/getting-started.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Getting Started
Welcome to your first Python notebook for the Retron Database. This notebook will help you establish a connection to the database and teach you how to perform basic transactions.

### Basic installation
If you haven't already, install the following pacakges, restart the kernel (Kernel > Restart), and proceed to subsequent section.

In [None]:
!pip install pymongo
!pip install dnspython
!pip install python-dotenv

### Connecting to the database
A username and password is required in order to access the database. These should be provided by a team member as either a .env file you can place in the same folder as this notebook, or as secret values that you can manually enter when prompted below.

_Note: we are connecting to the "sandbox" database in this notebook, so you don't have to worry about messing up the real data :). In non-tutorial notebooks, the default parameter is used to connect to "retronDB."_

In [1]:
import retrondb as rdb
db_retrons = rdb.connect_retronDB('sandbox') #this database is for demos and tutorials; it is not the actual database


[92m Success[00m: Connected to sandbox with 96 retrons



### Retrieving retrons from the database 

In [13]:
# Get all retrons in database as a Pandas DataFrame
rdb.get_all_retrons(db_retrons)

Unnamed: 0,_id,node,ncRNA,ensemble prediction,RTDNA (sequencing values),RT/Cladea,Retron (sub)b,msr/msd familiyc,RT-DNA production,Bacterial Editing,Mammalian Editing Group
0,62f01cb5959907d83045df37,1,GAGTAACAGTCGGTTAGCTTCCTTCATGGGCCAGTCATGGCGAGTT...,,,1,I-A,IA/IIA1,57,25,1
39,62f01cb5959907d83045df5e,1000,GATGGTTTATGGGCAGATTTTTGGTCGCGTTTCGCGACCTGTGAAA...,,,,,,55,65,2
40,62f01cb5959907d83045df5f,1036,ACGCATGTAGGCAGATTTGTTGGTTGTGAATCGCAACCAGTGGCCT...,(((((((((((((......(((((((((...)))))))))..(((....,0.002000257;0.002000257;0.002016815;0.00201681...,,,,84,74,1
41,62f01cb5959907d83045df60,1100,TTGTGCACGGTTAGATTTTTTACTCATGTTTCGTGAGTATTGACAA...,,,,,,100,85,1
42,62f01cb5959907d83045df61,1122,ACGGGGTCTCCCCCTCGCATACGCGCTGTAGAAACTGCGGCGTAAG...,,,,,,49,34,4
...,...,...,...,...,...,...,...,...,...,...,...
34,62f01cb5959907d83045df59,855,TGACCTGGGGTTCCCCCCGGAACCTTAGCTACTTTGGTTTCGCTTG...,,,,,,91,71,4
35,62f01cb5959907d83045df5a,888,GAACTCCGTAGCAACTCGTCACAGAACGCCCGATTTTATCAATGTA...,,,,,,87,79,3
36,62f01cb5959907d83045df5b,915,AGGCTTGCGGATAGATTCTGCTATTGTGTTTCGCGATAGTTGATAC...,,,,,,17,85,4
37,62f01cb5959907d83045df5c,939,GCTTACAGGCAGATTTATTATCGTGTTTCGCGATATATGATCCAAA...,,,,,,29,18,1


### Add a new retron to the database
You can manually add records one at a time as dictionaries (below), or in batches from CSV files (see `importing-retrons.ipynb`).

In [7]:
retron0 = {
        "node":"0",
        "ncRNA":"ACTATAAACGCACAGAACCAGACGCATGGCTGAGATGTCTATTATGTGCGAGGGAACCCAATCTTCCTGCACCAGCTAGACGTTACGCGCCGGCCGCAGCGTGAACCTACGAACCATATAAGAGTGCAAAACCAATGAACCCTTACCCTAAGATACCCGTGATCTTTTCAAAAGCACACCTAATTACCTATACTAAAATCACTTCCC",
        "ensemble prediction":"((((((((.........((((((((((......)))).)))))).(((.((((...)))).)))..(((((....))))).......((((((((((((((((((((((.((((((((.....)))))))).)))))))))))))))))))))).....))))))))",
        "RT-DNA production":"94",
        "Bacterial Editing":"93",
        "Mammalian Editing Group":"1"
        }
rdb.add_retron(db_retrons, retron0)

Added retron to the database.


Unnamed: 0,_id,node,ncRNA,ensemble prediction,RT-DNA production,Bacterial Editing,Mammalian Editing Group
0,62f046a8f6b57a67a7fefa83,0,ACTATAAACGCACAGAACCAGACGCATGGCTGAGATGTCTATTATG...,((((((((.........((((((((((......)))).)))))).(...,94,93,1


### Updating retron records
You can also update or add new properties to a retron already in the database. See `updating-retrons.ipynb` for more examples.

In [8]:
retron0update = {
        "node":"0",
        "Mammalian Editing Group":"2"
        }
rdb.update_retron(db_retrons, retron0update)

Updated retron in the database.


Unnamed: 0,_id,node,ncRNA,ensemble prediction,RT-DNA production,Bacterial Editing,Mammalian Editing Group
0,62f046a8f6b57a67a7fefa83,0,ACTATAAACGCACAGAACCAGACGCATGGCTGAGATGTCTATTATG...,((((((((.........((((((((((......)))).)))))).(...,94,93,2


### Query retrons by properties 
You can retrieve retrons from the database by node ID, using `get_retron()`, or by any property, using `get_retrons_by()`. See the `making-queries.ipynb` for more examples.

In [9]:
rdb.get_retrons_by(db_retrons,'Mammalian Editing Group', '2')

Unnamed: 0,_id,node,ncRNA,ensemble prediction,RTDNA (sequencing values),RT/Cladea,Retron (sub)b,msr/msd familiyc,RT-DNA production,Bacterial Editing,Mammalian Editing Group
0,62f01cb5959907d83045df38,28,TAGTTTGTCTTTTAGCGAATGAGGCATTTATGCCTAGTCGGGTGTT...,,,1.0,I-A,IA/IIA1,93,32,2
1,62f01cb5959907d83045df39,64,GCTCTTTAGCGTTTTATGGATTTACCACCTGATTGGTCAAATCTAG...,((((((((.........((((((((((......)))).)))))).(...,0.00003;0.00003;0.00003;0.00006;0.00006;0.0000...,,,,26,78,2
2,62f01cb5959907d83045df3a,116,ACTCTTTAGCGTTAGGCTTTGATTTATAGCCTTGTCGAGCGTTTCG...,((((((((.....(((((.........)))))((((..(((...))...,0.04947541;0.050365995;0.051252061;0.055244473...,,,,2,38,2
3,62f01cb5959907d83045df44,400,TTGGACATCAGTCATTCGCTCAGATTCATGAGAGAGTTAGACCCTA...,,,,,,40,25,2
4,62f01cb5959907d83045df48,486,TTTATAAAGTAACTTTGCGATAAGCTAAATTTTTCTTCATTGCATT...,,,,,,51,3,2
5,62f01cb5959907d83045df4c,613,TGATTGTAACACAGGAGAAGAAGATAAAAAATTTGGCAAAATGGAT...,,,,,,18,49,2
6,62f01cb5959907d83045df4e,749,GCTTCTTCTTCGATAGAAGCTGGAGGGCTCAAATGAGCTGACGCAT...,,,,,,38,68,2
7,62f01cb5959907d83045df4f,783,GAGAAGCTGATCAGCCCATGGTGAAGTTCAGGGCTACTTATGCTAG...,,,,,,95,51,2
8,62f01cb5959907d83045df55,816,AAGAGAACAACTAGAATGAGGTGATTCACCTCCTTGTTTAACGGCA...,,,,,,54,71,2
9,62f01cb5959907d83045df57,842,GGTAGTGGCGTTCACGAGGGTGTGTATCATACCCATTTGTGAAGGT...,,,,,,1,57,2


### Remove retrons
Similarly, you can remove retrons by either by their node ID or by property matches.

In [10]:
rdb.remove_retron(db_retrons, "0")

Removed a retron from the database.


Unnamed: 0,_id,node,ncRNA,ensemble prediction,RT-DNA production,Bacterial Editing,Mammalian Editing Group
0,62f046a8f6b57a67a7fefa83,0,ACTATAAACGCACAGAACCAGACGCATGGCTGAGATGTCTATTATG...,((((((((.........((((((((((......)))).)))))).(...,94,93,2
