<a href="https://colab.research.google.com/github/AlexanderPico/retrondb-notebooks/blob/main/importing-retrons.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Importing Retrons
This is how to import new retron data from CSV files into the retron database. These operations are distinct from modifying data associated with existing retron records (see `updating-retrons.ipynb`). Please take note of the parameters that are set to protect database corruption and modify their default setting with caution.

### Connecting to the database
If you are not sure how to do this, review `getting-started.ipynb`.

In [1]:
import retrondb as rdb
dbr = rdb.connect_retronDB('sandbox') #this database is for demos and tutorials; it is not the actual database


[92m Success[00m: Connected to sandbox with 101 retrons



### Add retrons to an existing database
As long as the retron properties you are importing already exist in the database, then you can leave the `new_property` paramater set to `False` and protect against accepting typos and using unconventional property names.

In [3]:
rdb.add_retrons_by_csv(dbr, "demo_files/import-data-New.csv")

Added retrons to the database.


Unnamed: 0,_id,node,ncrna,ensemble prediction,rtdna (sequencing values),rt/cladea,retron (sub)b,msr/msd familiyc,rt-dna production,bacterial editing,mammalian editing group
0,62f1bc0411a81641511e467e,4,TCCACAAAGATCGACGAAGCGCCTGGACCTGTAATAGTCGGTACCA...,(((((((((((((......(((((((((...)))))))))..(((....,0.002000257;0.002000257;0.002016815;0.00201681...,,,,84,74,1
1,62f1bc0411a81641511e467f,5,AGGAGTGGCACGTGGCCAGTTACGCATTGAAAGCCATACCCTAAAA...,,,,,,100,85,1
2,62f1bc0411a81641511e4680,6,GAACTCTCAAGTATTATCATCCCCGCTGCGACATCGTGGAGTAGGC...,,,,,,49,34,4


Note: the return DataFrame includes all data associated with imported retrons.

If the retrons you are importing include new properties (i.e., properties that no other retron in the current database has), then you will need to set `new_property=True`. 

First, let's see what happens if you don't change this setting...

In [4]:
rdb.add_retrons_by_csv(dbr, "demo_files/import-data-NewProp.csv")

UnrecognizedPropertyError: Failed to add or update retron. 
One or more unrecognized properties detected:

	new group

Double check your property names or consider setting new_property=True

Note: A custom error should be raised for `UnrecognizedPropertyError` and it will list the new properties detected in the import data. You can then inspect these property names and determine:
 * Did I make a typo?
 * Do I not want to import this column?
 * Do I want to force the database to accept a new property?
 
If you choose the last option, then here is the correct statement:

In [5]:
rdb.add_retrons_by_csv(dbr, "demo_files/import-data-NewProp.csv", new_property=True)

Added retrons to the database.


Unnamed: 0,_id,node,ncrna,ensemble prediction,rtdna (sequencing values),rt/cladea,retron (sub)b,msr/msd familiyc,rt-dna production,bacterial editing,mammalian editing group,new group
0,62f1bc0b11a81641511e4681,7,TCCACAAAGATCGACGAAGCGCCTGGACCTGTAATAGTCGGTACCA...,(((((((((((((......(((((((((...)))))))))..(((....,0.002000257;0.002000257;0.002016815;0.00201681...,,,,84,74,1,A
1,62f1bc0b11a81641511e4682,8,AGGAGTGGCACGTGGCCAGTTACGCATTGAAAGCCATACCCTAAAA...,,,,,,100,85,1,A
2,62f1bc0b11a81641511e4683,9,GAACTCTCAAGTATTATCATCCCCGCTGCGACATCGTGGAGTAGGC...,,,,,,49,34,4,B


Note: Scroll to the right to see the new property. Subsequent imports containing "new group" will no longer require `new_property=True`.

Another safeguard is the requirement for `node` property containing a unique integer (as str in the database). This requirement protects against duplicate records and adding, updating or removing the wrong retrons.

Let's see the what happens if you try to import a new retron with a pre-existing node ID...

In [6]:
rdb.add_retrons_by_csv(dbr, "demo_files/import-data-NewProp.csv")

[91m DuplicateKeyError[00m: "7" A retron with this same node ID already exists in the database. Either change the node ID or consider using update_retrons_by_csv().



Note: A custom error should be raised for `DuplicateKeyError` with advice about how to fix it. For example, perhaps you intended to update existing retrons rather than add new ones, in which case check out `updating-retrons.ipynb`.

Also note: CSV rows are added one at a time in the order provided. If some rows are successfully added before running into a duplicate node ID, then those records will remain the database (not undone) and returned as a DataFrame.

# Cleanup
This chunk will remove all the retrons added to the sandbox above, so the sandbox is ready for the next user.

In [7]:
rdb.remove_retrons_by(dbr, "node", {"$in":["4","5","6","7","8","9"]})

Removed retrons from the database.


Unnamed: 0,_id,node,ncrna,ensemble prediction,rtdna (sequencing values),rt/cladea,retron (sub)b,msr/msd familiyc,rt-dna production,bacterial editing,mammalian editing group,new group
0,62f1bc0411a81641511e467e,4,TCCACAAAGATCGACGAAGCGCCTGGACCTGTAATAGTCGGTACCA...,(((((((((((((......(((((((((...)))))))))..(((....,0.002000257;0.002000257;0.002016815;0.00201681...,,,,84,74,1,
1,62f1bc0411a81641511e467f,5,AGGAGTGGCACGTGGCCAGTTACGCATTGAAAGCCATACCCTAAAA...,,,,,,100,85,1,
2,62f1bc0411a81641511e4680,6,GAACTCTCAAGTATTATCATCCCCGCTGCGACATCGTGGAGTAGGC...,,,,,,49,34,4,
3,62f1bc0b11a81641511e4681,7,TCCACAAAGATCGACGAAGCGCCTGGACCTGTAATAGTCGGTACCA...,(((((((((((((......(((((((((...)))))))))..(((....,0.002000257;0.002000257;0.002016815;0.00201681...,,,,84,74,1,A
4,62f1bc0b11a81641511e4682,8,AGGAGTGGCACGTGGCCAGTTACGCATTGAAAGCCATACCCTAAAA...,,,,,,100,85,1,A
5,62f1bc0b11a81641511e4683,9,GAACTCTCAAGTATTATCATCCCCGCTGCGACATCGTGGAGTAGGC...,,,,,,49,34,4,B
