# Writing data to files and databases

Review  [`Reading data from files`](Reading_data_from_files.ipynb) and [`Practice functions`](Practice_functions.ipynb) before coming in here.

## Writing files

It's just as easy to write a text file in Python as it is to read one:

In [41]:
with open('myfile.txt', 'a') as f:
    f.write('This is new line!\n')

We can use Python to define the file names and to write the contents of the file.

In [10]:
for n in range(10):
    with open(f'myfile{n}.txt', 'w') as f:
        f.write(f'This is file number {n}')

## Get some data to write to a file

Here's the bare bones tops-reading function from `Reading_data_from_files.ipynb`.

In [2]:
def read_tops(fname):
    with open(fname, 'r') as f:
        lines = f.readlines()
    tops = {}
    for line in lines:
        if line.startswith('#'):
            continue
        top, depth = line.split(',')
        top = top.title()
        tops[top] = float(depth)
    return tops

In [3]:
tops = read_tops('../data/L-30_tops.txt')

In [4]:
tops

{'Wyandot Fm': 867.156,
 'Dawson Canyon Fm': 984.50402,
 'Logan Canyon Fm': 1136.904,
 'Upper Missisauga Fm': 2251.2529,
 'Lower Missisauga Fm': 3190.6464,
 'Abenaki Fm': 3404.3112,
 'Mid Baccaro': 3485.0832,
 'Lower Baccaro': 3964.5337,
 'Base O-Marker': 2469.207,
 'Td': 4268.0,
 'Pay_Sand_1-Rft': 2478.0,
 'Pay_Sand_2': 2499.0,
 'Pay_Sand_3': 2543.0,
 'Pay_Sand_4': 2637.0,
 'Sand_5': 2699.0,
 'Sand_6': 2795.0,
 'Sand_7': 2835.0}

### Exercise

Can you write a function to write the cleaned `tops` dictionary to a text file?

In [5]:
# This doesn't work
with open('tops.txt', 'w') as f:
    f.write(tops)

TypeError: write() argument must be str, not dict

In [9]:
# This is better, but it's not valid JSON
with open('tops.txt', 'w') as f:
    f.write(str(tops))

In [18]:
# Without turning to JSON (yet), safest would be CSV, but it's
# a pain to write. Best to use Pandas.
import pandas as pd
df = pd.DataFrame.from_dict(tops, orient='index')
df.to_csv('../data/tops.csv', header=False)

## JSON

There are lots of ways to save a file (e.g. pickling). But a good one to know about, because it's often used in working with web applications, and it's how we interact with NoSQL data stores, is JSON.

If you do anything with the web or with NoSQL databases, you will need to know about [JavaScript Object Notation, or JSON](https://en.wikipedia.org/wiki/JSON).

In [8]:
import json

with open('tops.txt', 'w') as f:
    f.write(json.dumps(tops, indent=4))

In [29]:
json?

In [11]:
with open('tops.txt', 'w') as f:
    json.dump(tops, f, indent=4)

## Reading the files we wrote

In [24]:
with open('tops.txt', 'r') as f:
    tops_f = f.read()

In [25]:
tops_f

'{\n    "Wyandot Fm": 867.156,\n    "Dawson Canyon Fm": 984.50402,\n    "Logan Canyon Fm": 1136.904,\n    "Upper Missisauga Fm": 2251.2529,\n    "Lower Missisauga Fm": 3190.6464,\n    "Abenaki Fm": 3404.3112,\n    "Mid Baccaro": 3485.0832,\n    "Lower Baccaro": 3964.5337,\n    "Base O-Marker": 2469.207,\n    "Td": 4268.0,\n    "Pay_Sand_1-Rft": 2478.0,\n    "Pay_Sand_2": 2499.0,\n    "Pay_Sand_3": 2543.0,\n    "Pay_Sand_4": 2637.0,\n    "Sand_5": 2699.0,\n    "Sand_6": 2795.0,\n    "Sand_7": 2835.0\n}'

In [12]:
with open('tops.txt', 'r') as f:
    tops_f = json.load(f)

In [13]:
tops_f

{'Abenaki Fm': 3404.3112,
 'Base O-Marker': 2469.207,
 'Dawson Canyon Fm': 984.50402,
 'Logan Canyon Fm': 1136.904,
 'Lower Baccaro': 3964.5337,
 'Lower Missisauga Fm': 3190.6464,
 'Mid Baccaro': 3485.0832,
 'Pay_Sand_1-Rft': 2478.0,
 'Pay_Sand_2': 2499.0,
 'Pay_Sand_3': 2543.0,
 'Pay_Sand_4': 2637.0,
 'Sand_5': 2699.0,
 'Sand_6': 2795.0,
 'Sand_7': 2835.0,
 'Td': 4268.0,
 'Upper Missisauga Fm': 2251.2529,
 'Wyandot Fm': 867.156}

We got our tops back!

## Writing to and reading from a SQL database

In [12]:
!rm tops.db

In [16]:
import sqlite3
import sys

with sqlite3.connect('tops.db') as conn:
    
    cur = conn.cursor()    
    cur.execute("CREATE TABLE strat(formation TEXT, depth DECIMAL, age INT)")
    
    for name, depth in tops.items():
        cur.execute("INSERT INTO strat VALUES('{}',{},{})".format(name, depth, 0))

In [14]:
# You can only create the table if it doesn't exist.
# Or you can do CREATE TABLE IF NOT EXISTS
# If you need to delete it at some point:
# with sqlite3.connect('tops.db') as conn:
    
#     cur = conn.cursor()    
#     cur.execute("DROP TABLE IF EXISTS strat")

In [17]:
with sqlite3.connect('tops.db') as conn:    
    
    cur = conn.cursor()    
    cur.execute("SELECT * FROM strat")

    rows = cur.fetchall()

    for row in rows:
        print(row)

('Wyandot Fm', 867.156, 0)
('Dawson Canyon Fm', 984.50402, 0)
('Logan Canyon Fm', 1136.904, 0)
('Upper Missisauga Fm', 2251.2529, 0)
('Lower Missisauga Fm', 3190.6464, 0)
('Abenaki Fm', 3404.3112, 0)
('Mid Baccaro', 3485.0832, 0)
('Lower Baccaro', 3964.5337, 0)
('Base O-Marker', 2469.207, 0)
('Td', 4268, 0)
('Pay_Sand_1-Rft', 2478, 0)
('Pay_Sand_2', 2499, 0)
('Pay_Sand_3', 2543, 0)
('Pay_Sand_4', 2637, 0)
('Sand_5', 2699, 0)
('Sand_6', 2795, 0)
('Sand_7', 2835, 0)


## Reading from the database to `pandas`

In [18]:
import pandas as pd

conn = sqlite3.connect('tops.db')

df = pd.read_sql("SELECT * FROM strat", conn)

In [19]:
df

Unnamed: 0,formation,depth,age
0,Wyandot Fm,867.156,0
1,Dawson Canyon Fm,984.50402,0
2,Logan Canyon Fm,1136.904,0
3,Upper Missisauga Fm,2251.2529,0
4,Lower Missisauga Fm,3190.6464,0
5,Abenaki Fm,3404.3112,0
6,Mid Baccaro,3485.0832,0
7,Lower Baccaro,3964.5337,0
8,Base O-Marker,2469.207,0
9,Td,4268.0,0


## Writing to and reading from a no-SQL key-value store

We'll use [TinyDB](https://github.com/msiemens/tinydb), but there are lots of options out there. You can install TinyDB with `conda`.

In memory stores, like Redis and memcached, work in a similar way.

In [26]:
tops = [{'name': k, 'depth': v} for k, v in tops.items()]

In [28]:
from tinydb import TinyDB, Query

db = TinyDB('db.json')

for doc in tops:
    db.insert(doc)

In [29]:
db

<TinyDB tables=['_default'], tables_count=1, default_table_documents_count=17, all_tables_documents_count=['_default=17']>

In [35]:
top = Query()

db.search(top.name == 'Abenaki Fm')

[{'name': 'Abenaki Fm', 'depth': 3404.3112}]

In [36]:
db.search(top.depth > 3000)

[{'name': 'Lower Missisauga Fm', 'depth': 3190.6464},
 {'name': 'Abenaki Fm', 'depth': 3404.3112},
 {'name': 'Mid Baccaro', 'depth': 3485.0832},
 {'name': 'Lower Baccaro', 'depth': 3964.5337},
 {'name': 'Td', 'depth': 4268.0}]

In [40]:
db.search((top.name == 'Abenaki Fm') & (top.depth < 3000))

[]

In [25]:
# Delete everything
db.purge_tables()