# Chapter 5 - Files and Data Persistance


## Working with files and directories
### Opening Files
    file = open(File)

In [13]:
fh = open('Assets/Files/Merchant of Venice.txt', "rt")
c = 0

for line in fh.readlines():
    print(line.strip())
    
    # To end after 10 line, the file is too long. 
    if c==10:
        print('    ** Rest ommited, file is too long **')
        break
    c+=1

ACT I

SCENE I. Venice. A street.

Enter ANTONIO, SALARINO, and SALANIO
ANTONIO
In sooth, I know not why I am so sad:
It wearies me; you say it wearies you;
But how I caught it, found it, or came by it,
What stuff 'tis made of, whereof it is born,
I am to learn;
    ** Rest ommited, file is too long **


It is important to close a file, in order to prevent failing to release the handle on it. 

In [14]:
fh = open('Assets/Files/Merchant of Venice.txt', "rt")
c = 0

for line in fh.readlines():
    print(line.strip())
    
    # To end after 10 line, the file is too long. 
    if c==10:
        print('    ** Rest ommited, file is too long **')
        break
    c+=1

ACT I

SCENE I. Venice. A street.

Enter ANTONIO, SALARINO, and SALANIO
ANTONIO
In sooth, I know not why I am so sad:
It wearies me; you say it wearies you;
But how I caught it, found it, or came by it,
What stuff 'tis made of, whereof it is born,
I am to learn;
    ** Rest ommited, file is too long **


Simplifying:

In [15]:
try:
    fh = open('Assets/Files/Merchant of Venice.txt', "rt")
    c = 0

    for line in fh:
        print(line.strip())

        # To end after 10 line, the file is too long. 
        if c==10:
            print('    ** Rest ommited, file is too long **')
            break
        c+=1
        
finally:
    fh.close()

ACT I

SCENE I. Venice. A street.

Enter ANTONIO, SALARINO, and SALANIO
ANTONIO
In sooth, I know not why I am so sad:
It wearies me; you say it wearies you;
But how I caught it, found it, or came by it,
What stuff 'tis made of, whereof it is born,
I am to learn;
    ** Rest ommited, file is too long **


### Using Context-Manager to open files:

In [18]:
c=0
with open('Assets/Files/Merchant of Venice.txt') as fh:
    for line in fh:
        print(line.rstrip())
        
        # To end after 10 line, the file is too long. 
        if c==10:
            print('    ** Rest ommited, file is too long **')
            break
        c+=1

ACT I

SCENE I. Venice. A street.

Enter ANTONIO, SALARINO, and SALANIO
ANTONIO
In sooth, I know not why I am so sad:
It wearies me; you say it wearies you;
But how I caught it, found it, or came by it,
What stuff 'tis made of, whereof it is born,
I am to learn;
    ** Rest ommited, file is too long **


### Reading and Writing to a file

In [22]:
with open('Assets/Files/Merchant of Venice.txt') as f:
    lines = [line.rstrip() for line in f]

with open('Assets/Files/[Copy - Modified 1] Merchant of Venice.txt', 'w') as fr:
    fr.write('\n'.join(lines))

### Reading and Writing in Binary Mode
If bytes are to be written in a file, it can be opened in binary mode.

In [3]:
with open('Assets/Files/example.bin', 'wb') as fw:
    fw.write(b'This is binary data...')

with open('Assets/Files/example.bin') as f:
    print(f.read())

This is binary data...


### Protecting against overriding an existing file
Using `x` flag, the file will only be opened for writing if the file doesn't exist.

In [7]:
with open('Assets/Files/write_x.txt', 'x') as fw:
    fw.write('Writing Line 1' )
with open('Assets/Files/write_x.txt', 'x') as fw:    # Error
    fw.write('Writing Line 2' )

FileExistsError: [Errno 17] File exists: 'Assets/Files/write_x.txt'

### Checking for file and directory existance

In [9]:
import os

filename = 'Assets/Files/Merchant of Venice.txt'

path = os.path.dirname(os.path.abspath(filename))

print(os.path.isfile(filename))
print(os.path.isdir(path))
print(path)
print(os.path.abspath(filename))

True
True
/Users/harshchaturvedi/OneDrive/Source/Projects/Study/Getting Started with Python/Assets/Files
/Users/harshchaturvedi/OneDrive/Source/Projects/Study/Getting Started with Python/Assets/Files/Merchant of Venice.txt


### Manipulating files and directories

In [13]:
from collections import Counter
from string import ascii_letters

chars = ascii_letters + ' '

def sanitize(s, chars):
    return ''.join(c for c in s if c in chars)

def reverse(s):
    return s[::-1]

with open('Assets/Files/Merchant of Venice.txt') as stream:
    lines = [line.rstrip() for line in stream]
    
with open('Assets/Files/ecineV fo tnahcreM.txt', 'w') as stream:
    stream.write('\n'.join(reverse(line) for line in lines))
    
lines = [sanitize(line, chars) for line in lines]
whole = ' '.join(lines)
cnt = Counter(whole.lower().split())
print(cnt.most_common(3))

[('the', 831), ('i', 658), ('and', 610)]


In [16]:
import shutil
import os

BASE_PATH = 'ops_example'
os.mkdir(BASE_PATH)

path_b = os.path.join(BASE_PATH, 'A', 'B')
path_c = os.path.join(BASE_PATH, 'A', 'C')
path_d = os.path.join(BASE_PATH, 'A', 'D')

os.makedirs(path_b)
os.makedirs(path_c)

for filename in ('ex1.txt', 'ex2.txt', 'ex3.txt'):
    with open(os.path.join(path_b, filename), 'w') as stream:
        stream.write(f'Some content here in {filename}\n')

shutil.move(path_b, path_d)

shutil.move(
    os.path.join(path_d, 'ex1.txt'),
    os.path.join(path_d, 'ex1d.txt')
)

'ops_example/A/D/ex1d.txt'

### Manipulating pathnames

In [17]:
import os

filename = 'Assets/Files/Merchant of Venice.txt'
path = os.path.abspath(filename)

print(path)
print(os.path.basename(path))
print(os.path.dirname(path))
print(os.path.splitext(path))
print(os.path.split(path))

readme_path = os.path.join(
    os.path.dirname(path), '..', '..', 'README.rst'
)

print(readme_path)
print(os.path.normpath(readme_path))

/Users/harshchaturvedi/OneDrive/Source/Projects/Study/Getting Started with Python/Assets/Files/Merchant of Venice.txt
Merchant of Venice.txt
/Users/harshchaturvedi/OneDrive/Source/Projects/Study/Getting Started with Python/Assets/Files
('/Users/harshchaturvedi/OneDrive/Source/Projects/Study/Getting Started with Python/Assets/Files/Merchant of Venice', '.txt')
('/Users/harshchaturvedi/OneDrive/Source/Projects/Study/Getting Started with Python/Assets/Files', 'Merchant of Venice.txt')
/Users/harshchaturvedi/OneDrive/Source/Projects/Study/Getting Started with Python/Assets/Files/../../README.rst
/Users/harshchaturvedi/OneDrive/Source/Projects/Study/Getting Started with Python/README.rst


### Temporary files and directories

In [19]:
import os
from tempfile import NamedTemporaryFile, TemporaryDirectory

with TemporaryDirectory(dir='Assets/Files') as td:
    print('Temp directory:', td)
    with NamedTemporaryFile(dir=td) as t:
        name = t.name
        print(os.path.abspath(name))


Temp directory: Assets/Files/tmpkpbloxc7
/Users/harshchaturvedi/OneDrive/Source/Projects/Study/Getting Started with Python/Assets/Files/tmpkpbloxc7/tmpy87p53bg


### Directory Content

In [21]:
import os

with os.scandir('.') as it:
    for entry in it:
        print(
            entry.name, entry.path,
            'File' if entry.is_file() else 'Folder'
        )

.DS_Store ./.DS_Store File
Chapter 3 - Iterating and Making Decisions.ipynb ./Chapter 3 - Iterating and Making Decisions.ipynb File
cache ./cache Folder
Archive ./Archive Folder
Chapter 1 - A Gentle Introduction to Python.ipynb ./Chapter 1 - A Gentle Introduction to Python.ipynb File
Chapter 2 - Build In Datatypes.ipynb ./Chapter 2 - Build In Datatypes.ipynb File
Chapter 4 - Functions - The Building Blocks of Code.ipynb ./Chapter 4 - Functions - The Building Blocks of Code.ipynb File
.ipynb_checkpoints ./.ipynb_checkpoints Folder
Assets ./Assets Folder
Chapter 5 - Files and Data Persistance.ipynb ./Chapter 5 - Files and Data Persistance.ipynb File


### Files and Directory compression

#### Working with ZIPs

In [None]:
from zipfile import ZipFile

with ZipFile('example.zip', 'w') as zp:
    zp.write('content1.txt')
    zp.write('content2.txt')
    zp.write('subfolder/content3.txt')
    zp.write('subfolder/content4.txt')
    
with ZipFile('example.zip') as zp:
    zp.extract('content1.txt', 'extract_zip')
    zp.extract('subfolder/content3.txt', 'extract_zip')

## Data Interchange Formats
Most commonly used data interchange formats are JSON, XML, and YAML. Python has modules `xml`, `json`, and PyPI contains packages for working with YAML.

## Working with JSON
JSON is based on two structures:
- Collection of name/value pairs, mapping to dictionaries.
- Ordered list of values, mapping to lists.

In [28]:
import sys
import json

data = {
    'big_number' : 2**6969,
    'max_float' : sys.float_info.max,
    'a_list' : [2,3,5,7]
}

json_data = json.dumps(data)

data_out = json.loads(json_data)
assert data==data_out
print(data==data_out)

True


JSON is quite similar to python.

In [29]:
import json

info = {
    'full_name' : 'Sherlock Holmes',
    'address' : {
        'street' : '221B Baker St',
        'zip' : 'NW1 6XE',
        'city' : 'london',
        'country' : 'UK',
    }
}

print(json.dumps(info, indent=2, sort_keys=True))

{
  "address": {
    "city": "london",
    "country": "UK",
    "street": "221B Baker St",
    "zip": "NW1 6XE"
  },
  "full_name": "Sherlock Holmes"
}


Note that unlike python, JSON doesn't support commas after the last item.

In [31]:
import json

data_in = {
    'a_tuple' : (1,2,3,4,5),
}

json_data = json.dumps(data_in)
print(json_data)
data_out = json.loads(json_data)
print(data_out)

{"a_tuple": [1, 2, 3, 4, 5]}
{'a_tuple': [1, 2, 3, 4, 5]}


Since JSON doesn't support tuples, which are, conceptually, quite similar to lists, `json.dumps` converts it into a list in JSON, and the info about it being a tuple is lost. This is a common problem that occurs with more datatypes and cases.

#### Custom Encoding and Decoding with JSON
The cls parameter can be used to specify a custom encoder.

In [33]:
import json
class ComplexEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, complex):
            return {
                '_meta' : '_complex',
                'num' : [obj.real, obj.imag],
            }
        return json.JSONEncoder.default(self, obj)

data = {
    'an_int' : 42,
    'a_float' : 3.14159265,
    'a_complex' :  3+4j
}

json_data = json.dumps(data, cls=ComplexEncoder)
print(json_data)

def object_hook(obj):
    try:
        if obj['_meta'] == '_complex':
            return complex(*obj['num'])
    except (KeyError, TypeError):
        return obj

data_out = json.loads(json_data)
print(data_out)

{"an_int": 42, "a_float": 3.14159265, "a_complex": {"_meta": "_complex", "num": [3.0, 4.0]}}
{'an_int': 42, 'a_float': 3.14159265, 'a_complex': {'_meta': '_complex', 'num': [3.0, 4.0]}}


## IO, streams and requests

### Using an in-memory stream
`io.StringIO` is an in-memory stream for text IO.

In [36]:
import io

stream = io.StringIO()
stream.write('Subscribe to PewDiePie.\n')      # One Way of writing to stream
print('Saving people, hunting things, the family buisness...', file=stream)    # Another way to write to stream

contents = stream.getvalue()
print(contents)

stream.close()

Subscribe to PewDiePie.
Saving people, hunting things, the family buisness...



In [37]:
with io.StringIO() as stream:
    stream.write('You hide your drugs in a Lupus textbook?\n')
    print('Its never lupus.', file=stream)
    contents = stream.getvalue()
    print(contents)

You hide your drugs in a Lupus textbook?
Its never lupus.



Memory is much faster than disk, and can be a good choice to save small amounts of data.

### Making HTTP Requests
#### GET Request
A GET request is made when getting data from a server.

In [1]:
import requests

urls = {
    'get' : 'https://httpbin.org/get?title=learn+python+programming',
    'headers' : 'https://httpbin.org/headers',
    'ip' : 'https://httpbin.org/ip',
    'now' : 'https://now.httpbin.org/',
    'user-agent':'https://httpbin.org/user-agent',
    'UUID':'https://httpbin.org/uuid',
}

def get_content(title,url):
    resp = requests.get(url)
    print(f'Response for {title}')
    print(resp.json())

for title, url in urls.items():
    get_content(title, url)
    print('-'*40)

Response for get
{'args': {'title': 'learn python programming'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.24.0', 'X-Amzn-Trace-Id': 'Root=1-5f7a7c6f-4c671e2f2b48ab67502692e4'}, 'origin': '49.36.173.60', 'url': 'https://httpbin.org/get?title=learn+python+programming'}
----------------------------------------
Response for headers
{'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.24.0', 'X-Amzn-Trace-Id': 'Root=1-5f7a7c70-3143adda547e624126274bc8'}}
----------------------------------------
Response for ip
{'origin': '49.36.173.60'}
----------------------------------------


ConnectionError: HTTPSConnectionPool(host='now.httpbin.org', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f940450a940>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

#### POST Request
A POST Request is made when sending data to the server.

In [41]:
import requests

url = 'https://httpbin.org/post'
data = dict(title = 'Coffee is better than Tea')

resp = requests.post(url, data=data)
print('Response for POST')
print(resp.json())

Response for POST
{'args': {}, 'data': '', 'files': {}, 'form': {'title': 'Coffee is better than Tea'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Content-Length': '31', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0', 'X-Amzn-Trace-Id': 'Root=1-5f48474a-19ddff69265faa1a09f43a29'}, 'json': None, 'origin': '49.36.169.58', 'url': 'https://httpbin.org/post'}


## Persiting data on disk
We will be using `pickle`, `selve`, and SQLAlchemy, the most widely adopted ORM Library in Python ecosystem.

### Serialising Data with `pickle`
`pickle` module offers tools to convert Python objects into byte objects and vice versa.    
> Even though there is a partial overlap in the API that `pickle` and `json` expose, they are quite different. JSON is a human-readable text format, and the pickle module is not human readable and translates to bytes, and is Python-specific, and supports extremely large number of datatypes.    

WARNING: `pickle` poses a Security Threat, in the sense  that *unpickling* malicious code can be very dangerous. 

In [47]:
import pickle
from dataclasses import dataclass

@dataclass
class Person:
    first_name: str
    last_name: str
    id: int
    
    def greet(self):
        print(f'I am {self.first_name} {self.last_name}',
             f'and my ID is{self.id}'
        )
        
people = [
    Person('Obi-Wan', 'Kenobi', 123),
    Person('Anakin', 'Skywalker', 456),
]

# save data to binary format to a file
with open('data.pickle', 'wb') as stream:
    pickle.dump(people, stream)
    
# load data from file
with open('data.pickle', 'rb') as stream:
    peeps = pickle.load(stream)
    
for person in peeps:
    person.greet()

I am Obi-Wan Kenobi and my ID is123
I am Anakin Skywalker and my ID is456


### Saving data with `shelve`
A shelve is a persistant dictionary like object with values that can be anything that can be pickled, unlike a database. Abeit useful, the `shelve` module is rarely used in practice.

In [53]:
import shelve

class Person:
    def __init__(self, name, id):
        self.name = name
        self.id = id
    
with shelve.open('shelf1.shelve') as db:
    db['s'] = Person('Sam Winchester', 123)
    db['d'] = Person('Dean Winchester', 456)
    db['c'] = Person('Castiel', 789)
    db['a_list'] = [2,3,5]
    db['delete_me'] = 'delet dis'
    
    print(list(db.keys()))
    del db['delete_me']
    
    print(list(db.keys()))
    
    print('delete_me' in db)
    print('c' in db)
    
    a_list = db['a_list']
    a_list.append(7)
    db['a_list'] = a_list
    print(db['a_list'])

['d', 'a_list', 'delete_me', 's', 'c']
['d', 'a_list', 's', 'c']
False
True
[2, 3, 5, 7]


The parameter `writeback` allows data to be modified. It is disabled by default to optimize memory usage and security. 

In [56]:
with shelve.open('shelf2.selve', writeback=True) as db:
    db['a_list'] = [11,13,17]
    db['a_list'].append(19)
    print(db['a_list'])

[11, 13, 17, 19]


### Saving data to database: SQLAlchemy
A **Relational Database** is a database that saves data following the **Relational Model**. In this model, data is stored in one or more tables, each having rows (aka _records_ or _tuples_), each representing an entry in the table, and columns (aka _attributes_), each of which represents an attribute of the records. Each record is identified through a unique key, more commonly known as the **Primary Key**, which is a union of one or more columns in a table.    
The model is called **Relational** is becuase a relationship can be established between tables.    
For Example: In a table `Users` with attributes `id`, `username`, `password`, `name`, and `surname`, a new table `PhoneNumbers` can be added, and then through relation, it can be established which phone number belongs to which user. 

In order to query a relational database, we need a special language. The main standard is **SQL (Structured Query Language)**, born out of **relational algebra**. 

Each Database comes with its own flavor of SQL. Most respects the standard upto an extent, but none respects it fully. It can be quite painful, as SQL Queries become very complicated very quickly. Therefore, Computer Scientists created code that maps objects of a particular language to tables of a relational database, named Object-Relational Mapping (ORMs).

SQLAlchemy is the most popular python ORM.

##### Example of SQLAlchemy:

First, import some functions and Create an Engine:

In [59]:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import(
    Column,
    Integer,
    String,
    ForeignKey,
    create_engine
)

from sqlalchemy.orm import relationship

In [62]:
engine = create_engine('sqlite:///:memory:')
Base = declarative_base()

class Person(Base):
    __tablename__ = 'person'
    
    id = Column(Integer, primary_key=True)
    name = Column(String)
    age = Column(Integer)
    
    address = relationship(
        'Address', back_populates='person', 
        order_by='Address.email', 
        cascade='all, delete-orphan'
    )
    
    def __repr__(self):
        return f'{self.name}(id={self.id})'
class Address(Base):
    __tablename__ = 'address'
    id = Column(Integer, primary_key=True)
    email = Column(String)
    person_id = Column(ForeignKey('person.id'))
    person = relationship('Person', back_populates='addresses')
    
    def __str__(self):
        return self.email
    __repr__ = __str__
    
Base.metadata.create_all(engine)

## Summary
- Files and directories
- Compression
- Networks and streams
- The JSON data-interchange format
- Data persistence with pickle and shelve, from the standard library
- Data persistence with SQLAlchemy