# 1 Automating File Processing

## 1.1  File Reading
This notebook segments explores how Python can be used in creating, reading and writing files to the disk. 

### 1.1.1  Basics: File and Its Properties
File have two basic properties: `name` and `filepath`. Path points to the location of the file in the computer, name refers to the name of the file at this location. 
We use the function `os.path.join()` to create a file name string: 

```
import os
os.path.join('Quant', 'doc')
```

Note that on Mac OS X and Linux, file path uses `/` as the delimter while Windows use `\` as the delimeter. 

### 1.1.2 Current Working Directory
Current working directory is the current file path opened. We can use `os.getcwd()` to obtain the current file path, and `os.chdir()` can be used to change it.

In [8]:
import os
os.chdir('/Users/ConquerV/QUANT/OfficeAutomation')
os.getcwd()

'/Users/ConquerV/QUANT/RSM'

### 1.1.3 Path Manipulation
`Absolute path` file path always begins with the root, such as which disk partition the file is on. `Relative path` is the path relative to the current working directory. 

In [10]:
os.path.abspath('.')
os.path.isabs('.')
os.path.isabs(os.path.abspath('.'))

True

`os.path.relpath(path, start)` returns the relative file path from start to path. Default value of `start` is current working directory. 

In [13]:
os.path.relpath('/Users/ConquerV/QUANT/OfficeAutomation', 'Users')

'../../OfficeAutomation'

In [19]:
path = '/Users/ConquerV/QUANT/OfficeAutomation'
os.path.dirname(path)

'/Users/ConquerV/QUANT'

Python provides `os.path.exists()` for path checking.

`os.path.isfile(path)` evaluates if a path variable exits, and that it refers to a file. 

`os.path.isdir(path)` evaluates if a path variable refers to a directory.

### 1.1.4 File and Directory Manipulation

In [17]:
import os
os.makedirs('Sample')

In [20]:
os.path.getsize(path)
os.listdir(path)

['.DS_Store',
 '.ipynb_checkpoints',
 'Task1_Automating File and Email Processing.ipynb']

In [21]:
totalSize = 0
for filename in os.listdir(path):
    totalSize += os.path.getsize(os.path.join(path, filename))
print(totalSize)

11398


### 1.1.6 File Openning, Reading, Writing. 


In [25]:
helloFile = open(os.path.join(path, 'hello.txt'))
print(helloFile)

<_io.TextIOWrapper name='/Users/ConquerV/QUANT/OfficeAutomation/hello.txt' mode='r' encoding='UTF-8'>


Once we have the file object, we can read its content.

In [27]:
helloContent = helloFile.read()
sonnetFile = open(os.path.join(path, 'hello.txt'))
sonnetFile.readlines()

['{\\rtf1\\ansi\\ansicpg936\\cocoartf1404\\cocoasubrtf470\n',
 '{\\fonttbl\\f0\\fswiss\\fcharset0 Helvetica;}\n',
 '{\\colortbl;\\red255\\green255\\blue255;}\n',
 '\\margl1440\\margr1440\\vieww10800\\viewh8400\\viewkind0\n',
 '\\pard\\tx566\\tx1133\\tx1700\\tx2267\\tx2834\\tx3401\\tx3968\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural\\partightenfactor0\n',
 '\n',
 '\\f0\\fs24 \\cf0 Hello World}']

Note that there is different mode to open a file, such as `w, r, a`

In [32]:
helloFile = open(os.path.join(path, 'hello.txt'), 'w')
helloFile.write('Python is the best language! \n')

# Close the file to complete writing to the file
helloFile.close()

# Open the file for reading
hFile = open(os.path.join(path, 'hello.txt'))
content = hFile.read()
hFile.close()
print(content)

Python is the best language! 



### 1.1.6 Shelve Module
Using `shelve`, we can stores the python variables into the binary `shelf` file format. 
Such, the program will be able to restore variable data from the disk.

In [34]:
import shelve
shelfFile = shelve.open('mydata')
cats = ['British Short', 'Ragdoll', 'Angora']
shelfFile['cats'] = cats
shelfFile.close()

Note that shelf contains methods such as `keys()` and `values()` as in dictionary. But the return variable needs to be passed into the `list()` method to obtain the data in list format. 

In [35]:
shelfFile = shelve.open('mydata')
list(shelfFile.keys())
list(shelfFile.values())
shelfFile.close()

Using `pprint.pformat()` function to stores variable method 

In [44]:
import pprint
cats = ['British Short', 'Ragdoll', 'Angora', 'Mouse']
pprint.pformat(cats)
fileO = open('myCats.py', 'w')
fileO.write('cats = '+pprint.pformat(cats)+'\n')
fileO.close()

__Practice__
1. Write a program that creates 35 unique survey with 50 multiple choice, in random orders. The problems needs to have 1 correct answer and 3 random wrong answer. The surveys questions and their answers needs to be written into 35 txt file respectively. 

In [48]:
import random

for i in range(35):
    shelfFile = shelve.open('Survey #'+str(i))
    for j in range(50):
        random.shuffle(cats)
        answers = {'Correct':cats[0], 'Wrong':cats[1:]}
        shelfFile['Q' +str(j)] = answers

### 1.2.1 Shutil Module