# File Handling
- Reading text files
    - Opening text file
    - ```read()``` function
    - ```readlines()``` function
    - ```readline()``` function
    - Parsing data
    - Encoding data
- Writing text files
    - Creating text file
    - Writing files
    - Appending elements

## 1. Reading text files
- Text files (.txt) can be opened using open() function in Python
    - When reading a file, set *mode* as 'r'
    - In general, **utf-8** encoding is used

<br>
```open(file_name, mode, encoding = 'utf-8')```

<br>    
- To read the opened files, there are three options. Which functions is preferable is largely dependent upon situation.
    - read()
    - readline()
    - readlines()

### Opening text file

In [None]:
# opening text file
file = open('text.txt', 'r', encoding = 'utf-8')
print(file)
file.close()      # it is desirable to close file after usage

### ```read()``` function
- ```read()``` converts all contents in text file in single string

In [None]:
file = open('text.txt', 'r', encoding = 'utf-8')
data = file.read()
print(type(data))

In [None]:
print(data)

### ```readlines()``` function
- ```readlines()``` reads contents in text file line by line, and saves it in a list
- Each element in resulting list is each line in text file

In [None]:
file = open('text.txt', 'r', encoding = 'utf-8')
data = file.readlines()
print(type(data))
file.close()

In [None]:
print(data)

In [None]:
### below code block is almost equivalent to above
# can you explain why? 
# what is different between two? 
# what does split() function do here?
file = open('text.txt', 'r', encoding = 'utf-8')
data = file.read().split('\n')
file.close()
print(data)

### ```readline()``` function
- ```readline()``` reads only contents in first line of text file, and saves it in single string

In [None]:
file = open('text.txt', 'r', encoding = 'utf-8')
data = file.readline()
print(type(data))
file.close()

In [None]:
print(data)

### Parsing data
- In most cases, datasets are consisted of multiple columns.
- In other words, each line should be parsed.
- One way to handle such dataset is to split line by line and store it into 2-D list (i.e., matrix format)
- For instance, consider grades dataset provided

| Name | Score  | Grade   |
|------|--------|---------|
|Jane  | 97     | A+      |
|Johnny  | 80     | B+     |
|Lisa  | 60     | C-     |
|Mike  | 95     | A0      |

In [None]:
file = open('grades.txt', 'r', encoding = 'utf-8')
data = file.read().split('\n')
print(len(data))
print(data)
file.close()        # you can see that each line(row) is stored in each element in list

In [None]:
data_parsed = data
for i in range(len(data_parsed)):
    data_parsed[i] = data_parsed[i].split(',')
print(data_parsed)        # you can see that each line(row) is parsed and stored!!

### Encoding data
- In light of ```data_parsed``` above, we have another potential problem.
    - Scores (97, 80, 60, 95) should be interpreted as integers or floats, but they are strings now
- So, we have to *encode* data.
    - Convert scores into integers or floats

In [None]:
data_encoded = data_parsed
for i in range(len(data_encoded)):
    data_encoded[i][1] = int(data_encoded[i][1])   # use type casting to convert score into integer
print(data_encoded)

In [None]:
for row in data_encoded:
    print(row)

### Exercise 5-1.
- Read pitcher stats from ```pitcher_stats.txt``` file
- Parse & encode data as above
    - Name should be string, W & L integer, and ERA float
    - Note that text file is delimited by '/'

| Name | W  | L   |  ERA|
|------|--------|---------|--------|
|Kershaw  | 18     | 4      |2.31 |
|Jansen  | 5     | 0     |1.32|
|Wood  | 16    | 3     |2.72|
|Hill  | 12     | 8      |3.32|

In [1]:
## Your answer
file = open('pitcher_stats.txt', 'r', encoding = 'utf-8')
data = file.read().split('\n')
for i in range(len(data)):
    data[i] = data[i].split('/')
file.close()
print(data)

### shorter answer
with open('pitcher_stats.txt', 'r', encoding = 'utf-8') as f:
    data = [x.split('/') for x in f.read().split('\n')]
print(data)

for i in range(len(data)):
    data[i][1] = int(data[i][1])
    data[i][2] = int(data[i][2])
    data[i][3] = float(data[i][3])

print(data)

[['Kershaw', '18', '4', '2.31'], ['Jansen', '5', '0', '1.32'], ['Wood', '16', '3', '2.72'], ['Hill', '12', '8', '3.32']]
[['Kershaw', '18', '4', '2.31'], ['Jansen', '5', '0', '1.32'], ['Wood', '16', '3', '2.72'], ['Hill', '12', '8', '3.32']]
[['Kershaw', 18, 4, 2.31], ['Jansen', 5, 0, 1.32], ['Wood', 16, 3, 2.72], ['Hill', 12, 8, 3.32]]


 ## Writing text files
- Text files (.txt) can be opened using open() function in Python
    - When writing a file, set *mode* as 'w' or 'a'
    - In general, **utf-8** encoding is used

<br>
```open(file_name, mode, encoding = 'utf-8')```

<br>    
- There are two modes of writing
    - write ('w')
    - append ('a')

### Creating text file
- If *mode* is set to 'w', new text file is created

In [2]:
file = open('new_file.txt', 'w', encoding = 'utf-8')
print(file)
file.close()

<_io.TextIOWrapper name='new_file.txt' mode='w' encoding='utf-8'>


### Writing files
- Use ```write()``` function to add contents to file

In [3]:
# writing strings line by line
file = open('new_file_2.txt', 'w', encoding = 'utf-8')
file.write('To be, or not to be - that is the question:')
file.write("Whether 'tis nobler in the mind to suffer")
file.write('The slings and arrows of outrageous fortune')
file.write('Or to take arms against a sea of troubles')
file.write('And by opposing end them')
file.close()

In [4]:
## one problem is that write() does not distinguish lines per se
# writing strings line by line (with '\n')
file = open('new_file_3.txt', 'w', encoding = 'utf-8')
file.write('To be, or not to be - that is the question: \n')
file.write("Whether 'tis nobler in the mind to suffer \n")
file.write('The slings and arrows of outrageous fortune \n')
file.write('Or to take arms against a sea of troubles \n')
file.write('And by opposing end them')
file.close()

In [5]:
# writing contents in list
alist = ['to', 'be', 'or', 'not', 'to', 'be']
file = open('new_file_4.txt', 'w', encoding = 'utf-8')
for word in alist:
    file.write(word + '\n')
file.close()

In [None]:
# writing contents in list
# spliting each word with space
alist = ['to', 'be', 'or', 'not', 'to', 'be']
file = open('new_file_5.txt', 'w', encoding = 'utf-8')
for word in alist:
    file.write(word + ' ')
file.close()

In [6]:
# writing int/floats
alist = [2, 3, 7, 4, 5]
file = open('new_file_6.txt', 'w', encoding = 'utf-8')
for number in alist:
    file.write(str(number) + '\n')
file.close()

### Appending elements
- Sometimes, one might just one to "add" some elements to existing text file
- In this case, set opening mode as ```'a'```

In [7]:
blist = [1, 6, 8, 9, 10]
file = open('new_file_6.txt', 'a', encoding = 'utf-8')
for number in blist:
    file.write(str(number) + '\n')
file.close()

### Exercise 5-2.
- Write & save data parsed & encoded from Exercise 1-1. into a new text file ```pitcher_stats_new.txt```
- Dataset should look the same

In [9]:
## Your answer
with open('pitcher_stats.txt', 'r', encoding = 'utf-8') as f:
    data = [x.split('/') for x in f.read().split('\n')]

for i in range(len(data)):
    print(data[i])
    data[i] = '/'.join(data[i])     # join all elements in list with '/' between each element
    print(data[i])
    
with open('pitcher_stats_new.txt', 'w', encoding = 'utf-8') as f:
    for row in data:
        f.write(row + '\n')

['Kershaw', '18', '4', '2.31']
Kershaw/18/4/2.31
['Jansen', '5', '0', '1.32']
Jansen/5/0/1.32
['Wood', '16', '3', '2.72']
Wood/16/3/2.72
['Hill', '12', '8', '3.32']
Hill/12/8/3.32


### Exercise 5-3.
- Parse & save below string into new text file ```zen_of_python.txt```
- Split each line by **'.'** and save as each line in text file

```"Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. "```

In [9]:
## Your answer
data = "Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently."
data = data.split('. ')

with open('zen_of_python.txt', 'w', encoding = 'utf-8') as f:
    for sentence in data:
        f.write(sentence + '\n')

### Exercise 5-4.
- Open ```zen_of_python.txt``` and append contents in below list, saving each element in list to each line

```zen = ["Unless explicitly silenced.", "In the face of ambiguity, refuse the temptation to guess." "There should be one-- and preferably only one --obvious way to do it.", "Although that way may not be obvious at first unless you're Dutch. Now is better than never.",  "Although never is often better than *right* now.", "If the implementation is hard to explain, it's a bad idea.", "If the implementation is easy to explain, it may be a good idea."]```

In [10]:
## Your answer
zen = ["Unless explicitly silenced.", "In the face of ambiguity, refuse the temptation to guess." "There should be one-- and preferably only one --obvious way to do it.", "Although that way may not be obvious at first unless you're Dutch. Now is better than never.",  "Although never is often better than *right* now.", "If the implementation is hard to explain, it's a bad idea.", "If the implementation is easy to explain, it may be a good idea."]

with open('zen_of_python.txt', 'a', encoding = 'utf-8') as f:
    for sentence in zen:
        f.write(sentence + '\n')