## Reading from files and writing to files
### BIOINF 575 - Fall 2022

In [None]:
amino_acids = {'Ser': 'Serine', 'Lys': 'Lysine', 'Ala': 'Alanine', 'Leu': 'Leucine'}

To add a new dictionary element - subset the dictionay using the new key and assign the value or use the update function

In [None]:
# reading user input: the input() function
# reads text, evaluates expression
res = input()
while res != "STOP":
    print("input", res)
    res = input()

### FILES and file access from python

####  A file is a named location on disk to store related information 
#### It is used to permanently store data in a non-volatile memory (e.g. hard disk)
#### https://www.programiz.com/python-programming/file-operation <br><br>

#### open – builtin function to open a file for reading or writing<br>
    
```python
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
  
fileObj = open(fileName, ‘r’) # open file for reading, r+
fileObj = open(fileName, ‘w’) # open file for writing, w+
fileObj = open(fileName, ‘a’) # open file for  appending, a+
```

| Parameter | Condition | Description                                                       |
|-----------|-----------|-------------------------------------------------------------------|
| file      | Required  | The pathname (absolute or relative to the current working directory) of the file to be opened                                      |
| mode      | Optional  | Specifies the mode you want to open the file in                   |
| buffering | Optional  | Sets the buffering policy - how much to read/write from/to the disk at once                                        |
| encoding  | Optional  | Specifies encoding                                                |
| errors    | Optional  | Specifies different error handling scheme                         |
| newline   | Optional  | Controls how universal newlines mode works                        |
| closefd   | Optional  | Keeps the underlying file descriptor open when the file is closed |
| opener    | Optional  | A custom opener used for low-level I/O operations                 |

#### Mode parameter options

| Character | Meaning                                                         |
|-----------|-----------------------------------------------------------------|
| 'r'       | open for reading (default)                                      |
| 'w'       | open for writing, truncating the file first                     |
| 'x'       | open for exclusive creation, failing if the file already exists |
| 'a'       | open for writing, appending to the end of file if it exists     |
| 'b'       | binary mode                                                     |
| 't'       | text mode (default)                                             |
| '+'       | open for updating (reading and writing)                         |

The default mode is 'r' (open for reading text, a synonym of 'rt'). Modes 'w+' and 'w+b' open and truncate the file. Modes 'r+' and 'r+b' open the file with no truncation.

https://docs.python.org/3/library/functions.html#open

https://www.learnbyexample.org/python-open-function/



* ##### The `file` parameter must be a string or reference to one
    * typically, that would be the file name (if the file is is in the current directory) or the file path (if the file is somewhere elese in the file system)

* ##### The file object is iterable by line/buffer


In [None]:
# buffering default is -1 so it takes the system buffer size

import io
print(io.DEFAULT_BUFFER_SIZE)

#### What is encoding?

From: https://www.w3.org/International/questions/qa-what-is-encoding



Basically, you can visualise it by assuming that all characters are stored in computers using a special code, like the ciphers used in espionage. 
A character encoding provides a key to unlock (ie. crack) the code. 
It is a set of mappings between the bytes in the computer and the characters in the character set. 
Without the key, the data cannot be understood - all or part of the data looks just like strange sequesnces of characters.   

Data that should look like this:     
<img src = https://www.w3.org/International/questions/qa-what-is-encoding-data/mojibake1.gif />

Looks like this:        
<img src = https://www.w3.org/International/questions/qa-what-is-encoding-data/mojibake2.gif />

===============

The `codecs` module defines base classes for standard Python codecs (encoders and decoders) and provides access to the internal Python codec registry, which manages the codec and error handling lookup process. Most standard codecs are text encodings, which encode text to bytes (and decode bytes to text), but there are also codecs provided that encode text to text, and bytes to bytes. Custom codecs may encode and decode between arbitrary types, but some module features are restricted to be used specifically with text encodings or with codecs that encode to bytes.
https://docs.python.org/3/library/codecs.html

In [None]:
# to explore more information about the codecs module 
# you can run the following commands

# import codecs
# help(codecs)
# dir(codecs)

##### ASCII code table
https://www.ascii-code.com

#### Python code to demonstrate str.decode() - adapted from:
https://www.geeksforgeeks.org/python-strings-decode-method/

In [None]:
# Python code to demonstrate
# decode()

# initializing string
text_to_encode = "Symbol " + chr(166) + " empty set: " + chr(216)
print("Text to encode:")
print(text_to_encode)

# encoding string
encoded_text = text_to_encode.encode(encoding='utf8')

# printing the encoded string
print ("\nThe encoded text is:")
print (encoded_text )

# printing the original decoded string
print ("\nThe decoded text is:")
print (encoded_text.decode('utf8', 'strict'))



____
### Now let's open a file

```python
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
  
fileObj = open(fileName, ‘r’) # open file for reading, r+
fileObj = open(fileName, ‘w’) # open file for writing, w+
fileObj = open(fileName, ‘a’) # open file for  appending, a+
```

In [None]:
# open a file 
# if it is read mode and the file does not exist it give an error

test_file = open("test.txt")

In [None]:
# open a file 
# if it is write mode and the file does not exist it will be created

test_file = open("test.txt", mode = "w")

In [None]:
# check out the type

type(test_file)

In [None]:
# display the object 

test_file

In [None]:
# seems to be a lazy-loading object - let's try to list it
# will give an error due to being open for writing only

# this will only work when opened for reading but 
# do not try this for very large files 
# each line will be a item in the list

list(test_file)

In [None]:
# check out what this object can do
# display only methods without _ from the dir list

for elem in dir(test_file):
    if "_" not in elem:
        print(elem)

#### Write text to the file - use the method write

```python
help(test_file.write)
```

```
Help on built-in function write:

write(text, /) method of _io.TextIOWrapper instance
    Write string to stream.
    Returns the number of characters written (which is always equal to
    the length of the string).
```

In [None]:
# Writes text as is, if you want to write on a new line you have to write "\n"

test_file.write("Hello! ")

In [None]:
# make sure to close the file when done
# once closed - you have to open it again to read from it or write to it

test_file.close()

In [None]:
test_file.write("Writing some text.\n")

<b>Open for append and Write</b>

In [None]:
# open file and write lines into a file 
# buffering = 1 will write to the disk as soon as it finds a newline character

test_file = open("test.txt", mode = "a", buffering = 1)
test_file.write("Writing some text.\n")


In [None]:
test_file.write("Writing another line (unfinished).")


In [None]:
test_file.write("Still writing on the second line.\nAnd starting the third. ")

In [None]:
test_file.write("DONE on the third line.")

<b>Close file</b>

In [None]:
# close()
help(test_file.close)


<b>Open file and then Read file content using the `read` method</b>
* by default all content is read (size = -1)
* if set to a number other than -1, the number assigned to size is how many characters will be read

In [None]:
help(test_file.read)

In [None]:
# open file and read file contents

# do not try this for very large files 
# all file content will be set in a string

test_file = open("test.txt", "r")
res = test_file.read()
print(res)
test_file.close()



<b>Read lines - use the `readlines` method to read all lines in a list</b>

In [None]:
# do not try this for very large files 
# each line will be a item in the list

test_file = open("test.txt", "r")
res = test_file.readlines()
print(res)
test_file.close()

In [None]:
# same can be achieved with the list function

# do not try this for very large files 
# each line will be a item in the list

test_file = open("test.txt", "r")
list(test_file)
test_file.close()

<b> The reading will continue from where it left off.</b>


In [None]:
test_file = open("test.txt", "r")
test_file.read() # this reads all
test_file.read() # nothing left to read

<b>Go at position - use the `seek` method - and start reading from there</b>

In [None]:
# seek
test_file.seek(10)
res = test_file.readline()
print(res)

<b>Return current position - use the `tell` method</b>

In [None]:
# tell - check current position
print(test_file.tell())


In [None]:
# read 4 more characters
res = test_file.read(4)
print("Characters read:", res)

# check current position again
test_file.tell()

In [None]:
# remember to close the file
test_file.close()

### Context manager

<b>with: give code context</b>

The special part about with is when it is paired with file handling or database access functionality

In [None]:
# Without with
test_file = open("test.txt",'r')
print(test_file.read())
test_file.close()

In [None]:
# With with :)
with open("test.txt",'r') as test_file:
    print(test_file.read())

The file is opened and processed. <br>
As soon as you exit the with statement, <b>the file is closed automatically</b>.

### <font color = "red">Exercise</font>:   

Open a file `exercise1.txt` and add 10 lines in a for loop.   
Each line should contain: the text Line and then a space and then the number (the number for the first line would be 1).    

`exercise1.txt` should look like:      
```
Line 1       
Line 2       
Line 3       
Line 4       
Line 5       
Line 6       
Line 7       
Line 8       
Line 9      
Line 10
```

### <font color = "red">Exercise</font>:   

Open the file `exercise1.txt` and read each line and take the index and change the index to `number` * 2 - 1. 
For each line, write "This line is " and the new `number` on a line in a new file `exercise2.txt`, then add a new line with a "---" text.

`exercise2.txt` should look like:      
```
Line 1     
---       
Line 3      
---      
Line 5      
---      
Line 7      
---      
Line 9      
---      
Line 11      
---      
Line 13      
---      
Line 15      
---      
Line 17      
---      
Line 19      
---      
```

#### Using a function to process file lines

In [None]:
# parsing files
def parse_line(line):
    return line.strip().split(" ")

with open("test.txt",'r') as test_file:
    line = test_file.readline()
    while (line != ""):
        print(parse_line(line))
        line = test_file.readline()

https://www.tutorialspoint.com/python/python_files_io.htm
https://www.tutorialspoint.com/python/file_methods.htm