<a href="https://colab.research.google.com/github/doi-shigeo/KMITL-CE-Programming2/blob/main/Programming2_Week02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# File Input/Output

You often use various files through operating systems, such as a text , Office, archive, databases, and so on.

Once you create a file or database, it will be preserved in the file system generally, even if you turn off the power of your PC or you intentionally delete it.

## Simple File Input/Output 

One of the easiest way to store data is using a file. You can see the file through Explorer (for Windows). A file has a name (generally it includes an extension) and a byte sequence. An operating system identifies what the file type is by the extension and a part of the byte sequence (It is called "Magic Number". we'll explain in Reverse Engineering).

Anyway, you often edit a text file. In the below, you will know how to access a file by Python. You take three steps to access (read/write) a file.

1. Open a file
2. Read/Write (Access) the file
3. Close the file

### Open a file

```
fileobj = open(filename, mode)
```

`fileobj` is a "file object" that is returned by `open()` function. This `fileobj` is used for accessing the file.

`filename` is a filename, a string.

`mode` consists of 2 letters.
The first letter can take the following values:
- `r`: Read mode. If no file is existed, an exception will be thrown (Exception: will be explained later).
- `w`: Write mode. If the file of  `filename` already exists, then the file is overwritten.
- `x`: Write mode. It only works with if the file of `filename` doesn't exist.
- `a`: Append mode. It will append the data from the last of `filename`.

The second letter can take the type of data.
 Generally, "text" means MSB(Most  Significant Bit) of the data is always 0. In other words, the code is represented 0 to 127 in decimal. ), "binary" means MSB takes 0 or 1.
- `b`: Binary Mode
- `t`: Text Mode

The difference between "Binary Mode" and "Text Mode" is that,
"Binary mode" means it reads data as binary and "Text mode" means it reads data as a specified encoding (default: UTF-8).
When printing data is binary (binary string), you can see the prefix "b'". If the data is text, you can see no prefix. You can distinguish if the data is binary or text.

Once you open a file, you must close the file until the end of the program. 
You should use `with` statement (it will explain later) if possible.







### Access the file

#### Read a file

You can use following functions to read a file. You can use these functions for binary mode and text mode:
- `read()`: Read all the data of the file.
- `read(size)`: Read `size` bytes of the file. 
- `readline()`: Read one line of the file. The end of each line is a '\n'.
- `readlines()`: Read all the data of the file. Each line is stored as a list. 

Below is the program to read and print the file named "rfc0001.txt".
Before running the program, you must upload "rfc0001.txt" to the Google Drive.
The file will be deleted if you close the Google Colab.

1) Click "Explorer" icon in the leftmost pane. (The 4th order from the top)
2) Click "Upload Icon" near "Files".
3) Explorer launched. Then select an file to upload and click "Open".

Example 1: The difference between "Binary mode" and "Text mode". Run the two program below and check the difference between them.

In [None]:
# Binary Mode
fp = open("rfc0001.txt" ,"rb") # Open a file as Read and Binary mode
l = fp.readline()
print(l)
fp.close()


b'\n'


In [None]:
# Text Mode
fp = open("rfc0001.txt" ,"rt") # Open a file as Read and Text mode
l = fp.readline()
print(l)
fp.close()






Please take care that the prefix `b` is added to the string.

Example 2: Examples of a whole file.


Read whole data (This lines can be used for binary mode)
Generally, this program is simple but works well.

In [None]:
fp = open("rfc0001.txt", "rt")
txt = fp.read()
fp.close()
print(txt)

Read whole data of Text as binary
You must understand what encoding of the file is and appropriate decode is required.

In [8]:
fp = open("rfc0001.txt", "rb")
bin = fp.read()
txt = bin.decode('UTF-8') # 'UTF-8' can be omitted because the default encoding character is UTF-8.
print(txt)
fp.close()

Hello, nice to meet youMy name is Anonymous


Read partial data (separated by constant bytes) and whole data finally.
This program is useful for the environment with limited memory.

In [None]:
buf = ''
chunk_size = 256
fp = open("rfc0001.txt", "rt")
while True:
  chunk = fp.read(chunk_size)
  if not chunk: # if chunk is equal to '' (empty string), then 'not chunk' will be true.
    break
  buf += chunk
fp.close()
print(buf)

Read partial data (separated by each line) and whole data finally.
Please beware of using `read(size)` and `readline()`. 

In [None]:
buf = ''
fp = open("rfc0001.txt", "rt")
while True:
  l = fp.readline()
  if not l: # if chunk is equal to '' (empty string), then 'not chunk' will be true.
    break
  buf += l
fp.close()
print(buf)

#### Write a file

When you want to write a file, then you open the file before writing.
If you finished to use the file, you must close the file.

The following example is to open a file "output.txt" with write mode.
```
fp = open("output.txt", "wt") # output.txt is overwritten
fp = open("output.txt", "at") # output.txt is open for appending
```

To write a text or a binary into a file, you can use either `write(text_or_binary)`. If you want to add a newline (`'\n'`), then you should put `'\n'` explicitly.



Example: Write strings into a text file 'something.txt', "something_with_newline.txt" and "something_binary.txt".

You can confirm the new file by clicking a folder mark in the leftmost pane.

(1) Print a string to a file

In [5]:
str_1 = "Hello, nice to meet you"
str_2 = "My name is Anonymous"

# When you want to output a string, you should open a file with text mode.
fp = open("something.txt", "wt")
fp.write(str_1)
fp.write(str_2)
fp.close()


(2) Print a string with new line by adding '\n' at the end of line

In [None]:
str_1 = "Hello, nice to meet you"
str_2 = "My name is Anonymous"

fp2 = open("something_with_newline.txt", "wt")
fp2.write(str_1 + '\n')
fp2.write(str_2)
fp2.close()

(3) Print a string as a binary mode (You need to convert a string to **bytes**, which means the binary data)
The default encoding of Python3 is "UTF-8", 

In [7]:
str_1 = "Hello, nice to meet you"
str_2 = "My name is Anonymous"

fp3 = open("something_binary.txt", "wb") # open with binary mode
#fp3.write(str_1 + '\n') # If you remove the # at the beginning, this will be an error. When opening in binary mode, the arguments for write() must be a `bytes`
fp3.write(str_1.encode('UTF-8')) # To write binary mode, you must specify enconding and convert string to UTF-8 eplicitly
fp3.write(str_2.encode()) # encode() is the same as encode('UTF-8')
fp3.close()
#fp3.write(str_2)

### Practice: Encoding Conversion

There are some text encodings for Thai language. As far as I know, "TIS-620", "ISO-8859-11", "UTF-8" are used to express Thai text in a computer.

Then, write a program to convert text in from UTF-8 to ISO-8859-11. You can assume the encoding of reading file is UTF-8.

This program is considered as a part of `iconv` utility. https://en.wikipedia.org/wiki/Iconv

In [10]:
# When you want to output a string, you should open a file with text mode.
fp = open("text_UTF-8.txt", "rt") # You can rename the filename

# fill the code line(s) to read UTF-8.txt as a UTF-8 encoding.

fp.close()

# When you want to use the different encoding other than UTF-8, you should open the output file as binary mode.
fp = open("text_ISO-8859-11.txt", "wb")
# fill the code line(s) to write the text in a string as ISO-8859-11 encoding.
fp.close()