# File I/O & Iterator 
***

## Introduction:  
  
You will learn **[file I/O (input and output)](#fileIO)** and **[Iterator](#Iter)** in this notebook.

***  
<span id="fileIO">  
# Topic-1. File Input and Output
## Table of Contents
* 1.1 [Background Knowledge](#BK)  
* 1.2 [Preparation](#Pre)  
* 1.3 Examples  
    *1.3.1 [Example-1 open(),close(),read()](#E1)  
    *1.3.2 [Example-2 readline(),readlines()](#E2)  
    *1.3.3 [Example-3 with open() as file](#E3)  
    *1.3.4 [Example-4 write()](#E4)  
* 1.4 Exercise  
    *1.4.1 [Exercise-1 Make a copy of a file by Read and Write](#EX1)  
    *1.4.2 [Exercise-2 Comparing Different Copying Strategies](#EX2)  
    *1.4.3 [Exercise-3 Comparing 2 Files](#EX3)
</span>  
***
<span id="BK"> 
## 1.1 Background Knowledge  
</span>  

We have known how to use `input` and `print` to interact with keyboard and screen. It is also important to read from and write to files.

The file IO includes the following functions:

1. `open()`
2. `close()`
3. `read()`
4. `readline()`
5. `readlines()`   
***
  
<span id="Pre">  
## 1.2 Preparation (Example Using java.txt)  
</span>  
You can find a file named `java.txt` in the fold. We will first use it to demonstrate the file I/O.  
***
<span id="E1">  
</span>  
  
<span id="review">  
## 1.3.1 Example-1  
</span>  
<span id="read">
</span>
In this example, we will learn `open()`,`close()` and `read()`.  
* There are **2 common used arguments** in `open()`. The first one is the *file name*(remenber to add the file type), the second one is *mode*:  
<span id="diffe"> 
![openmode](./openmode.png)
        (picture retrived from www.programiz.com)  
</span>
* `file.close()` is used to close the corresponding file  
* `file.read()` is used to store all the imformation in the file into memory. The argument in `read()` is the words you want to read, it is defaulted to be all the imformation.

In [1]:
filename = "java.txt"
file = open(filename, "r")

s = file.read()
print(s)
print(type(s))

file.close()

Java has significant advantages over other languages and environments that make it suitable for just about any programming task.
The advantages of Python are as follows:
1. Java is easy to learn.
2. Java was designed to be easy to use and is therefore easy to write, compile, debug, and learn than other programming languages.
3. Java is object-oriented. This allows you to create modular programs and reusable code.
<class 'str'>


In [36]:
filename = "java.txt"
with open(filename, "r") as file:
    s = file.read(10)
    print(s)
    print(len(s))
    
    s = file.read(10)
    print(s)
    print(len(s))

Java has s
10
ignificant
10


***  
<span id="E2">  
</span>  
<span id="while">  
## 1.3.2 Example-2  
</span>  
In Example-1 we use `read()` to store all the imformation in the file in memory. In Example-2, we can try other methods: `readline()` and `readlines()`.  
* `readline()` will read the file line by line, and return one line as a *string* once time. 
* `readlines()` will read the file and return a *list* whose elements are lines in file.

In [2]:
filename = "java.txt"
file = open(filename, "r")

s = file.readline()   
while s: # while s means the loop will not end until it read all the lines
    print(s)
    print("-"*30)
    s = file.readline()
else:
    print(s, type(s), len(s)) 
file.close()

Java has significant advantages over other languages and environments that make it suitable for just about any programming task.

------------------------------
The advantages of Python are as follows:

------------------------------
1. Java is easy to learn.

------------------------------
2. Java was designed to be easy to use and is therefore easy to write, compile, debug, and learn than other programming languages.

------------------------------
3. Java is object-oriented. This allows you to create modular programs and reusable code.
------------------------------
 <class 'str'> 0


In [17]:
filename = "java.txt"
file = open(filename, "r")

s = file.readlines()      
print(s)
print(type(s))

file.close()


['Java has significant advantages over other languages and environments that make it suitable for just about any programming task.\n', 'The advantages of Python are as follows:\n', '1. Java is easy to learn.\n', '2. Java was designed to be easy to use and is therefore easy to write, compile, debug, and learn than other programming languages.\n', '3. Java is object-oriented. This allows you to create modular programs and reusable code.']
<class 'list'>


***  
<span id="E3">  
## 1.3.3 Example-3  
</span> 
We sometimes use `with open() as file:` instead of `open()` and `close()`, because it is safer. If there are some bugs between `open()` and `close()`, the program will end without close the file. For example:  
![f](./fault2.png)  
The program will end without close the file, it is dangerous. However, when use `with` the danger can be avoided.

In [24]:
filename = "java.txt"
with open(filename, "r") as file:
    s = file.read()
    print(s)
    print(type(s))

Java has significant advantages over other languages and environments that make it suitable for just about any programming task.
The advantages of Python are as follows:
1. Java is easy to learn.
2. Java was designed to be easy to use and is therefore easy to write, compile, debug, and learn than other programming languages.
3. Java is object-oriented. This allows you to create modular programs and reusable code.
<class 'str'>


***  
<span id="E4">  
## 1.3.4 Example-4  
</span> 
You can change the mode to `"w"` or `"a"` and use `write()` to write something.  
**Remenber** [the difference](#diffe) between `"w"` and `"a"`.  
Also note that there are **only one** argument in `write()`, if you want to write two or more string, you should use   
`file.write("hello"+"world!")`   
instead of `file.write("hello","world!")`

In [4]:
filename = "java2.txt"
with open(filename, "w") as file:     
    file.write("hello world!"+"hello world!\n")   
    file.write("hello world!")

### Result:   
>A new file named java2.txt is created and written:
![java21](./java21.png) 

In [5]:
filename = "java2.txt"
with open(filename, "a") as file:   
    file.write("hello!")

### Result:   
>Something is appended to the file java2.txt:
![java22](./java22.png) 

***  
<span id="EX1">  
</span>  
<span id="E1"> 
## 1.4.1 Exercise-1: Make a copy of a file by Read and Write
</span>
Define a function `copy_file(input_filename, output_filename)`, copy `input_filename` to `output_filename` using the following 3 strategies.

1. Read all data from `input_filename` into a string, then write it to `output_filename`.
2. Use a for loop. Read a character from `input_filename` and write it to `output_filename`. Repeat it until we have done copying.
3. Similar to method 2. But this time, we read a `batch` (define it as a parameter) of characters every time. For example, we read `batch=1024` characters every time.
  
**Sample code here:**  
```python
def copy_file1(input_filename, output_filename):
    with open(input_filename, "r") as file1:
        with open(output_filename, "w") as file2:
            (...) #some code here to read and write
copy_file1("java.txt", "java1.txt")
```  
**Ideal output:**  
![java1](./java1.png)
**Review [this](#review) if you forget.**  
***
## 1.4.1(1)-Solution

In [6]:
def copy_file1(input_filename, output_filename):
    with open(input_filename, "r") as file1:
        with open(output_filename, "w") as file2:
            s = file1.read()       
            file2.write(s)
copy_file1("java.txt", "java1.txt")

## 1.4.1(2)-solution  
Because you are required to read the text one character each time, it is a good idea to use a `while` loop. You can review the `while` loop used in file I/O [here](#while)

In [15]:
def copy_file2(input_filename, output_filename):
    with open(input_filename, "r") as file1:
        with open(output_filename, "w") as file2:
            s = file1.read(1)
            while s:
                file2.write(s)    # every time the memory will be freed 
                s = file1.read(1)
copy_file2("java.txt", "java2.txt")

## 1.4.1(3)-solution  
The idea is similar to 2-solution. You can just change the number of characters you `read()` each time.

In [18]:
def copy_file3(input_filename, output_filename, batch = 1024):
    with open(input_filename, "r") as file1:
        with open(output_filename, "w") as file2:
            s = file1.read(batch)
            while s:
                file2.write(s)
                s = file1.read(batch)
copy_file3("java.txt", "java3.txt")

***
<span id="EX2">  
## 1.4.2 Exercise-2: Comparing Different Copying Strategies  
</span>
**Think about which one is better for what cases.**  
1. **Preparation:**  
Use the following `generate_large_file` function to generate a large file. Then test it with the 3 strategies.  
2. **Comparison**  
Compare the time it uses to copy such a large file using the `copy_file()` function you defined in [Exercise-1](#E1).  
**sample code here:**  
```python
import time
t1 = time.time()
copy_file1("large.txt", "large1.txt")
t2 = time.time()
print(t2 - t1)
```   

## 1.4.2-Preparation

In [24]:
def generate_large_file(
    filename = "large.txt", 
    size = 1000*1024*10 # About 266 MB
):
    import random
    a = "".join((chr(i) for i in range(97,123)))
    with open(filename, 'w') as f:
        for i in range(size): 
            f.write(a)
generate_large_file()

## 1.4.2-Solution  


In [25]:
import time

t1 = time.time()

copy_file1("large.txt", "large1.txt")

t2 = time.time()
print(t2 - t1)


3.6432602405548096


In [16]:
import time

t1 = time.time()

copy_file2("large.txt", "large2.txt")

t2 = time.time()
print(t2 - t1)

112.5853660106659


In [19]:
import time

t1 = time.time()

copy_file3("large.txt", "large3.txt", 1024)

t2 = time.time()
print(t2 - t1)

2.4308226108551025


In [20]:
import time

t1 = time.time()

copy_file3("large.txt", "large4.txt", 2048)

t2 = time.time()
print(t2 - t1)

2.3467276096343994


In [21]:
import time

t1 = time.time()

copy_file3("large.txt", "large5.txt", 4096)

t2 = time.time()
print(t2 - t1)

2.28888201713562


## 1.4.2 Summary  of Exercise-2  
 **Using a Batch Chunk Can Improve I/O Efficiency**
1. As we can see, if we read / write **chunks** of data, we can have better performance. The reason behind it is related to the computer architecture. You can think that, every read has some **startup** cost $f(x) = c_0 + c_1x$. Therefore, reading a batch of data is more efficient.  2. **However**, sometimes (especially when the file is very large) we cannot directly use `read()` to store all the function into memory because it may exceed computer's memory.   
3. Moreover, the **batch size** is related to the hardware. Usually, a block of memory / the bandwith of a disk is a multiple of KB (that is $n \times 1024$ bytes). If you are interested, try to use different batch size and do more experiments!  
***

<span id="EX3">  
## 1.4.3 Exercise 3: Comparing 2 Files  
</span>

Write a function `compare(file1, file2)`. Find the first position that they differ with each other. Return the position or -1 if two files are the same.

1. write a function `find_difference(batch1, batch2)` to compare a batch  
2. write the function `compare(file1, file2)` with the help of `find_difference(batch1, batch2)`  

**Review the funciton `read()` [here](#read) if forget.**

## 1.4.3-Solution
1, every tiem we read `batch` of characters in file1 and file2.  
2, if `a == b`, we read the next batch  
3, if `a != b`, then we find the first position they are different.

In [27]:
def find_difference(batch1, batch2):
    for i in range(min(len(batch1), len(batch2))):
        if batch1[i] != batch2[i]:
            return i
    else:
        return -1

In [29]:
# Compare 2 files

file1 = "java1.txt"
file2 = "large.txt"

def compare(file1, file2, batch_size = 10):
    with open(file1, "r") as f:
        with open(file2, "r") as g:
            cnt = -1
 
            s1 = f.read(batch_size)
            s2 = g.read(batch_size)

            while s1 and s2:    
                cnt += 1
                s1 = f.read(batch_size)
                s2 = g.read(batch_size)
                if s1 != s2:
                    return cnt * batch_size + find_difference(s1, s2)
    return -1

compare(file1, file2)
# the output is 0 means they differ in the first character

0

***
<span id="Iter">  
# Topic-2 Iterator and Generator  
</span>  
## Table of Contents  
* 2.1 [Introduction](#introduction)  
* 2.2 Examples
    * 2.2.1 [Example-1 Syntax of a generator](#2E1)  
    * 2.2.2 [Example-2 Convert Between Iterator and List](#2E2)  
    * 2.2.3 [Example-3 Implement function iter](#2E3)  
* 2.2 Exercise  
    * 2.3.1 [Exercise-1: Implement function range](#2EX1)  
    * 2.3.1 [Exercise-2: Implement function list_](#2EX2)  
*** 
<span id="introduction">   
## 2.1 Introduction  
</span>
Iterator is a special type of function. It uses `yield` to return "stages" of results and it uses `return` to terminate. The greatest benefit of an iterator is that it saves you memory!

Iterators are everywhere: `range` is a frequently used iterator.

We can use `sys.getsizeof()` to get the size of a value. Compare a list and a iterator!

In [100]:
import sys
li = list(range(100))
sys.getsizeof(li), sys.getsizeof(range(100))

(1008, 48)

In [99]:
import sys
li = list(range(10000))
sys.getsizeof(li), sys.getsizeof(range(10000))

(90112, 48)

We can see that the size of a iterator `range()` is **much smaller** than the list. So how to convert a list into a iterator?   
***

<span id="2E1">  
## 2.2.1 Example-1  
</span>
 **Syntax of a generator**  
 We can define a function using `yield` to return the stage, this is a kind of simple generator. When we want to get the stage we call `next()`  

  
```python
def function(...):
    yield result1
    yield result2
    return
```

In [101]:
def iterator():
    yield 1
    yield 2
    yield 3
    return

i = iterator()

print(i)
print(next(i))
print(next(i))
print(next(i))
# you cannot call the next again 

<generator object iterator at 0x000001DC30A772C8>
1
2
3


***
<span id="2E2">   
## 2.2.2 Example-2  
</span>  
**Convert Between Iterator and List**  
  
we can directly use **`iter()`** to convert list into iterator

In [107]:
li = [1, 2, 3]
iterator = range(10)


new_iterator = iter(li)
new_li = list(iterator)

for i in new_iterator:
    print(i)

1
2
3


***
<span id="2E3">  
## 2.2.3 Example-3  
</span>  
**Implement function iter**

Can we implement the function `iter` that can generate an `iterator`?  
A `for` loop to achieve this.  

we can use this function to check:  
```python
for i in iterator:
    print(i)
```  

In [30]:
def iter_(li):
    for i in li:
        yield i
li = [1, 2, 3]
new_iterator = iter_(li)
print(new_iterator)

for i in new_iterator:
    print(i)

<generator object iter_ at 0x000001F3FEBF9A48>
1
2
3


***
<span id="2EX1">  
## 2.3.1 Exercise-1: Implement function `range_`  
</span>

Remember what `range` looks like? 

1. `range(stop)`
2. `range(start, stop)`
3. `range(start, stop, step)`

Try to reimplement it `range_` that has the same functionality as `range`.

## 2.3.1-Solution  
1. we should first set parameters in the function `range_()` and initialize them as the default value.   
2. Then the function should give value to the parameters according to the parameters users input.   
3. Finally, judge whether step is large or smaller than 0, then use a `while` loop to yield the value.

In [31]:
def range_(*parameters):
    start = 0
    stop = None
    step = 1
    
    assert(len(parameters)) >= 1 and len(parameters) <= 3
    
    if len(parameters) == 1:
        stop = parameters[0]
    elif len(parameters) == 2:
        start, stop = parameters
    elif len(parameters) == 3:
        start, stop, step = parameters
        
    i = start
    
    assert(step != 0)
    
    if step > 0:
        while i < stop:
            yield i
            i += step
    elif step < 0:
        while i > stop:
            yield i
            i += step

In [33]:
print(type(range_(100,10,-10)))
list(range_(100,10,-10))

<class 'generator'>


[100, 90, 80, 70, 60, 50, 40, 30, 20]

***
<span id="2EX2">  
## 2.3.2 Exercise-2: Implement function `list_`  
</span>

What `list` does is the opposite of `iter`. Try to reimplement a function `list_` that convert a iterator into a `list`.

## 2.3.2-Solution  
Use a `for` loop to achieve it

In [123]:
def list_(iterator):
    li = []
    for i in iterator:
        li.append(i)
    return li

In [124]:
list_(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

***
***
+ Author: Guochao Xie, Peiran Qin  
+ Editor: Peiran Qin  
+ Last modified: 2020-05-31