# Session 3: File Input/Output

## File I/O: Basic

* Data on a computer is usually stored in **files**
* From the view of the operating system, a file is just a sequence of
    bits that is given a name
* What data is stored in a file and how exactly it is stored in a file
    is defined in a **file format**
* The file format defines what the bits mean to the program that is reading/writing
    the file
* ***Note:*** The **file extension** (e.g. whether the name of a file ends
    in **.txt** or **.doc** does not determine the file format (it is
    just a name) -- but it makes sense to name files according to their format

## File I/O: Writing to a Text File

* A very common and useful file format is one where the sequence of bits
    is interpreted as sequence of characters
* This conversion is performed with respect to a character set
    (such as ASCII or UTF-8, but let's not worry about that here...)
* In Python, such **text files** can be manipulated very easily, by
    reading/writing their contents to/from strings
* Using the `open()` function one can obtain a reference to a
    **file** object that provides methods for reading and writing (e.g.
    `read()` and `write()`)

## File I/O: Text Files

### File I/O: Writing to a text file:

Opening a text file for writing 
자세한 사항은 다음을 참조하세요
https://wikidocs.net/26

In [309]:
import os
path = "C:\\Users\\user\\Dropbox\\Class\\2020_2\\programming\\week2"
os.chdir(path) 


In [310]:
f = open('my_first_file.txt', 'w') # writing w:erase and write, r:read only, a: add to the previous version of the file.


In [311]:
f.write('Hello world!\n')
f.write("Hello There")

11

In [312]:
f.close()

We can now read this file again:

In [4]:
f = open('my_first_file.txt', 'r')

FileNotFoundError: [Errno 2] No such file or directory: 'my_first_file.txt'

In [5]:
f = open('my_first_file.txt', 'w')

In [314]:
line = f.readline() #한줄씩 읽음
print(line)

Hello world!



In [315]:
line2 = f.readline() # 그 다음줄
print(line2)

Hello There


In [316]:

line3 = f.readline()
print(line3)
#f.close()




Write can be called multiple times to write more data:

In [1]:
print("I am {0} teaching this course. I am {2} kind {1}.".format("Jaeyoun", "Man", "very") )

I am Jaeyoun teaching this course. I am very kind Man.


In [2]:
print("I am {0}, and I {1} music".format("Jaeyoun", "hate"))
print("I am %s" %"Jaeyoun")
print("I am %d" %12)

I am Jaeyoun, and I hate music
I am Jaeyoun
I am 12


In [3]:
f = open("animals.txt", "w")
for animal in ["Animal\tFood","Sloth\tLeaves", "Chicken\tCorn", "Ant_eater\tAnts", "Penguin\tFish", "Armadillo\tIce_cream\n"]:
    print("%s\n" % animal)
    f.write("%s\n" % animal)
f.close()

Animal	Food

Sloth	Leaves

Chicken	Corn

Ant_eater	Ants

Penguin	Fish

Armadillo	Ice_cream




In [6]:
f=open("animals.txt", "r")
line = f.readline()
print(line)
line2 = f.readline()
print(line2)
line3 = f.readline()
print(line3)
f.close()


Animal	Food

Sloth	Leaves

Chicken	Corn



## File I/O: Reading from a Text File:

#### Reading the content of a text file using the `readlines()` function:

The `readlines()` function reads an entire text file into a list of strings, where each list entry corresponds to a line in the file

In [7]:
f = open("animals.txt", "r")

In [8]:
lines = f.readlines()

In [9]:
print(lines)


['Animal\tFood\n', 'Sloth\tLeaves\n', 'Chicken\tCorn\n', 'Ant_eater\tAnts\n', 'Penguin\tFish\n', 'Armadillo\tIce_cream\n', '\n']


Use of `strip` commands.

In [10]:
a="    a b c d"+"b"
print(a)
print(a.lstrip(" "))


    a b c db
a b c db


In [325]:
a="abcd    "   
a+"b"

'abcd    b'

In [11]:
a="abcd    "    
b=a.rstrip(" ")+"b"
print(b)

abcdb


Now print each line:

In [12]:
str = " A B C D "   
print(str)
print( str.lstrip())       
print( str.rstrip() + "E") 
print( str.strip() + "E")  

 A B C D 
A B C D 
 A B C DE
A B C DE


The `print` statement inserts `\n` after automatically, without removing the already present `\n` characters with `rstrip()` we end up with empty lines!

#### Reading the content of a text file line by line:

Because processing each line in a file is such a common operation,
  Python provides the following simple syntax

In [328]:
f = open("animals.txt", "r")
a=list(f)
print(a)

['Animal\tFood\n', 'Sloth\tLeaves\n', 'Chicken\tCorn\n', 'Ant_eater\tAnts\n', 'Penguin\tFish\n', 'Armadillo\tIce_cream\n', '\n']


In [329]:
f = open("animals.txt", "r")
lines = f.readlines()
print(lines)

['Animal\tFood\n', 'Sloth\tLeaves\n', 'Chicken\tCorn\n', 'Ant_eater\tAnts\n', 'Penguin\tFish\n', 'Armadillo\tIce_cream\n', '\n']


In [330]:
for line in a:
    print(line.rstrip())
f.close()

Animal	Food
Sloth	Leaves
Chicken	Corn
Ant_eater	Ants
Penguin	Fish
Armadillo	Ice_cream



This iterates over the file line by line instead of reading in the whole content in the beginning!

#### And because python makes your life easy, here an even shorter version:

In [331]:
#R: print(aaa$bbb)  -->>    with(aaa, print(bbb)) 
with open("animals.txt", "r") as infile10:  #infile = open("animals.txt", "r")
    for line in infile10:
        print(line.rstrip())

Animal	Food
Sloth	Leaves
Chicken	Corn
Ant_eater	Ants
Penguin	Fish
Armadillo	Ice_cream



Using `with` removes the necessity to call the `close()` function on your file object!

## File I/O: Transforming a File:

* When working with data provided by other programs (and/or other
    people), it is often necessary to convert data from one format to another


The file that we wrote contained columns separated by tabs; what if we
    need commas? --> This is hard for Jaeyoun, hence leave it for exercise for student.

In [336]:
with open("animals.txt", "r") as infile:
    for line in infile:
        print(line, line.split(), ",".join(line.split())  )

Animal	Food
 ['Animal', 'Food'] Animal,Food
Sloth	Leaves
 ['Sloth', 'Leaves'] Sloth,Leaves
Chicken	Corn
 ['Chicken', 'Corn'] Chicken,Corn
Ant_eater	Ants
 ['Ant_eater', 'Ants'] Ant_eater,Ants
Penguin	Fish
 ['Penguin', 'Fish'] Penguin,Fish
Armadillo	Ice_cream
 ['Armadillo', 'Ice_cream'] Armadillo,Ice_cream

 [] 


In [13]:
import os
with open("animals.txt", "r") as infile:
    with open("animals.csv", "w") as outfile:
              for line in infile:
                outfile.write(",".join(line.split()))
                outfile.write('\n')

Lets check everything worked...

In [14]:
with open("animals.csv", "r") as infile:
          for line in infile:
              print(line.rstrip())

Animal,Food
Sloth,Leaves
Chicken,Corn
Ant_eater,Ants
Penguin,Fish
Armadillo,Ice_cream



Looking good!

### Example 1.
1> Open animals.csv by recognigizing the first line as header

hint: Read https://thispointer.com/python-read-a-csv-file-line-by-line-with-or-without-header/

2> Read animals.csv that you have made, and write the ADDITIONAL header, which is animal and food.

hint: Use csv package to do this. Read https://jdhao.github.io/2018/05/13/read-write-csv-file-with-header/



In [15]:
#from csv import reader
with open("animals.csv", "r") as infile:
          header = next(infile)
          for line in infile:
                print(line.rstrip())

header

Sloth,Leaves
Chicken,Corn
Ant_eater,Ants
Penguin,Fish
Armadillo,Ice_cream



'Animal,Food\n'

In [17]:
import csv

In [18]:
with open("animals.csv", "r", newline='') as infile:
    infile = csv.reader(infile, delimiter=",")
    header = next(infile)
    infile = list(infile)
    for line in infile:
        print(line)


['Sloth', 'Leaves']
['Chicken', 'Corn']
['Ant_eater', 'Ants']
['Penguin', 'Fish']
['Armadillo', 'Ice_cream']
[]


In [19]:
infile = open("animals.csv", "r")
ff = infile.readlines()
print(ff)
infile.close()

['Animal,Food\n', 'Sloth,Leaves\n', 'Chicken,Corn\n', 'Ant_eater,Ants\n', 'Penguin,Fish\n', 'Armadillo,Ice_cream\n', '\n']


In [20]:
outfile = open("anmals2.csv", "w", newline='')
infile = open("animals.csv", "r")
infile2 =  csv.reader(infile)
infile3 = list(infile2)
print(infile3)
header = ['Animal', 'Food']
writer = csv.writer(outfile, delimiter=',')
writer.writerow(header)
for line in infile3:
    print(line)
    writer.writerow(line)
infile.close()
outfile.close()

[['Animal', 'Food'], ['Sloth', 'Leaves'], ['Chicken', 'Corn'], ['Ant_eater', 'Ants'], ['Penguin', 'Fish'], ['Armadillo', 'Ice_cream'], []]
['Animal', 'Food']
['Sloth', 'Leaves']
['Chicken', 'Corn']
['Ant_eater', 'Ants']
['Penguin', 'Fish']
['Armadillo', 'Ice_cream']
[]



## File I/O Pickling: (similar to .RData in R)

* Text files are convenient when data needs to be exchanged with other
    programs
* However, getting the data in/out of text files can be tedious
* If we know we only need the data within Python, there is a very easy
    way to write arbitrary Python data structures to compact binary files
* This is generally referred to as **serialization**, but in
    Python-lingo it's called **pickling**
* The **pickle** module and it's more efficient **cPickle**
    version provide two functions, `dump()` and `load()`, that
    allow writing and reading arbitrary Python objects

In [21]:
#from pickle import dump, load
import pickle

In [22]:
l = ["a", "list", "with", "stuff", [42, 23, 3.14], True]
print(l)
len(l)

['a', 'list', 'with', 'stuff', [42, 23, 3.14], True]


6

In [23]:
with open("my_list.pkl", "wb") as f: #write wb
    pickle.dump(l, f)

In [24]:
with open("my_list.pkl", "rb") as f: #read rb
    l2 = pickle.load(f)
l2

['a', 'list', 'with', 'stuff', [42, 23, 3.14], True]

## File I/O Checking for Existence: Below is free Exercise!!!

* Sometimes a program needs to check whether a file exists
* The `os.path` module provides the `exists()` function

In [25]:
from os.path import exists

In [248]:
if exists("C:\\Users\\user\\Dropbox\\Class\\2020_2\\programming\\week2\\animals.csv"):
    print("Lockfile exists!")
else:
    print("No lockfile found!")

Lockfile exists!


In general, the  `os` and `os.path` modules provide functions for manipulating the file systems. Don't try to reinvent the wheel - most things exist already in the Python standard library!

In [26]:
#Print the current working directory
import os
print(os.getcwd())

C:\Users\esthe\python_env


## File I/O: Reading from the Web:

* In Python, there are several other objects that behave just like text
    files
* One particularly useful one provides file-like access to resources on
    the web: the `urlopen()` method in the `urllib2` module

In [9]:
from urllib.request import urlopen

In [5]:
URL = "http://www.gutenberg.org/cache/epub/28885/pg28885.txt"

In [16]:
if not exists("alice.txt"):
    #req=urllib.request
    f = urlopen(URL)
    with open("alice.txt", "wb") as outfile:
        outfile.write(f.read())

In [17]:
print(''.join(open("alice.txt").readlines()[970:975]))

middle of one! There ought to be a book written about me, that there
ought! And when I grow up, I'll write one--but I'm grown up now," she
added in a sorrowful tone; "at least there's no room to grow up any more
_here_."




In [27]:
with open("alice.txt", "r") as infile: # rb
    book = infile.readlines()
    print(''.join(book[1000:1005]))

hand, and made a snatch in the air. She did not get hold of anything,
but she heard a little shriek and a fall, and a crash of broken glass,
from which she concluded that it was just possible it had fallen into a
cucumber-frame, or something of the sort.




## File I/O Multiple Files:

The `glob` module provides an easy way to find all files with certain names (e.g. all files with names that end in `.txt`)

In [28]:
import glob

In [29]:
text_files = glob.glob("*.txt")

In [30]:
for t in text_files:
    print(t)

alice.txt
animals.txt
library_data.txt
my_first_file.txt


## File I/O Terminal streams:

* The terminal input/output streams can also be accessed like filesusing the `stdin` and `stdout` objects from the `sys` module

In [31]:
import sys

In [32]:
sys.stdout.write("Another way to print!\n")

Another way to print!


In [33]:
aaa = "a aa aa b b"

In [34]:
aaa.split()

['a', 'aa', 'aa', 'b', 'b']