# Working with Files in Python

So far we just have explored the interpreter and some native data structure now we can start to actually work with data. 

Here we will talk about IO operations, let's starts with the *I* in *IO*, the input.

Let's start first my creating a mock file with 4 lines: a header and 3 lines of "data". We will do this in the terminal (outside Python)

In [2]:
!echo "Column1,Column2,Column3\n1,2,3\none,two,three\nuno,dos,tres" > afile.txt
!cat afile.txt

Column1,Column2,Column3
1,2,3
one,two,three
uno,dos,tres


Now let's get into python and try to read this file:

In [1]:
handle = open('afile.txt')
print(handle)

<_io.TextIOWrapper name='afile.txt' mode='r' encoding='UTF-8'>


In [2]:
text = handle.read()
print(text)

Column1,Column2,Column3
1,2,3
one,two,three
uno,dos,tres



In [3]:
...
handle.close()

Show the type of variable that text is, and point out the difference between handle and read. Also show that read exhaust the handle

### Context manager
Since you always have to remember (or should!!) to close files, let the context manager do it for you:

In [None]:
with open('afile.txt') as handle:
    text = handle.read()
print(text)

In [None]:
...

Show that a closed handle do not read.

<center>
<img src="quiz.png"  width="820" height="700" align="center"/>
</center>

### Reading by line
If your file is too big to fully buffer it on memory or if you only one a few lines, the `read` function might be an overkill. Let's imagine that we only want the headers of the file (or the first line):

In [11]:
with open('afile.txt') as handle:
    header = handle.readline()
print(header)

Column1,Column2,Column3



#### What about reading the full file one line at a time?

In [13]:
with open('afile.txt') as handle:
    for line_num, line in enumerate(handle):
        print("Line {} is {}".format(line_num, line))

Line 0 is Column1,Column2,Column3

Line 1 is 1,2,3

Line 2 is one,two,three

Line 3 is uno,dos,tres



In [None]:
...

<center>
<img src="quiz.png"  width="820" height="700" align="center"/>
</center>

### Writing to files
Before we go to processing, lets see how can we write to a file. So let's make a copy of `afile.txt`:

In [5]:
with open('afile.txt', 'r') as rh, open('afile2.txt', 'w') as wh:
    for line in rh:
        wh.write(line)

In [None]:
...

#### Modes of opening file

As you saw, the function `open` can be:
1. Read mode (`r`): This is the default to read from a file
2. Write mode (`w`): This will create **(or recreate)** a file to write to
3. Append mode (`a`): This will open a file to write, but will append to the end of it instead of recreating

These are the basic modes, and have the modifiers `b` to read/write in binary format and `+` to open the file for updating (reading and writing). The last two modes are a bit more advanced, and we will not cover them.

What does append will look like and why would you like to use it? Say you have the `afile.txt` that you have created either upstream or by a completely different program, and you want to add a line:

In [None]:
new_line = "ichi,ni,san"
with open('afile2.txt', 'w') as wh, open('afile.txt', 'a') as ah:
    wh.write(new_line)
    ah.write(new_line)
...

<center>
<img src="quiz.png"  width="820" height="700" align="center"/>
</center>

### Processing data from files
Say you want to write a program that reads a comma-delimited file with the numbers 1-3 in English, Spanish or Japanese. And let's say we want to output a different file with all in numerical versions in a **TAB**-delimited file:

In [None]:
translate = dict(one=1, two=2, three=3,
                 uno=1, dos=2, tres=3, 
                 ichi=1, ni=2, san=3)
with open('afile.txt', 'r') as infile, open('output.txt', 'w') as outfile:
    for line in infile:
        ...

Let's say that instead we just want to have the file into a list of lists:

In [None]:
...

Introduce here a list comprehension

### Pre-existing data structures
CSV files are extremely common, and there are packages that will allow you to read and write easier. You need to import them though...

In [8]:
import csv
with open('afile.txt') as csvfile, open('afile3.txt', 'w') as outcsv:
    reader = csv.reader(csvfile, delimiter=',', quotechar='"')
    writer = csv.writer(outcsv, delimiter=',')
    for row in reader:
        print(row)
        writer.writerow([row])
        


<_csv.reader object at 0x105df5b50>


In [None]:
import pandas as pd
fi = pd.read_csv('afile.txt', sep=',', header=None)
print(fi)
fo = fi.write('afile4.txt', sep=',', header=False, index=False)

<center>
<img src="quiz.png"  width="820" height="700" align="center"/>
</center>

### Installing packages
**The Python Package Index** or Pypi by:

```bash
pip install <package>
```

In [9]:
!python3 -m pip install pandas

[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.[0m


### Packages of note
1. Pandas: Python Data Analysis Library
2. Numpy: the fundamental package for scientific computing with Python
3. Scipy: Scientific Library for Python
4. Scikit-learn: Python's main machine learning library

...

## Base problem

In [None]:
...