# Welcome to the Dark Art of Coding:
## Introduction to Python
Reading and writing to files

<img src='../images/dark_art_logo.600px.png' width='300' style="float:right">

# Objectives

* Opening and closing files
* Reading .txt files and basic .csv files


# File handling
---

In [2]:
# We start off by opening the file using the open()
# function and assigning a label as a filehandle

fin = open('folder/carroll.txt')

In [5]:
fin.

# File handles
These are what Python uses to refer to files and how to read/write to them. File handles provide access to the following, and more:

* A 'cursor' or 'pointer' used to read and write from the file
* The ability to iterate over the file
* The ability to read from the file in various ways
* The current location of the cursor
* The ability to move the pointer


## Reading


In [6]:
# One method to read in data is using the read() method.

text = fin.read()
print(text)

The Walrus and The Carpenter

Lewis Carroll

(from Through the Looking-Glass and What Alice Found There, 1872)

The sun was shining on the sea,
Shining with all his might:
He did his very best to make
The billows smooth and bright--
And this was odd, because it was
The middle of the night.

The moon was shining sulkily,
Because she thought the sun
Had got no business to be there
After the day was done--
"It's very rude of him," she said,
"To come and spoil the fun!"

The sea was wet as wet could be,
The sands were dry as dry.
You could not see a cloud, because
No cloud was in the sky:
No birds were flying overhead--
There were no birds to fly.

The Walrus and the Carpenter
Were walking close at hand;
They wept like anything to see
Such quantities of sand:
"If this were only cleared away,"
They said, "it would be grand!"

"If seven maids with seven mops
Swept it for half a year.
Do you suppose," the Walrus said,
"That they could get it clear?"
"I doubt it," said the Carpenter,
And shed 

In [7]:
# When you finish interacting with a file, it is important that
#     you close() the file.
# I liken it to 
#     Putting your toys away, when you are done with them.

fin.close()

In [9]:
fin.read()

ValueError: I/O operation on closed file.

In [11]:


fout = open('folder/output.txt', 'w')      # NOTE the 'w' to open the file for writing purposes
fout.write('''Batman:
The Dark Knight
Returns''')
fout.close()

# Navigate to the folder in your file explorer and confirm:
#     * the file exists
#     * the content is present

## Where does Python write to?

**Short answer**: where you tell it to

**Longer answer**: 
    
* Python literally writes where you tell it
* Understanding directory structures on the command line is critical
* The **easiest solution for beginners** is put the script and data in the same folder
* OR, like these scripts, put the data in an adjacent folder

# Experience Points!
---

On the **IPython interpreter** do each of the following:

Task | Sample Object(s)
:---|---
Assign the label `filein` to the results of the `open()` function for this file | `names.txt`
Assign the label `content` to the results of the `read()` function | `.read()`
Print the `content` | `print()`
Close `filein` when you are done | `.close()`

In [None]:
filein = open('folder/names.txt')
content = filein.read()
print(content)
filein.close()

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

In [14]:
# It is not uncommon to chain functions together when all you really need
#     is the text.

text = open('folder/carroll.txt').read()
print(text)
# text

The Walrus and The Carpenter

Lewis Carroll

(from Through the Looking-Glass and What Alice Found There, 1872)

The sun was shining on the sea,
Shining with all his might:
He did his very best to make
The billows smooth and bright--
And this was odd, because it was
The middle of the night.

The moon was shining sulkily,
Because she thought the sun
Had got no business to be there
After the day was done--
"It's very rude of him," she said,
"To come and spoil the fun!"

The sea was wet as wet could be,
The sands were dry as dry.
You could not see a cloud, because
No cloud was in the sky:
No birds were flying overhead--
There were no birds to fly.

The Walrus and the Carpenter
Were walking close at hand;
They wept like anything to see
Such quantities of sand:
"If this were only cleared away,"
They said, "it would be grand!"

"If seven maids with seven mops
Swept it for half a year.
Do you suppose," the Walrus said,
"That they could get it clear?"
"I doubt it," said the Carpenter,
And shed 

In [15]:
# What happens when we attempt to open a file that doesn't exist?

file = open('folder/carroll.txt')

FileNotFoundError: [Errno 2] No such file or directory: 'notHere.txt'

In [16]:
# We can use try/except to do things we think might bring up errors without stopping the program

try:
    file = open('nothere.txt')
except:
    print('FILE NOT FOUND')

FILE NOT FOUND


# There are several primary means of reading in data:
---
```python
* `read()`                     # reads the file in as a single string
* `readline()`                 # reads in one line at a time
* `readlines()`                # reads in all lines, as separate strings in a list 
* `for line in <filehandle>:`  # iterates over each line, one at a time
```

We have seen `read()` in action, let's see some of the others in action

## `.readline()`

In [3]:
# .readline() reads in a single line.

data = open('folder/log_file.header.csv')
line = data.readline()

# print(line)
line

'name,email,from_ip,to_ip,datetime,latitude,longitude,payload\n'

In [1]:
!cat folder/log_file.header.csv

name,email,from_ip,to_ip,datetime,latitude,longitude,payload
barry allen,ballen@jleague.org,246.167.32.21,253.36.206.4,2016-02-08T21:44:22,49.55854,8.87819,32171
barbara gordon,bgordon@jleague.org,253.36.207.192,198.240.252.129,2016-02-07T21:44:28,48.14013,9.07396,34285
kyle rayner,krayner@jleague.org,208.66.182.10,102.230.226.99,2016-02-06T21:44:56,45.28336,10.38742,40287
dinah lance,dlance@jleague.org,246.167.32.76,7.36.164.133,2016-02-06T21:45:51,45.83448,8.70891,688291
arthur curry,acurry@jleague.org,253.36.207.215,7.36.164.0,2016-02-06T21:47:02,49.44709,8.05527,126609
kara zor-el,kzor-el@jleague.org,208.66.183.214,253.36.207.215,2016-02-06T21:47:26,46.22157,10.07309,862129
kara zor-el,kzor-el@jleague.org,208.66.183.214,198.240.252.173,2016-02-06T21:48:03,45.76911,10.33047,648640
hal jordan,hjordan@jleague.org,253.36.207.148,208.66.182.184,2016-02-06T21:49:36,47.99098,7.80398,496563
kara zor-el,kzor-el@jleague.org,253.36.207.148,26.28.209.95,2016-02-06T21:50:41,48.03181,10

In [4]:
# Repeating .readline() will read in another line

next_line = data.readline()

print(next_line)

barry allen,ballen@jleague.org,246.167.32.21,253.36.206.4,2016-02-08T21:44:22,49.55854,8.87819,32171



In [None]:
# Personally, I use .readline() most frequently, when reading in 
#     headers from files

# This allows us to get column headers AND/OR simply get the 
#     header row out of the way.

## `.readlines()`

In [5]:
# .readlines() reads in all the lines AND stores the data as a 
#     list of strings

data = open('folder/log_file.header.csv')
list_of_lines = data.readlines()

print(list_of_lines)

['name,email,from_ip,to_ip,datetime,latitude,longitude,payload\n', 'barry allen,ballen@jleague.org,246.167.32.21,253.36.206.4,2016-02-08T21:44:22,49.55854,8.87819,32171\n', 'barbara gordon,bgordon@jleague.org,253.36.207.192,198.240.252.129,2016-02-07T21:44:28,48.14013,9.07396,34285\n', 'kyle rayner,krayner@jleague.org,208.66.182.10,102.230.226.99,2016-02-06T21:44:56,45.28336,10.38742,40287\n', 'dinah lance,dlance@jleague.org,246.167.32.76,7.36.164.133,2016-02-06T21:45:51,45.83448,8.70891,688291\n', 'arthur curry,acurry@jleague.org,253.36.207.215,7.36.164.0,2016-02-06T21:47:02,49.44709,8.05527,126609\n', 'kara zor-el,kzor-el@jleague.org,208.66.183.214,253.36.207.215,2016-02-06T21:47:26,46.22157,10.07309,862129\n', 'kara zor-el,kzor-el@jleague.org,208.66.183.214,198.240.252.173,2016-02-06T21:48:03,45.76911,10.33047,648640\n', 'hal jordan,hjordan@jleague.org,253.36.207.148,208.66.182.184,2016-02-06T21:49:36,47.99098,7.80398,496563\n', 'kara zor-el,kzor-el@jleague.org,253.36.207.148,26.28.

**NOTE**: each string includes the newline character at the end of the text string.

## `for line in <filehandle>:`

In [6]:
# One of the most useful approaches for handling lines in files
#     is using for loops.

data = open('folder/log_file.header.csv')

for line in data:
    print(line)

name,email,from_ip,to_ip,datetime,latitude,longitude,payload

barry allen,ballen@jleague.org,246.167.32.21,253.36.206.4,2016-02-08T21:44:22,49.55854,8.87819,32171

barbara gordon,bgordon@jleague.org,253.36.207.192,198.240.252.129,2016-02-07T21:44:28,48.14013,9.07396,34285

kyle rayner,krayner@jleague.org,208.66.182.10,102.230.226.99,2016-02-06T21:44:56,45.28336,10.38742,40287

dinah lance,dlance@jleague.org,246.167.32.76,7.36.164.133,2016-02-06T21:45:51,45.83448,8.70891,688291

arthur curry,acurry@jleague.org,253.36.207.215,7.36.164.0,2016-02-06T21:47:02,49.44709,8.05527,126609

kara zor-el,kzor-el@jleague.org,208.66.183.214,253.36.207.215,2016-02-06T21:47:26,46.22157,10.07309,862129

kara zor-el,kzor-el@jleague.org,208.66.183.214,198.240.252.173,2016-02-06T21:48:03,45.76911,10.33047,648640

hal jordan,hjordan@jleague.org,253.36.207.148,208.66.182.184,2016-02-06T21:49:36,47.99098,7.80398,496563

kara zor-el,kzor-el@jleague.org,253.36.207.148,26.28.209.95,2016-02-06T21:50:41,48.03181,10

In [7]:
# A sample of how this could be used...

data = open('folder/log_file.header.csv')

header = data.readline()

for line in data:
    if 'kara' in line:
        print(line)
    else:
        print('N/A')

N/A
N/A
N/A
N/A
N/A
kara zor-el,kzor-el@jleague.org,208.66.183.214,253.36.207.215,2016-02-06T21:47:26,46.22157,10.07309,862129

kara zor-el,kzor-el@jleague.org,208.66.183.214,198.240.252.173,2016-02-06T21:48:03,45.76911,10.33047,648640

N/A
kara zor-el,kzor-el@jleague.org,253.36.207.148,26.28.209.95,2016-02-06T21:50:41,48.03181,10.01841,800746

N/A


**NOTE**: each string includes the newline character at the end of the text string.

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_files_01.py```

Execute your script in the **IPython interpreter** using the command:

```bash
run my_files_01.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally. 

Task | Sample Object(s)
:---|---
Assign the label `my_csv` to the results of the `open()` function for this file | `log_file_1000.csv`
Use a `for` loop to read in the text | `for line in <filehandle>`
Print only lines that have this ip address: `220.211.18.31` on them  | `print()`
Close `my_csv` when you are done | `.close()`

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

In [8]:
# Let's look our previous file again...

for line in data:
    print(line)
    
# When executing this code, nothing happpened!    

## do overs and more...

In [9]:
# One way to do a do-over is to simply reread the file from scratch

data = open('folder/log_file.header.csv')

for line in data:
    print(line)


name,email,from_ip,to_ip,datetime,latitude,longitude,payload

barry allen,ballen@jleague.org,246.167.32.21,253.36.206.4,2016-02-08T21:44:22,49.55854,8.87819,32171

barbara gordon,bgordon@jleague.org,253.36.207.192,198.240.252.129,2016-02-07T21:44:28,48.14013,9.07396,34285

kyle rayner,krayner@jleague.org,208.66.182.10,102.230.226.99,2016-02-06T21:44:56,45.28336,10.38742,40287

dinah lance,dlance@jleague.org,246.167.32.76,7.36.164.133,2016-02-06T21:45:51,45.83448,8.70891,688291

arthur curry,acurry@jleague.org,253.36.207.215,7.36.164.0,2016-02-06T21:47:02,49.44709,8.05527,126609

kara zor-el,kzor-el@jleague.org,208.66.183.214,253.36.207.215,2016-02-06T21:47:26,46.22157,10.07309,862129

kara zor-el,kzor-el@jleague.org,208.66.183.214,198.240.252.173,2016-02-06T21:48:03,45.76911,10.33047,648640

hal jordan,hjordan@jleague.org,253.36.207.148,208.66.182.184,2016-02-06T21:49:36,47.99098,7.80398,496563

kara zor-el,kzor-el@jleague.org,253.36.207.148,26.28.209.95,2016-02-06T21:50:41,48.03181,10

In [10]:
# Try a new file...

data = open('folder/bytes.txt')

# this time, let's read one byte, instead of the whole line

byte = data.read(1)
print('First byte:', byte)

First byte: a


In [11]:
# The file handle maintains the pointer and 
#     knows where we left off in the file.

twobytes = data.read(2)
print('Two bytes: ', twobytes)

Two bytes:  bb


In [12]:
# The next three bytes:

threebytes = data.read(3)
print('Four bytes:', threebytes)

Four bytes: ccc


In [13]:
# The next four bytes

fourbytes = data.read(4)
print('Four bytes:', fourbytes)

Four bytes: dddd


In [14]:
# .readline() doesn't necessarily start at the beginning of 
#     the line... it picks up where it left off and goes to 
#     the end of the current line.

print('Remainder: ', data.readline())

Remainder:  efghi



In [15]:
# The next time we call .readline(), it carries on
#     as we expect.

print('Readline:  ', data.readline())

Readline:   Line 2



In [16]:
# The file handle pointer switches seamlessly between
#     using readline() or other reading mechanisms and
#     for loops

for line in data:
    print('For loop:', line)


For loop: Line 3

For loop: Line 4

For loop: Line 5

For loop: Line 6

For loop: Line 7

For loop: Line 8

For loop: Line 9

For loop: Last line


In [17]:
# Since the file handle uses a pointer, we don't have
#     to reread the file ... we can just rewind the file
#     using .seek()

data.seek(7)

# Read just one byte (the first byte in the file)
byte = data.read(1)
print('Back at the beginning:', byte)

# getting fancy
print('Two bytes:'.rjust(22), data.read(2))

Back at the beginning: d
            Two bytes: dd


In [18]:
# But where are we? in the file...
#     .tell() will let you know what byte you are about to 
#     read.

print(data.tell())

print(data.read(3))

print(data.tell())

10
efg
13


# Let's do some work!
---

In [None]:
data = open('folder/names.txt')

lineNum = 0

for line in data:
    lineNum += 1
    if line.startswith('S'):
        print(lineNum, line)

# newline characters

Files typically have more than one line

There's a special character used to indicate a newline in Python: `'\n'`

This character can be tricky. It shows up at the end of every line when we read in data both line by line as well as all at once

The easiest way to get rid of this is the with the `.rstrip()` method

In [None]:
# As an example...

print('my string of text\n'.rstrip())

In [None]:
data = open('folder/names.txt')

lineNum = 0

for line in data:
    lineNum += 1
    if line.startswith('S'):
        cleanline = line.rstrip() # Let's get rid of that pesky newline
        print(lineNum, cleanline)

# OK, so maybe real work
---



In [None]:
data = open('folder/nums.txt')

for line in data:
    line = line.strip()
    num = int(line)
    if num > 90:
    
        print(num, num * 2)

# Writing to files
---

In [None]:
# A sample of writing to files: don't forget the 'w'

fout = open('folder/buffer.txt', 'w')

In [None]:
for number in range(200000):
    
    # WARNING: .write() only takes strings    

    output = str(number)
    fout.write(output)
    
print('done')    

In [None]:
# This Step is optional, but important
#     IF you need to leave the file open,
#     but want to flush the buffer in memory

fout.flush()

In [None]:
fout.close()

In [None]:
fileout = open('folder/numbers.txt', 'w')

for number in range(10):
    
    # WARNING: .write() only takes strings    
    # NOTE: .write() does NOT include a '\n' (newline)
    #     by default, you must add one on.

    output = str(number) + '\n'
    fileout.write(output)
    
print('done') 
fileout.close()

# Experience Points!

In your **text editor** create a simple script called:

```bash
my_files_02.py```

Execute your script in the **IPython interpreter** using the command:

```bash
run my_files_02.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally. 

Task | Sample Object(s)
:---|---
Assign the label `my_output` to the results of the `open()` function for this file: use the `w` flag | `results.txt`
Start a `while True` loop|
Assign a label, `output`, to the result of an `input()` function| `Name one of your favorite foods? `
Check whether `output` is equal to the string: `exit` |
IF NOT, `.write()` the content of `output` to the file|
IF EQUAL, `break` out of the `while` loop|
When the loop finishes, `.close()` the file.|


# Experience Points!

In your **text editor** create a simple script called:

```bash
my_files_03.py```

Execute your script in the **IPython interpreter** using the command:

```bash
run my_files_03.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally. 

Task | Sample Object(s)
:---|---
Using the techniques learned in this lesson:|
1. open the file `log_file_1000.csv`|
2. examine the content and count the number of lines in the file|
3. print the one line and line number where `SELINA` is capitalized|

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>