# Welcome to the Dark Art of Coding:
## Introduction to Python
Reading and writing to files

<img src='../images/dark_art_logo.600px.png' width='300' style="float:right">

# Objectives

* Opening and closing files
* Reading .txt files and basic .csv files


# File handling
---

In [3]:
# We start off by opening the file using the open()
# function and assigning a label as a filehandle

fin = open('folder/carroll.txt')

# File handles
These are what Python uses to refer to files and how to read/write to them. File handles provide access to the following, and more:

* A 'cursor' or 'pointer' used to read and write from the file
* The ability to iterate over the file
* The ability to read from the file in various ways
* The current location of the cursor
* The ability to move the pointer


## Reading


In [5]:
# One method to read in data is using the read() method.

text = fin.read()
print(text)

ValueError: I/O operation on closed file.

In [4]:
# When you finish interacting with a file, it is important that
#     you close() the file.
# I liken it to 
#     Putting your toys away, when you are done with them.

fin.close()

In [12]:
# Writes a new text file in 06/folder/
out = open('folder/okay.txt', 'w')
out.write('''Lets make
a file with text
Thanks python''')
out.close()

In [None]:


fout = open('folder/output.txt', 'w')      # NOTE the 'w' to open the file for writing purposes
fout.write('''Batman:
The Dark Knight
Returns''')
fout.close()

# Navigate to the folder in your file explorer and confirm:
#     * the file exists
#     * the content is present

## Where does Python write to?

**Short answer**: where you tell it to

**Longer answer**: 
    
* Python literally writes where you tell it
* Understanding directory structures on the command line is critical
* The **easiest solution for beginners** is put the script and data in the same folder
* OR, like these scripts, put the data in an adjacent folder

# Experience Points!
---

On the **IPython interpreter** do each of the following:

Task | Sample Object(s)
:---|---
Assign the label `filein` to the results of the `open()` function for this file | `names.txt`
Assign the label `content` to the results of the `read()` function | `.read()`
Print the `content` | `print()`
Close `filein` when you are done | `.close()`

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

In [None]:
# It is not uncommon to chain functions together when all you really need
#     is the text.

text = open('folder/carroll.txt').read()
# print(text)
text

In [18]:
filein = open('folder/chalmersclass.txt', 'r')
content = filein.read()
print(content)
filein.close()


Okay


In [None]:
# What happens when we attempt to open a file that doesn't exist?

file = open('notHere.txt')

In [None]:
# We can use try/except to do things we think might bring up errors without stopping the program

#script will continue running instead of crashing
try:
    file = open('nothere.txt')
except:
    print('FILE NOT FOUND')

# There are several primary means of reading in data:
---
```python
* `read()`                     # reads the file in as a single string
* `readline()`                 # reads in one line at a time
* `readlines()`                # reads in all lines, as separate strings in a list 
* `for line in <filehandle>:`  # iterates over each line, one at a time
```

We have seen `read()` in action, let's see some of the others in action

## `.readline()`

In [None]:
# .readline() reads in a single line.

data = open('folder/log_file.header.csv')
line = data.readline()

print(line)

In [None]:
# Repeating .readline() will read in another line

next_line = data.readline()

print(next_line)

In [None]:
# Personally, I use .readline() most frequently, when reading in 
#     headers from files

# This allows us to get column headers AND/OR simply get the 
#     header row out of the way.

## `.readlines()`

In [None]:
# .readlines() reads in all the lines AND stores the data as a 
#     list of strings

data = open('folder/log_file.header.csv')
list_of_lines = data.readlines()

print(list_of_lines)

**NOTE**: each string includes the newline character at the end of the text string.

## `for line in <filehandle>:`

In [None]:
# One of the most useful approaches for handling lines in files
#     is using for loops.

data = open('folder/log_file.header.csv')

for line in data:
    print(line)

In [None]:
# A sample of how this could be used...

data = open('folder/log_file.header.csv')

header = data.readline()

for line in data:
    if 'kara' in line:
        print(line)
    else:
        print('N/A')

**NOTE**: each string includes the newline character at the end of the text string.

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_files_01.py```

Execute your script in the **IPython interpreter** using the command:

```bash
run my_files_01.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally. 

Task | Sample Object(s)
:---|---
Assign the label `my_csv` to the results of the `open()` function for this file | `log_file_1000.csv`
Use a `for` loop to read in the text | `for line in <filehandle>`
Print only lines that have this ip address: `220.211.18.31` on them  | `print()`
Close `my_csv` when you are done | `.close()`

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

In [20]:
my_csv = open('folder/log_file_1000.csv')

for line in my_csv:
    if '220.211.18.31' in line:
        print(line)
    else:
        print('what')
        
my_csv.close()

what
what
what
what
what
what
what
what
what
what
bruce wayne,bwayne@jleague.org,220.211.18.31,190.214.22.94,2016-02-05T21:52:27,49.73236,11.51376,390646

what
what
what
what
what
john jones,jjones@jleague.org,220.211.18.31,155.130.121.215,2016-02-05T21:58:20,47.89841,11.66265,530368

what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
bruce wayne,bwayne@jleague.org,102.86.56.213,220.211.18.31,2016-01-30T22:22:58,49.42008,11.76816,146059

kara zor-el,kzor-el@jleague.org,102.86.56.213,220.211.18.31,2016-01-30T22:23:55,49.49383,11.89127,379981

what
what
what
dick grayson,dgrayson@jleague.org,106.152.115.130,220.211.18.31,2016-01-29T22:26:49,48.17151,10.57354,426194

what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
what
wha

In [None]:
# Let's look our previous file again...

for line in data:
    print(line)
    
# When executing this code, nothing happpened!    

## do overs and more...

In [None]:
# One way to do a do-over is to simply reread (reopen) the file from scratch

data = open('folder/log_file.header.csv')

for line in data:
    print(line)


In [21]:
# Try a new file...

data = open('folder/bytes.txt')

# this time, let's read one byte, instead of the whole line

byte = data.read(1)
print('First byte:', byte)

First byte: a


In [None]:
# The file handle maintains the pointer and 
#     knows where we left off in the file.

twobytes = data.read(2)
print('Two bytes: ', twobytes)

In [None]:
# The next three bytes:

threebytes = data.read(3)
print('Four bytes:', threebytes)

In [22]:
# The next four bytes

fourbytes = data.read(4)
print('Four bytes:', fourbytes)

Four bytes: bbcc


In [23]:
# .readline() doesn't necessarily start at the beginning of 
#     the line... it picks up where it left off and goes to 
#     the end of the current line._gives you the rest of the line

print('Remainder: ', data.readline())

Remainder:  cddddefghi



In [24]:
# The next time we call .readline(), it carries on
#     as we expect.

print('Readline:  ', data.readline())

Readline:   Line 2



In [None]:
# The file handle pointer switches seamlessly between
#     using readline() or other reading mechanisms and
#     for loops

for line in data:
    print('For loop:', line)


In [None]:
# Since the file handle uses a pointer, we don't have
#     to reread the file ... we can just rewind the file
#     using .seek()
# instead of reopening the file you can just re-read from byte number 7 or byte of choice

data.seek(7)

# Read just one byte (the first byte in the file)
byte = data.read(1)
print('Back at the beginning:', byte)

# getting fancy
print('Two bytes:'.rjust(22), data.read(2))

In [None]:
# But where are we? in the file...
#     .tell() will let you know what byte you are about to 
#     read.

print(data.tell())

print(data.read(3))

print(data.tell())

# Let's do some work!
---

In [None]:
data = open('folder/names.txt')

lineNum = 0

for line in data:
    lineNum += 1
    if line.startswith('S'):
        print(lineNum, line)

# newline characters

Files typically have more than one line

There's a special character used to indicate a newline in Python: `'\n'`

This character can be tricky. It shows up at the end of every line when we read in data both line by line as well as all at once

The easiest way to get rid of this is the with the `.rstrip()` method

In [None]:
# As an example...

print('my string of text\n'.rstrip())

In [None]:
data = open('folder/names.txt')

lineNum = 0

for line in data:
    lineNum += 1
    if line.startswith('S'):
        cleanline = line.rstrip() # Let's get rid of that pesky newline
        print(lineNum, cleanline)

# OK, so maybe real work
---



In [None]:
data = open('folder/nums.txt')

for line in data:
    line = line.strip()
    num = int(line)
    if num > 90:
    
        print(num, num * 2)

# Writing to files
---

In [None]:
# A sample of writing to files: don't forget the 'w'

fout = open('folder/buffer.txt', 'w')

In [None]:
for number in range(200000):
    
    # WARNING: .write() only takes strings    

    output = str(number)
    fout.write(output)
    
print('done')    

In [None]:
# This Step is optional, but important
#     IF you need to leave the file open,
#     but want to flush the buffer in memory

fout.flush()

In [None]:
fout.close()

In [None]:
fileout = open('folder/numbers.txt', 'w')

for number in range(10):
    
    # WARNING: .write() only takes strings    
    # NOTE: .write() does NOT include a '\n' (newline)
    #     by default, you must add one on.

    output = str(number) + '\n'
    fileout.write(output)
    
print('done') 
fileout.close()

# Experience Points!

In your **text editor** create a simple script called:

```bash
my_files_02.py```

Execute your script in the **IPython interpreter** using the command:

```bash
run my_files_02.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally. 

Task | Sample Object(s)
:---|---
Assign the label `my_output` to the results of the `open()` function for this file: use the `w` flag | `results.txt`
Start a `while True` loop|
Assign a label, `output`, to the result of an `input()` function| `Name one of your favorite foods? `
Check whether `output` is equal to the string: `exit` |
IF NOT, `.write()` the content of `output` to the file|
IF EQUAL, `break` out of the `while` loop|
When the loop finishes, `.close()` the file.|


# Experience Points!

In your **text editor** create a simple script called:

```bash
my_files_03.py```

Execute your script in the **IPython interpreter** using the command:

```bash
run my_files_03.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally. 

Task | Sample Object(s)
:---|---
Using the techniques learned in this lesson:|
1. open the file `log_file_1000.csv`|
2. examine the content and count the number of lines in the file|
3. print the one line and line number where `SELINA` is capitalized|

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>