# Welcome to the Dark Art of Coding:
## Introduction to Python
Reading and writing to files

<img src='../universal_images/dark_art_logo.600px.png' width='300' style="float:right">

# Objectives

* Opening and closing files
* Reading .txt files and basic .csv files


# File handling
---

In [None]:
# We start off by opening the file using the open()
# function and assigning a label as a filehandle

fin = open('../universal_datasets/carroll.txt')

In [None]:
# How do you find out which methods exist for filehandles?
# Use tab-completion

fin.

# File handles
These are what Python uses to refer to files and the methods associated with files. File handles provide access to the following, and more:

* A **cursor** or **pointer** used to read and write from the file
* The ability to iterate over the file
* The ability to read from the file in various ways
* The current location of the cursor
* The ability to move the pointer


## Reading in data


In [None]:
# One method to read in data is using the read() method.

text = fin.read()
print(text)

In [None]:
# When you finish interacting with a file, it is important that
#     you close() the file.
# I liken it to 
#     Putting your toys away, when you are done with them.

fin.close()

In [None]:
fin.read()

## Writing out data

In [None]:
# The process for opening a file for writing is very similar
#     to opening a file for reading... with one exception...

fout = open('../universal_datasets/output.txt', 'w')      # NOTE the 'w' to open the
                                                          #     file for writing purposes

In [None]:
# To write to the file, we use the .write() method
# The .write() method takes a string as an argument
#     and writes that string to the filesystem.

fout.write('''Batman:
The Dark Knight
Returns''')

# In this case, since we use a multi-line string, Python will
#     write the entire string, newline characters and all


In [None]:
# Of course, when we finish with our file, we call the 
#     .close() method.

fout.close()

# Now, you can navigate to the ../universal_datasets in your file explorer and confirm:
#     * the file exists
#     * the content is present

## Where does Python write to?

**Short answer**: where you tell it to

**Longer answer**: 
    
* Python literally writes where you tell it
* Understanding directory structures on the command line is critical
* The **easiest solution for beginners** is put the script and data in the same ../universal_datasets
* OR, like these scripts, put the data in an adjacent folder

# Experience Points!
---

In **Jupyter** do each of the following:

Task | Sample Object(s)
:---|---
Assign the label `filein` to the results of the `open()` function for this file | `names.txt`
Assign the label `content` to the results of the `read()` function | `.read()`
Print the `content` | `print()`
Close `filein` when you are done | `.close()`

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

In [None]:
# Possible Solution

filein = open('../universal_datasets/names.txt')
content = filein.read()
print(content)
filein.close()

# Chaining functions during a reading operation
---

In [None]:
# It is not uncommon to chain functions together when all you really need
#     is the text.

text = open('../universal_datasets/carroll.txt').read()
print(text)

# Compare this to the previous 

# fin = open('../universal_datasets/carroll.txt')
# text = fin.read()
# print(text)

In [None]:
# What happens when we attempt to open a file that doesn't exist?

file = open('../universal_datasets/not_here.txt')

In [None]:
# We can use try/except to do things we think might bring up errors without stopping the program

try:
    file = open('not_here.txt')
except:
    print('FILE NOT FOUND')

# Four primary means of reading in data:
---

```python
* `read()`                     # reads the file in as a single string
* `readline()`                 # reads in one line at a time
* `readlines()`                # reads in all lines, as separate strings in a list 
* `for line in <filehandle>:`  # iterates over each line, one at a time
```

We have seen **`read()`** in action, let's see some of the others in action.

Take a moment to the look at the data for the file `../universal_datasets/log_file.header.txt` in a text editor

## `.readline()`

In [None]:
# .readline() reads in a single line.

data = open('../universal_datasets/log_file.header.txt')
line = data.readline()

print(line)

In [None]:
# For clarity's sake, be aware that there is a difference between
#     how an evaluated string displays compared to a printed string

line

In [None]:
# Repeating .readline() will read in another line

next_line = data.readline()

print(next_line)

In [None]:
# Personally, my most frequent use of .readline() is when
#     reading headers from files

# This allows us to parse column headers AND/OR simply get the 
#     header row out of the way.

## `.readlines()`

In [None]:
# .readlines() reads in all the lines AND stores the data as a 
#     list of strings

data = open('../universal_datasets/log_file.header.txt')
list_of_lines = data.readlines()

print(list_of_lines)

**NOTE**: each string includes the newline character at the end of the text string.

## `for line in <filehandle>:`

In [None]:
# One of the most useful approaches for handling lines in files
#     is using for loops.

data = open('../universal_datasets/log_file.header.txt')

for line in data:
    print(line)

In [None]:
# A sample of how this could be used...

data = open('../universal_datasets/log_file.header.txt')

header = data.readline()

for line in data:
    if 'kara' in line:
        print(line)
    else:
        print('N/A')

**NOTE**: each string includes the newline character at the end of the text string.

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_files_01.py```

Execute your script in **Jupyter** using the command:

```bash
run my_files_01.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally. 

Task | Sample Object(s)
:---|---
Assign the label `my_csv` to the results of the `open()` function for this file | `log_file_1000.csv`
| NOTE: the above file is in the directory called `../universal_datasets`
Use a `for` loop to read in the text | `for line in <filehandle>`
Print only lines that have this ip address: `220.211.18.31` on them  | `print()`
Close `my_csv` when you are done | `.close()`

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

In [None]:
# Possible solution

my_csv = open('../universal_datasets/log_file_1000.csv')

for line in my_csv:
    if '220.211.18.31' in line:
        print(line)

my_csv.close()        

# do overs and more...
---

In [None]:
# Let's look at our previous data file again...

for line in data:
    print(line)
    
# When executing this code, nothing happpened!    

In [None]:
# One way to do a do-over is to simply reread the file from scratch

data = open('../universal_datasets/log_file.header.txt')

In [None]:
for line in data:
    print(line)

# reading less than a full line
---

In [None]:
# Try a new file...

data = open('../universal_datasets/bytes.txt')

# this time, let's read one byte, instead of the whole line

byte = data.read(1)
print('First byte:', byte)

In [None]:
# The file handle maintains the pointer and 
#     knows where we left off in the file.

twobytes = data.read(2)
print('Two bytes: ', twobytes)

In [None]:
# The next three bytes:

threebytes = data.read(3)
print('Four bytes:', threebytes)

In [None]:
# The next four bytes

fourbytes = data.read(4)
print('Four bytes:', fourbytes)

In [None]:
# .readline() doesn't necessarily start at the beginning of 
#     the line... it picks up where it left off and goes to 
#     the end of the current line.

print('Remainder: ', data.readline())

In [None]:
# The next time we call .readline(), it carries on
#     as we expect.

print('Readline:  ', data.readline())

In [None]:
# The file handle pointer switches seamlessly between
#     using readline() or other reading mechanisms and
#     for loops

for line in data:
    print('For loop:', line)


# moving your pointer to a specific location
---

In [None]:
# Since the file handle uses a pointer, we don't have
#     to reread the file ... we can just rewind the file
#     using .seek()

data.seek(7)

# Read just one byte (the first byte in the file)
byte = data.read(1)
print('Back at the beginning:', byte)

# getting fancy
print('Two bytes:'.rjust(22), data.read(2))

In [None]:
# But where are we? in the file...
#     .tell() will let you know what byte you are about to 
#     read.

print(data.tell())

print(data.read(3))

print(data.tell())

# Let's do some work!
---

In [None]:
data = open('../universal_datasets/names.txt')

lineNum = 0                # This is a counter ... can we do better?

for line in data:
    lineNum += 1
    if line.startswith('S'):
        print(lineNum, line)

# Pesky newline characters
---

Files typically have more than one line

There's a special character used to indicate a newline in Python: `'\n'`

This character can be tricky. It shows up at the end of every line whether we read in data:

* line by line
* all at once

The easiest way to get rid of this is the with the `.rstrip()` method

In [None]:
# As an example... without stripping...

print('my string of text\n')
print('line two')

In [None]:
# As an example... with stripping

print('my string of text\n'.rstrip())
print('line two')

In [None]:
'selina' == 'selina\n'.rstrip()

In [None]:
# Let's parse the names.txt file for any row that 
#     has the name Selina

data = open('../universal_datasets/names.txt')

for index, line in enumerate(data, 1):
    if line.startswith('S'):
        cleanline = line.rstrip() # Let's get rid of that pesky newline
        print(index, cleanline)

# OK, so maybe real work
---



In [None]:
isinstance('95', (str, int))

In [None]:
data = open('../universal_datasets/nums.txt')

for line in data:
    line = line.strip()
    num = int(line)
    if num > 90:
    
        print("{}\t{}".format(num, num/100))

# Writing to files
---

In [None]:
# A sample of writing to files: don't forget the 'w'

fout = open('../universal_datasets/buffer.txt', 'w')

In [None]:
for number in range(200000):
    
    # WARNING: .write() only takes strings 
    #     but range only produces integers, so we have to convert on the fly

    output = str(number)
    fout.write(output)
    
print('done')    

In [None]:
# This Step is optional, but important
#     IF you need to leave the file open,
#     but want to flush the buffer in memory

fout.flush()

In [None]:
fout.close()

In [None]:
fileout = open('../universal_datasets/numbers.txt', 'w')

for number in range(10):
    
    # WARNING: .write() only takes strings    
    # NOTE: .write() does NOT include a '\n' (newline)
    #     by default, you must add one on.

    output = str(number) + '\n'
    fileout.write(output)
    
print('done') 
fileout.close()

# Experience Points!

In your **text editor** create a simple script called:

```bash
my_files_02.py```

Execute your script in **Jupyter** using the command:

```bash
run my_files_02.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally. 

Task | Sample Object(s)
:---|---
Assign the label `my_output` to the results of the `open()` function for this file: use the `w` flag | `results.txt`
Start a `while True` loop|
Assign a label, `output`, to the result of an `input()` function| `Name one of your favorite foods? `
Check whether `output` is equal to the string: `exit` |
IF NOT, `.write()` the content of `output` to the file|
IF EQUAL, `break` out of the `while` loop|
When the loop finishes, `.close()` the file.|


In [None]:
# Possible solution

my_output = open('../universal_datasets/results.txt', 'w')

while True:
    output = input('Name one of your favorite foods? ')
    if output == 'exit':
        break
    else:    
        my_output.write(output + '\n')

my_output.close()    

# Experience Points!

In your **text editor** create a simple script called:

```bash
my_files_03.py```

Execute your script in the **Jupyter** using the command:

```bash
run my_files_03.py```

I suggest that as you add each feature to your script that you run it right away to test it incrementally. 

Task | Sample Object(s)
:---|---
1. open the file `log_file_1000.csv`|
2. examine the file contents|
3. one line has 'SELINA' capitalized.|
4. `print()` that line and the associated line number of the line where `SELINA` is capitalized|

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

In [None]:
# Possible solution

fin = open('../universal_datasets/log_file_1000.csv')

count = 0
for line in fin:
    count += 1
    if 'SELINA' in line:
        print('Corrupt line:', count, line)

print('Total lines:', count)