## Indroduction to Files

* Regardless of whether a file is specific to a particular program or not, all files are created, read, written, and deleted by computer programs.

### What is a file?

* A file is just a sequence of data. 
* Files can either be stored in text or binary form.

### File Operations
* Conceptually, computer programs can do four operations on files : Open, Read, Write, and Close.

1. Open : The program must have the proper permissions to access the file.

2. Read : For text files, this involves getting a certain number of characters as a string.

3. Write : For text files, this involves passing a string to the file object. As with reading files, in almost all cases, each write to a file places the new text right after the text that was written by the previous write operation. It is also possible to specify where to write into the file (instead of writing the file sequentially), but the program must explicitly do that. 

4. Close :  After the program is done reading or writing the file, it must close the open file object. Operating systems typically limit the number of files that can be open at any given time, since open files consume resources in memory.


#### Open

In [None]:
# The name of the file to open
filename = "starwars.txt"

# The mode in which to open it (read, text)
mode = "rt"

# Actually open the file
openfile = open(filename, mode)

#### Read

In [None]:

openfile = open("gonewiththewind.txt", "rt")

filedata = openfile.read()
print(filedata)

#### Write

In [None]:
openfile = open("mymoviescript.txt", "wt")

openfile.write('I wish I had an idea for a movie...\n')

#### Close

In [None]:
openfile = open("emptyfile.txt")

openfile.close()

* open() takes two arguments, filname and a mode.
* r = read, t = text, b = binary
* Make sure you close the file after reading it.
* open() object type is TextIOWrapper.


In [None]:
"""
Reading files.
"""

print("Opening Files")
print("=============")

# Open takes a filename and a mode
openfile = open("the_idiot.txt", "rt")

# Modes for reading:
#  r - read (default)
#  t - text (default)
#  b - binary

print(type(openfile))
print(openfile)

# Must close file after opening it
openfile.close()

print("")
print("Errors")
print("======")

# nofile = open("nosuchfile.txt")

print("")
print("Reading")
print("=======")

datafile = open("the_idiot.txt", "rt")

data = datafile.read()

print("type:", type(data))
print("length:", len(data))
print("")
print(data)

datafile.close()

### Reading Files using Iteration

* file objects has function called 'readline' which allows us to break the contents of the file into lines.
* It returns a list of strings.
* There are two equivalent ways of iterating over the lines of the file, readlines() and the file object itself. But the second way is more efficient as it doesn't create big giant list.

In [None]:
"""
Iterating over files.
"""

# Using readlines()
#  readlines creates a list of strings
#  that you can iterate over

datafile1 = open("the_idiot.txt", "rt")

for line in datafile1.readlines():
    print(line)

datafile1.close()

print("")
print("================================")
print("")

# Direct iteration
#  This is faster for large files,
#  as no list is created

datafile2 = open("the_idiot.txt", "rt")

for line in datafile2:
    print(line)

datafile2.close()



### Writing Files

* Be aware of the difference when a mode is 'w' or 'a'.
* You can write strings through writelines() or write().

In [None]:
"""
Writing Files.
"""

print("Opening Files")
print("=============")

openfile = open("output.txt", "wt")

# Modes for writing:
#  w - write (erases the file first)
#  a - write (appends to the end of the file)
#  t - text (default)
#  b - binary
#  + - open for read and write

print(type(openfile))
print(openfile)

# Always close files
openfile.close()

print("")
print("Writing")
print("=======")

def checkfile(filename):
    """
    Read and print the contents of the file named filename.
    """
    datafile = open(filename, "rt")
    data = datafile.read()
    datafile.close()
    print(data)

# Write
outputfile = open("output.txt", "wt")
outputfile.writelines(["first line\n", "second line\n"])
outputfile.write("third line\nfourth line\n")
outputfile.close()

print("Initial file contents:")
checkfile("output.txt")


# Overwrite
outputfile2 = open("output.txt", "wt")
outputfile2.write("overwriting contents\n")
outputfile2.close()

print("Overwritten file contents:")
checkfile("output.txt")


# Append
outputfile2 = open("output.txt", "at")
outputfile2.write("appending to contents\n")
outputfile2.close()

print("Appended file contents:")
checkfile("output.txt")

## Understanding File Systems and Paths
* When working with directories, there are several common terms that we will use.
* The parent of a directory C is simply the directory that direcly contains C.
*  If P is the parent directory of another directory C, the C is referred to as a sub-directory of P.  (C is also sometimes referred to as a child directory of P.)

### Absolute paths
* In Python, the location of files is specified as string known as a path.
* This path consist of a string whose first two characters "C:" denote which drive contains the directory.  The  rest of the path consists of directory names separated by backslashes "\\".

### Relative paths
*  By default, Python maintains a working directory that it uses as the base location for all relative paths.  In a GUI-based IDEs such as Thonny, IDLE or Atom,  the working directory is usually the directory that contains the Python source code being run.

### Using the OS module to manipulate paths

* For many applications, your Python code should work reliably independent of the particular OS for the file systems.  In this case, you may wish to considering use the methods from the os module described below to create truly OS-independent Python code for manipulating paths. (Remember to 'import os' before using these methods.)
* os.getcwd() = return the path to the current working directory for your python code.
* os.path.abspath(file_name) = return to the absolute path the specified file.
* os.path.join(path, dir1, dir2, ..., filename) = return the absolue path to the file file_name that lies in the sequence of nested directories dir1,dir2,... lacted at path.
* os.pardir = return the relative path to the parent directory of the current working directory. For most systems, this path is ".."

## Working with File Paths



In [None]:
"""
Examples of paths used in Python
Expects current_file.txt in same directory as this code
Expects parent_file.txt in parent directory of this code
Expects child_file.txt in sub-directory child
"""

def echo_file(file_name):
    """
    Open a file, read its contents, and echo to console
    """
    my_file = open(file_name, 'r')
    my_file_text = my_file.read()
    print(my_file_text)
    my_file.close()                         # close the file, Joe!



def run_absolute_path_examples():
    """
    Some simple examples of absolute and relative paths
    """

    # Examples using absolute paths on Windows - Use raw strings to handle backslash
    current_abs_path = r"C:\Users\jwarren\Dropbox\Python Scripting\course 2\week4\paths\current_file.txt"
    child_abs_path = r"C:\Users\jwarren\Dropbox\Python Scripting\course 2\week4\paths\child\child_file.txt"
    parent_abs_path = r"C:\Users\jwarren\Dropbox\Python Scripting\course 2\week4\parent_file.txt"
    echo_file(current_abs_path)
    echo_file(child_abs_path)
    echo_file(parent_abs_path)
    print()

run_absolute_path_examples()

def run_relative_path_examples():
    """
    Some simple examples of relative paths
    """

    # Examples using relative paths - current_file.txt in same directory as this code
    echo_file("current_file.txt")
    echo_file("child/child_file.txt")           # Note that slash works on Windows
    echo_file("../parent_file.txt")
    print()


#run_relative_path_examples()


import os

def run_os_path_examples():
    """
    Examples of computing/manipulating paths reliably using the os module
    """

    # Get absolute path using os.path - note path uses backslashes on Windows
    current_abs_path = os.path.abspath("current_file.txt")
    print(current_abs_path)

    # Get absolute path to child_file.text using os.path
    child_abs_path = os.path.abspath("child/child_file.txt")
    print(child_abs_path)

    # Get current working directory
    working_dir = os.getcwd()
    print(working_dir)

    # Construct paths using os.path.join
    child_rel_path = os.path.join(working_dir, "child", "child_file.txt")
    print(child_rel_path)

    parent_rel_path = os.path.join(working_dir, os.pardir, "parent_file.txt")
    print(parent_rel_path)

#run_os_path_examples()



---

## Practice Project : Updating the CodeSkulptor Docs

In [None]:
"""
Week 4 practice project template for Python Data Representation
Update syntax for print in CodeSkulptor Docs
from "print ..." syntax in Python 2 to "print(...)" syntax for Python 3
"""

# HTML tags that bounds example code
PREFIX = "<pre class='cm'>"
POSTFIX = "</pre>"
PRINT = "print"


def update_line(line):
    """
    Takes a string line representing a single line of code
    and returns a string with print updated
    """

    # Strip left white space using built-in string method lstrip()
    print_index = line.find('print')
    front_print = line[:print_index+5]
    back_print  = line[print_index+5:].lstrip()

    # If line is print statement,  use the format() method to add insert parentheses
    if print_index != -1:
        format_print = "{0}({1})".format(front_print, back_print)
    else:
        format_print = line
    # Note that solution does not handle white space/comments after print statememt

    return format_print

# Some simple tests
print(update_line(""))
print(update_line("foobar()"))  
print(update_line("print 1 + 1"))      
print(update_line("    print 2, 3, 4"))

# Expect output
##
##foobar()
##print(1 + 1)
##    print(2, 3, 4)


def update_pre_block(pre_block):
    """
    Take a string that correspond to a <pre> block in html and parses it into lines.  
    Returns string corresponding to updated <pre> block with each line
    updated via process_line()
    """
    line = pre_block.split('\n')
    updated_block = update_line(line[0])
    for i in line[1:]:
        updated_block += "\n"
        updated_block += update_line(i)

    return updated_block

# Some simple tests
print(update_pre_block(""))
print(update_pre_block("foobar()"))
print(update_pre_block("if foo():\n    bar()"))
print(update_pre_block("print\nprint 1+1\nprint 2, 3, 4"))
print(update_pre_block("    print a + b\n    print 23 * 34\n        print 1234"))

# Expected output
##
##foobar()
##if foo():
##    bar()
##print()
##print(1+1)
##print(2, 3, 4)
##    print(a + b)
##    print(23 * 34)
##        print(1234)

def update_file(input_file_name, output_file_name):
    """
    Open and read the file specified by the string input_file_name
    Proces the <pre> blocks in the loaded text to update print syntax)
    Write the update text to the file specified by the string output_file_name
    """
    
    # open file and read text in file as a string
    file_path = "/content/drive/MyDrive/etc/" + input_file_name

    with open(file_path) as doc_file:
        doc_text = doc_file.read()

    # split text in <pre> blocks and update using update_pre_block()
    pre_blocks = doc_text.split(PREFIX)
    updated_text = pre_blocks[0]

    for block in pre_blocks[1:]:
        updated_text += PREFIX
        [preblock, filler] = block.split(POSTFIX, 1)
        updated_text += update_pre_block(preblock)
        updated_text += POSTFIX
        updated_text += filler
        
    # Write the answer in the specified output file
    with open(output_file_name,"w") as processed_file:
        processed_file.write(updated_text)

# A couple of test files
update_file("table.html", "table_updated.html")
update_file("docs.html", "docs_updated.html")

# Import some code to check whether the computed files are correct
##import examples3_file_diff as file_diff
file_diff.compare_files("table_updated.html", "table_updated_solution.html")
file_diff.compare_files("docs_updated.html", "docs_updated_solution.html")

# Expected output
##table_updated.html and table_updated_solution.html are the same
##docs_updated.html and docs_updated_solution.html are the same



foobar()
print(1 + 1)
    print(2, 3, 4)

foobar()
if foo():
    bar()
print()
print(1+1)
print(2, 3, 4)
    print(a + b)
    print(23 * 34)
        print(1234)


In [None]:
WINDOW_SIZE = 10


def compare_files(file1_name, file2_name):
    """
    Given two files (whose paths are specified as strings),
    find the first location in the files that differ and
    print a small portion of both files around this location
    """

    # open and read both files
    file1 = open(file1_name)
    file2 = open(file2_name)
    file1_text = file1.read()
    file2_text = file2.read()

    smaller_length = min(len(file1_text), len(file2_text))

    for idx in range(smaller_length):
        if file1_text[idx] != file2_text[idx]:
            start_window = max(0, idx - WINDOW_SIZE)
            end_window = min(smaller_length, idx + WINDOW_SIZE)
            print("Found difference at position", idx)
            print(file1_name, "has the characters", file1_text[start_window : end_window])
            print(file2_name, "has the characters", file2_text[start_window : end_window])
            return
        
    if len(file1_text) < len(file2_text):
        print(file1_name, "is a prefix of", file2_name)
    elif len(file2_text) < len(file1_text):
        print(file2_name, "is a prefix of", file1_name)
    else:
        print(file1_name, "and", file2_name, "are the same")

In [None]:
compare_files("/content/table_updated.html", "/content/drive/MyDrive/etc/table_updated_solution.html")
compare_files("/content/docs_updated.html", "/content/drive/MyDrive/etc/docs_updated_solution.html")

Found difference at position 24
/content/table_updated.html has the characters om url=(0066)https:/
/content/drive/MyDrive/etc/table_updated_solution.html has the characters om url=(0083)https:/
Found difference at position 39
/content/docs_updated.html has the characters om url=(0065)https:/
/content/drive/MyDrive/etc/docs_updated_solution.html has the characters om url=(0082)https:/




---

## Assessment Project