![LU Logo](https://www.df.lu.lv/fileadmin/user_upload/LU.LV/Apaksvietnes/Fakultates/www.df.lu.lv/Par_mums/Logo/DF_logo/01_DF_logo_LV.png)

# Week 7: File Handling


## Lesson Overview

We will cover the following topics:

* Folder / directory operations: creating, renaming, deleting, listing.
  * Path from pathlib, glob and rglob
* Reading from, appending and writing to text files.
  * Encoding issues
* Binary files
* JSON files

## Lesson Objectives

To learn how to:
* Work with folders / directories
* Work with files

### Import required libraries

In [27]:
# generally imports go at the top of a notebook
# python version
import sys
print(f"Python version: {sys.version}")

Python version: 3.9.16 (main, Dec  7 2022, 02:40:58) 
[Clang 11.0.3 (clang-1103.0.32.62)]


### Topic 1: - Folder / Directory Operations

* [pathlib](https://docs.python.org/3/library/pathlib.html) – Object-oriented filesystem paths
* [os](https://docs.python.org/3/library/os.html) – Various operating system interfaces

In [28]:
# Let's import the required Path object
from pathlib import Path

In [29]:
# Listing contents of the current working directory

# Current directory path
cur_dir = Path(".")

# List directory contents (in arbitrary order)
for item in cur_dir.iterdir():
    print(item)

my_module.py
test_data.json
week_2_key_programming_concepts.ipynb
week_1_example.py
my_module2.py
__pycache__
data.csv
README.md
week_4_functions_tuples_dictionaries_sets.ipynb
week_8_classes_objects.ipynb
.ipynb_checkpoints
week_1_Python_basics.ipynb
week_7_file_handling.ipynb
week_5_standard_library_modules.ipynb


In [30]:
# Let's sort directory content list

for item in sorted(cur_dir.iterdir()):
    print(item)

.ipynb_checkpoints
README.md
__pycache__
data.csv
my_module.py
my_module2.py
test_data.json
week_1_Python_basics.ipynb
week_1_example.py
week_2_key_programming_concepts.ipynb
week_4_functions_tuples_dictionaries_sets.ipynb
week_5_standard_library_modules.ipynb
week_7_file_handling.ipynb
week_8_classes_objects.ipynb


In [31]:
# Make a new directory (under the current dir)

new_dir = Path("Test-directory")   # under the current dir by default
new_dir.mkdir(exist_ok=True)       # it's OK if the directory already exists

# Print current dir contents
for item in sorted(cur_dir.iterdir()):
    print(item)

.ipynb_checkpoints
README.md
Test-directory
__pycache__
data.csv
my_module.py
my_module2.py
test_data.json
week_1_Python_basics.ipynb
week_1_example.py
week_2_key_programming_concepts.ipynb
week_4_functions_tuples_dictionaries_sets.ipynb
week_5_standard_library_modules.ipynb
week_7_file_handling.ipynb
week_8_classes_objects.ipynb


In [32]:
# Rename a directory

import os

os.rename("Test-directory", "Test-directory-2")

# Print current dir contents
for item in sorted(cur_dir.iterdir()):
    print(item)

.ipynb_checkpoints
README.md
Test-directory-2
__pycache__
data.csv
my_module.py
my_module2.py
test_data.json
week_1_Python_basics.ipynb
week_1_example.py
week_2_key_programming_concepts.ipynb
week_4_functions_tuples_dictionaries_sets.ipynb
week_5_standard_library_modules.ipynb
week_7_file_handling.ipynb
week_8_classes_objects.ipynb


In [33]:
# Delete a directory (it must be empty)

new_dir = Path("Test-directory-2")
new_dir.rmdir()

#### Creating some files and a directory for demonstrating filename matching operations.

To create files, you would normally `open` the file in a write mode but here we will use the `touch()` method to create some empty files.

In [34]:
# create a directory and some empty files

Path("Test-directory").mkdir()
Path("Test-directory/sub_dir").mkdir()

Path("Test-directory/file1.docx").touch()
Path("Test-directory/file2.docx").touch()
Path("Test-directory/test1.py").touch()
Path("Test-directory/sub_dir/file1.docx").touch()
Path("Test-directory/sub_dir/file2.csv").touch()
Path("Test-directory/sub_dir/test3.csv").touch()

In [35]:
# Print current dir contents

cur_dir = Path(".")

for item in sorted(cur_dir.iterdir()):
    print(item)

.ipynb_checkpoints
README.md
Test-directory
__pycache__
data.csv
my_module.py
my_module2.py
test_data.json
week_1_Python_basics.ipynb
week_1_example.py
week_2_key_programming_concepts.ipynb
week_4_functions_tuples_dictionaries_sets.ipynb
week_5_standard_library_modules.ipynb
week_7_file_handling.ipynb
week_8_classes_objects.ipynb


In [36]:
# See help for additional information about Path objects

help(cur_dir)

Help on PosixPath in module pathlib object:

class PosixPath(Path, PurePosixPath)
 |  PosixPath(*args, **kwargs)
 |  
 |  Path subclass for non-Windows systems.
 |  
 |  On a POSIX system, instantiating a Path should return this object.
 |  
 |  Method resolution order:
 |      PosixPath
 |      Path
 |      PurePosixPath
 |      PurePath
 |      builtins.object
 |  
 |  Methods inherited from Path:
 |  
 |  __enter__(self)
 |  
 |  __exit__(self, t, v, tb)
 |  
 |  absolute(self)
 |      Return an absolute version of this path.  This function works
 |      even if the path doesn't point to anything.
 |      
 |      No normalization is done, i.e. all '.' and '..' will be kept along.
 |      Use resolve() to get the canonical path to a file.
 |  
 |  chmod(self, mode)
 |      Change the permissions of the path, like os.chmod().
 |  
 |  exists(self)
 |      Whether this path exists.
 |  
 |  expanduser(self)
 |      Return a new path with expanded ~ and ~user constructs
 |      (as ret

In [41]:
os.getcwd()

'/Users/captsolo/_changed_stuff_/Code/LU_Python_2023/notebooks'

#### Filename pattern matching

We want to search for files according to some filename pattern (e.g. `*.py`)

In [37]:
# There are no .docx files in the current directory
#  - there will be no matches for this file pattern

matches = cur_dir.glob("*.docx")

for item in sorted(matches):
    print(item)

In [42]:
# Let's tell Python to look for matches in any directory
#  - there will be matches in the "Test-directory" directory 
#  - but Python will not search recursively (in further subdirectories)

matches = cur_dir.glob("*/*.docx")

for item in sorted(matches):
    print(item)

Test-directory/file1.docx
Test-directory/file2.docx


In [43]:
# We want to find matches recursively, in any subdirectory
#  - for this purpose we can use the "**" directory pattern

matches = cur_dir.glob("**/*.docx")

for item in sorted(matches):
    print(item)

Test-directory/file1.docx
Test-directory/file2.docx
Test-directory/sub_dir/file1.docx


In [44]:
# rglob() is like calling glob() with "**/" added in front of the filename pattern

matches = cur_dir.rglob("*.docx")

for item in sorted(matches):
    print(item)

Test-directory/file1.docx
Test-directory/file2.docx
Test-directory/sub_dir/file1.docx


In [45]:
# We can also "walk" a directory tree using os.walk()

import os

# Walk a directory tree and print directory and file names
for dirpath, dirnames, files in os.walk('.'):
        print(f'Directory: {dirpath}')
        for filename in files:
            print("   ", filename)


Directory: .
    my_module.py
    test_data.json
    week_2_key_programming_concepts.ipynb
    week_1_example.py
    my_module2.py
    data.csv
    README.md
    week_4_functions_tuples_dictionaries_sets.ipynb
    week_8_classes_objects.ipynb
    week_1_Python_basics.ipynb
    week_7_file_handling.ipynb
    week_5_standard_library_modules.ipynb
Directory: ./__pycache__
    my_module.cpython-39.pyc
    my_module2.cpython-39.pyc
Directory: ./.ipynb_checkpoints
    week_7_file_handling-checkpoint.ipynb
    week_2_key_programming_concepts-checkpoint.ipynb
    week_1_example-checkpoint.py
    week_8_classes_objects-checkpoint.ipynb
    week_1_Python_basics-checkpoint.ipynb
    week_5_standard_library_modules-checkpoint.ipynb
    week_4_functions_tuples_dictionaries_sets-checkpoint.ipynb
Directory: ./Test-directory
    file2.docx
    test1.py
    file1.docx
Directory: ./Test-directory/sub_dir
    file2.csv
    test3.csv
    file1.docx


#### Topic 1 - mini exercise

Choose a directory and explore its contents using the methods described above.

### Topic 2: - Reading from and writing to text files

This topic will cover reading from, appending and writing to text files.

Use Python's `with` statement to make sure the file is properly closed after opening:

```
with open(filename, "w") as file_object:
    file_object.write("some text")
```

We will also use Jupyter's command `%%writefile` to create a file to read.

#### Reading files

In [46]:
%%writefile test_file.txt
first,second,third
1,2,3
4,5,6
7,8,9

Writing test_file.txt


In [47]:
# using "with" for opening a file
#  - "r" instructs to open the file for reading

with open("test_file.txt", "r") as file:
    text = file.read()

print(text)

first,second,third
1,2,3
4,5,6
7,8,9



In [48]:
help(open)

Help on function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise OSError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position).
    In

In [49]:
# use the "encoding" parameter to specify character encoding (usually "utf-8")

with open("test_file.txt", "r", encoding="utf-8") as file:
    text = file.read()

print(text)

first,second,third
1,2,3
4,5,6
7,8,9



In [50]:
# we can also go through the file line-by-line

with open("test_file.txt", "r", encoding="utf-8") as file:
    for line in file:
        print(line)

first,second,third

1,2,3

4,5,6

7,8,9



In [51]:
# let's get rid of extra newline characters

with open("test_file.txt", "r", encoding="utf-8") as file:
    for line in file:
        line = line.rstrip()
        print(line)

first,second,third
1,2,3
4,5,6
7,8,9


In [52]:
# Path objects can also be used in the open() function

test_file = Path("test_file.txt")

with open(test_file, "r", encoding="utf-8") as file:
    for line in file:
        line = line.rstrip()
        print(line)

first,second,third
1,2,3
4,5,6
7,8,9


#### Writing files

In [53]:
# to write to a file (overwriting its contents if the file exists) use open mode "w"

text = """
This is another file.
It contains lines of text.
"""

# let's use Path()
write_file_path = Path("write_file.txt")

with open(write_file_path, "w", encoding="utf-8") as write_file:
    write_file.write(text)

In [54]:
# let's check that the text has been written to the file

with open(write_file_path, "r", encoding="utf-8") as file:
    data = file.read()

print(data)


This is another file.
It contains lines of text.



In [55]:
# Files may also be open in the append mode "a". In this case, new content will
# be appended at the end of th file.

with open(write_file_path, "a", encoding="utf-8") as write_file:
    write_file.write("We are appending text at the end of the file.")
    write_file.write("One more line here.")


In [56]:
# let's check file contents

with open(write_file_path, "r", encoding="utf-8") as file:
    data = file.read()

print(data)


This is another file.
It contains lines of text.
We are appending text at the end of the file.One more line here.


In [57]:
# Lines got merged together. To write them on separate lines, 
# we need to add a newline character "\n" to the end of the line.

with open(write_file_path, "a", encoding="utf-8") as write_file:

    # add the newline character to start on a new line
    write_file.write("\n")
    
    write_file.write("This text should be on a new line.\n")
    write_file.write("One more line here.\n")

In [58]:
with open(write_file_path, "r", encoding="utf-8") as file:
    data = file.read()

print(data)


This is another file.
It contains lines of text.
We are appending text at the end of the file.One more line here.
This text should be on a new line.
One more line here.



In [59]:
data

'\nThis is another file.\nIt contains lines of text.\nWe are appending text at the end of the file.One more line here.\nThis text should be on a new line.\nOne more line here.\n'

In [60]:
print(repr(data))

'\nThis is another file.\nIt contains lines of text.\nWe are appending text at the end of the file.One more line here.\nThis text should be on a new line.\nOne more line here.\n'


In [61]:
# delete the files

os.remove("test_file.txt")

write_file_path.unlink()

### Topic 3: - Reading and writing binary and other types of files

#### Binary files

To open binary files, append "b" to the file open mode.

Binary files do not have an encoding.

In [62]:
# create a bytes object
data = b'0123456789abcdef'
print(data)

b'0123456789abcdef'


In [63]:
# write to file

write_binary_path = Path("write_file.bin")

with open(write_binary_path, "wb") as write_file:
    write_file.write(data)

In [64]:
# read the file

with open(write_binary_path, "rb") as read_file:
    data_read = read_file.read()
    print(data_read)

b'0123456789abcdef'


In [65]:
# use seek() to go to a given position in the file

with open(write_binary_path, "rb") as read_file:

    # go to position 8 and read 1 byte
    read_file.seek(8)
    print(read_file.read(1))

    print()

    # go to position 3 from the end and read 1 byte
    read_file.seek(-3, 2)
    print(read_file.read(1))


b'8'

b'd'


In [67]:
%%writefile test_file.txt
first,second,third
1,2,3
4,5,6
7,8,9

Writing test_file.txt


In [68]:
with open("test_file.txt", "rb") as read_file:
    data_read = read_file.read()
    print(data_read)

b'first,second,third\n1,2,3\n4,5,6\n7,8,9\n'


In [69]:
help(data_read)

Help on bytes object:

class bytes(object)
 |  bytes(iterable_of_ints) -> bytes
 |  bytes(string, encoding[, errors]) -> bytes
 |  bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
 |  bytes(int) -> bytes object of size given by the parameter initialized with null bytes
 |  bytes() -> empty bytes object
 |  
 |  Construct an immutable array of bytes from:
 |    - an iterable yielding integers in range(256)
 |    - a text string encoded using the specified encoding
 |    - any object implementing the buffer API.
 |    - an integer
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __getnewargs__(...)
 |  
 |  __g

#### JSON files

JSON (JavaScript Object Notation) files let us save Python data hierarchies (dictionaries, lists, ...) to a file / read them from a file.

https://www.json.org/json-en.html

To do this, we will use Python [json](https://docs.python.org/3/library/json.html) library:

- json.dump() – save structured data to a JSON file
- json.dumps() – return structured data as a JSON string
- json.load() – read structured data from a JSON file
- json.loads() – read structured data from a JSON string

In [70]:
import json

In [71]:
# data to be saved
#  - a list containing a dictionary that contains a tuple

data = ['foo', {'bar': ('baz', None, 1.0, 2)}]

In [72]:
print(data)

['foo', {'bar': ('baz', None, 1.0, 2)}]


In [73]:
data[1]

{'bar': ('baz', None, 1.0, 2)}

In [74]:
data[1]["bar"]

('baz', None, 1.0, 2)

In [75]:
# save data to a JSON file

file_path = Path("test_data.json")

with open(file_path, "w", encoding="utf-8") as file_out:
    json.dump(data, file_out)

In [76]:
# let's look at the file that we created

with open(file_path, "r", encoding="utf-8") as file_in:
    for line in file_in:
        print(line)

["foo", {"bar": ["baz", null, 1.0, 2]}]


In [77]:
# load data from a file

with open(file_path, "r", encoding="utf-8") as file_in:
    new_data = json.load(file_in)

In [78]:
new_data

['foo', {'bar': ['baz', None, 1.0, 2]}]

In [79]:
new_data[1]['bar']

['baz', None, 1.0, 2]

---

You can also transform Python data structures to / from JSON strings:

In [80]:
data

['foo', {'bar': ('baz', None, 1.0, 2)}]

In [81]:
json_str = json.dumps(data)
json_str

'["foo", {"bar": ["baz", null, 1.0, 2]}]'

In [82]:
new_data = json.loads(json_str)
new_data

['foo', {'bar': ['baz', None, 1.0, 2]}]

#### CSV files

CSV (comma separated values) files let us work with table-like files which consist of data cells usually separated by comma symbols.

To do this, we will use Python [csv](https://docs.python.org/3/library/csv.html) library:

In [83]:
import csv

data = [["apple", "ābols"], ["pear", "bumbieris"], ["dog", "suns"], ["white", "balts"], ["black", "melns"]]

In [84]:
# first we need to open the CSV file for writing

with open("data.csv", "w") as out_file:
    csv_file = csv.writer(out_file)

    for item in data:
        csv_file.writerow(item)

In [85]:
# on Linux / Mac we can see file contents using the "cat" command
!cat data.csv

apple,ābols
pear,bumbieris
dog,suns
white,balts
black,melns


In [86]:
# let's read this file

data = []

with open("data.csv", "r") as in_file:
    csv_file = csv.reader(in_file)

    for item in csv_file:
        print(item)
        data.append(item)

['apple', 'ābols']
['pear', 'bumbieris']
['dog', 'suns']
['white', 'balts']
['black', 'melns']


In [87]:
data

[['apple', 'ābols'],
 ['pear', 'bumbieris'],
 ['dog', 'suns'],
 ['white', 'balts'],
 ['black', 'melns']]

#### Other file types

Python supports many other file types including archives:

- [gzip](https://docs.python.org/3/library/gzip.html) archive file support
- [Python's zipfile: Manipulate Your ZIP Files Efficiently](https://realpython.com/python-zipfile/)

Python's support for various archive formats allows you to read data directly from archive files without unarchiving it first. It can be useful when working with large (archived) files.

There is also Python's [pickle library](https://docs.python.org/3/library/pickle.html) that allows us to save to disc custom / more complex Python objects (that can not be saved to JSON files).

## Lesson Overview

In this lesson you learned:
* How to work with directories in Python
* How to work with text and binary files in Python

## Bonus: Dictionary comprehension

Dictionary comprehension gives us a compact way for creating dictionaries
- `{item[0]: item[1] for item in some_list if some_condition}`

In this code snippet `item[0]` will become dictionary's key and `item[1]` will be the corresponding value.

In [88]:
data

[['apple', 'ābols'],
 ['pear', 'bumbieris'],
 ['dog', 'suns'],
 ['white', 'balts'],
 ['black', 'melns']]

In [89]:
# without dictionary comprehension

new_dict = {}

for key, value in data:
    new_dict[key] = value

new_dict

{'apple': 'ābols',
 'pear': 'bumbieris',
 'dog': 'suns',
 'white': 'balts',
 'black': 'melns'}

In [90]:
# in 1 line using dictionary comprehension

new_dict2 = {key: value for key, value in data}

new_dict2

{'apple': 'ābols',
 'pear': 'bumbieris',
 'dog': 'suns',
 'white': 'balts',
 'black': 'melns'}

In [91]:
new_dict2["dog"]

'suns'

## Additional Resources

### Topic 1 - resources

- [pathlib](https://docs.python.org/3/library/pathlib.html) - Object-oriented filesystem paths
- [Working with files in Python](https://realpython.com/working-with-files-in-python/)

### Topic 2 - resources

- [Reading and writing files](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) - Python tutorial
- [Reading and writing files](https://automatetheboringstuff.com/2e/chapter9/) - "Automate the boring stuff with Python" book
- [Working with files in Python](https://realpython.com/working-with-files-in-python/)

### Topic 3 - resources

- [Reading binary files in Python](https://www.pythonmorsels.com/reading-binary-files-in-python/#top)
- [gzip — Support for gzip files](https://docs.python.org/3/library/gzip.html)
- [Working With JSON Data in Python](https://realpython.com/python-json/)
- [Python's zipfile: Manipulate Your ZIP Files Efficiently](https://realpython.com/python-zipfile/)
