![LU Logo](https://www.df.lu.lv/fileadmin/user_upload/LU.LV/Apaksvietnes/Fakultates/www.df.lu.lv/Par_mums/Logo/DF_logo/01_DF_logo_LV.png)

# Week 7: File Handling


## Lesson Overview

We will cover the following topics:

* Folder / directory operations: creating, renaming, deleting, listing.
  * Path from pathlib, glob and rglob
* Reading from, appending and writing to text files.
  * Encoding issues
* Binary files
* JSON files

## Lesson Objectives

To learn how to:
* Work with folders / directories
* Work with files

### Import required libraries

In [None]:
# generally imports go at the top of a notebook
# python version
import sys
print(f"Python version: {sys.version}")

## Topic 1: - Folder / Directory Operations

* [pathlib](https://docs.python.org/3/library/pathlib.html) – Object-oriented filesystem paths
* [os](https://docs.python.org/3/library/os.html) – Various operating system interfaces

In [None]:
# Let's import the required Path object
from pathlib import Path

In [None]:
# Listing contents of the current working directory

# Current directory path
cur_dir = Path(".")

# List directory contents (in arbitrary order)
for item in cur_dir.iterdir():
    print(item)

In [None]:
# Let's sort directory content list

for item in sorted(cur_dir.iterdir()):
    print(item)

In [None]:
# Make a new directory (under the current dir)

new_dir = Path("Test-directory")   # under the current dir by default
new_dir.mkdir(exist_ok=True)       # it's OK if the directory already exists

# Print current dir contents
for item in sorted(cur_dir.iterdir()):
    print(item)

In [None]:
# Rename a directory

import os

os.rename("Test-directory", "Test-directory-2")

# Print current dir contents
for item in sorted(cur_dir.iterdir()):
    print(item)

In [None]:
# Delete a directory (it must be empty)

new_dir = Path("Test-directory-2")
new_dir.rmdir()

#### Creating some files and a directory for demonstrating filename matching operations.

To create files, you would normally `open` the file in a write mode but here we will use the `touch()` method to create some empty files.

In [None]:
# create a directory and some empty files

Path("Test-directory").mkdir()
Path("Test-directory/sub_dir").mkdir()

Path("Test-directory/file1.docx").touch()
Path("Test-directory/file2.docx").touch()
Path("Test-directory/test1.py").touch()
Path("Test-directory/sub_dir/file1.docx").touch()
Path("Test-directory/sub_dir/file2.csv").touch()
Path("Test-directory/sub_dir/test3.csv").touch()

In [None]:
# Print current dir contents

cur_dir = Path(".")

for item in sorted(cur_dir.iterdir()):
    print(item)

In [None]:
# See help for additional information about Path objects

help(cur_dir)

In [None]:
os.getcwd()

#### Filename pattern matching

We want to search for files according to some filename pattern (e.g. `*.docx` to search for all Microsoft Word files)

In [None]:
# There are no .docx files in the current directory
#  - there will be no matches for this file pattern

matches = cur_dir.glob("*.docx")

for item in sorted(matches):
    print(item)

In [None]:
# Let's tell Python to look for matches in any directory
#  - there will be matches in the "Test-directory" directory 
#  - but Python will not search recursively (in further subdirectories)

matches = cur_dir.glob("*/*.docx")

for item in sorted(matches):
    print(item)

In [None]:
# We want to find matches recursively, in any subdirectory
#  - for this purpose we can use the special "**" directory pattern

matches = cur_dir.glob("**/*.docx")

for item in sorted(matches):
    print(item)

In [None]:
# rglob() is like calling glob() with "**/" added in front of the filename pattern

matches = cur_dir.rglob("*.docx")

for item in sorted(matches):
    print(item)

In [None]:
# We can also "walk" a directory tree using os.walk()

import os

# Walk a directory tree and print directory and file names
for dirpath, dirnames, files in os.walk('.'):
        print(f'Directory: {dirpath}')
        for filename in files:
            print("   ", filename)


#### Topic 1 - mini exercise

Choose a directory and explore its contents using the methods described above.

## Topic 2: - Reading from and writing to text files

This topic will cover reading from, appending and writing to text files.

Use Python's `with` statement to make sure the file is properly closed after opening:

```
with open(filename, "w") as file_object:
    file_object.write("some text")
```

This `with` statement opens the `filename` file for writing (`"w"`), assigns it to the `file_object` variable, executes the code block containing the `write()` method and, after the code block has finished executing, closes the file. 

We will also use Jupyter's command `%%writefile` to create a file to read.

#### Reading files

In [None]:
%%writefile test_file.txt
first,second,third
1,2,3
4,5,6
7,8,9

In [None]:
# using "with" for opening a file
#  - "r" instructs to open the file for reading

with open("test_file.txt", "r") as file:
    text = file.read()

print(text)

In [None]:
help(open)

In [None]:
# use the "encoding" parameter to specify character encoding (usually "utf-8")

with open("test_file.txt", "r", encoding="utf-8") as file:
    text = file.read()

print(text)

In [None]:
# we can also go through the file line-by-line

with open("test_file.txt", "r", encoding="utf-8") as file:
    for line in file:
        print(line)

In [None]:
# let's get rid of extra newline characters

with open("test_file.txt", "r", encoding="utf-8") as file:
    for line in file:
        line = line.rstrip()
        print(line)

In [None]:
# Path objects can also be used in the open() function

test_file = Path("test_file.txt")

with open(test_file, "r", encoding="utf-8") as file:
    for line in file:
        line = line.rstrip()
        print(line)

#### Writing files

In [None]:
# to write to a file (overwriting its contents if the file exists) use the file open mode "w"

text = """
This is another file.
It contains lines of text.
"""

# let's use Path()
write_file_path = Path("write_file.txt")

with open(write_file_path, "w", encoding="utf-8") as write_file:
    write_file.write(text)

In [None]:
# let's check that the text has been written to the file

with open(write_file_path, "r", encoding="utf-8") as file:
    data = file.read()

print(data)

In [None]:
# Files may also be open in the append mode "a". In this case, new content will
# be appended at the end of th file.

with open(write_file_path, "a", encoding="utf-8") as write_file:
    write_file.write("We are appending text at the end of the file.")
    write_file.write("One more line here.")


In [None]:
# let's check file contents

with open(write_file_path, "r", encoding="utf-8") as file:
    data = file.read()

print(data)

In [None]:
# Lines got merged together. To write them on separate lines, 
# we need to add the newline character "\n" to the end of the line.

with open(write_file_path, "a", encoding="utf-8") as write_file:

    # add the newline character to start on a new line
    write_file.write("\n")
    
    write_file.write("This text should be on a new line.\n")
    write_file.write("One more line here.\n")

In [None]:
with open(write_file_path, "r", encoding="utf-8") as file:
    data = file.read()

print(data)

In [None]:
data

In [None]:
print(repr(data))

In [None]:
# delete the files

os.remove("test_file.txt")

write_file_path.unlink()

## Topic 3: - Reading and writing binary and other types of files

#### Binary files

To open binary files, append "b" to the file open mode.

Binary files do not have an encoding.

In [None]:
# create a bytes object
data = b'0123456789abcdef'
print(data)

In [None]:
# write to file

write_binary_path = Path("write_file.bin")

with open(write_binary_path, "wb") as write_file:
    write_file.write(data)

In [None]:
# read the file

with open(write_binary_path, "rb") as read_file:
    data_read = read_file.read()
    print(data_read)

In [None]:
# use seek() to go to a given position in the file

with open(write_binary_path, "rb") as read_file:

    # go to position 8 and read 1 byte
    read_file.seek(8)
    print(read_file.read(1))

    print()

    # go to position 3 from the end and read 1 byte
    read_file.seek(-3, 2)
    print(read_file.read(1))


In [None]:
%%writefile test_file.txt
first,second,third
1,2,3
4,5,6
7,8,9

In [None]:
with open("test_file.txt", "rb") as read_file:
    data_read = read_file.read()
    print(data_read)

In [None]:
help(data_read)

#### JSON files

JSON (JavaScript Object Notation) files let us save Python data hierarchies (dictionaries, lists, ...) to a file / read them from a file.

https://www.json.org/json-en.html

```
json_object = {
  "key 1": "value 1",
  "key 2": ["value 2", "is", "a", "list"],
  "key 3": {"lists and dictionaries": "can be nested"}
}
```

To do this, we will use Python [json](https://docs.python.org/3/library/json.html) library:

- json.dump() – save structured data to a JSON file
- json.dumps() – return structured data as a JSON string
- json.load() – read structured data from a JSON file
- json.loads() – read structured data from a JSON string

In [None]:
import json

In [None]:
# data to be saved
#  - a list containing a dictionary that contains a tuple

data = [
    'foo', 
    {'bar': ('baz', None, 1.0, 2)}
]

In [None]:
print(data)

In [None]:
data[1]

In [None]:
data[1]["bar"]

In [None]:
# save data to a JSON file

file_path = Path("test_data.json")

with open(file_path, "w", encoding="utf-8") as file_out:
    json.dump(data, file_out)

In [None]:
# let's look at the file that we created

with open(file_path, "r", encoding="utf-8") as file_in:
    for line in file_in:
        print(line)

In [None]:
# load data from a file

with open(file_path, "r", encoding="utf-8") as file_in:
    new_data = json.load(file_in)

In [None]:
new_data

In [None]:
new_data[1]['bar']

---

You can also transform Python data structures to / from JSON strings:

In [None]:
data

In [None]:
json_str = json.dumps(data)
json_str

In [None]:
new_data = json.loads(json_str)
new_data

#### CSV files

CSV (comma separated values) files let us work with table-like files which consist of data cells usually separated by comma symbols.

To do this, we will use Python [csv](https://docs.python.org/3/library/csv.html) library:

In [None]:
import csv

data = [["apple", "ābols"], ["pear", "bumbieris"], ["dog", "suns"], ["white", "balts"], ["black", "melns"]]

In [None]:
# first we need to open the CSV file for writing

with open("data.csv", "w", encoding="utf-8") as out_file:
    csv_file = csv.writer(out_file, lineterminator="\n")

    for item in data:
        csv_file.writerow(item)

In [None]:
# on Linux / Mac we can see file contents using the "cat" command
# uncomment the line below to see the file contents if you have Linux / Mac
# !cat data.csv

In [None]:
# let's read this file

data = []

with open("data.csv", "r", encoding="utf-8") as in_file:
    csv_file = csv.reader(in_file)

    for item in csv_file:
        print(item)
        data.append(item)

In [None]:
data

#### Other file types

Python supports many other file types including archives:

- [gzip](https://docs.python.org/3/library/gzip.html) archive file support
- [Python's zipfile: Manipulate Your ZIP Files Efficiently](https://realpython.com/python-zipfile/)

Python's support for various archive formats allows you to read data directly from archive files without unarchiving it first. It can be useful when working with large (archived) files.

There is also Python's [pickle library](https://docs.python.org/3/library/pickle.html) that allows us to save to disc custom / more complex Python objects (that can not be saved to JSON files).

## Lesson Overview

In this lesson you learned:
* How to work with directories in Python
* How to work with text and binary files in Python

## Bonus: Dictionary comprehension

Dictionary comprehension gives us a compact way for creating dictionaries
- `{item[0]: item[1] for item in some_list if some_condition}`

In this code snippet `item[0]` will become dictionary's key and `item[1]` will be the corresponding value.

In [None]:
data = [["apple", "ābols"], ["pear", "bumbieris"], ["dog", "suns"], ["white", "balts"], ["black", "melns"]]

In [None]:
# without dictionary comprehension

new_dict = {}

for key, value in data:
    new_dict[key] = value

new_dict

In [None]:
# in 1 line using dictionary comprehension

new_dict2 = {key: value for key, value in data}

new_dict2

In [None]:
new_dict2["dog"]

## Additional Resources

### Topic 1 - resources

- [pathlib](https://docs.python.org/3/library/pathlib.html) - Object-oriented filesystem paths
- [Working with files in Python](https://realpython.com/working-with-files-in-python/)

### Topic 2 - resources

- [Reading and writing files](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) - Python tutorial
- [Reading and writing files](https://automatetheboringstuff.com/2e/chapter9/) - "Automate the boring stuff with Python" book
- [Working with files in Python](https://realpython.com/working-with-files-in-python/)

### Topic 3 - resources

- [Reading binary files in Python](https://www.pythonmorsels.com/reading-binary-files-in-python/#top)
- [gzip — Support for gzip files](https://docs.python.org/3/library/gzip.html)
- [Working With JSON Data in Python](https://realpython.com/python-json/)
- [Python's zipfile: Manipulate Your ZIP Files Efficiently](https://realpython.com/python-zipfile/)
