# Organizing Files

## Introduction

This notebook is about organizing files and folders - you will learn how to copy and rename and delete files, as well as how to create and extract ZIP archives.

*[Chapter 8](https://automatetheboringstuff.com/2e/chapter8) in the book is about input validation, but it's heavily based on the [PyInputPlus](https://pyinputplus.readthedocs.io/en/latest/) library, which was written by the book author and [isn't used widely](https://github.com/asweigart/pyinputplus/network/dependents?dependent_type=PACKAGE) outside of the book. Thus, we are skipping the topic in this course.*

**Instead, this lab covers [chapter 10](https://automatetheboringstuff.com/2e/chapter10/) from the book.**

### Optional resources

You can find more information about the modules used in this lab in the Python documentation:

- [`shutil` — High-level file operations](https://docs.python.org/3/library/shutil.html)
- [`os` — Miscellaneous operating system interfaces](https://docs.python.org/3/library/os.html)
- [`zipfile` — Work with ZIP archives](https://docs.python.org/3/library/zipfile.html)

For additional in-depth coverage on those modules, see PyMOTW (Python Module of the week):

- [`shutil` — High-level File Operations](https://pymotw.com/3/shutil/index.html)
- [`os` — Portable access to operating system specific features](https://pymotw.com/3/os/index.html)
- [`zipfile` — ZIP Archive Access](https://pymotw.com/3/zipfile/index.html)

We are also going to use the third-party `send2trash` library - check [its README file](https://github.com/arsenetar/send2trash) for more information.

Relevant Real Python tutorials:

- [Working With Files in Python](https://realpython.com/working-with-files-in-python/) (also contains some material on the prior lab)

You might also find the `pathlib`-related material from the last lab useful:

- [Python docs: `pathlib` — Object-oriented filesystem paths](https://docs.python.org/3/library/pathlib.html)
- [Real Python: Python 3's pathlib Module: Taming the File System](https://realpython.com/python-pathlib/)
- [PyMOTW: pathlib — Filesystem Paths as Objects](https://pymotw.com/3/pathlib/index.html)

## Warnings

**Please read this**, so you don't end up like here:

![XKCD 293: RTFM](https://imgs.xkcd.com/comics/rtfm.png)

*([XKCD 293: RTFM](https://xkcd.com/293/); a [man page](https://en.wikipedia.org/wiki/Man_page) is the usual way to get a command's documentation on Linux/Unix systems)*


**In this lab, you are going to delete files permanently.** Please be careful *before* running your code, to avoid accidentally deleting e.g. this file or other labs. If in doubt, consider running the `!submit` command first, or downloading the file via the right-click menu in the file explorer at the left.

If something goes wrong, and you did run `!submit` or download the lab beforehand, it will be easy to restore things: Either upload the files via the upload arrow in the file browser; or contact us if you need a backup from the submission repository. If you didn't do so and something goes wrong, there's likely **nothing we can do**.

Some additional hints to avoid interefering with the tests and submission process:

- **Do not use `os.chdir()`** in your solutions, as this will break the `!submit` command and is generally seen as bad practice.
- If you want to try out your functions by calling them, do so **in a separate cell**, not in the cell you're implementing your function in. In previous labs, we provided such sample calls - this time, it's your job to "play" with your code to ensure it works.
- Use the provided **separate cells** to try out your code, only define the function in the first cell, without calling it. This is to avoid interfering with the tests running those cells.

## Summary

### Libraries used

```python
# import the shutil module to copy, move, rename and delete folders and its contents
import shutil

# import the os module to permanently delete a single file or a single empty folder and to walk a directory tree.
import os

# import the zipfile module to read, create, extract from and add to zip files
import zipfile

# import the class "Path" to simplify path declarations. Path.home() represents the user's home directory.
from pathlib import Path
home = Path.home()

# import the third-party module send2trash to send files and folders to your computer's trash or recycle bin.
import send2trash
```

If a function (e.g. in `os` or `shutil`) requires a path as a parameter, it usually accepts:

- Either a `Path` object (e.g. `home / "test.txt"`, where `home = Path.home()`)
  - Keep in mind `from pathlib import Path` will be necessary
- Or a string (`"/home/harry/test.txt"`)
  - On Windows, keep in mind you'd have to use double backslashes: `"C:\\Users\\Username\\test.txt"`
  - ...or use a raw string: `r"C:\Users\Username\test_folder"`

### Copying, Moving and Renaming Files and Folders

#### Copying Files and Folders

`shutil.copy(source, destination)` will copy the file at the source path to the folder at the destination path. If the destination path is a filename, it will be used as the new name of the copied file. This returns the path of the copied file.

`shutil.copytree(source, destination)` will copy an entire folder and all of its contents to the destination path. This function returns the path of the copied folder.

**Attention:** If the destination folder already contains a file with the same name as `source`, it will be overwritten. <br>
**Note:** There is no distinction between non-existant folders and filenames - thus, if you e.g. want to copy a `plants.txt` file to `/home/harry/notes` but no such folder exists, the `plants.txt` file will end up as a file `notes` in `/home/harry`.

#### Moving and Renaming Files

`shutil.move(source, destination)` will move the file at the source path to the destination path, renamed to the specified filename. This is also used to simply rename files in the same folder.

**Attention:** If the destination folder already contains a file with the same name as `source`, it will be overwritten. <br>
**Note:** There is no distinction between non-existant folders and filenames - thus, if you e.g. want to move a `plants.txt` file to `/home/harry/notes` but no such folder exists, the `plants.txt` file will end up as a file `notes` in `/home/harry`.

### Deleting Files and Folders

#### Permanently Deleting Files and Folders

`os.unlink(path)`/ `Path.unlink()` will permanently delete a single file specified in the path. 

`os.rmdir(path)` / `Path.rmdir()` will permanently delete a single empty folder specified in the path.

`shutil.rmtree(path)` will permanently delete the folder and all its contents specified in the path.

These functions can be used for single files and folders or within a for-loop to mass delete files.

**Attention:** To avoid accidental deletions, comment out the functions first and test with ```print()``` lines. Uncomment if the result is as expected.

#### Safely Deleting Files and Folders

`send2trash.send2trash(path)` will send the file or folder specified in path to your computer's trash or recycle bin.

**Note:** As the file is still in the recycle bin, this operation doesn't free up disk space.

If working locally, send2trash needs to be installed, e.g. by running `pip install send2trash` in a terminal. The [pip command](https://pip.pypa.io/en/latest/) is the usual way to install third-party packages from [PyPI](https://pypi.org/), the "Python Package Index".

### Walking a Directory Tree

In the last lab, you've already seen how to iterate through the files in a directory using `Path.iterdir()`. But sometimes you need to access the entire content of a folder, including all subfolders.

To do this, Python provides the `os.walk(path)` function. Note, however, that it's usage is a bit confusing, as it produces three values for each folder (foldername, subfolders and filenames). When using `Path` objects, consider using [Path.rglob()](https://docs.python.org/3/library/pathlib.html#pathlib.Path.rglob) with `*` as pattern instead, e.g. `for path in Path.home().rglob('*'):`.

### Working with the Zipfile Module 

A zip file contains compressed files and folders to minimize its size and allows us to package several folders into one file, making it easier to send it over the internet.

#### Creating a Zip File

`with zipfile.ZipFile('path_of_new.zip', 'w', compression=zipfile.ZIP_DEFLATED) as new_zip:` opens a zipfile object in write mode. Similar to when using `open` for normal files, use a `with` statement to automatically finish and close the zip file after utilizing `write()`.

Using `compression`, we can set the compression mode to [deflate](https://en.wikipedia.org/wiki/DEFLATE) for all files, so that we don't need to specify `compress_type` for each individual file (like shown in the book). The default `compression` is "store", which means that files aren't compressed at all. Compression typically makes the ZIP files smaller than the data stored in them, by storing repeated data in clever ways.

`new_zip.write(file, arcname)` creates a new file in the zip, with the data from the given `file`. Using `arcname`, we can control how the file will be organized inside the zip archive. For example, if we would add `Path.home() / "school" / "potions" / "polyjuice.xlsx"` to the archive without specifying `arcname`, unzipping the zip would create this folder structure:

```
home
└── harry
    └── school
        └── potions
            └── polyjuice.xlsx
```

which is usually not what we want. Normally, we might only want to have e.g. the `potions` folder inside the zip - to do so, we'd do something like:

```python
zip_root_path = Path.home() / "school"   # All paths in the zip should be relative to this directory
polyjuice_path = zip_root_path / "potions" / "polyjuice.xlsx"

with zipfile.ZipFile("files_for_hermione.zip", "w", compression=zipfile.ZIP_DEFLATED) as hermione_zip:
   hermione_zip.write(polyjuice_path, arcname=polyjuice_path.relative_to(zip_root_path))
```

**Attention:** Setting the mode to `w` will erase the contents of the specified zip file, if it already exists. If you want to add files to an existing zip file, use `zipfile.ZipFile('path_of_existing.zip', 'a')`.

#### Reading a Zip File

Using the default `'r'` file mode for `ZipFile`, we can get information about files inside an existing zip:

```python
import zipfile
from pathlib import Path

zip_path = Path.home() / "existing.zip"

# create a ZipFile object using the path of an existing zip file
with zipfile.ZipFile(zip_path) as zip_example:
    # List the strings of all files and folders contained in 'existing.zip'
    print(zip_example.namelist())

    # With .getinfo(), we can get a ZipInfo object, which holds information about a single file in the zip file
    # (e.g. file_size or compress_size)
    info = zip_example.getinfo("file.txt")
    print(f"file.txt got compressed from {info.file_size} to {info.compress_size}")
```

#### Extracting a Zip File

```python
import zipfile
from pathlib import Path

zip_path = Path.home() / "existing.zip"
output_path = Path.home() / "extracted"
output_path.mkdir()
    
# Opening the zip and creating a ZipFile object
with zipfile.ZipFile(zip_path) as zip_example:

    # extract all the contents of the specified zip file into the home directory.
    zip_example.extractall()
    
    # to extract to an existing folder, specify the path of the folder as an argument.
    zip_example.extractall(output_path)

    # extract a single file from the zip.
    zip_example.extract("test.txt")

    # extract a single file from the zip and put it in the specified folder.
    zip_example.extract("test.txt", output_path)
```

## Exercises

### Exercise 1: Preparation

Create the following folder structure, to check later if your functions work the way they should.

Your function will get the root path (`exercises`) as an argument, and should create the following tree of files:

```bash
exercises
├── images
│   ├── img_0.png
│   ├── img_1.png
│   ├── img_2.png
│   ├── ...etc...
│   ├── img_8.png
│   └── img_9.png
└── notes
    ├── empty_folder_1
    ├── empty_folder_2
    ├── folder_3
    │   └── text_file.txt
    ├── text_file_1.txt
    ├── text_file_2.txt
    ├── text_file_3.txt
    └── text_file_4.txt
```

In [6]:
import pathlib
import shutil


def create_files(dest_path):
    if dest_path.exists():
        shutil.rmtree(dest_path)
    dest_path.mkdir()

    # Create images folder and add image files
    images_path = dest_path / 'images'
    images_path.mkdir(parents=True, exist_ok=True)
    for i in range(10):
        open(images_path / f'img_{i}.png', 'w')
    
    # Create notes folder and add files and folders
    notes_path = dest_path / 'notes'
    notes_path.mkdir(parents=True, exist_ok=True)
    (notes_path / 'empty_folder_1').mkdir(parents=True, exist_ok=True)
    (notes_path / 'empty_folder_2').mkdir(parents=True, exist_ok=True)
    folder_3_path = notes_path / 'folder_3'
    folder_3_path.mkdir(parents=True, exist_ok=True)
    open(folder_3_path / 'text_file.txt', 'w')
    for i in range(1, 5):
        open(notes_path / f'text_file_{i}.txt', 'w')

Use this separate cell to run your code.
You **shouldn't change it** (as the given folder is deleted!), but your code needs to work with other folders too.

We pass the `exercises` path to it, which will be created next to this notebooks path (since it's a relative path, it's based on the current working directory). In later exercises, you will re-use the `exercises_path` variable defined in this cell.

In [8]:
exercises_path = pathlib.Path("exercises")
create_files(exercises_path)

### Exercise 2: Zipping Files

Write a function called `create_archive(destination_zip, source_files, append)` that zips all the files and subfolders of a given directory. The function has the following arguments:

* `destination_zip`: path (including zip filename) to the destination zip.
* `source_files`: path to the folder whose contents should be zipped.
* `append`: A boolean. If `True`, appends the provided `source_files` to the given zip. If `False`, an existing .zip file is overwritten. Default value is `False`.

Additional requirements:

* The zip should *not* contain the full paths to the files, but the relative path to `source_files` (see example)
* Return value: A list of all the files and folders in the zip, as strings, with the path as stored in the zip file. Hint: Check the documentation on zipping.

You can use the files and folder structure used in the previous exercise to test the expected behavior of your function.

Expected behavior (the exact order of the files can change, which is why we use `sorted(...)`):

```python
# dest: path to the zip file to create
# notes_source, images_source: paths to the source, e.g. exercises/notes

>>> sorted(create_archive(dest, notes_source))
[
    'empty_folder_1/',
    'empty_folder_2/',
    'folder_3/',
    'folder_3/text_file.txt',
    'text_file_1.txt',
    'text_file_2.txt',
    'text_file_3.txt',
    'text_file_4.txt'
]

>>> sorted(create_archive(dest, images_source, append=True))
[
    'empty_folder_1/',
    'empty_folder_2/',
    'folder_3/',
    'folder_3/text_file.txt',
    'img_0.png',
    'img_1.png',
    'img_2.png',
    'img_3.png',
    'img_4.png',
    'img_5.png',
    'img_6.png',
    'img_7.png',
    'img_8.png',
    'img_9.png',
    'text_file_1.txt',
    'text_file_2.txt',
    'text_file_3.txt',
    'text_file_4.txt'
]
```

In [1]:
import zipfile
 
def create_archive(destination_zip, source_files, append=False):
    try:
        if append:
            with zipfile.ZipFile(destination_zip, 'a', compression=zipfile.ZIP_DEFLATED) as appendZip:
                addFilesToZip(appendZip, source_files, source_files) 
            return [str(file_info.filename) for file_info in appendZip.infolist()]
        else:
            with zipfile.ZipFile(destination_zip, 'w', compression=zipfile.ZIP_DEFLATED) as newZip:
                addFilesToZip(newZip, source_files, source_files) 
            return [str(file_info.filename) for file_info in newZip.infolist()]
    except Exception:
        return []
 
def addFilesToZip(zip, directory, base_path):
    for item in directory.iterdir():
        if item.is_dir():
            zip.write(item, str(item.relative_to(base_path)))
            addFilesToZip(zip, item, base_path)
        elif item.is_file():
            zip.write(item, str(item.relative_to(base_path)))

Use these separate cells to try out your code.
Your code should work with the example below, but you're free to change it.

In [9]:
dest_path = exercises_path / "ex2.zip"
notes_source = exercises_path / "notes"

sorted(create_archive(dest_path, notes_source))

['empty_folder_1/',
 'empty_folder_2/',
 'folder_3/',
 'folder_3/text_file.txt',
 'text_file_1.txt',
 'text_file_2.txt',
 'text_file_3.txt',
 'text_file_4.txt']

In [10]:
images_source = exercises_path / "images"

sorted(create_archive(dest_path, images_source, append=True))

['empty_folder_1/',
 'empty_folder_2/',
 'folder_3/',
 'folder_3/text_file.txt',
 'img_0.png',
 'img_1.png',
 'img_2.png',
 'img_3.png',
 'img_4.png',
 'img_5.png',
 'img_6.png',
 'img_7.png',
 'img_8.png',
 'img_9.png',
 'text_file_1.txt',
 'text_file_2.txt',
 'text_file_3.txt',
 'text_file_4.txt']

### Exercise 3: Selected Extraction

Write a function called `extract_selected(archive_path, dest_path, file_extension)` that does the following:
* Extracts only selected items from a zipped archive to a given destination.
* Takes the following arguments:
    - `archive_path`: Filepath to the zip.
    - `dest_path`: Filepath to where the contents should be extracted.
    - `file_extension`: A string. The type of files that should be extracted, for example `txt` or `png`.
    - You can assume that `archive_path` and `dest_path` already exist, and that `file_extension` will not be an empty string.
* Returns a list of all the extracted files, as strings, including the folder given in `dest_path` (see example).
    - Hint: The return value of `.extract(...)` will be in the proper format.

Expected output (the exact order of the files is not relevant):

```python
# archive_path: A Path object like exercises/ex2.zip
# dest_path: A Path object like exercises/ex3/
>>> extract_selected(archive_path, dest_path, "png")
[
    'exercises/ex3/img_0.png',
    'exercises/ex3/img_1.png',
    'exercises/ex3/img_2.png',
    'exercises/ex3/img_3.png',
    'exercises/ex3/img_4.png',
    'exercises/ex3/img_5.png',
    'exercises/ex3/img_6.png',
    'exercises/ex3/img_7.png',
    'exercises/ex3/img_8.png',
    'exercises/ex3/img_9.png',
]
# The dest_path folder now contains the png-files
```

In [3]:
import zipfile
import os
 
def extract_selected(archive_path, dest_path, file_extension):
    try:
        files = []
        with zipfile.ZipFile(archive_path, 'r') as zip:
            for file in zip.infolist():
                if file.filename.lower().split(".")[-1] == file_extension.lower():
                    zip.extract(file, dest_path)
                    files.append(str(os.path.join(dest_path, file.filename)))
        return files
    except Exception:
        return []

Use this separate cell to try out your code.
Your code should work with the example below, but you're free to change it.

In [11]:
archive_path = exercises_path / "ex2.zip"  # reusing the file we created above
dest_path = exercises_path / "ex3"
dest_path.mkdir(exist_ok=True)

extract_selected(archive_path, dest_path, "png")

['exercises/ex3/img_2.png',
 'exercises/ex3/img_1.png',
 'exercises/ex3/img_3.png',
 'exercises/ex3/img_7.png',
 'exercises/ex3/img_0.png',
 'exercises/ex3/img_9.png',
 'exercises/ex3/img_6.png',
 'exercises/ex3/img_8.png',
 'exercises/ex3/img_4.png',
 'exercises/ex3/img_5.png']

### Exercise 4: Deleting Files

Implement these two functions:

First function: `delete_files(rootdir, extension)`
* Sends all files of a given extension to the computer's recycle bin.
* `rootdir` is a `Path` object and the base directory to start from.
* `extension` is a string, for example `txt`.
* Return value:
    - A list of the files that were moved to the recycle bin, an empty list if nothing was moved.
    - Every file is a string, with the file path **relative to the given `rootdir`** (see example)


Second function: `delete_empty_folders(rootdir)`
* Permanently deletes all empty folders.
* `rootdir` is a `Path` object and the base directory to start from.
* Return value:
    - A list of the deleted folders, an empty list if nothing was deleted.
    - Every folder is a string, with its path **relative to the given `rootdir`** (see example)
* No need to implement a logic that deletes folders recursively, i.e. you do not have to consider the case `rootdir/subfolder/emptyfolder`.


If you execute the functions on the sample folder structure, only the following files should be left:

* `ex2.zip`
* The `notes` folder, with three empty folders in it

Expected output (the exact order of the files is not relevant):

```python
>>> delete_files(exercises_path, "txt")
[
    'notes/text_file_4.txt',
    'notes/text_file_2.txt',
    'notes/text_file_3.txt',
    'notes/text_file_1.txt',
    'notes/folder_3/text_file.txt',
]

>>> delete_files(exercises_path, "png")
[
    'images/img_2.png',
    'images/img_4.png',
    'images/img_5.png',
    'images/img_0.png',
    'images/img_8.png',
    'images/img_9.png',
    'images/img_3.png',
    'images/img_7.png',
    'images/img_6.png',
    'images/img_1.png',
    'ex3/img_2.png',
    'ex3/img_4.png',
    'ex3/img_5.png',
    'ex3/img_0.png',
    'ex3/img_8.png',
    'ex3/img_9.png',
    'ex3/img_3.png',
    'ex3/img_7.png',
    'ex3/img_6.png',
    'ex3/img_1.png',
]

>>> delete_empty_folders(exercises_path)
['images', 'ex3']
```

In [12]:
import send2trash
 
 
def delete_files(rootdir, extension):
    returnList = []
 
    for root, asd, files in os.walk(rootdir):
        for file in files:
            if file.endswith("." + extension):
                folderpath = os.path.join(root, file)
                send2trash.send2trash(folderpath)
                returnList.append(os.path.relpath(folderpath, rootdir))
 
    return returnList
 
def delete_empty_folders(rootdir):
    returnList = []
    for root, dirs, asd in os.walk(rootdir, topdown=False):
        for directory in dirs:
            folderpath = os.path.join(root, directory)
            if not os.listdir(folderpath):
                os.rmdir(folderpath)
                returnList.append(os.path.relpath(folderpath, rootdir))
 
    return returnList

Use these separate cells to try out your code.
Your code should work with the example below, but you're free to change it.

In [None]:
delete_files(exercises_path, "txt")

In [None]:
delete_files(exercises_path, "png")

In [None]:
delete_empty_folders(exercises_path)

# Feedback form

We'd like to get some feedback for this lab! To give us feedback, double-click the cells below and edit it in the appropriate places:

- Replace `[ ]` by `[x]` to cross checkboxes, they should look like this once you finish editing:
  * [ ] uncrossed
  * [x] crossed
- Add additional text where indicated (optional)

**Difficulty:**

The difficulty of the materials in this lab was:

- [ ] Much too difficult
- [ ] A little too difficult
- [ ] Just right
- [ ] A little too easy
- [ ] Much too easy

**Time:**

For one block (usually multiple labs), you should spend around 4h at home and 4h in the course. There are two labs in this block, so we'd expect you to spend a total of **around 4h on this one (both reading and solving)**.

For the materials in *this lab*, do you think you spent:

- [ ] Much more time
- [ ] A little more time
- [ ] About the scheduled amount of time
- [ ] A little less time
- [ ] Much less time

**Any topics you found especially enjoyable or difficult in this lab?**

<!-- Write below this line -->

**Anything else you'd like to tell us?**

<!-- Write below this line -->

# Submit

First, **save this file** (no grey dot should be visible in the tab above). Then, run the cell below to submit your work and see the results. You can submit as often as you like.

In case of problems:
- *Don't panic!*
- If you're in a course, show the error to your instructor.
- If the **tests failed** and you suspect an issue in the tests:
    * Mail your instructor, Cc `florian.bruhin@ost.ch` (if instructor != florian)
    * **No attachments** necessary.
- If the **submission failed** (error message, etc.):
    * Mail your instructor, Cc `florian.bruhin@ost.ch` (if instructor != florian)
    * Attach a screenshot of the issue
    * Attach the notebook (File > Download).

In [14]:
!submit organizing-files.ipynb

Last change: [1;36m2[0m seconds ago

[2K[32m⠇[0m [1;32mTesting...[0m0m
[1A[2K╭─────────────────────────────── creating files ───────────────────────────────╮
│ [32m100% passed[0m                                                                  │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────── deleting files ───────────────────────────────╮
│ [32m100% passed[0m                                                                  │
╰──────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────── selected extraction ─────────────────────────────╮
│ [32m100% passed[0m                                                                  │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────── zipping files ────────────────────────────────╮
│ [32m100% passed[0m                                                       