# Exercises for Introduction to Python for Data Science

Week 05 - Files and Object Oriented Programming

Matthias Feurer and Andreas Bender  
2027-03-05

# Exercise 1

Write a function called `replace_all` that takes as arguments a pattern
string, a replacement string, and two filenames. It should read the
first file and write the contents into the second file (creating it if
necessary). If the pattern string appears anywhere in the contents, it
should be replaced with the replacement string.

Here’s an outline of the function to get you started.

``` python
def replace_all(old, new, source_path, dest_path):
    # read the contents of the source file, open the file using a context manager

    # replace the old string with the new
    
    # write the result into the destination file
```

To test your function, read the file `photos/notes.txt`, replace
`'photos'` with `'images'`, and write the result to the file
photos/new_notes.txt. You can obtain the files `photos/notes.txt` in the
zip-file available at
<https://github.com/AllenDowney/ThinkPython/blob/v3/photos.zip>.

# Exercise 2

In a large collection of files, there may be more than one copy of the
same file, stored in different directories or with different file names.
The goal of this exercise is to search for duplicates. As an example,
we’ll work with image files in the `photos` directory.

Here’s how it will work:

-   We’ll use the `walk` function to search this directory for files
    that end with one of the extensions in `config['extensions']` (hint:
    have a look at the config given in the lecture).
-   For each file, we’ll use `md5_digest` from Checking for equivalent
    files to compute a digest of the contents.
-   Using a dict, we’ll make a mapping from each digest to a list of
    paths with that digest.
-   Finally, we’ll search the dict for any digests that map to multiple
    files.
-   If we find any, we’ll use function same_contents to confirm that the
    files contain the same data.

Here are some suggestions on which functions to write first. In a second
step, these can be brought together to solve the task above.

1.  To identify image files, write a function called `is_image` that
    takes a path and a list of file extensions, and returns True if the
    path ends with one of the extensions in the list. Hint: Use
    `os.path.splitext`. Also: How can this be solved using pathlib?
2.  Write a function called `add_path` that takes as arguments a path
    and a dict. It should use `md5_digest` to compute a digest of the
    file contents. Then it should update the dict, either creating a new
    item that maps from the digest to a list containing the path, or
    appending the path to the list if it exists. Hint: can you use a
    specialized version of dict?
3.  Write a function called `walk_images` that takes a dict and a
    directory and uses a `walk` function to walk through the files in
    the directory and its subdirectories. For each file, it should use
    `is_image` to check whether it’s an image file and `add_path` to add
    it to the dict.

When everything is working, you can use the following program to create
the dict, search the `photos` directory and add paths to the shelf, and
then check whether there are multiple files with the same digest.

``` python
walk_images(mapping, 'photos')

for digest, paths in mapping.items():
    if len(paths) > 1:
        print(paths)
```

You should find one pair of files that have the same digest. Use
`same_contents` to check whether they contain the same data.

Bonus: if you are eager, you can try to work directly on the zip file
(without extracting it) using the
[zipfile](https://docs.python.org/3/library/zipfile.html) module.

Bonus 2: save the data structure `mapping` using the three storage
formats discussed in the lecture: YAML, JSON and pickle. Open each file
with the text editor and check its readability.

# Exercise 3

Write a function called `subtract_time` that takes two `Time` objects
and returns the interval between them in seconds – assuming that they
are two times during the same day.

# Exercise 4

Write a function called `is_after` that takes two `Time` objects and
returns `True` if the first time is later in the day than the second,
and `False` otherwise.

``` python
def is_after(t1, t2):
    """Checks whether `t1` is after `t2`.
    
    >>> is_after(make_time(3, 2, 1), make_time(3, 2, 0))
    True
    >>> is_after(make_time(3, 2, 1), make_time(3, 2, 1))
    False
    >>> is_after(make_time(11, 12, 0), make_time(9, 40, 0))
    True
    """
    return None
```

# Exercise 5

Here’s a definition for a `Date` class that represents a date – that is,
a year, month, and day of the month.

``` python
class Date:
    """Represents a year, month, and day"""
```

1.  Write a function called `make_date` that takes `year`, `month`, and
    `day` as parameters, makes a `Date` object, assigns the parameters
    to attributes, and returns the result the new object. Create an
    object that represents June 22, 1933.
2.  Write a function called `print_date` that takes a `Date` object,
    uses an f-string to format the attributes, and prints the result. If
    you test it with the `Date` you created, the result should be
    `1933-06-22`.
3.  Write a function called `is_after` that takes two `Date` objects as
    parameters and returns `True` if the first comes after the second.
    Create a second object that represents September 17, 1933, and check
    whether it comes after the first object.

Hint: You might find it useful to write a function called
`date_to_tuple` that takes a Date object and returns a tuple that
contains its attributes in year, month, day order.

# Exercise 6

In the previous chapter, a series of exercises asked you to write a
`Date` class and several functions that work with `Date` objects. Now
let’s practice rewriting those functions as methods.

1.  Write a definition for a `Date` class that represents a date – that
    is, a year, month, and day of the month.
2.  Write an `__init__` method that takes `year`, `month`, and `day` as
    parameters and assigns the parameters to attributes. Create an
    object that represents June 22, 1933.
3.  Write `__str__` method that uses an f-string to format the
    attributes and returns the result. If you test it with the `Date`
    you created, the result should be 1933-06-22.
4.  Write a method called `is_after` that takes two `Date` objects and
    returns `True` if the first comes after the second. Create a second
    object that represents September 17, 1933, and check whether it
    comes after the first object.

Hint: You might find it useful write a method called `to_tuple` that
returns a tuple that contains the attributes of a `Date` object in
year-month-day order.