# Class Design Exercise

<style>
section.present > section.present { 
    max-height: 90%; 
    overflow-y: scroll;
}
</style>

<small><a href="https://colab.research.google.com/github/brandeis-jdelfino/cosi-10a/blob/main/lectures/notebooks/12_class_design.ipynb">Link to interactive slides on Google Colab</a></small>

# Exercise

Create a program to model a library network:
* These libraries deal in books only - no need to model magazines, video, etc.
* There are multiple library branches, each with their own book inventory.
* Library patrons have accounts in the library network, and can have books out on loan.
* Loans have due dates.
* Data on books, branches, and patrons will be loaded from files.

Our program should be able to:
* List the branches and their info.
* Print out all the available and/or checked out books from a branch.
* Print out the books a patron has checked out.
* Provide very simple text search (substring matching) over book titles.

Today we'll focus mostly on how we can model the data with classes and data structures, and less on the outer scaffold of a program like this.

# File structure

We have 3 files:
* `branches.csv`
  * A comma-delimited file that contains one row per branch, with 3 string fields: `id`, `name`, and `address`.
* `books.json`
  * A JSON file with a list of dictionaries. Each dictionary represents a unique title. They have the following fields:
    * `id` (str)
    * `name` (str)
    * `description` (str)
    * `copies`
      * A list of dictionaries, each of which has several keys: `copy_id`, `branch_id`, `due_date`, and `patron_id`. `due_date` and `patron_id` are `None` if the book is not checked out.
* `patrons.json`
  * A JSON file that contains a list of dictionaries. Each dictionary represents a patron. They have the folloing fields:
    * `id` (str)
    * `name` (str)
    * `checked_out_copy_ids` (list of str)

# Where do we start??

This is a big problem, we can't tackle it all at once.

Two main strategies:
* "Bottom up"
  * Start by defining small classes/functions for basic operations we think we'll need, then combine them into the larger, more complicated user-facing operations.
  * e.g. Define classes and some methods for `Branch`, `Book`, and `Patron`, then implement functions that do each of the larger operations above using the class methods.
* "Top down"
  * Start by writing the main operations, noting and creating classes and methods as we need them. 
  * e.g. Write a function that prints out a patron's checked out books, and discover that we need a Patron class, and a way to get a book by copy_id. Then implement those methods.
  
Neither strategy is necessarily better, and you don't need to follow either one dogmatically - they just give you a place to start.

I prefer "top down" in this case, because the exact set of operations we'll need isn't clear. With "bottom up" we risk defining a bunch of classes / methods we don't need.

# Listing branches

This is the simplest operation - load `branches.csv` into a list, and then print each branch.

In [None]:
import csv

def load_branches():
    branches = []
    with open('../../data/library/branches.csv') as f:
        reader = csv.reader(f, delimiter=',')
        for line in reader:
            branches.append(line)
    return branches

def list_branches():
    print("Listing all branches...")    
    branches = load_branches()
    for branch in branches:
        print(f"Branch name: {branch[1]}, address: {branch[2]}")

## Add some structure

If this was the only operation we needed to support, this code is fine. However, we have a few other operations that will deal with branches as well. It will make sense to add some structure to this data by introducing a `Branch` class.

In [None]:
class Branch:
    def __init__(self, branch_id, name, address):
        self.id = branch_id
        self.name = name
        self.address = address

    def __str__(self):
        return f"{self.name} (id: {self.id}); Address: \"{self.address}\", id: {self.id}"
    
def load_branches():
    branches = []
    with open('../../data/library/branches.csv') as f:
        reader = csv.reader(f, delimiter=',')
        for line in reader:
            branches.append(Branch(line[0], line[1], line[2]))
    return branches

def list_branches():
    print("Listing all branches...")
    branches = load_branches()
    for branch in branches:
        print(branch)

## Side note: the `__str__` method

If you add a method with the signature `__str__(self)` to a class, and make it return a string, then Python will use it when converting your class to a string!

# Listing books for a branch


This one makes us confront more tricky data modeling questions. "Books" in this exercise have metadata, but also a list of copies, each of which can belong to a different branch and be checked out separately.

We need to list all books from a branch, with an optional filter on book availability.

Do we: 
* Load a list of all books, and iterate over them to find books from the branch we care about?
* Build a list of book copies for a branch when we load the data, and store them on a `Branch` object? 

The first option is simpler for now, let's write a function that does this.

In [None]:
import json
def load_books():
    with open('../../data/library/books.json', 'r') as f:
        return json.load(f)

def list_books_for_branch(branch_id, available_only=True):
    print(f"Listing books for {branch_id=}")
    books = load_books()
    for book in books:
        for copy in book['copies']:
            if copy['branch_id'] == branch_id:
                if available_only and (copy['due_date'] is None):
                    print(book['title'])

## Good enough?

This is ok. But what if we need to print the branch name, instead of the id?

In [None]:
import json
import csv

class Branch:
    def __init__(self, branch_id, name, address):
        self.id = branch_id
        self.name = name
        self.address = address

    def __str__(self):
        return f"{self.name} (id: {self.id}); Address: \"{self.address}\", id: {self.id}"
    
    
def load_branches():
    branches = []
    with open('../../data/library/branches.csv') as f:
        reader = csv.reader(f, delimiter=',')
        for line in reader:
            branches.append(Branch(line[0], line[1], line[2]))
    return branches


def load_books():
    with open('../../data/library/books.json', 'r') as f:
        return json.load(f)

    
def list_books_for_branch(branch_id, available_only=True):
    books = load_books()
    branches = load_branches()

    branch = None
    for b in branches:
        if branch.id == branch_id:
            branch = b
            break
            
    print(f"Listing books for {branch.name}")
    
    for book in self.books.values():
        for copy in book['copies']:
            if copy['branch_id'] == branch_id:
                if available_only and (copy['due_date'] is not None):
                    continue

                print(book['title'])
                break

## Can we do better?

This works! But we can improve it.

1. Looking up a branch by `branch_id` seems like an operation we'll need again. Let's apply functional decomposition.
2. Storing the branches in a list makes finding a branch inefficient - let's use a dictionary from `branch_id` -> `Branch` instance.

We're going to **refactor** - change the way our code is structured without changing its behavior.

We'll introduce a `BranchCollection` class to hold all our branches, and provide easy lookup of `Branch` instance by `branch_id`:

In [None]:
class Branch:
    def __init__(self, branch_id, name, address):
        self.id = branch_id
        self.name = name
        self.address = address

    def __str__(self):
        return f"{self.name} (id: {self.id}); Address: \"{self.address}\", id: {self.id}"

    
class BranchCollection:
    def __init__(self):
        self.branches = {}
        
    def load_from_file(self, filename):
        with open(filename) as f:
            reader = csv.reader(f, delimiter=',')
            for line in reader:
                self.branches[line[0]] = Branch(line[0], line[1], line[2])
                
    def list_branches(self):
        print("Listing all branches...")
        for branch in self.branches.values():
            print(branch)
    
    def get_branch(self, branch_id):
        return self.branches.get(branch_id)
    
    
def list_books_for_branch(branch_id):
    books = load_books()
    branches = BranchCollection()
    branches.load_from_file('../../data/library/branches.csv')
    
    branch = branches.get_branch(branch_id)
    print(f"Listing books for {branch.name}")
    
    for book in self.books.values():
        for copy in book['copies']:
            if copy['branch_id'] == branch_id:
                if available_only and (copy['due_date'] is not None):
                    continue

                print(book['title'])
                break

## Any suggestions?

How does this feel? What else could we refactor?

At this point in the problem, it's still quite hard to say whether one design is "better" than another. Let's move on to the next operation, rather than risk over-designing things we won't need.

# Listing books checked out by a patron

Now we need to load the third file, containing the patron data. We'll now have three related data collections - Branches, Books, and Patrons. It seems like a good idea to create a class to hold this relationship.

Let's refactor again. We'll repurpose `BranchCollection` into a `LibraryNetwork`, and make it hold all of our data.

In [None]:
class LibraryNetwork:
    def __init__(self):
        self.branches = {}
        self.books = {}
        self.patrons = {}
        
    def _load_branches(self, filename):
        with open(filename) as f:
            reader = csv.reader(f, delimiter=',')
            for line in reader:
                self.branches[line[0]] = Branch(line[0], line[1], line[2])

                def _load_books(self, filename):

    def _load_books(self, filename):
        with open(filename, 'r') as f:
            book_list = json.load(f)
            for book in book_list:
                self.books[book['id']] = book
    
    def _load_patrons(self, filename):
        with open(filename, 'r') as f:
            patron_list = json.load(f)
            for patron in patron_list:
                self.patrons[patron['id']] = patron

    def load_from_files(self, branches_filename, books_filename, patrons_filename):
        self._load_branches(branches_filename)
        self._load_books(books_filename)
        self._load_patrons(patrons_filename)

            
    def get_branch(self, branch_id):
        return self.branches.get(branch_id)

    def list_branches(self):
        print("Listing all branches...")
        for branch in self.branches:
            print(branch)
    
    def list_books_for_branch(self, branch_id, available_only=True):
        branch = self.get_branch(branch_id)
        print(f"Listing books for {branch.name}")

        for book in self.books.values():
            for copy in book['copies']:
                if copy['branch_id'] == branch_id:
                    if available_only and (copy['due_date'] is not None):
                        continue

                    print(book['title'])
                    break

## What are those leading underscores? 

We added `_load_branches`, `_load_books`, and `_load_patrons` - why did I name them that way?

The leading underscore indicates they are "private". To understand what that means, let's first talk about **interfaces**, **encapsulation**.

# Interfaces

One of the poweful things about classes (and modules! and packages!) is that they provide an **interface**: a set of methods and/or data structures that can be used to interact with the class (or module! or package!). 

A good interface is:
1. Cohesive
1. Easy to understand
1. Easy to use
1. Hard to misuse

Designing a great interface - one that is hard to misuse - is harder than it sounds.

# Encapsulation

An interface provides **encapsulation**. This means we've hidden the details of an operation behind an interface. You, or someone else, can use the interface without needing to worry about the details of the implementation. 

This is an important and powerful strategy to keep code structured and understandable as a codebase grows larger.

## Privacy

In Python it is a **convention** that methods that start with underscores (`_`) are considered **private**.

**Private**, in the context of classes, means a data structure or method that is only meant to be accessed from within the class. 
* Making things "private" makes them easier to change later without needing to update every usage of the class.

**Convention** means that there's no enforcement of this at the language level. You can happily name and use functions with leading underscores however you want. 
* But, if other people are using your code, you will confuse them if you expect them to use functions that start with underscores. 
* Conversely, if you are using someone else's classes, avoid accessing private data structure or methods - they are likely to change without warning in future versions of the class, or be hard to use correctly.

Other languages (e.g. Java) have language-level enforcement of privacy - you can declare that a method is "private", and the language will prevent you from accessing it from outside the class.

# Back to the library...

Now we'll add our `list_checkouts_for_patron` method. 

Note that there is a distinction between a "book" and a "copy of a book". Patrons check out copies of books, but the book metadata (title, description) is specified for the book. In order to print out a book title and a due date, we need information about both the book and the copy.

In [None]:
class LibraryNetwork:
    # ...
    def get_patron(self, patron_id):
        return self.patrons.get(patron_id)
    
    def list_checkouts_for_patron(self, patron_id):
        patron = self.get_patron(patron_id)
        print(f"Checkouts for {patron['name']}:")
        for copy_id in patron['checked_out_copy_ids']:
            bc, book = self.get_copy(copy_id)
            print(f"{book['title']} (id: {book['id']}), due: {bc['due_date']}")

    def get_copy(self, copy_id):
        for book in self.books.values():
            for bc in book['copies']:
                if bc['copy_id'] == copy_id:
                    return bc, book
        return None, None

# Last operation: simple text search

Let's add this last function, then we'll look at our whole program. This one is relatively straightforward.

In [None]:
class LibraryNetwork:
    # ...
    
    def find_matching_books(self, substring):
        print(f"Books with \"{substring}\" in the title: ")
        for book in self.books.values():
            if substring.lower() in book['title'].lower():
                print(book['title'])

# The whole thing

[repl.it link](https://replit.com/@cosi-10a-fall23/Library#main.py)

In [None]:
import csv
import json
import random


class Branch:

    def __init__(self, branch_id, name, address):
        self.id = branch_id
        self.name = name
        self.address = address

    def __str__(self):
        return f"{self.name} (id: {self.id}); Address: \"{self.address}\", id: {self.id}"


class LibraryNetwork:

    def __init__(self):
        self.branches = {}
        self.books = {}
        self.patrons = {}

    def _load_branches(self, filename):
        with open(filename) as f:
            reader = csv.reader(f, delimiter=',')
            for line in reader:
                self.branches[line[0]] = Branch(line[0], line[1], line[2])

    def _load_books(self, filename):
        with open(filename, 'r') as f:
            book_list = json.load(f)
            for book in book_list:
                self.books[book['id']] = book

    def _load_patrons(self, filename):
        with open(filename, 'r') as f:
            patron_list = json.load(f)
            for patron in patron_list:
                self.patrons[patron['id']] = patron

    def load_from_files(self, branches_filename, books_filename,
                                            patrons_filename):
        self._load_branches(branches_filename)
        self._load_books(books_filename)
        self._load_patrons(patrons_filename)

    def get_branch(self, branch_id):
        return self.branches.get(branch_id)

    def get_patron(self, patron_id):
        return self.patrons.get(patron_id)

    def get_copy(self, copy_id):
        for book in self.books.values():
            for bc in book['copies']:
                if bc['copy_id'] == copy_id:
                    return bc, book
        return None, None

    def list_branches(self):
        print("Listing all branches...")
        for branch in self.branches.values():
            print(str(branch))

    def list_books_for_branch(self, branch_id, available_only=True):
        branch = self.get_branch(branch_id)
        print(f"Listing books for {branch.name}")

        for book in self.books.values():
            for copy in book['copies']:
                if copy['branch_id'] == branch_id:
                    if available_only and (copy['due_date'] is not None):
                        continue

                    print(book['title'])
                    break

    def list_checkouts_for_patron(self, patron_id):
        patron = self.get_patron(patron_id)
        print(f"Checkouts for {patron['name']}:")
        for copy_id in patron['checked_out_copy_ids']:
            bc, book = self.get_copy(copy_id)
            print(f"{book['title']} (id: {book['id']}), due: {bc['due_date']}")

    def find_matching_books(self, substring):
        print(f"Books with \"{substring}\" in the title: ")
        for book in self.books.values():
            if substring.lower() in book['title'].lower():
                print(book['title'])

# What do you think now?

Do you understand this code? Do you like it? Where could we improve? 

Some things I like:
* We avoided writing a lot of code that we didn't need. 
   * My first instinct with a problem like this is to run off and define a class for every type of object, model the relationships between them, etc. We didn't need most of that.
* The interface is pretty simple, clear, and mostly hard to misuse
   * Our LibraryNetwork can be misused if you create one, but forget to load the data, but otherwise the methods we provided are pretty straightforward.

Some things I don't like:

* No `Book` or `Patron` classes - we access the data directly in the dictionaries that are loaded from the JSON.
   * What happens if the format of the JSON changes? If this code is widely used, we'd need to update many places. 
   * **But**: the code is not widely used, and it is unlikely to ever be used again.  
* Listing the checkouts for a patron is very inefficient
   * We look at every copy of every book to find the copy_id we're looking for.  
   * **But**: the scale of data we're dealing with is tiny (in computer terms): hundreds of books and patrons.
* You need the whole `LibraryNetwork` object in order to link between `Branch`, `Book`, and `Patron`.
   * Everything is referenced by ids, and the only way to get to the objects is to look them up by id in `LibraryNetwork`s data structures. 
   * We could use **references** to link objects together. E.g. a `BookCopy` object could reference a `Book` object, rather than holding a `book_id`. 
   * **But**: we don't need it, the code is small manageable.

# Keep it simple

Managing complexity is the biggest challenge when writing code. 

> "YAGNI" (You Ain't Gonna Need it) - _Kent Beck_

> "Premature optimization is the root of all evil" - _Donald Knuth_

> "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - _Brian W. Kernighan_

It can be tempting to over-engineer: to add features, flexibility, or optimizations that you don't actually need.

Don't do it! Keep it simple! Write the code you need to solve the problem in front of you.

But also: writing simple code is harder than it looks, so don't get discouraged if you sometimes (often) end up with a tangled mess. 

The best, most experienced programmers may build **complicated systems**, but they tend to write the **simplest code**.

# An alternative

We discussed more complicated alternatives. Here's a version of the code that uses references to tie objects together. If we needed this code to be more robust, flexible, or efficient, this might be a good approach.

In this code, the loading of data is much more complicated, because all the different objects reference one another.

The benefit of this version is that access to related objects is simpler, and much more efficient.

[repl.it link](https://replit.com/@cosi-10a-fall23/Library-references-version)

In [None]:
import csv
import json


class Branch:
    def __init__(self, branch_id, name, address):
        self.id = branch_id
        self.name = name
        self.address = address
        self.copies = {}

    def register_copy(self, bc):
        ''' Registers a BookCopy with this branch '''
        self.copies[bc.id] = bc

    def __str__(self):
        return f"{self.name} (id: {self.id}); Address: \"{self.address}\", id: {self.id}"


class Book:

    def __init__(self, book_id, title, description):
        self.id = book_id
        self.title = title
        self.description = description
        self.copies = []

    def make_copy(self, copy_id, branch):
        ''' 
        Creates a BookCopy, and registers it with this book and `branch`

        returns: The new BookCopy
        '''
        bc = BookCopy(copy_id, self, branch)
        branch.register_copy(bc)

        self.copies.append(bc)
        return bc


class BookCopy:

    def __init__(self, copy_id, book, branch):
        self.id = copy_id
        self.book = book
        self.branch = branch
        self.due_date = None
        self.patron = None

    def _register_checkout(self, patron, due_date):
        ''' 
        Registers a checkout of this book.
        Intended to be used from the Patron class only; does not update 
        Patron's list of checked out books.
        '''
        self.patron = patron
        self.due_date = due_date

    def _register_return(self):
        ''' 
        Registers a return of this book.
        Intended to be used from the Patron class only; does not update 
        Patron's list of checked out books.
        '''
        self.patron = None
        self.due_date = None

    def is_available(self):
        ''' Returns True if this book is not checked out '''
        return self.due_date is None


class Patron:

    def __init__(self, patron_id, name):
        self.id = patron_id
        self.name = name
        self.checked_out_copies = {}

    def checkout(self, bc, due_date):
        ''' Checks a copy of a book out '''
        self.checked_out_copies[bc.id] = bc
        bc._register_checkout(self, due_date)

    def return_book(self, bc):
        ''' Marks a copy of a book as returned '''
        del self.checked_out_copies[bc.id]
        bc._register_return()


class LibraryNetwork:

    def __init__(self):
        self.branches = {}
        self.books = {}
        self.copies = {}
        self.patrons = {}

    def _load_branches(self, filename):
        with open(filename) as f:
            reader = csv.reader(f, delimiter=',')
            for line in reader:
                self.branches[line[0]] = Branch(line[0], line[1], line[2])

    def _load_patrons(self, filename):
        with open(filename, 'r') as f:
            patron_list = json.load(f)
            for patron in patron_list:
                self.patrons[patron['id']] = Patron(patron['id'], patron['name'])

    def _load_books(self, filename):
        with open(filename, 'r') as f:
            book_list = json.load(f)

            for book_json in book_list:
                # Steps for loading a single book and its copies:
                # 1. Create the book object
                # 2. For each copy:
                # 2a. Look up its Branch
                # 2b. Make a BookCopy object, linking it to the branch.
                # 2c. If the book is checked out, look up its Patron and 
                #     register the checkout
                # 2d. Record the BookCopy in the Network's list of copies
                # 3. Record the Book in the Network's list of books.
                book = Book(book_json['id'], book_json['title'],
                                        book_json['description'])

                for bc_json in book_json['copies']:
                    branch = self.get_branch(bc_json['branch_id'])
                    bc = book.make_copy(bc_json['copy_id'], branch)

                    if bc_json['patron_id'] is not None:
                        patron = self.get_patron(bc_json['patron_id'])
                        patron.checkout(bc, bc_json['due_date'])
                    self.copies[bc.id] = bc

                self.books[book.id] = book

    def load_from_files(self, branches_filename, books_filename,
                                            patrons_filename):
        print("Load start")
        self._load_branches(branches_filename)
        self._load_patrons(patrons_filename)
        # Books must be loaded after Branches and Patrons - the loading process
        # relies on those objects already existing.
        self._load_books(books_filename)
        print("Load end")

    def get_branch(self, branch_id):
        return self.branches.get(branch_id)

    def get_patron(self, patron_id):
        return self.patrons.get(patron_id)

    def get_copy(self, copy_id):
        return self.copies.get(copy_id)

    def list_branches(self):
        print("Listing all branches...")
        for branch in self.branches.values():
            print(str(branch))

    def list_books_for_branch(self, branch_id, available_only=True):
        branch = self.get_branch(branch_id)
        print(f"Listing books for {branch.name}")

        found_books = {}
        for bc in branch.copies.values():
            if available_only and not bc.is_available():
                continue
            found_books[bc.book.id] = bc.book

        for book in found_books.values():
            print(book.title)

    def list_checkouts_for_patron(self, patron_id):
        patron = self.get_patron(patron_id)
        print(f"Checkouts for {patron.name}:")
        for bc in patron.checked_out_copies.values():
            print(f"{bc.book.title} (id: {bc.id}), due: {bc.due_date}")

    def find_matching_books(self, substring):
        print(f"Books with \"{substring}\" in the title: ")
        for book in self.books.values():
            if substring.lower() in book.title.lower():
                print(book.title)