# Automating Bureaucracy with Python - the Case of the Springfield Bail Fund
## Instructor workbook
(C) 2021, Daniel Guetta

_Many thanks to Drew Feldman (CBS '22) for his invaluable assistance in developing the technical aspects of the case._

This workbook contains instructor instructions for  running the Springfield Bail Fund case.

The case runs in two steps (all specific dates/numbers can be modified in the first section below)
  - First, the instructor generates cases from January 2021 to February 2022 at the Springfield court, and sends out all emails that would have been sent in 2021.
  - The class happens based on those emails.
  - The instructor sends any January 2022 emails, and students carry out the homework based on these new emails
  
This notebook carries out all these steps. To run it, carry out the following steps
  - Modify the "create parameters" section if necessary (the most important change - if you're not teaching this case at Columbia - is to change the `EMAIL_FROM` and `EMAIL_TO` varaiables to reflect the gmail account you've set up)
  - Download client credentials for the email address you're sending emails _from_, and save them in a file called client_credentials.json
  - Run the entire notebook straight through. It will save all emails to a file whose name is specified in the `LEFTOVER_EMAIL_FILE` variable, and send the subject of emails required before class.
  - After class, take the commented-out code in the last cell of this workbook, and run it to read the `LEFTOVER_EMAIL_FILE` and send remaining emails

## Create parameters
This section contains parameters that will determine the 

In [1]:
# The email address from which the eTrack emails will come
EMAIL_FROM = 'cbs.python.courts@gmail.com'

# The email address to which the eTrack emails will go
EMAIL_TO = 'cbs.python.bailfund@gmail.com'

In [2]:
# The initial run of this script will send emails up to a certain date,
# and it will save any remaining emails for sending after class.

# Define the name of the file in which the remaining emails will be
# saved
LEFTOVER_EMAIL_FILE = 'leftover_emails.pickle'

In [3]:
# Location of the image that will appear in the email signatures
SIGNATURE_IMAGE_LINK = 'https://raw.githubusercontent.com/danguetta/bail_fund_case_code/main/Instructor%20code/court-logo.png'

In [4]:
# Average delay - in days - between a case closing and reimbursements
REIMBURSEMENT_DELAY = 12

In [5]:
# Determine the number of cases we will generate in total
N_CASES = 225

In [6]:
import datetime
# The point before which emails will be sent in the FIRST batch and after
# which emails will be sent in the SECOND batch
EMAIL_SPLIT_POINT = datetime.datetime(2022,1,1)

In [7]:
EMAIL_SPLIT_POINT - datetime.timedelta(days=36)

datetime.datetime(2021, 11, 26, 0, 0)

In [8]:
def generate_arrest_date():
    '''
    This function takes no arguments, and simply returns a single date.
    It will be used to generate the arrest date for the cases. It can
    be made as complex as needed to produce interesting case arrival
    patterns
    '''
    
    # Begin with a list of all possible dates an arrest could happen on
    # (In this case, we generate cases for all of 2021 and one month in 
    # 2022)
    START_DATE = datetime.datetime(2021, 9, 1, 0, 0, 0)
    END_DATE = datetime.datetime(2022, 1, 31, 0, 0, 0)
    all_dates = [START_DATE + datetime.timedelta(days=i) for i in range((END_DATE-START_DATE).days)]
    
    # Begin by assuming every day is equally likely
    date_prob = [1]*len(all_dates)
    
    # Go through each day and make some modifications to make things interesting
    for day_n, day in enumerate(all_dates):
        # Create a sinusoidal pattern throughout the year
        date_prob[day_n] *= np.sin(2*np.pi*day_n/365) + 1.5
        
        # Add extra density around Christmas and July 4th
        if np.abs((day - datetime.datetime(2021, 7, 4)).days) < 4 \
                or np.abs((day - datetime.datetime(2021, 12, 25)).days) < 4:
            date_prob[day_n] *= 1.8
        
        # Add a large density 16-24 days before the EMAIL_SPLIT_POINT to
        # ensure there are plenty of open cases for class
        
        if np.abs((day - (EMAIL_SPLIT_POINT - datetime.timedelta(days=36))).days) < 8:
            date_prob[day_n] *= 4
        
        # Add extra density on weekends
        if day.weekday() > 4:
            date_prob[day_n] *= 3
    
    # Standardize the probabilities to sum to 1
    stand = lambda x : [i/sum(x) for i in x]
    date_prob = stand(date_prob)
    
    # Generate the date
    arrest_date = np.random.choice(all_dates, p=date_prob)
    
    # Generate a time with interesting daily patterns
    hours = np.random.choice(          [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23],
                             p = stand([6,5,5,3,1,1,2,2,2,3,3 ,3 ,4, 4, 4, 4, 4 ,4, 5, 5, 6, 6, 6, 6]))
    minutes = random.randrange(0, 59)
    
    # Generate and return the final date
    return arrest_date + datetime.timedelta(hours=float(hours), minutes=float(minutes))

## Import packages

In [9]:
# Easy random number generation
import random

In [10]:
# Date and time
import datetime
import time

In [11]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import os.path
import pickle
from tqdm import tqdm

In [12]:
# Random name generation
import names

In [13]:
# Graph utilities
import networkx as nx

In [14]:
# Google login and email libraries
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import base64

## Setup
This section sets up basic utility functions

In [15]:
def random_step_date(start_date, mu):
    '''
    This function generates the date of the next step in a sequence. It
    takes two arguments
      - start_date: the date of the previous step
      - mu: the average number of days till the next step; we will assume
            the number of days is exponentially distributed with that mean
    
    It will return a single date
    '''
    
    return start_date + datetime.timedelta(days=np.random.exponential(mu))

In [16]:
def gmail_connect(client_credentials_file='client_credentials.json', token_file='token.json'):
    '''
    This function establishes a connection to the Gmail API for reading
    only. It takes two arguments
      - client_credentials_file: a json file containing client credentials
        from Google
      - token_file: a json file containing a previously obtained tokens. If
        this file exists and is valid, it will be used to establish a
        connection. If not, a new connection will be established
    
    The function returns Google service object that can be used to access
    the Gmail API.    
    '''
    
    # Specify the read-only scope we'll want
    SCOPES = ['https://www.googleapis.com/auth/gmail.modify', 'https://www.googleapis.com/auth/gmail.labels']
    
    # Begin with empty credentials
    creds = None
    
    # Check whether the token file exists
    if os.path.exists(token_file):
        # Attempt to connect using this token file
        creds = Credentials.from_authorized_user_file(token_file, SCOPES)
        
    # If valid credentials were not obtained, get new ones
    if not creds or not creds.valid:
        flow = InstalledAppFlow.from_client_secrets_file(client_credentials_file, SCOPES)
        creds = flow.run_local_server(port=0)
        
        # Save the credentials for future use
        with open(token_file, 'w') as f:
            f.write(creds.to_json())
    
    # Authenticate to GMail with this token, and return the Google service
    service = build('gmail', 'v1', credentials=creds)
    return service

In [17]:
# Create random seeds so the same cases are created every time
np.random.seed(123)
random.seed(123)

## Data
This function loads data that will be used to generate realistic cases

In [18]:
# Load a list of charges, their severity, and the probability any given case
# arose because of a given charge
CHARGES = pd.read_excel('bail_case_data.xlsx', sheet_name='charges')
CHARGES.head()

Unnamed: 0,Charge,Type,Class,p
0,Criminal solicitation in the fourth degree,Misdemeanor,A,0.0122
1,Conspiracy in the fifth degree,Misdemeanor,A,0.0101
2,Assault in the third degree,Misdemeanor,A,0.0101
3,Hazing in the first degree,Misdemeanor,A,0.0101
4,Reckless endangerment in the second degree,Misdemeanor,A,0.0101


Create a set of graphs that represent the progression of a case

In [19]:
# Create a NODES list, listing the different potential steps in a case,
NODES = ['Case opened', 'Evidence received 1', 'Arraignment', 'Bail hearing', 'Bail posted',
             'Grand jury', 'Indictment', 'Evidence received 2', 'Pre trial appearance', 'Jury selection',
             'Trial', 'Evidence received 3', 'Sentencing', 'Plea', 'Case Closed']

In [20]:
# Create a tuple that lists the potential states a case might start in.
# Each element will correspond to one state the case might start in,
# and provide
#   - The probability the case will begin in that state
#   - The average amount of time, in days, between the arrest and that
#     state if the case begins in that state
#
# Most cases begin in the 'Case opened' state, but some begin at the
# bail hearing or bail posted stage if the bail fund picks the case up
# after it has opened
STARTING_STATES = ({'state' : 'Case opened',  'p' : 0.7,  'mu' : 2},
                   {'state' : 'Bail hearing', 'p' : 0.15, 'mu' : 10},
                   {'state' : 'Bail posted',  'p' : 0.15, 'mu' : 10})

In [21]:
# Create three graphs, for three kinds of cases
GRAPH_MISDEMEANOR = nx.DiGraph()
GRAPH_LOW_FELONY  = nx.DiGraph()
GRAPH_HIGH_FELONY = nx.DiGraph()

GRAPH_MISDEMEANOR.add_nodes_from(NODES)
GRAPH_LOW_FELONY.add_nodes_from(NODES)
GRAPH_HIGH_FELONY.add_nodes_from(NODES)

In [22]:
# Define transition probabilities for the three kinds of cases. Each
# entry below represents a transition from one state to another, and comes
# with two values:
#   - p: the transition probability for that specific transition
#   - mu: the average time taken (in days) for the transition. We will
#         assume transition times are exponentially distributed, with these
#         averages

GRAPH_MISDEMEANOR.add_edges_from([
    ('Case opened',          'Evidence received 1',    {'p': 0.3,   'mu' : 1}), 
    ('Case opened',          'Arraignment',            {'p': 0.7,   'mu' : 1}), 
    ('Evidence received 1',  'Evidence received 1',    {'p': 0.3,   'mu' : 2}), 
    ('Evidence received 1',  'Arraignment',            {'p': 0.7,   'mu' : 2}), 
    ('Arraignment',          'Bail hearing',           {'p': 0.15,  'mu' : 2}), 
    ('Arraignment',          'Bail posted',            {'p': 0.4,   'mu' : 5}), 
    ('Arraignment',          'Grand jury',             {'p': 0.0,   'mu' : 10}),
    ('Arraignment',          'Evidence received 2',    {'p': 0.25,  'mu' : 5}), 
    ('Arraignment',          'Plea',                   {'p': 0.2,   'mu' : 0}), 
    ('Bail hearing',         'Bail posted',            {'p': 0.9,   'mu' : 3}), 
    ('Bail hearing',         'Grand jury',             {'p': 0.0,   'mu' : 10}),
    ('Bail hearing',         'Evidence received 2',    {'p': 0.1,   'mu' : 5}), 
    ('Bail posted',          'Grand jury',             {'p': 0.0,   'mu' : 10}),
    ('Bail posted',          'Evidence received 2',    {'p': 1.0,   'mu' : 5}), 
    ('Grand jury',           'Indictment',             {'p': 0.0,   'mu' : 0.5}),
    ('Grand jury',           'Plea',                   {'p': 0.0,   'mu' : 1}), 
    ('Indictment',           'Evidence received 2',    {'p': 0.0,   'mu' : 5}), 
    ('Indictment',           'Plea',                   {'p': 0.0,   'mu' : 1}), 
    ('Evidence received 2',  'Evidence received 2',    {'p': 0.3,   'mu' : 2}), 
    ('Evidence received 2',  'Pre trial appearance',   {'p': 0.5,   'mu' : 2}), 
    ('Evidence received 2',  'Jury selection',         {'p': 0.1,   'mu' : 4}), 
    ('Evidence received 2',  'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Pre trial appearance', 'Pre trial appearance',   {'p': 0.3,   'mu' : 12}),
    ('Pre trial appearance', 'Jury selection',         {'p': 0.6,   'mu' : 4}), 
    ('Pre trial appearance', 'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Jury selection',       'Trial',                  {'p': 1.0,   'mu' : 15}),
    ('Trial',                'Evidence received 3',    {'p': 0.9,   'mu' : 10}),
    ('Trial',                'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Evidence received 3',  'Evidence received 3',    {'p': 0.1,   'mu' : 2}), 
    ('Evidence received 3',  'Sentencing',             {'p': 0.9,   'mu' : 10}),
    ('Sentencing',           'Case Closed',            {'p': 1.0,   'mu' : 0.5}),
    ('Plea',                 'Case Closed',            {'p': 1.0,   'mu' : 0.5})])

GRAPH_LOW_FELONY.add_edges_from([
    ('Case opened',          'Evidence received 1',    {'p': 0.5,   'mu' : 1}), 
    ('Case opened',          'Arraignment',            {'p': 0.5,   'mu' : 1}), 
    ('Evidence received 1',  'Evidence received 1',    {'p': 0.3,   'mu' : 2}), 
    ('Evidence received 1',  'Arraignment',            {'p': 0.7,   'mu' : 2}), 
    ('Arraignment',          'Bail hearing',           {'p': 0.2,   'mu' : 2}), 
    ('Arraignment',          'Bail posted',            {'p': 0.3,   'mu' : 5}), 
    ('Arraignment',          'Grand jury',             {'p': 0.3,   'mu' : 10}),
    ('Arraignment',          'Evidence received 2',    {'p': 0.0,   'mu' : 5}), 
    ('Arraignment',          'Plea',                   {'p': 0.2,   'mu' : 0}), 
    ('Bail hearing',         'Bail posted',            {'p': 0.9,   'mu' : 3}), 
    ('Bail hearing',         'Grand jury',             {'p': 0.1,   'mu' : 10}),
    ('Bail hearing',         'Evidence received 2',    {'p': 0.0,   'mu' : 5}), 
    ('Bail posted',          'Grand jury',             {'p': 1.0,   'mu' : 10}),
    ('Bail posted',          'Evidence received 2',    {'p': 0.0,   'mu' : 5}), 
    ('Grand jury',           'Indictment',             {'p': 0.9,   'mu' : 0.5}),
    ('Grand jury',           'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Indictment',           'Evidence received 2',    {'p': 0.95,  'mu' : 5}), 
    ('Indictment',           'Plea',                   {'p': 0.05,  'mu' : 1}), 
    ('Evidence received 2',  'Evidence received 2',    {'p': 0.3,   'mu' : 2}), 
    ('Evidence received 2',  'Pre trial appearance',   {'p': 0.5,   'mu' : 2}), 
    ('Evidence received 2',  'Jury selection',         {'p': 0.1,   'mu' : 4}), 
    ('Evidence received 2',  'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Pre trial appearance', 'Pre trial appearance',   {'p': 0.3,   'mu' : 12}),
    ('Pre trial appearance', 'Jury selection',         {'p': 0.6,   'mu' : 4}), 
    ('Pre trial appearance', 'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Jury selection',       'Trial',                  {'p': 1.0,   'mu' : 15}),
    ('Trial',                'Evidence received 3',    {'p': 0.9,   'mu' : 10}),
    ('Trial',                'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Evidence received 3',  'Evidence received 3',    {'p': 0.2,   'mu' : 2}), 
    ('Evidence received 3',  'Sentencing',             {'p': 0.8,   'mu' : 10}),
    ('Sentencing',           'Case Closed',            {'p': 1.0,   'mu' : 0.5}), 
    ('Plea',                 'Case Closed',            {'p': 1.0,   'mu' : 0.5})])

GRAPH_HIGH_FELONY.add_edges_from([
    ('Case opened',          'Evidence received 1',    {'p': 0.5,   'mu' : 1}), 
    ('Case opened',          'Arraignment',            {'p': 0.5,   'mu' : 1}), 
    ('Evidence received 1',  'Evidence received 1',    {'p': 0.3,   'mu' : 2}), 
    ('Evidence received 1',  'Arraignment',            {'p': 0.7,   'mu' : 2}), 
    ('Arraignment',          'Bail hearing',           {'p': 0.2,   'mu' : 2}), 
    ('Arraignment',          'Bail posted',            {'p': 0.2,   'mu' : 5}), 
    ('Arraignment',          'Grand jury',             {'p': 0.3,   'mu' : 10}),
    ('Arraignment',          'Evidence received 2',    {'p': 0.0,   'mu' : 5}), 
    ('Arraignment',          'Plea',                   {'p': 0.3,   'mu' : 0}), 
    ('Bail hearing',         'Bail posted',            {'p': 0.9,   'mu' : 3}), 
    ('Bail hearing',         'Grand jury',             {'p': 0.1,   'mu' : 10}),
    ('Bail hearing',         'Evidence received 2',    {'p': 0.0,   'mu' : 5}), 
    ('Bail posted',          'Grand jury',             {'p': 1.0,   'mu' : 10}),
    ('Bail posted',          'Evidence received 2',    {'p': 0.0,   'mu' : 5}), 
    ('Grand jury',           'Indictment',             {'p': 0.9,   'mu' : 0.5}),
    ('Grand jury',           'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Indictment',           'Evidence received 2',    {'p': 0.95,  'mu' : 5}), 
    ('Indictment',           'Plea',                   {'p': 0.05,  'mu' : 1}), 
    ('Evidence received 2',  'Evidence received 2',    {'p': 0.3,   'mu' : 2}), 
    ('Evidence received 2',  'Pre trial appearance',   {'p': 0.5,   'mu' : 2}), 
    ('Evidence received 2',  'Jury selection',         {'p': 0.1,   'mu' : 4}), 
    ('Evidence received 2',  'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Pre trial appearance', 'Pre trial appearance',   {'p': 0.3,   'mu' : 12}),
    ('Pre trial appearance', 'Jury selection',         {'p': 0.6,   'mu' : 4}), 
    ('Pre trial appearance', 'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Jury selection',       'Trial',                  {'p': 1.0,   'mu' : 15}),
    ('Trial',                'Evidence received 3',    {'p': 0.9,   'mu' : 10}),
    ('Trial',                'Plea',                   {'p': 0.1,   'mu' : 1}), 
    ('Evidence received 3',  'Evidence received 3',    {'p': 0.2,   'mu' : 2}), 
    ('Evidence received 3',  'Sentencing',             {'p': 0.8,   'mu' : 10}),
    ('Sentencing',           'Case Closed',            {'p': 1.0,   'mu' : 0.5}),
    ('Plea',                 'Case Closed',            {'p': 1.0,   'mu' : 0.5})])

## Create classes
This section creates three classes (Defendant, Email, and Case) which will form the bulk of the work we'll need to carry out.

In [23]:
class Defendant:
    '''
    This class represents a single defendant. It requires no arguments to initialize,
    and exposes the following instance variables
      - sex
      - name
      - dob (date of birth)
    '''
    
    def __init__(self):
        '''
        Creates a fake defendant
        '''
        
        self.sex  = random.choice([["male", "M"], ["female", "F"]])
        self.name = names.get_full_name(gender=self.sex[0])
        
        # Generate DOB starting 70 years ago and ending 18 years ago
        start_date = datetime.datetime.now() - datetime.timedelta(days=365*70)
        self.dob = start_date + datetime.timedelta(days=random.randrange(0, 365*(70-18)))

In [24]:
class Case:
    '''
    This class represent a single case in the Sprinfield court system. It requires
    a single argument to initialize - a list of case IDs that have already been
    generated, to ensure no duplication.
    
    When initialized it generates every part of the case, including the steps in it
    and any emails involved.
    
    It exposes the following instance variables (these are separated in this list
    based on the functions they are defined in)
      - defendant: the defendant, an object of class Defendant
      - location: the location of the case
      - case_number
      - arrest_date
      
      - charge, charge_type, charge_class: the specific charge (crime) in this case
      - pmat, mumat: the relevant p and mu matrices that govern transitions between
                     states for this case (see data section above)
      
      - path: a list of steps in the case. Each step will be a number, corresponding
              to one of the states in the NODE list
      - path_timeline: a list of datetimes, representing the date and time at which
                       each step in path is entered
                       
      - bail_set: True if bail is set in the case, False if it is denied
      - bail_amount: None if bail is not set, otherwise the bail amount
      - bail_set_step_num: The step at which the bail is set or denied. This will
                           correspond to an index in self.path (so if bail is set/
                           denied at the fifth step, this will be equal to 4). If
                           Bail was set *before* we received our first email for this
                           case, this will be set to -1
      - posted_by_fund: None if bail was denied or never posted, True if bail was
                        paid by the fund, False otherwise
      - reimbursement_time: None if bail was not paid by the fund, or if the
                            reimbursement failed (see next point). Otherwise, the
                            date on which the reimbursement happens (which will be
                            delayed after the end of the case).
      - reimbursement_failed: None if bail was not paid by the fund. Otherwise, True
                              if the fund forgot to get bail reimbursed when it could
                              have, False if reimbursement was claimed when it was due
      
      - all_emails: A list containing one element of class Email for every email 
                    relevant to this case
                
    It exposes one externally useful method
      - get_charge_string: returns a string that summarizes the charge
    '''
    
    def __init__(self, existing_ids):
        '''
        Initialize the case, and generate the defendant, location, case_number, and
        arrest_date instance variables.
        
        This accepts a single argument - the list of case IDs that already exist, to
        ensure we do not duplicate any IDs
        '''
        
        # Generate the defendant
        self.defendant = Defendant()
        
        # Location is always Springfield
        self.location = 'Springfield'
        
        # Create the case number, ensuring there are no duplicates
        self.case_number = None
        while (self.case_number is None) or (self.case_number in existing_ids):
            self.case_number = "CR-" + str(random.randrange(1000000, 9999999))
        
        # Generate the arrest date
        self.arrest_date = generate_arrest_date()
        
        # Generate the charge
        self.generate_charge()
        
        # Generate the case path
        self.generate_case_path()
        
        # Determine bail
        self.generate_bail()
        
        # Generate all emails
        self.generate_emails()
        
    def generate_charge(self):
        '''
        This function generates the charge for this particular case, and creates
        the instance variables charge, charge_type, charge_class, pmat, and mumat
        '''
        
        # Select a single charge from the CHARGES DataFrame, using the provided
        # probabilities
        charge = CHARGES.sample(weights=CHARGES.p)
        
        self.charge       = charge.Charge.iloc[0]
        self.charge_type  = charge.Type.iloc[0]
        self.charge_class = charge.Class.iloc[0]
        
        # Select the right case transition graph based on the gravitiy of the
        # charge
        if self.charge_type == 'Misdemeanor':
            graph = GRAPH_MISDEMEANOR
        elif self.charge_type == "Felony":
            if self.charge_class in ["A", "B", "C"]:
                graph = GRAPH_HIGH_FELONY
            elif self.charge_class in ["D", "E"]:
                graph = GRAPH_LOW_FELONY
        
        # Extract p and mu transition matrices from the graph
        self.pmat  = np.array(nx.attr_matrix(graph, edge_attr="p", rc_order=NODES))
        self.mumat = np.array(nx.attr_matrix(graph, edge_attr="mu", rc_order=NODES))
    
    def get_charge_string(self):
        '''
        This function returns a string that summarizes the charge in this case
        '''
        
        return self.charge + "- Class " + self.charge_class + " " + self.charge_type
        
    def generate_case_path(self):
        '''
        This function creates the path of a case, and creates the instance variables
        path and path_timeline
        '''
        
        # Select a first step in the path
        first_step = np.random.choice(STARTING_STATES, p=[i['p'] for i in STARTING_STATES])
        
        # Create the lists required
        self.path          = [NODES.index(first_step['state'])]
        self.path_timeline = [random_step_date(self.arrest_date, first_step['mu'])]
        
        # Continue adding steps until the final absorbing state ('Case Closed')
        # is reached
        while NODES[self.path[-1]] != 'Case Closed':
            # Identify the current stage, and generate the next stage
            cur_stage = self.path[-1]
            next_stage = np.random.choice(range(len(NODES)), p=self.pmat[cur_stage,])
            
            # Generate the date of the next step; if it's a weekend, move it forward
            next_date = random_step_date(self.path_timeline[-1],
                                         self.mumat[cur_stage, next_stage])
            if next_date.weekday() > 4:
                next_date = next_date + datetime.timedelta(days = 7-next_date.weekday())
            
            # Add the next stage to the path
            self.path.append(next_stage)
            self.path_timeline.append(next_date)
            
    def generate_bail(self):
        '''
        This function determines all the nuts of bolts of bail for this case, and sets
        the instance variables bail_set, bail_amount, bail_set_step_num, posted_by_fund,
        reimbursement_delay, and reimbursement_failed
        '''
        
        # First determine whether the bail is set; if bail is posted, it is necessarily
        # set. If not, assume it is set with probability 0.4
        if NODES.index('Bail posted') in self.path:
            self.bail_set = True
        else:
            self.bail_set = (random.uniform(0, 1) > 0.4)
        
        # If bail is set, determine how much it is
        self.bail_amount = None
        if self.bail_set:
            if self.charge_type == 'Misdemeanor':
                if self.charge_class == 'B':
                    mu, sigma = 300, 100
                elif self.charge_class == 'A':
                    mu, sigma = 500, 100
                    
            elif self.charge_type == 'Felony':
                if self.charge_class == 'E' or self.charge_class == 'D':
                    mu, sigma = 1000, 200
                elif self.charge_class == 'C' or self.charge_class == 'B':
                    mu, sigma = 5000, 400
                elif self.charge_class == 'A':
                    mu, sigma = 8000, 600
                    
            else:
                raise 'Unknown charge type/class'
            
            self.bail_amount = np.maximum(1, np.random.normal(mu, sigma))
            
            # Ensure bail is at an increment of $50, unless it is under $50
            if self.bail_amount >= 50:
                self.bail_amount = self.bail_amount - (self.bail_amount % 50)
                
        # Next determine WHEN bail is set/denied. If there is a bail hearing, this
        # happens then. If not, it happens during the Arraignment. In some cases,
        # we start *after* the arraignment and so this step never occurs
        self.bail_set_step_num = -1
        if NODES.index('Bail hearing') in self.path:
            self.bail_set_step_num = self.path.index(NODES.index('Bail hearing'))
        elif NODES.index('Arraignment') in self.path:
            self.bail_set_step_num = self.path.index(NODES.index('Arraignment'))
        
        # Finally, determine if the bail is posted by the fund. If it is, determine
        # how long the delay is before the fund gets its payment back, or whether
        # the fund fails to get this payment back, resulting in a "hanging payment"
        self.posted_by_fund = None
        self.reimbursement_time = None
        self.reimbursement_failed = None
        
        if NODES.index('Bail posted') in self.path:
            # If the fund has signed up to receive updates on these cases, assume
            # there's a 70% chance it was involved
            self.posted_by_fund = (random.uniform(0, 1) > 0.3)
            
            if self.posted_by_fund:
                # Bail was posted by the fund. Assume there's a 10% chance the bail
                # never claimed back by the fund, due to administrative error
                self.reimbursement_failed = (random.uniform(0, 1) > 0.9)
                
                if not self.reimbursement_failed:
                    # Find the time at which the refund to the bail fund happens
                    self.reimbursement_time = self.path_timeline[-1] + datetime.timedelta(days=np.random.exponential(REIMBURSEMENT_DELAY))
                
    def generate_emails(self):
        '''
        This function generates all the emails pertaining to this case, and sets the
        all_emails instance variable.
        
        In particular, three kinds of emails will be sent
          - One email for every step in the case
          - One email receipt if the bail fund posts bail in the case
          - One email reimbursement receipt if the bail fund gets a refund in the case
        '''
        
        # Create an empty list of emails
        self.all_emails = []
        
        # Create an email for every step in the case's path
        for i in range(len(self.path)):
            self.all_emails.append(Email(email_type='update', email_date=self.path_timeline[i], case=self, step_num=i))
        
        # If bail was posted by the fund, send receipts
        if self.posted_by_fund:
            # Send a payment receipt, at exactly the same time the Bail posted state
            # is entered into
            bail_posted_step = self.path.index(NODES.index('Bail posted'))
            self.all_emails.append(Email(email_type='payment_receipt', email_date=self.path_timeline[bail_posted_step], case=self))
            
            # If this is not a case where the reimbursement is forgotten, send
            # a reimbursement receipt
            if not self.reimbursement_failed:
                # Add the receipt email
                self.all_emails.append(Email('reimbursement_receipt', self.reimbursement_time, self))

In [25]:
class Email:
    '''
    This class represents a single email.
    
    It is initialized with four arguments
      - email_type: the type of the email. This should be one of the following
                       * 'update': an email that is sent when a case enters a
                                   certain state
                       * 'payment_receipt': an email that is sent when a receipt
                                   is sent to the bail fund
                       * 'reimbursement_receipt': an email that is sent when the
                                   bail fund gets a reimbursement
      - email_date: the date this email is sent
      - case: the case in question
      - step_num: the step number an update is being sent for if the email_type is
                  'update'. Else, None.
    
    It exposes the following instance variables
      - email_text: the full text of the email
      - sent: whether the email was sent or not
      - send_error: any error that was encountered while trying to send the email
    
    It exposes the following method
      - send: given a gmail service, use it to send the email and update the sent
              and send_error variables if necessary
    '''
    
    def __init__(self, email_type, email_date, case, step_num = None):
        '''
        Create an email.
        '''
        
        # Save the case, step number, email type, and date
        self.email_type = email_type
        self.email_date = email_date
        self.case       = case
        self.step_num   = step_num
        
        # Note that this email hasn't been sent yet, and that no sending
        # error has been recorded
        self.sent       = False
        self.send_error = ''
        
        # Create the email
        self.email_text = self.generate_start() + self.generate_body() + self.generate_end()
        
    def generate_subject(self):
        '''
        Generate the subject of the email
        '''
        
        if self.email_type == 'update':
            return 'Court Update Alert: Case Number ' + self.case.case_number
        
        elif self.email_type == 'payment_receipt':
            return 'Bail Paid Confirmation: Case Number ' + self.case.case_number
        
        elif self.email_type == 'reimbursement_receipt':
            return 'Bail Reimbursement Confirmation: Case Number ' + self.case.case_number
    
    def generate_start(self):
        '''
        Generate the first line of the email
        '''
        
        return "<p style='font-weight:bold'>Sent Date = " + self.email_date.strftime('%m-%d-%Y %H:%M:%S') + '<br></p>\n'
                
    def generate_body(self):
        '''
        Generate the email body
        '''
        
        if self.email_type == 'update':
            # If this is an update, begin with basic case information
            body_text  = f'<p>Case Number =  {self.case.case_number}<br>'
            body_text += f'Defendant     =  {self.case.defendant.name}<br>'
            body_text += f'DOB           =  ' + self.case.defendant.dob.strftime('%m-%d-%Y') + '<br>'
            body_text += f'Sex           = {self.case.defendant.sex[1]}<br>'
            body_text += f'Charge        = {self.case.get_charge_string()}<br>'
            body_text += f'Arrest Date   = ' + self.case.arrest_date.strftime('%m-%d-%Y %H:%M:%S') + '</p>'
            body_text += "<p style='font-weight:bold'>Update details<br></p>"
            body_text += f'<p>Purpose       = {NODES[self.case.path[self.step_num]]}<br>\n'
            body_text += f'Location      = {self.case.location}<br>\n'
            body_text +=  'Modality      = ' + np.random.choice(['Virtual', 'In person']) + '<br></p>\n'
            
            # If bail was set or denied in this step, make a note
            if self.case.bail_set_step_num == self.step_num:
                if self.case.bail_set:
                    body_text += '<p>Bail Set<br>'
                    body_text += 'Bail          = $' + '{0:.2f}'.format(self.case.bail_amount) + '</p>\n'
                else:
                    body_text += '<p>Bail Denied</p>\n'
            
            # Add a "case open/closed" line
            body_text +=  '<p>Case Closed   =      ' + ('YES' if NODES[self.case.path[self.step_num]] == 'Case Closed' else 'NO')
            body_text +=  '</p>\n'
            
        elif self.email_type == 'payment_receipt':
            body_text  = '<p>Bail Paid Confirmation<br>\n'
            body_text += f'Case Number = {self.case.case_number}<br>\n'
            body_text += 'Amount =$' + '{0:.2f}'.format(self.case.bail_amount) + '</p>\n'
                        
        elif self.email_type == 'reimbursement_receipt':
            body_text  = '<p>Bail Reimbursement Confirmation<br>\n'
            body_text += f'Case Number = {self.case.case_number}<br>\n'
            body_text += 'Amount =$' + '{0:.2f}'.format(self.case.bail_amount) + '</p>\n'
    
        return body_text
    
    def generate_end(self):
        '''
        Generate the email signature
        '''
        
        return "<p>--<br>CBS Python Bail Fund Case</p><img src='" + SIGNATURE_IMAGE_LINK + "' width='100'>"
    
    def send(self, gmail_service):
        '''
        Send the email, catching any errors.
        
        This function takes a single argument, the gmail service
        
        It returns True if the email was successfully sent, and False otherwise
        '''
        
        # Convert the message into MIME format
        message = MIMEMultipart()
        message['to'] = EMAIL_TO
        message['from'] = EMAIL_FROM
        message['subject'] = self.generate_subject()
        
        text = MIMEText(self.email_text, 'html')
        
        message.attach(text)
        message.set_payload(text)
        
        final_message = {'labelIds': ['INBOX'], 'raw': base64.urlsafe_b64encode(message.as_string().encode()).decode()}
        
        # Send the message, updating the sent flag or noting the error if
        # there is one
        try:
            gmail_service.users().messages().send(userId = 'me', body=final_message).execute()
            self.sent = True
            
            return True
        except Exception as e:
            self.send_error = str(e)
            
            return False

## Create cases and emails

In [26]:
cases = []
emails = []
# Create the cases
for i in tqdm(range(N_CASES)):
    # Create the case, ensuring no duplicate case ID
    cases.append(Case([i.case_number for i in cases]))
    
    # Save all emails from the case
    emails.extend(cases[-1].all_emails)

100%|████████████████████████████████████████████████████████████████████████████████| 225/225 [00:03<00:00, 72.58it/s]


In [27]:
# Sort the emails by their date
emails = sorted(emails, key = lambda x : x.email_date)

In [28]:
# Split the emails into two batches - before the split point and after
first_batch = [i for i in emails if i.email_date < EMAIL_SPLIT_POINT]
second_batch = [i for i in emails if i.email_date >= EMAIL_SPLIT_POINT and i.email_date < EMAIL_SPLIT_POINT + datetime.timedelta(days=30)]

## Summaries

In [29]:
print(f'{len(cases)} were generated, with an average of {len(emails)/len(cases)} emails per case')

225 were generated, with an average of 9.893333333333333 emails per case


In [30]:
print(f'{len(first_batch)} emails will be sent in the first batch, and {len(second_batch)} in the second, from a total of {len(emails)}')

1385 emails will be sent in the first batch, and 452 in the second, from a total of 2226


In [31]:
bail_set             = [i.bail_set for i in cases]
bail_posted          = [NODES.index("Bail posted") in i.path for i in cases]
bail_posted_by_fund  = [i.posted_by_fund==True for i in cases]
not_reimbursed       = [i.path_timeline[-1] < EMAIL_SPLIT_POINT and i.posted_by_fund==True and ((i.reimbursement_time is None) or (i.reimbursement_time > EMAIL_SPLIT_POINT)) for i in cases]
reimbursement_failed = [i.path_timeline[-1] < EMAIL_SPLIT_POINT and i.reimbursement_failed==True for i in cases]

In [32]:
# Find the proportion of cases with bail set, posted, and posted by the fund
print(f'{round(sum(bail_set)*100/len(cases),2)}% of cases had bail set')
print(f'{round(sum(bail_posted)*100/sum(bail_set),2)}% of cases with bail set had it posted')
print(f'{round(sum(bail_posted_by_fund)*100/sum(bail_posted), 2)}% of cases with bail posted were posted by fund')
print(f'{round(sum(not_reimbursed)*100/sum(bail_posted_by_fund),2)}% ({sum(not_reimbursed)}) of cases with bail posted by fund not yet reimbursed.')
print(f'Of which {sum(reimbursement_failed)} have a failed reimbursement.')
print('')
print(f'Average case length is {(cases[0].path_timeline[-1] - cases[0].path_timeline[0]).days} days')

83.11% of cases had bail set
73.26% of cases with bail set had it posted
66.42% of cases with bail posted were posted by fund
13.19% (12) of cases with bail posted by fund not yet reimbursed.
Of which 1 have a failed reimbursement.

Average case length is 28 days


## Send emails

In [33]:
# Connect to the gmail service
service = gmail_connect()

Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=363908162380-opllgbe43jkr8mf1rfsa7iklr8j66mt4.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A62193%2F&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.modify+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.labels&state=2581CC0X5IoB5O3A2Bs0EdkQexgrHR&access_type=offline


In [37]:
# Send every email in the first batch.
for i in tqdm(first_batch):
    # Only send an email if it hasn't been sent yet - that way, we can re-run
    # this loop if it fails the first time
    if i.sent == False:
        if i.send(service):
            # Send success! Pause to try not to hit rate limits
            time.sleep(60)
        else:
            # Send error - if it's a rate limit error, stop the loop, else continue
            if 'User-rate limit exceeded.' in i.send_error:
                print('Rate limit hit; stopping')
                print(i.send_error)
                break

# Dump the email lists into a pickle; if this line is reached after a
# user rate limit has been excluded, it will keep a record of all the
# emails that have already been sent. If the notebook is somehow killed
# before it can be re-run, the pickle file can just be re-loaded and the
# code above re-run to send the remaining emails
pickle.dump([first_batch, second_batch], open(LEFTOVER_EMAIL_FILE, 'wb'))

100%|████████████████████████████████████████████████████████████████████████████| 1385/1385 [4:37:16<00:00, 12.01s/it]


## Post-class code
This section loads the email pickle file, and sends the second batch of emails - it should be run *after* class and before the homework.

In [42]:
'''
(first_batch, second_batch) = pickle.load(open(LEFTOVER_EMAIL_FILE, 'rb'))

# Send every email in the second batch.
for i in tqdm(second_batch):
    # Only send an email if it hasn't been sent yet - that way, we can re-run
    # this loop if it fails the first time
    if i.sent == False:
        if i.send(service):
            # Send success! Pause to try not to hit rate limits
            time.sleep(60)
        else:
            # Send error - if it's a rate limit error, stop the loop, else continue
            if 'User-rate limit exceeded.' in i.send_error:
                print('Rate limit hit; stopping')
                print(i.send_error)
                break

# Dump the email lists into a pickle; if this line is reached after a
# user rate limit has been excluded, it will keep a record of all the
# emails that have already been sent. If the notebook is somehow killed
# before it can be re-run, the pickle file can just be re-loaded and the
# code above re-run to send the remaining emails
pickle.dump([first_batch, second_batch], open(LEFTOVER_EMAIL_FILE, 'wb'))
'''

100%|██████████████████████████████████████████████████████████████████████████████| 452/452 [7:33:11<00:00, 60.16s/it]
