# Welcome

Python programming basics: variables, types, lists, dictionaries, functions, dates, strings, dir, help, simulated transactional data, computing Earned Premium

Welcome aboard! In our first session, we will do the following:

| What | How long |
|:------------------------------------------------------------------------|--------------:|
| Preliminary stuff: up and running with Colab | 5 min |
| Boring stuff: operations, data types, and package import | 15 min |
| Semi-fun stuff: functions and loops | 15 min |
| Fun stuff: iterable data | 15 min |
| Some other stuff: stats functions, numpy and a tiny bit of matplotlib | 10 min |
| Practical assignment: earned premium | 15 min |

## Preliminary stuff: up and running with Colab

There are several different ways to work with Python. You can run it from the command line, you can use an IDE like Spyder, PyCharm, or even RStudio (!), or you can use a notebook. A notebook requires the least amount of investment up front for instruction and it will cover most of your initial use cases.

A notebook is divided into "cells". A cell may contain code or text. Text follows the [markdown](https://daringfireball.net/projects/markdown/) format.

The computer perspective:

**REPL**

1. Read
2. Execute
3. Process
4. Loop

The user perspective:

**TWRL**

1. Type
2. Wait
3. Read
4. Loop

Contrast **REPL** / **TWRL** with spreadsheet, SQL, or compiled programming languages (C/C++, Fortran, etc). It's interactive, imperative (not descriptive) and (more or less) immediate.

Go ahead and type a few basic commands into a cell, type CTRL+ENTER and see what happens.

In [None]:
5 + 6

# Boring stuff: operators, data types and package import

## Objects

An object is an area of storage in memory, similar to a cell in a spreadsheet. You may also see the word "variable" (similar to a random variable in statistics) to describe this concept.

Objects may be created by assigning the result of an operation to the name of the object.

In [None]:
x = 5

Typing the name of a variable will cause the value of that variable to be printed.

In [None]:
x

In a notebook, only the most recent output is printed.

In [None]:
y = 100
y
x

### Multiple assignment

Python has support for multiple assignment. Check this out:

In [None]:
x = 2
y = 5
x, y

x, y = y, x

x, y

## Data types

Python data types are about what you would expect from a programming language. Scalar/primitive types are as follows:

* boolean
* integer
* float
* string

There are also `complex` and `bytes`. We will not go into those. Just for fun, though, here's a complex number:

In [None]:
(-4) ** (1 / 2)

You can query the object type using the `type()` function. 

In [None]:
type(2)
type(2.0)
type(5 // 2)  # We'll see why soon

## Casting

You can cast from one to another using functions like:

`bool(some_value)`  -> boolean
`int(some_value)`   -> integer
`float(some_value)` -> float
`str(some_value)`   -> string

Upcasting runs as: boolean -> integer -> float -> complex -> string

In [None]:
some_value = True
some_value
int(some_value)
float(some_value)
str(some_value)

Downcasting works when a conversion may be made unambiguously. Note that you may lose precision!

In [None]:
bool('True')
int(5.6)
float('5.0')
int('5')

x = True
y = False

x and y
x or y

type(x)

name = 'Dave'

name

type(name)

x + name

5 * 'Dave'

### More on strings

Strings are delimited by single or double quotes.

In [None]:
'This is a string'
"And so is this"

You can create a multi-line string by using triple quotes

In [None]:
"""This string will take more than one line
Here's the second line.
"""

To include a single quote character, you can use double quotes, or escape the single quote with a `\` character.

In [None]:
print("This'll do for an apostrophe")
print('This\'ll also do')

Speaking of `\`, if you need one, you'll need to type it twice.

In [None]:
print("The modulus operator is given by \\")

`\n` and `\t` are special characters which will give a new line and tab, respectively.

In [None]:
"I'm done with this line.\nTime for another"
"Wait for it: \tHere is is"

## Operators

Most operators are binary. Negation (in a numeric, not a logical sense) is about the only unary operation we'll encounter. Technically, `+` is also a unary operator, but it has no effect, so if fairly trivial. It will not convert a negative number to a positive, it simply returns the operand.

In [None]:
-5
+5
+(-5)

### Numeric

Numeric operators are the ones we encounter is basic math. One potential gotcha for R/Excel users is that the caret operator is a bitwise XOR, _not_ exponentiation. 

In [None]:
2 + 5
2 * 5
2 / 5
2 // 5  # Drop the remainder
2 % 5   # Return the modulus
2 ** 5  # Exponentiation

The `+` and `*` operators can also work on character data. We'll see more of this in a moment.

### Logical comparison

In [None]:
2 == 5  # Note the double equal sign!
2 != 5
2 < 5
2 > 5

### Logical operators

Logical operators use the very expressive and brief English words `and` and `or`.

In [None]:
True or False # logical OR
True and False # logical AND

Don't confuse them with the bitwise operators `&` and `|`. They're equivalent for logical objects, but not so for integers. They don't work at all for floats.

In [None]:
True | False
True or False

1 or 4
1 | 4

1.0 or 4.0
1.0 | 4.0

True & False
True and False

1 and 4
1 & 4

### Operate and assign

You can use the basic operators with an equal sign to alter an object in place.

In [None]:
x = 5
x
x += 4
x

y = "Dave"
y += " Thomas"
y

### String operations

The `+` and `*` operator can also apply to strings. This has to do with the properties of a list object, which we'll get to later. Logical comparisons also work.

In [None]:
'Dave' + 'Thomas'
'Dave' * 3
'Dave' * 'Thomas'
'Dave' < 'Thomas'
'Dave' > 'Thomas'

In [None]:
float(x)

str(x) + name

### f-strings

An `f-string` is a way to have objects formatted and placed into a string, as below.

In [None]:
f'It is {x} that my name is {name}.'

f'Pi is equal to {3.1415927:.3f}'

In [None]:
name.upper()

Note that often methods on objects do not change the objects themselves. Some functions - particularly in Pandas - will modify an object in place.

In [None]:
name
name = name.upper()
name

## Modules

Functionality beyond what is found in the base install is available by importing other modules. Similar to an Add-in in Excel, or a package in R.

Three ways to do this.

1. `from [package] import [item_1], [item_2]`
2. `import [package] as [abbreviation]`
3. `import [package]`

The first option will bring those items into the global namespace.

```
from datetime import date

christmas = date(2020, 12, 25)
```

The second option means that you will need to prefix the name of a function with the abbreviation

```
import datetime as dt

christmas = dt.date(2020, 12, 25)
```

No abbreviation means that you will to prefix the function name with the full name of the package.

```
import datetime

christmas = datetime.date(2020, 12, 25)
```

Python installs with quite a few additional packages. If there is one that you need, you can install via `pip` or `conda` at the command line.

`pip install scikit-learn`
or
`conda install scikit-learn`

### Math

You can find some additional mathematical functions and constants in the `math` package: https://docs.python.org/3/library/math.html.

In [None]:
from math import sqrt, pi, e

## Dates

Dates require the `datetime` package. 

In [None]:
from datetime import date, timedelta

christmas = date(2020, 12, 25)
christmas

spring_solstice = date(2020, 4, 31)
spring_solstice

halloween = date(2020, 10, 31)

christmas - halloween

Date arithmetic works like you should expect.

In [None]:
date(2020, 1, 31) - date(2020, 1, 1)

A date represents one instant $\epsilon$ after midnight.

In [None]:
date(2010, 12, 31) - date(2010, 1, 1)

(christmas - halloween) / (date(2020, 12, 31) - date(2020, 1, 1))

You can use the `replace` function to return an object whose elements are changed.

In [None]:
christmas = christmas.replace(year = 1)

## Miscellany

Comments are noted by a single hash mark.

In [None]:
# This is a comment

### Classes/OOP

We won't cover this. However, these are very good things and you should know about them.

### Coding practices

Code with the same attention to detail that you apply for getting dressed for a job interview. Intent matters. Python has a de facto standard commonly referred to as PEP-8: https://www.python.org/dev/peps/pep-0008/. 

Quick notes:

* Give your variables sensible, expressive names. `loss_ratio` > `LR`
* Document _why_, not _what
* Functions should do one and only one thing. Don't read in a file, construct a plot and save results in one monolithic function
* Code is _read_ (by a human!) more often than it is written. Make your code readable.

## Quick exercise:

1. What is the sqaure root of pi?
2. Express your birthday as a date.
3. Use an f-string to wish yourself a happy birthday.
4. Is "A" more or less than "a"? Why?
5. Why does `type(5 // 2)` return `int`?

# Semi-fun stuff: functions, loops and packages

Spaces, not braces! The beginning of a code block is marked with a `:` at the end of the line. Subsequent lines are indented four spaces. An empty line indicates closure of the block. Note that we don't need to use `end if`, `next`, `loop`, or anything else to indicate that the block is closed.

Blocks are used for:

* Functions
* Conditional blocks, e.g. if/else
* Loops

## Functions

Defined by the keyword `def`, followed by an argument list within parentheses and then the aforementioned `:`. The value returned from the function is indicated by `return`.

In [None]:
def add_two(num_one, num_two):
  return num_one + num_two


Note that we did not specify data types in the function.

In [None]:
add_two(3, 4)
add_two('a', 'b')
add_two(True, False)
add_two(True, 1)

Functions may return more than one value. This is an amazing and useful thing!

In [None]:
from math import log

def lognormal_params(mean, sd):
  cv = sd / mean
  sigma = sqrt(log(cv ** 2 + 1))
  mu = log(mean) - (sigma ** 2 /2)
  return mu, sigma

mu, sigma = lognormal_params(10 ** 3, 1.5)
mu, sigma

Objects that are local to the function are not preserved

In [None]:
def add_five(x):
  the_result = x + 5
  return the_result

add_five(5)
the_result

Object names local to the function take precedence. In the block below, the function `add_five()` will _not_ use the object `the_result` which is defined in the global namespace.

In [None]:
the_result = 100
add_five(5)

However, a function _will_ use a global variable if it exists.

In [None]:
def add_const(x):
  return x + y

y = 4
add_const(10)

### Named and default arguments

You can change the order of arguments to a function by specifying them by name. Unlike R, once you've specified an argument by name, you can't rely on position for the other arguments.

In [None]:
def limited_loss(loss, limit):
  """
  This function assumes that losses are from-ground-up
  """
  loss = min(loss, limit)
  return loss

limited_loss(50000, 10000)
limited_loss(limit = 10000, loss = 50000)

You can also provide sensible defaults to function arguments.

In [None]:
def limited_loss_2(loss, limit = 100000, deductible = 5000):
  """
  This function assumes that losses are from-ground-up
  """
  loss = min(loss, limit)
  loss = max(0, loss - deductible)
  return loss

limited_loss_2(50000)
limited_loss_2(50000, deductible=250)

### lambda functions

Python supports the creation of "lambda" functions. These are functions which can be written in one line. They're used for disposable functions largely for convenience.

In [None]:
cap_500 = lambda x: min(x, 500)
cap_500(250)
cap_500(1000)

### Quick exercise

1. Create a function that will return the amount of time between a date and the today's date. You will want to import the `today()` function from `datetime.

## Conditional execution

In [None]:
x = 100
if x > 50:
  print("x is larger than 50")
else:
  print(x)


Note that control execute at the first `True` branch.

In [None]:
x = 100
if x > 50:
  print("x is larger than 50")
elif x > 75:
  print("x is larger than 75")
else:
  print(x)


## Loops

Python will always iterate over items in a collection. This will be a `list`, or a `tuple`, which we'll talk about shortly. For now, the `range()` function is a nice one for generating a sequence.

In [None]:
for i in range(5):
    print(i)

for i in range(5, 10):
  print(i)

for i in range(2, 21, 2):
  print(i)


Count the number of days in January

In [None]:
from datetime import timedelta

jan_days = 0
for day in range(365):
  test_day = date.today() + timedelta(days = day)
  if test_day.month == 1:
    jan_days += 1

jan_days

## Quick exercises

1. Create a function that will return the amount of time between a date and the today's date.
2. Create a function that will wish someone a happy birthday, if today happens to be their birthday. If it is not, have it return a message indicating how many days remain until their birthday.

# The fun stuff

Python embraces the concept of iteration over a collection of objects. Contrast this with R or Matlab's embrace of vectorized data and operations. Similar, but with some important differences.

## Lists

Lists are one of the most common non-trivial data structures. Lists in Python are very similar to lists in R. They represent an ordered set of heterogeneous data.

You can form a list by manually placing items inside square brackets. Items are separated by commas. Trailing commas are not a problem.

In [None]:
my_list = [0, 1, 2, 3, 4, ]
my_list

Lists may store heterogeneous data types:

In [None]:
my_het_list = [0, 1, 2, 'a', 'b', 'c']
my_het_list

A list may contain another list:

In [None]:
my_recursive_list = ['x', 'y', 'z', my_list]
my_recursive_list

Elements in a list may be modified by assigning values. 

In [None]:
my_list[0] = 5
my_list
my_list[0] = 0
my_list

You may append new elements using the `append()` function. Note that this will modify the object in place.

In [None]:
my_list.append(5)
my_list

You can sort a list using `sort()` and reverse it using `reverse()`. These will modify the list in place.

In [None]:
my_list.sort()
my_list
my_list.reverse()
my_list
my_list.sort()

sorted(my_list)

### Slicing

You can extract individual elements of a list by referring to their ordinal position. Note that indexing is zero-based, in contrast to R, which is one-based. That is, the first element is at position zero. Despite what anyone may tell you, there is no advantage to zero- or one-based indexing.

In [None]:
my_list[0]
my_list[1]
my_list[4]

Passing in a negative number will extract elements starting from the end.

In [None]:
my_list[-1]
my_list[-3]

Multiple elements may be extracted using the `:` slice operator.

[start]:[end]:[step]

Note that [end] is not inclusive.

Omission of one of the elements means:

* If step is blank, it will be one
* If the start is beginning, it will begin at the first element
* If the end is missing, it will end with the last element

Negative:

* Negative start means that it will begin from the end of the string
* Negative end
* Negative step means that results will be reversed

In [None]:
my_list[0:4]
my_list[0:4:2]
my_list[:3]
my_list[::-1]
my_list[-3:]

You can use slicing for assignment of multiple items

In [None]:
my_list[:2] = [11, 12]
my_list

The magic of multiple assignment means we can extract to more than one variable at a time

In [None]:
x, y = my_list[:2]
x, y

Fun fact: a string is actually a list! 

In [None]:
my_name = 'Brian'
my_name[0]
my_name[-1]

Revisit the `+` and `*` operators on strings which we saw earlier. These are general to lists.

In [None]:
my_list * 2
my_list + [1,2]

Quick exercise:

1. Pull out every third letter from your name
2. Extract the penultimate letter of your name

## Tuples

A tuple is very similar to a list. The most significant difference is that the data is _immutable_. This means that items inside the tuple may not be changed. They are created similar to lists, but use parentheses rather than square brackets.

In [None]:
my_tuple = (3, 5)
my_tuple[0]
my_tuple[0] = 4

Tuples exist, but you will not often create them. They are often returned from functions, so you will see them.

## Sets

A set is a collection of unique items. Like a tuple the elements are not mutable.

In [None]:
set('Brian Fannin')
set([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])

## Dictionaries

A dictionary is a set of key-value pairs. The values may be lists. This is similar to JSON/XML. It may also remind you of a structure in C, or user-defined type in VBA.

A dictionary may be created by placing key-value pairs in parentheses, or calling the `dict()` function.

In [None]:
my_dict = {
  'effective_date': date(2020, 7, 1),
  'expiration_date': date(2021, 6, 30),
  'premium': 10e3,
}

my_dict

my_other_dict = dict(
  effective_date = date(2020, 7, 1),
  expiration_date = date(2021, 6, 30),
  premium = 10e3,
)
my_other_dict

my_dict == my_other_dict

Elements may be accessed by name, but not position.

In [None]:
my_dict['effective_date']
my_dict[0]

Reassignment doesn't care about data types

In [None]:
my_dict['effective_date'] = 7
my_dict['effective_date']

New keys may be added simply through assignment

In [None]:
my_dict['mojo'] = 'cazart'
my_dict

You may investigate what keys are included by calling the `keys()` method.

In [None]:
my_dict.keys()

`values()` will give you the, um, values.

In [None]:
my_dict.values()

You can see which keys exist in a dictionary by calling `in`.

In [None]:
'effective_date' in my_dict
'line_of_business' in my_dict

### Why dictionaries?

A tangible benefit of dictionaries is that they are fast. The time to access an item in a dictionary is independent of the size of the dictionary. This comes from some cost to add items, particularly if the order of items is important and you want to add something to the middle of the dictionary.

The less tangible benefit is that the elements are expressive.

## Comprehensions

Lists and dictionaries are _iterable_. We can walk through the elements in sequence and perform operations to create new lists and dictionaries. This is closely associated with list comprehensions and dictionary comprehensions.

### List comprehension

A list comprehension (sometimes abbreviated "listcomp") will generate a list. This is more compact way of achieving something for which you may have used a `for` loop.

[expression for list_element in a_list]
[expression for list_element in a_list if _condition_]

In [None]:
[add_two(x, 1)  for x in range(2)]

Get all the days in February 2020:

In [None]:
february_days = [date(2020, 2, x) for x in range(1, 30)]

We can use multiple assignment to create the same effect as a nested loop

In [None]:
[add_two(x, y)  for x in range(2) 
                for y in range(2)]

my_last_name = 'Fannin'

[add_two(x, y)  for x in my_name 
                for y in my_last_name]

A list comprehension will also let us create a logical subset of a list. We can check membership in a list by using the keyword `in`.

In [None]:
[x for x in my_name if x in 'aeiou']

Quick question

1. How could you use list comprehension to create a list of quarter start dates for the years 2001 through 2005?

### Dictionary comprehension

Similar to a list comprehension. Note that we can peform an operation on the value which will form the key.

In [None]:
new_dict = {str(x): x**2 for x in [1, 2, 3, 4]}
new_dict
new_dict.keys()
new_dict.values()
new_dict['1']

We can do this with multiple elements, to create a dictionary of lists

In [None]:
{
    'written_premium': [100, 200]
    , 'effective_date': [date(a_year, 1, 1) for a_year in range(2001, 2003)]
}

## Iteration

Python supports many iterable expressions. These can be thought of as functions which generate the next value in a sequence. 

### Generator expression

Generator expressions look similar to a list comprehension, but are enclosed within parentheses rather than square brackets. 

We'll create a simple function which determines if an integer is odd.

In [None]:
def is_odd(x):
  return x % 2 != 0

is_odd(8)

We can use list comprehension to form a Boolean list of the oddness of numbers.

In [None]:
mojo = [is_odd(x) for x in range(10)]
mojo

A generator function will not run execute until you ask it to.

In [None]:
gonzo = (is_odd(x) for x in range(10))
gonzo

So, let's ask it.

In [None]:
cazart = list(gonzo)
cazart

Note that the two lists are identical

In [None]:
cazart == mojo

The difference:

* A list comprehension is _greedy_. It will produce all results at once.
* A generator expression is _lazy_. It will generate the next result in a sequence as needed

We can use `next()` to ask for the next item in the sequence:

In [None]:
gonzo = (is_odd(x) for x in range(10))
next(gonzo)
next(gonzo)

### `zip()`

`zip()` will pair up matching items in two or more lists.

In [None]:
my_zip = zip(
  ['a', 'b', 'c'],
  [1, 2, 3]
)
my_zip

In Python 3, `zip()` will return an iterator. We can extract the results using `list()`.

In [None]:
zipped_stuff = list(my_zip)
zipped_stuff

In [None]:
for item in my_zip:
  print(item)


In [None]:
ids = [str(i) for i in range(100)]
written_premium = [float(i) / 2 for i in range(100)]

policies = dict(zip(ids, written_premium))
policies['3']

### `enumerate()`

`enumerate()` will produce each item in a list, along with the index of the item.

In [None]:
my_list = list('Brian')
for i, letter in enumerate(my_list):
  f'Item {i} is {letter}'


We can use `enumerate()` with the generator we created earlier. 

In [None]:
gonzo = (is_odd(x) for x in range(10))
for i, oddness in enumerate(gonzo):
  f'The number {i} is odd: {oddness}'


### `yield`

You can construct a generator function by using the `yield` keyword. This will look a lot like a function. However, instead of using `return`, you will use `yield`. This will return the next value in a sequence. Below we create a generator which will count integers beginning at zero.

In [None]:
def counter(limit):
  the_counter = 0
  while the_counter < limit:
    yield the_counter
    the_counter += 1

count_four = counter(4)

next(count_four)
next(count_four)
next(count_four)
next(count_four)
next(count_four)

We can use list comprehension to create a list using a generator.

In [None]:
new_counter = counter(10)
my_list = [next(new_counter) for _ in range(10)]
my_list

# Other stuff

## scipy.stats

https://docs.scipy.org/doc/scipy/reference/stats.html 

scipy stats has some useful stats functions, including random number generators, cumulative and density functions, etc.

In [None]:
from scipy.stats import poisson, describe

sims = 10 ** 3
count_mean = 5
arr_pois = poisson.rvs(count_mean, size = sims)

sum(arr_pois)
describe(arr_pois)

Notice that I didn't create an object called `lambda`. `lambda` is a reserved word in Python.

## numpy

https://numpy.org/doc/stable/index.html

In our next session, we will begin to use `pandas`. This will remain our preferred means of handling data sets. `pandas` uses `numpy` under the hood. Benefits of using `numpy`:

* Vectorized operations
* Smaller memory footprint
* Faster execution

In [None]:
import numpy as np

an_array = np.arange(5)
an_array

an_array * 2

### numpy stats

numpy replicates quite a few of the stats functions from scipy. However, they're optimized for array-like objects.

In [None]:
arr_pois_2 = np.random.poisson(count_mean, sims)
describe(arr_pois_2)

## matplotlib

We'll get into visualization later, but can't let the first session pass without a quick mention. The code below will let you get a basic scatter plot (which handles 80% of your use cases) and a histogram (which handles another 10%).

Within a Jupyter notebook, you won't need to call `plt.show()`.

To use matplotlib within Jupyter, we'll need to call the "magic" command `%matplotlib inline`.

In [None]:
%matplotlib inline  

In [None]:
import matplotlib.pyplot as plt

In [None]:
fig, axis = plt.subplots()

axis.plot(an_array, an_array * 2)
axis.set_title("A very silly straight line")

In [None]:
plt.show()

We can also have more than one plot

In [None]:
fig, (axis1, axis2) = plt.subplots(1, 2)

axis1.hist(arr_pois, bins = 20, ec = 'w')
axis1.set_title("20 bins")
axis2.hist(arr_pois, bins = 10, ec = 'w')
axis2.set_title("10 bins")

In [None]:
plt.show()

`matplotlib` is utilitarian, like R's base graphics. It's OK for getting started, but we'll show some better options.

# A practical assignment

We would like to randomly generate a set of policies. Nothing fancy. Each policy will contain an effective and expiration date and written premium. Next, we'd like to calculate the earned premium by calendar year. 

## Random set of policyholders

We'll start by importing some functions that we'll need.

In [None]:
from datetime import date, timedelta
from random import seed, randrange

Next, we'll create a lambda function to increment the effective date by a year. This is largely to make the code more readable.

In [None]:
increment_year = lambda date_in :date(date_in.year + 1, date_in.month, date_in.day)

Finally, we'll create a generator which gives us a randomized policy. There's not much special about the written premium. It's simply a number between 0 and one thousand.

Note 1) the use of our lambda function, 2) the use of a random number seed and 3) the `yield` keyword which makes this a generator.

In [None]:
def random_policy(new_seed = 1234):
  """
  This is a generator which produces a dictionary of basic policy information
  """
  seed(new_seed)
  while True:
    policy = {
      'written_premium': randrange(10**3),
      'effective_date': date(2020, 1, 1) + timedelta(randrange(365))
    }
    policy['expiration_date'] = increment_year(policy['effective_date'])
    yield policy

We can now generate a list of dictionaries by calling `next()` as many times as we like. We don't need the result of our call to `range()`. We indicate that we're ignoring it by assigning it to the `_` object.

In [None]:
policy_generator = random_policy()

policies = [next(policy_generator) for _ in range(10)]

Take a look at our policies:

In [None]:
policies[:2]

And confirm that we have as many as we're expecting:

In [None]:
len(policies)

## Earned premium

Compose a function to give the earned premium based on two dates. For example, a policy written on July 1, 2020 is roughly 50% earned between July 1, 2020 and December 31, 2020.

In [None]:
def earned_premium(written_premium, policy_start, policy_end, earn_start, earn_end):
    """
    Calculate earned premium. Presumes an even (linear) earning pattern
    """
    policy_term = policy_end - policy_start
    earn_start = max(earn_start, policy_start)
    earn_end = min(earn_end, policy_end)
    frac_earned = (earn_end - earn_start) / (policy_end - policy_start)
    
    return written_premium * frac_earned

We can confirm that the function works with some sample data.

In [None]:
premium = 10**3
earned_premium(premium, date(2020, 7, 1), date(2021, 7, 1), date(2020, 1, 1), date(2021, 1, 1))
earned_premium(premium, date(2020, 7, 1), date(2021, 7, 1), date(2021, 1, 1), date(2022, 1, 1))

This function will return the calendar year intervals which bracket a list of dates. Note the use of `set()`.

In [None]:
def get_cy_intervals(dates_list):
    """
    This function will return a list of calendar years
    """
    the_years = set([a_date.year for a_date in dates_list])
    cy_year = [[date(a_year, 1, 1), date(a_year + 1, 1, 1)] for a_year in the_years]
    return cy_year

We can see it in action here:

In [None]:
some_dates = [
    date(2020, 7, 1)
    , date(2020, 12, 31)
    , date(2021, 6, 30)
]

get_cy_intervals(some_dates)

Those two functions will let us compute the earned premium in all of the calendar years in a policy coverage period.

In [None]:
def earned_premium_cy(written_premium, policy_start, policy_end):
    """
    Calculate the amount of premium earned by calendar year for a single policy period
    """
    cy_return = {}
    for cy_start, cy_end in get_cy_intervals([policy_start, policy_end]):
        cy_return[cy_start.year] = earned_premium(
          written_premium, policy_start, policy_end, cy_start, cy_end
        )
    return cy_return


Does it work?

In [None]:
earned_premium_cy(10e3, date(2020, 7, 1), date(2021, 6, 30))

It does!

## All together now

In [None]:
# Initialize empty dictionary to store total earned premium by calendar year
cy_earned = {}
for policy in policies:
  
  cy_ep = earned_premium_cy(policy['written_premium'], policy['effective_date'], policy['expiration_date'])
  
  for year in cy_ep:
    if year in cy_earned:
      cy_earned[year] += cy_ep[year]
    else:
      cy_earned[year] = cy_ep[year]

cy_earned

Here's a sneak preview of what's happening next week:

In [None]:
import pandas as pd

pd.DataFrame(policies)

For the student

1. Simulate 5,000 simulations from a Poisson with an expected value of 5
2. Do the same for a negative binomial. Use values of 0.2, 0.5 and 1.0 for the coefficient of variation