# Welcome back

Welcome back! In our third session, we will do the following:

| What | How long |
|:------------------------------------------------------------------------|--------------:|
| Data frames refresh and fetch triangle data | 10 min |
| Indexing | 10 min.
| Querying a data frame | 15 min |
| Packing/unpacking | 15 min |
| Grouped operations | 20 min | 
| I/O | 5 min |
| Recap on iterators | 15 min |

# Data frames refresh

In [27]:
import pandas as pd
import  numpy as np

In [2]:
url = 'https://www.casact.org/research/reserve_data/wkcomp_pos.csv'
df_triangle = pd.read_csv(url)

df_triangle.head()

Unnamed: 0,GRCODE,GRNAME,AccidentYear,DevelopmentYear,DevelopmentLag,IncurLoss_D,CumPaidLoss_D,BulkLoss_D,EarnedPremDIR_D,EarnedPremCeded_D,EarnedPremNet_D,Single,PostedReserve97_D
0,86,Allstate Ins Co Grp,1988,1988,1,367404,70571,127737,400699,5957,394742,0,281872
1,86,Allstate Ins Co Grp,1988,1989,2,362988,155905,60173,400699,5957,394742,0,281872
2,86,Allstate Ins Co Grp,1988,1990,3,347288,220744,27763,400699,5957,394742,0,281872
3,86,Allstate Ins Co Grp,1988,1991,4,330648,251595,15280,400699,5957,394742,0,281872
4,86,Allstate Ins Co Grp,1988,1992,5,354690,274156,27689,400699,5957,394742,0,281872


In [3]:
df_triangle['GRNAME'].head(), df_triangle['GRNAME'].tail()

(0    Allstate Ins Co Grp
 1    Allstate Ins Co Grp
 2    Allstate Ins Co Grp
 3    Allstate Ins Co Grp
 4    Allstate Ins Co Grp
 Name: GRNAME, dtype: object,
 13195    Tower Ins Co Of NY
 13196    Tower Ins Co Of NY
 13197    Tower Ins Co Of NY
 13198    Tower Ins Co Of NY
 13199    Tower Ins Co Of NY
 Name: GRNAME, dtype: object)

In [4]:
type(df_triangle['GRNAME'])

pandas.core.series.Series

In [5]:
df_triangle.columns

Index(['GRCODE', 'GRNAME', 'AccidentYear', 'DevelopmentYear', 'DevelopmentLag',
       'IncurLoss_D', 'CumPaidLoss_D', 'BulkLoss_D', 'EarnedPremDIR_D',
       'EarnedPremCeded_D', 'EarnedPremNet_D', 'Single', 'PostedReserve97_D'],
      dtype='object')

In [6]:
df_triangle.shape

(13200, 13)

By the by, we get back a tuple from `shape`.

In [7]:
df_triangle.dtypes

GRCODE                int64
GRNAME               object
AccidentYear          int64
DevelopmentYear       int64
DevelopmentLag        int64
IncurLoss_D           int64
CumPaidLoss_D         int64
BulkLoss_D            int64
EarnedPremDIR_D       int64
EarnedPremCeded_D     int64
EarnedPremNet_D       int64
Single                int64
PostedReserve97_D     int64
dtype: object

Note that the financial amounts are integers. This is great if we're using a Poisson GLM> 

In [8]:
df_triangle.describe().round(2)

Unnamed: 0,GRCODE,AccidentYear,DevelopmentYear,DevelopmentLag,IncurLoss_D,CumPaidLoss_D,BulkLoss_D,EarnedPremDIR_D,EarnedPremCeded_D,EarnedPremNet_D,Single,PostedReserve97_D
count,13200.0,13200.0,13200.0,13200.0,13200.0,13200.0,13200.0,13200.0,13200.0,13200.0,13200.0,13200.0
mean,17153.05,1992.5,1997.0,5.5,11532.05,8215.74,1570.13,18438.47,1812.34,16626.13,0.73,39714.4
std,12512.21,2.87,4.06,2.87,35595.56,25714.08,7259.02,51830.7,6666.66,48941.72,0.45,130130.68
min,86.0,1988.0,1988.0,1.0,-59.0,-338.0,-4621.0,-6518.0,-3522.0,-9731.0,0.0,0.0
25%,8526.0,1990.0,1994.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,411.0
50%,14110.0,1992.5,1997.0,5.5,544.0,351.5,5.0,1419.0,144.5,827.0,1.0,2732.0
75%,26983.25,1995.0,2000.0,8.0,6526.5,4565.0,259.25,11354.25,1141.0,9180.5,1.0,19265.75
max,44300.0,1997.0,2006.0,10.0,367404.0,325322.0,145296.0,421223.0,78730.0,418755.0,1.0,1090093.0


## Renaming columns

In [9]:
new_names = {
  'CumPaidLoss_D': 'cumulative_paid',
  'IncurLoss_D': 'cumulative_incurred',
}
df_triangle = df_triangle.rename(columns = new_names)
df_triangle.columns

Index(['GRCODE', 'GRNAME', 'AccidentYear', 'DevelopmentYear', 'DevelopmentLag',
       'cumulative_incurred', 'cumulative_paid', 'BulkLoss_D',
       'EarnedPremDIR_D', 'EarnedPremCeded_D', 'EarnedPremNet_D', 'Single',
       'PostedReserve97_D'],
      dtype='object')

## Synthesizing new data

In [10]:
df_triangle['paid_to_incurred'] = df_triangle['cumulative_paid'] / df_triangle['cumulative_incurred']
df_triangle[['cumulative_paid', 'cumulative_incurred', 'paid_to_incurred']].head()

Unnamed: 0,cumulative_paid,cumulative_incurred,paid_to_incurred
0,70571,367404,0.19208
1,155905,362988,0.429505
2,220744,347288,0.635622
3,251595,330648,0.760915
4,274156,354690,0.772945


In [63]:
import numpy as np

def div0( a, b ):
    """ ignore / 0, div0( [-1, 0, 1], 0 ) -> [0, 0, 0] """
    with np.errstate(divide='ignore', invalid='ignore'):
        c = np.true_divide( a, b )
        c[ ~ np.isfinite( c )] = 0  # -inf inf NaN
    return c

df_triangle['paid_to_incurred'] = div0( df_triangle['cumulative_paid'], df_triangle['cumulative_incurred'] )

In [67]:
import division_by_zero as dv
df_triangle['paid_to_incurred'] = dv.div0( df_triangle['cumulative_paid'], df_triangle['cumulative_incurred'] )

ModuleNotFoundError: No module named 'division_by_zero'

In [71]:
# Remove dividing by zero
df_triangle['paid_to_incurred'] =  df_triangle['cumulative_paid'] / df_triangle['cumulative_incurred']
df_triangle['paid_to_incurred'] = df_triangle['paid_to_incurred'] .apply(lambda x: 0 if np.isinf(x) else x)

In [66]:
import os 
os.getcwd()

'C:\\Users\\e015614\\Documents\\GitHub\\CAS-Python-Intro\\CAS Python Course'

In [42]:
df_triangle['paid_to_incurred'].describe()

count    13200.000000
mean         0.515107
std          0.427412
min         -3.333333
25%          0.000000
50%          0.638353
75%          0.931815
max          3.000000
Name: paid_to_incurred, dtype: float64

In [72]:
df_triangle.head()

Unnamed: 0_level_0,GRCODE,AccidentYear,DevelopmentYear,DevelopmentLag,cumulative_incurred,cumulative_paid,BulkLoss_D,EarnedPremDIR_D,EarnedPremCeded_D,EarnedPremNet_D,Single,PostedReserve97_D,paid_to_incurred
GRNAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Allstate Ins Co Grp,86,1988,1988,1,367404,70571,127737,400699,5957,394742,0,281872,0.19208
Allstate Ins Co Grp,86,1988,1989,2,362988,155905,60173,400699,5957,394742,0,281872,0.429505
Allstate Ins Co Grp,86,1988,1990,3,347288,220744,27763,400699,5957,394742,0,281872,0.635622
Allstate Ins Co Grp,86,1988,1991,4,330648,251595,15280,400699,5957,394742,0,281872,0.760915
Allstate Ins Co Grp,86,1988,1992,5,354690,274156,27689,400699,5957,394742,0,281872,0.772945


In [13]:
df_triangle['paid_to_incurred'] = df_triangle.cumulative_paid / df_triangle.cumulative_incurred
df_triangle['paid_to_incurred'].describe()

count    9251.000000
mean            -inf
std              NaN
min             -inf
25%         0.575371
50%         0.868966
75%         0.965742
max         3.000000
Name: paid_to_incurred, dtype: float64

# Indexing

Pandas pays a lot more attention to indices than data frames in R. 

In [73]:
df_triangle.index

Index(['Allstate Ins Co Grp', 'Allstate Ins Co Grp', 'Allstate Ins Co Grp',
       'Allstate Ins Co Grp', 'Allstate Ins Co Grp', 'Allstate Ins Co Grp',
       'Allstate Ins Co Grp', 'Allstate Ins Co Grp', 'Allstate Ins Co Grp',
       'Allstate Ins Co Grp',
       ...
       'Tower Ins Co Of NY', 'Tower Ins Co Of NY', 'Tower Ins Co Of NY',
       'Tower Ins Co Of NY', 'Tower Ins Co Of NY', 'Tower Ins Co Of NY',
       'Tower Ins Co Of NY', 'Tower Ins Co Of NY', 'Tower Ins Co Of NY',
       'Tower Ins Co Of NY'],
      dtype='object', name='GRNAME', length=13200)

In [15]:
df_triangle = df_triangle.set_index('GRNAME')
df_triangle.index

Index(['Allstate Ins Co Grp', 'Allstate Ins Co Grp', 'Allstate Ins Co Grp',
       'Allstate Ins Co Grp', 'Allstate Ins Co Grp', 'Allstate Ins Co Grp',
       'Allstate Ins Co Grp', 'Allstate Ins Co Grp', 'Allstate Ins Co Grp',
       'Allstate Ins Co Grp',
       ...
       'Tower Ins Co Of NY', 'Tower Ins Co Of NY', 'Tower Ins Co Of NY',
       'Tower Ins Co Of NY', 'Tower Ins Co Of NY', 'Tower Ins Co Of NY',
       'Tower Ins Co Of NY', 'Tower Ins Co Of NY', 'Tower Ins Co Of NY',
       'Tower Ins Co Of NY'],
      dtype='object', name='GRNAME', length=13200)

An index may contain multiple values

In [16]:
# df_triangle.reset_index(inplace=True)
# df_triangle.index

In [74]:
df_triangle = df_triangle.set_index(['GRNAME', 'AccidentYear', 'DevelopmentYear'])

KeyError: "None of ['GRNAME'] are in the columns"

In [75]:
df_triangle.columns

Index(['GRCODE', 'AccidentYear', 'DevelopmentYear', 'DevelopmentLag',
       'cumulative_incurred', 'cumulative_paid', 'BulkLoss_D',
       'EarnedPremDIR_D', 'EarnedPremCeded_D', 'EarnedPremNet_D', 'Single',
       'PostedReserve97_D', 'paid_to_incurred'],
      dtype='object')

In [76]:
df_triangle.head()

Unnamed: 0_level_0,GRCODE,AccidentYear,DevelopmentYear,DevelopmentLag,cumulative_incurred,cumulative_paid,BulkLoss_D,EarnedPremDIR_D,EarnedPremCeded_D,EarnedPremNet_D,Single,PostedReserve97_D,paid_to_incurred
GRNAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Allstate Ins Co Grp,86,1988,1988,1,367404,70571,127737,400699,5957,394742,0,281872,0.19208
Allstate Ins Co Grp,86,1988,1989,2,362988,155905,60173,400699,5957,394742,0,281872,0.429505
Allstate Ins Co Grp,86,1988,1990,3,347288,220744,27763,400699,5957,394742,0,281872,0.635622
Allstate Ins Co Grp,86,1988,1991,4,330648,251595,15280,400699,5957,394742,0,281872,0.760915
Allstate Ins Co Grp,86,1988,1992,5,354690,274156,27689,400699,5957,394742,0,281872,0.772945


In [77]:
df_triangle = df_triangle.set_index(['AccidentYear', 'DevelopmentYear'], append = True)
df_triangle.index

MultiIndex([('Allstate Ins Co Grp', 1988, 1988),
            ('Allstate Ins Co Grp', 1988, 1989),
            ('Allstate Ins Co Grp', 1988, 1990),
            ('Allstate Ins Co Grp', 1988, 1991),
            ('Allstate Ins Co Grp', 1988, 1992),
            ('Allstate Ins Co Grp', 1988, 1993),
            ('Allstate Ins Co Grp', 1988, 1994),
            ('Allstate Ins Co Grp', 1988, 1995),
            ('Allstate Ins Co Grp', 1988, 1996),
            ('Allstate Ins Co Grp', 1988, 1997),
            ...
            ( 'Tower Ins Co Of NY', 1997, 1997),
            ( 'Tower Ins Co Of NY', 1997, 1998),
            ( 'Tower Ins Co Of NY', 1997, 1999),
            ( 'Tower Ins Co Of NY', 1997, 2000),
            ( 'Tower Ins Co Of NY', 1997, 2001),
            ( 'Tower Ins Co Of NY', 1997, 2002),
            ( 'Tower Ins Co Of NY', 1997, 2003),
            ( 'Tower Ins Co Of NY', 1997, 2004),
            ( 'Tower Ins Co Of NY', 1997, 2005),
            ( 'Tower Ins Co Of NY', 1997, 2006)],
   

Index columns disappear from the data frame unless you explicitly tell pandas not to.

In [78]:
df_triangle.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,GRCODE,DevelopmentLag,cumulative_incurred,cumulative_paid,BulkLoss_D,EarnedPremDIR_D,EarnedPremCeded_D,EarnedPremNet_D,Single,PostedReserve97_D,paid_to_incurred
GRNAME,AccidentYear,DevelopmentYear,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Allstate Ins Co Grp,1988,1988,86,1,367404,70571,127737,400699,5957,394742,0,281872,0.19208
Allstate Ins Co Grp,1988,1989,86,2,362988,155905,60173,400699,5957,394742,0,281872,0.429505
Allstate Ins Co Grp,1988,1990,86,3,347288,220744,27763,400699,5957,394742,0,281872,0.635622
Allstate Ins Co Grp,1988,1991,86,4,330648,251595,15280,400699,5957,394742,0,281872,0.760915
Allstate Ins Co Grp,1988,1992,86,5,354690,274156,27689,400699,5957,394742,0,281872,0.772945


In [79]:
df_triangle['lag'] = df_triangle['DevelopmentYear'] - df_allstate['AccidentYear'] + 1
#Development is an index not a column.


KeyError: 'DevelopmentYear'

A useful strategy is to carry out all of the non-indexed operations before creating the index.

In [81]:
df_triangle = df_triangle.reset_index()
df_triangle.index

RangeIndex(start=0, stop=13200, step=1)

In [83]:
df_triangle['lag'] = df_triangle['DevelopmentYear'] - df_triangle['AccidentYear'] + 1
df_triangle.lag
df_triangle['lagmo'] = (df_triangle['DevelopmentYear'] - df_triangle['AccidentYear'] + 1)*12

That's the only one we need for now, so we can go ahead and set an index.

In [85]:
df_triangle = df_triangle.set_index(['GRNAME', 'AccidentYear', 'lag'])

In [86]:
df_triangle.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,DevelopmentYear,GRCODE,DevelopmentLag,cumulative_incurred,cumulative_paid,BulkLoss_D,EarnedPremDIR_D,EarnedPremCeded_D,EarnedPremNet_D,Single,PostedReserve97_D,paid_to_incurred,lagmo
GRNAME,AccidentYear,lag,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Allstate Ins Co Grp,1988,1,1988,86,1,367404,70571,127737,400699,5957,394742,0,281872,0.19208,12
Allstate Ins Co Grp,1988,2,1989,86,2,362988,155905,60173,400699,5957,394742,0,281872,0.429505,24
Allstate Ins Co Grp,1988,3,1990,86,3,347288,220744,27763,400699,5957,394742,0,281872,0.635622,36
Allstate Ins Co Grp,1988,4,1991,86,4,330648,251595,15280,400699,5957,394742,0,281872,0.760915,48
Allstate Ins Co Grp,1988,5,1992,86,5,354690,274156,27689,400699,5957,394742,0,281872,0.772945,60


# Querying a data frame

## Columnar subsets

In [87]:
df_triangle[['cumulative_paid', 'cumulative_incurred', 'paid_to_incurred']]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,cumulative_paid,cumulative_incurred,paid_to_incurred
GRNAME,AccidentYear,lag,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Allstate Ins Co Grp,1988,1,70571,367404,0.192080
Allstate Ins Co Grp,1988,2,155905,362988,0.429505
Allstate Ins Co Grp,1988,3,220744,347288,0.635622
Allstate Ins Co Grp,1988,4,251595,330648,0.760915
Allstate Ins Co Grp,1988,5,274156,354690,0.772945
...,...,...,...,...,...
Tower Ins Co Of NY,1997,6,287,334,0.859281
Tower Ins Co Of NY,1997,7,293,318,0.921384
Tower Ins Co Of NY,1997,8,300,323,0.928793
Tower Ins Co Of NY,1997,9,297,310,0.958065


Pass in a list with the names of columns to return.

In [89]:
my_cols = ['cumulative_paid', 'cumulative_incurred']
df_triangle[my_cols]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,cumulative_paid,cumulative_incurred
GRNAME,AccidentYear,lag,Unnamed: 3_level_1,Unnamed: 4_level_1
Allstate Ins Co Grp,1988,1,70571,367404
Allstate Ins Co Grp,1988,2,155905,362988
Allstate Ins Co Grp,1988,3,220744,347288
Allstate Ins Co Grp,1988,4,251595,330648
Allstate Ins Co Grp,1988,5,274156,354690
...,...,...,...,...
Tower Ins Co Of NY,1997,6,287,334
Tower Ins Co Of NY,1997,7,293,318
Tower Ins Co Of NY,1997,8,300,323
Tower Ins Co Of NY,1997,9,297,310


Create a list using list comprehension

In [90]:
#List object
cumul_cols = [col for col in df_triangle.columns if 'cumul' in col]
cumul_cols

['cumulative_incurred', 'cumulative_paid']

In [91]:
df_triangle[cumul_cols]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,cumulative_incurred,cumulative_paid
GRNAME,AccidentYear,lag,Unnamed: 3_level_1,Unnamed: 4_level_1
Allstate Ins Co Grp,1988,1,367404,70571
Allstate Ins Co Grp,1988,2,362988,155905
Allstate Ins Co Grp,1988,3,347288,220744
Allstate Ins Co Grp,1988,4,330648,251595
Allstate Ins Co Grp,1988,5,354690,274156
...,...,...,...,...
Tower Ins Co Of NY,1997,6,334,287
Tower Ins Co Of NY,1997,7,318,293
Tower Ins Co Of NY,1997,8,323,300
Tower Ins Co Of NY,1997,9,310,297


Use `iloc()`

In [None]:
print(df_triangle.iloc[:, 0])
print(df_triangle.iloc[:, :1])

## Row-wise subsets

In [None]:
df_triangle[df_triangle['paid_to_incurred'] > 2]

In [None]:
df_triangle[df_triangle['paid_to_incurred'] > 2]['cumulative_paid']

In [None]:
df_triangle[df_triangle['paid_to_incurred'] > 2][['cumulative_paid']]

In [None]:
df_triangle[df_triangle['paid_to_incurred'] > 2].cumulative_paid

In [None]:
df_triangle[df_triangle['paid_to_incurred'] > 2]['cumulative_paid', 'cumulative_incurred']

In [None]:
df_triangle[df_triangle['paid_to_incurred'] > 2][['cumulative_paid', 'cumulative_incurred']]

In [None]:
df_triangle.loc[df_triangle['paid_to_incurred'] > 2, ['cumulative_paid', 'cumulative_incurred']]

In [None]:
df_triangle.loc(df_triangle['AccidentYear'] <= 1989) #bc it's index

We could use `filter()` here, but I'm not wild about that.

In [None]:
df_triangle.query('AccidentYear <= 1989 & lag == 1')['cumulative_paid']

In [None]:
df_triangle.query('AccidentYear <= 1989 & lag == 1')[['cumulative_paid']] #Joseph added

In [112]:
df_upper = df_triangle.query('DevelopmentYear <= 1997') 
df_upper.shape

(7260, 13)

In [113]:
df_triangle.shape



(13200, 13)

In [114]:
df_upper.shape[0] / 55 #55 elements in upper triangle

132.0

You may be tempted by the `filter()` method. This will filter based on index values.

In [None]:
# df_triangle[1996].cumulative_paid

In [None]:
# df_triangle.loc(1996,'cumulative_paid') #Joseph added

# Reshaping

## Pivoting

In [101]:
make_float = lambda x: "${:,.2f}".format(x)

In [115]:
df_allstate = df_upper.query('GRNAME == "Allstate Ins Co Grp"')
df_allstate

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,DevelopmentYear,GRCODE,DevelopmentLag,cumulative_incurred,cumulative_paid,BulkLoss_D,EarnedPremDIR_D,EarnedPremCeded_D,EarnedPremNet_D,Single,PostedReserve97_D,paid_to_incurred,lagmo
GRNAME,AccidentYear,lag,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Allstate Ins Co Grp,1988,1,1988,86,1,367404,70571,127737,400699,5957,394742,0,281872,0.19208,12
Allstate Ins Co Grp,1988,2,1989,86,2,362988,155905,60173,400699,5957,394742,0,281872,0.429505,24
Allstate Ins Co Grp,1988,3,1990,86,3,347288,220744,27763,400699,5957,394742,0,281872,0.635622,36
Allstate Ins Co Grp,1988,4,1991,86,4,330648,251595,15280,400699,5957,394742,0,281872,0.760915,48
Allstate Ins Co Grp,1988,5,1992,86,5,354690,274156,27689,400699,5957,394742,0,281872,0.772945,60
Allstate Ins Co Grp,1988,6,1993,86,6,350092,287676,20641,400699,5957,394742,0,281872,0.821715,72
Allstate Ins Co Grp,1988,7,1994,86,7,346808,298499,14513,400699,5957,394742,0,281872,0.860704,84
Allstate Ins Co Grp,1988,8,1995,86,8,349124,304873,15862,400699,5957,394742,0,281872,0.873251,96
Allstate Ins Co Grp,1988,9,1996,86,9,348157,321808,8974,400699,5957,394742,0,281872,0.924319,108
Allstate Ins Co Grp,1988,10,1997,86,10,347762,325322,8843,400699,5957,394742,0,281872,0.935473,120


`pivot_table(values, index, columns)`

In [116]:
#df_allstate['cumulative_paid']
df_allstate.pivot_table(values='cumulative_paid',index = 'AccidentYear', columns = 'lag') #.apply(make_float)

lag,1,2,3,4,5,6,7,8,9,10
AccidentYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1988,70571.0,155905.0,220744.0,251595.0,274156.0,287676.0,298499.0,304873.0,321808.0,325322.0
1989,66547.0,136447.0,179142.0,211343.0,231430.0,244750.0,254557.0,270059.0,273873.0,
1990,52233.0,133370.0,178444.0,204442.0,222193.0,232940.0,253337.0,256788.0,,
1991,59315.0,128051.0,169793.0,196685.0,213165.0,234676.0,239195.0,,,
1992,39991.0,89873.0,114117.0,133003.0,154362.0,159496.0,,,,
1993,19744.0,47229.0,61909.0,85099.0,87215.0,,,,,
1994,20379.0,46773.0,88636.0,91077.0,,,,,,
1995,18756.0,84712.0,87311.0,,,,,,,
1996,42609.0,44916.0,,,,,,,,
1997,691.0,,,,,,,,,


In [117]:
df_wide_paid = df_allstate.pivot_table('cumulative_paid', 'AccidentYear', 'lag')
df_wide_paid

lag,1,2,3,4,5,6,7,8,9,10
AccidentYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1988,70571.0,155905.0,220744.0,251595.0,274156.0,287676.0,298499.0,304873.0,321808.0,325322.0
1989,66547.0,136447.0,179142.0,211343.0,231430.0,244750.0,254557.0,270059.0,273873.0,
1990,52233.0,133370.0,178444.0,204442.0,222193.0,232940.0,253337.0,256788.0,,
1991,59315.0,128051.0,169793.0,196685.0,213165.0,234676.0,239195.0,,,
1992,39991.0,89873.0,114117.0,133003.0,154362.0,159496.0,,,,
1993,19744.0,47229.0,61909.0,85099.0,87215.0,,,,,
1994,20379.0,46773.0,88636.0,91077.0,,,,,,
1995,18756.0,84712.0,87311.0,,,,,,,
1996,42609.0,44916.0,,,,,,,,
1997,691.0,,,,,,,,,


In [118]:
df_wide_paid.shape

(10, 10)

In [119]:
df_wide_paid.columns

Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype='int64', name='lag')

In [120]:
df_wide_paid.index

Int64Index([1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997], dtype='int64', name='AccidentYear')

Let's also construct an incurred triangle

In [121]:
df_wide_incurred = df_allstate.pivot_table('cumulative_incurred', 'AccidentYear', 'lag')
df_wide_incurred

lag,1,2,3,4,5,6,7,8,9,10
AccidentYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1988,367404.0,362988.0,347288.0,330648.0,354690.0,350092.0,346808.0,349124.0,348157.0,347762.0
1989,336928.0,316483.0,278496.0,303033.0,299496.0,295061.0,299251.0,297492.0,300620.0,
1990,289198.0,311381.0,277980.0,277732.0,276563.0,278067.0,276704.0,281101.0,,
1991,297174.0,277209.0,269739.0,272666.0,271318.0,267578.0,269592.0,,,
1992,181796.0,205079.0,199106.0,187833.0,185663.0,184940.0,,,,
1993,114807.0,114774.0,101460.0,98430.0,96930.0,,,,,
1994,107934.0,107569.0,97730.0,96185.0,,,,,,
1995,100686.0,94456.0,92314.0,,,,,,,
1996,53381.0,51205.0,,,,,,,,
1997,6725.0,,,,,,,,,


## Unstack

Unstack will behave similarly to `pivot_table()`, however it relies on values in the multiindex.

In [126]:
df_allstate.drop(columns='lagmo',inplace = True)
df_allstate.unstack() #use the right most index

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0_level_0,Unnamed: 1_level_0,DevelopmentYear,DevelopmentYear,DevelopmentYear,DevelopmentYear,DevelopmentYear,DevelopmentYear,DevelopmentYear,DevelopmentYear,DevelopmentYear,DevelopmentYear,...,paid_to_incurred,paid_to_incurred,paid_to_incurred,paid_to_incurred,paid_to_incurred,paid_to_incurred,paid_to_incurred,paid_to_incurred,paid_to_incurred,paid_to_incurred
Unnamed: 0_level_1,lag,1,2,3,4,5,6,7,8,9,10,...,1,2,3,4,5,6,7,8,9,10
GRNAME,AccidentYear,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
Allstate Ins Co Grp,1988,1988.0,1989.0,1990.0,1991.0,1992.0,1993.0,1994.0,1995.0,1996.0,1997.0,...,0.19208,0.429505,0.635622,0.760915,0.772945,0.821715,0.860704,0.873251,0.924319,0.935473
Allstate Ins Co Grp,1989,1989.0,1990.0,1991.0,1992.0,1993.0,1994.0,1995.0,1996.0,1997.0,,...,0.197511,0.431135,0.643248,0.697426,0.772732,0.829489,0.850647,0.907786,0.911027,
Allstate Ins Co Grp,1990,1990.0,1991.0,1992.0,1993.0,1994.0,1995.0,1996.0,1997.0,,,...,0.180613,0.428318,0.641931,0.736113,0.803408,0.837712,0.915552,0.913508,,
Allstate Ins Co Grp,1991,1991.0,1992.0,1993.0,1994.0,1995.0,1996.0,1997.0,,,,...,0.199597,0.461929,0.629471,0.72134,0.785665,0.877038,0.887248,,,
Allstate Ins Co Grp,1992,1992.0,1993.0,1994.0,1995.0,1996.0,1997.0,,,,,...,0.219977,0.438236,0.573147,0.708092,0.83141,0.86242,,,,
Allstate Ins Co Grp,1993,1993.0,1994.0,1995.0,1996.0,1997.0,,,,,,...,0.171976,0.411496,0.610181,0.864564,0.899773,,,,,
Allstate Ins Co Grp,1994,1994.0,1995.0,1996.0,1997.0,,,,,,,...,0.18881,0.434819,0.906948,0.946894,,,,,,
Allstate Ins Co Grp,1995,1995.0,1996.0,1997.0,,,,,,,,...,0.186282,0.896841,0.945805,,,,,,,
Allstate Ins Co Grp,1996,1996.0,1997.0,,,,,,,,,...,0.798205,0.87718,,,,,,,,
Allstate Ins Co Grp,1997,1997.0,,,,,,,,,,...,0.102751,,,,,,,,,


In [127]:
df_allstate[['cumulative_paid', 'cumulative_incurred']].unstack()

Unnamed: 0_level_0,Unnamed: 1_level_0,cumulative_paid,cumulative_paid,cumulative_paid,cumulative_paid,cumulative_paid,cumulative_paid,cumulative_paid,cumulative_paid,cumulative_paid,cumulative_paid,cumulative_incurred,cumulative_incurred,cumulative_incurred,cumulative_incurred,cumulative_incurred,cumulative_incurred,cumulative_incurred,cumulative_incurred,cumulative_incurred,cumulative_incurred
Unnamed: 0_level_1,lag,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10
GRNAME,AccidentYear,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Allstate Ins Co Grp,1988,70571.0,155905.0,220744.0,251595.0,274156.0,287676.0,298499.0,304873.0,321808.0,325322.0,367404.0,362988.0,347288.0,330648.0,354690.0,350092.0,346808.0,349124.0,348157.0,347762.0
Allstate Ins Co Grp,1989,66547.0,136447.0,179142.0,211343.0,231430.0,244750.0,254557.0,270059.0,273873.0,,336928.0,316483.0,278496.0,303033.0,299496.0,295061.0,299251.0,297492.0,300620.0,
Allstate Ins Co Grp,1990,52233.0,133370.0,178444.0,204442.0,222193.0,232940.0,253337.0,256788.0,,,289198.0,311381.0,277980.0,277732.0,276563.0,278067.0,276704.0,281101.0,,
Allstate Ins Co Grp,1991,59315.0,128051.0,169793.0,196685.0,213165.0,234676.0,239195.0,,,,297174.0,277209.0,269739.0,272666.0,271318.0,267578.0,269592.0,,,
Allstate Ins Co Grp,1992,39991.0,89873.0,114117.0,133003.0,154362.0,159496.0,,,,,181796.0,205079.0,199106.0,187833.0,185663.0,184940.0,,,,
Allstate Ins Co Grp,1993,19744.0,47229.0,61909.0,85099.0,87215.0,,,,,,114807.0,114774.0,101460.0,98430.0,96930.0,,,,,
Allstate Ins Co Grp,1994,20379.0,46773.0,88636.0,91077.0,,,,,,,107934.0,107569.0,97730.0,96185.0,,,,,,
Allstate Ins Co Grp,1995,18756.0,84712.0,87311.0,,,,,,,,100686.0,94456.0,92314.0,,,,,,,
Allstate Ins Co Grp,1996,42609.0,44916.0,,,,,,,,,53381.0,51205.0,,,,,,,,
Allstate Ins Co Grp,1997,691.0,,,,,,,,,,6725.0,,,,,,,,,


## Stack

In [128]:
df_wide_paid.stack()

AccidentYear  lag
1988          1       70571.0
              2      155905.0
              3      220744.0
              4      251595.0
              5      274156.0
              6      287676.0
              7      298499.0
              8      304873.0
              9      321808.0
              10     325322.0
1989          1       66547.0
              2      136447.0
              3      179142.0
              4      211343.0
              5      231430.0
              6      244750.0
              7      254557.0
              8      270059.0
              9      273873.0
1990          1       52233.0
              2      133370.0
              3      178444.0
              4      204442.0
              5      222193.0
              6      232940.0
              7      253337.0
              8      256788.0
1991          1       59315.0
              2      128051.0
              3      169793.0
              4      196685.0
              5      213165.0
              6      2

Notice that we dropped the NA values. We can keep them if we like.

In [129]:
df_wide_paid.stack(dropna = False)

AccidentYear  lag
1988          1       70571.0
              2      155905.0
              3      220744.0
              4      251595.0
              5      274156.0
                       ...   
1997          6           NaN
              7           NaN
              8           NaN
              9           NaN
              10          NaN
Length: 100, dtype: float64

In [130]:
df_long_paid = df_wide_paid.stack()
df_long_paid

AccidentYear  lag
1988          1       70571.0
              2      155905.0
              3      220744.0
              4      251595.0
              5      274156.0
              6      287676.0
              7      298499.0
              8      304873.0
              9      321808.0
              10     325322.0
1989          1       66547.0
              2      136447.0
              3      179142.0
              4      211343.0
              5      231430.0
              6      244750.0
              7      254557.0
              8      270059.0
              9      273873.0
1990          1       52233.0
              2      133370.0
              3      178444.0
              4      204442.0
              5      222193.0
              6      232940.0
              7      253337.0
              8      256788.0
1991          1       59315.0
              2      128051.0
              3      169793.0
              4      196685.0
              5      213165.0
              6      2

In [131]:
df_long_paid = df_wide_paid.stack().to_frame()
df_long_paid.columns = ['cumulative_paid']

In [132]:
df_long_incurred = df_wide_incurred.stack().to_frame()
df_long_incurred.columns = ['cumulative_incurred']

In [133]:
df_long_paid

Unnamed: 0_level_0,Unnamed: 1_level_0,cumulative_paid
AccidentYear,lag,Unnamed: 2_level_1
1988,1,70571.0
1988,2,155905.0
1988,3,220744.0
1988,4,251595.0
1988,5,274156.0
1988,6,287676.0
1988,7,298499.0
1988,8,304873.0
1988,9,321808.0
1988,10,325322.0


In [134]:
df_long_incurred

Unnamed: 0_level_0,Unnamed: 1_level_0,cumulative_incurred
AccidentYear,lag,Unnamed: 2_level_1
1988,1,367404.0
1988,2,362988.0
1988,3,347288.0
1988,4,330648.0
1988,5,354690.0
1988,6,350092.0
1988,7,346808.0
1988,8,349124.0
1988,9,348157.0
1988,10,347762.0


## Merge two data frames

In [None]:
df_new = pd.merge(df_long_paid, df_long_incurred) #Need to specify if merge by index

In [135]:
df_new = pd.merge(df_long_paid, df_long_incurred, left_index = True, right_index = True)
df_new

Unnamed: 0_level_0,Unnamed: 1_level_0,cumulative_paid,cumulative_incurred
AccidentYear,lag,Unnamed: 2_level_1,Unnamed: 3_level_1
1988,1,70571.0,367404.0
1988,2,155905.0,362988.0
1988,3,220744.0,347288.0
1988,4,251595.0,330648.0
1988,5,274156.0,354690.0
1988,6,287676.0,350092.0
1988,7,298499.0,346808.0
1988,8,304873.0,349124.0
1988,9,321808.0,348157.0
1988,10,325322.0,347762.0


# Group-wise operations

In [136]:
df_allstate['cumulative_paid'].shift()

GRNAME               AccidentYear  lag
Allstate Ins Co Grp  1988          1           NaN
                                   2       70571.0
                                   3      155905.0
                                   4      220744.0
                                   5      251595.0
                                   6      274156.0
                                   7      287676.0
                                   8      298499.0
                                   9      304873.0
                                   10     321808.0
                     1989          1      325322.0
                                   2       66547.0
                                   3      136447.0
                                   4      179142.0
                                   5      211343.0
                                   6      231430.0
                                   7      244750.0
                                   8      254557.0
                                   9      2

In [137]:
df_allstate['prior_cumulative_paid'] = df_allstate['cumulative_paid'].shift()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [138]:
df_allstate[['cumulative_paid', 'prior_cumulative_paid']].head(15)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,cumulative_paid,prior_cumulative_paid
GRNAME,AccidentYear,lag,Unnamed: 3_level_1,Unnamed: 4_level_1
Allstate Ins Co Grp,1988,1,70571,
Allstate Ins Co Grp,1988,2,155905,70571.0
Allstate Ins Co Grp,1988,3,220744,155905.0
Allstate Ins Co Grp,1988,4,251595,220744.0
Allstate Ins Co Grp,1988,5,274156,251595.0
Allstate Ins Co Grp,1988,6,287676,274156.0
Allstate Ins Co Grp,1988,7,298499,287676.0
Allstate Ins Co Grp,1988,8,304873,298499.0
Allstate Ins Co Grp,1988,9,321808,304873.0
Allstate Ins Co Grp,1988,10,325322,321808.0


We have a problem. The entry for 1989, lag 1 is not correct. We need to group by accident year

In [139]:
df_allstate['prior_cumulative_paid'] = df_allstate['cumulative_paid'].groupby(
    level='AccidentYear'
  ).apply(lambda  x : x.shift(1))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


The warning is something we should pay attention to, but it's not anything to worry about in this case.

In [140]:
df_allstate[['cumulative_paid', 'prior_cumulative_paid']].head(15)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,cumulative_paid,prior_cumulative_paid
GRNAME,AccidentYear,lag,Unnamed: 3_level_1,Unnamed: 4_level_1
Allstate Ins Co Grp,1988,1,70571,
Allstate Ins Co Grp,1988,2,155905,70571.0
Allstate Ins Co Grp,1988,3,220744,155905.0
Allstate Ins Co Grp,1988,4,251595,220744.0
Allstate Ins Co Grp,1988,5,274156,251595.0
Allstate Ins Co Grp,1988,6,287676,274156.0
Allstate Ins Co Grp,1988,7,298499,287676.0
Allstate Ins Co Grp,1988,8,304873,298499.0
Allstate Ins Co Grp,1988,9,321808,304873.0
Allstate Ins Co Grp,1988,10,325322,321808.0


Do that again for the incurred

In [141]:
df_allstate['prior_cumulative_incurred'] = df_allstate['cumulative_incurred'].groupby(
    level='AccidentYear'
  ).apply(lambda  x : x.shift(1))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [143]:
cols =[col for col in df_allstate.colums]
cols
#df_allstate[cols]

AttributeError: 'DataFrame' object has no attribute 'colums'

## Make some link ratios

In [144]:
df_allstate['paid_ldf'] = df_allstate['cumulative_paid'] / df_allstate['prior_cumulative_paid']
df_allstate['incurred_ldf'] = df_allstate['cumulative_incurred'] / df_allstate['prior_cumulative_incurred']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [146]:
df_allstate[['cumulative_paid','paid_ldf', 'prior_cumulative_paid', 'incurred_ldf']]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,cumulative_paid,paid_ldf,prior_cumulative_paid,incurred_ldf
GRNAME,AccidentYear,lag,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allstate Ins Co Grp,1988,1,70571,,,
Allstate Ins Co Grp,1988,2,155905,2.209194,70571.0,0.987981
Allstate Ins Co Grp,1988,3,220744,1.415888,155905.0,0.956748
Allstate Ins Co Grp,1988,4,251595,1.139759,220744.0,0.952086
Allstate Ins Co Grp,1988,5,274156,1.089672,251595.0,1.072712
Allstate Ins Co Grp,1988,6,287676,1.049315,274156.0,0.987037
Allstate Ins Co Grp,1988,7,298499,1.037622,287676.0,0.99062
Allstate Ins Co Grp,1988,8,304873,1.021354,298499.0,1.006678
Allstate Ins Co Grp,1988,9,321808,1.055548,304873.0,0.99723
Allstate Ins Co Grp,1988,10,325322,1.01092,321808.0,0.998865


In [147]:
df_allstate.pivot_table(index = 'AccidentYear', columns = 'lag', values = 'paid_ldf').round(3)

lag,2,3,4,5,6,7,8,9,10
AccidentYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1988,2.209,1.416,1.14,1.09,1.049,1.038,1.021,1.056,1.011
1989,2.05,1.313,1.18,1.095,1.058,1.04,1.061,1.014,
1990,2.553,1.338,1.146,1.087,1.048,1.088,1.014,,
1991,2.159,1.326,1.158,1.084,1.101,1.019,,,
1992,2.247,1.27,1.165,1.161,1.033,,,,
1993,2.392,1.311,1.375,1.025,,,,,
1994,2.295,1.895,1.028,,,,,,
1995,4.517,1.031,,,,,,,
1996,1.054,,,,,,,,


In [148]:
df_allstate.pivot_table(index = 'AccidentYear', columns = 'lag', values = 'paid_ldf').round(3).fillna("")

lag,2,3,4,5,6,7,8,9,10
AccidentYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1988,2.209,1.416,1.14,1.09,1.049,1.038,1.021,1.056,1.011
1989,2.05,1.313,1.18,1.095,1.058,1.04,1.061,1.014,
1990,2.553,1.338,1.146,1.087,1.048,1.088,1.014,,
1991,2.159,1.326,1.158,1.084,1.101,1.019,,,
1992,2.247,1.27,1.165,1.161,1.033,,,,
1993,2.392,1.311,1.375,1.025,,,,,
1994,2.295,1.895,1.028,,,,,,
1995,4.517,1.031,,,,,,,
1996,1.054,,,,,,,,


## Weighted average link ratios

In [149]:
cumul_cols = [col_name for col_name in df_allstate.columns if 'cumulative' in col_name] #Grouping by lags
df_links = df_allstate.groupby(level='lag')[cumul_cols].sum()
df_links

Unnamed: 0_level_0,cumulative_incurred,cumulative_paid,prior_cumulative_paid,prior_cumulative_incurred
lag,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1856033,390836,0.0,0.0
2,1841144,867276,390145.0,1849308.0
3,1664113,1100096,822360.0,1789939.0
4,1566527,1173244,1012785.0,1571799.0
5,1484660,1182521,1082167.0,1470342.0
6,1375738,1159538,1095306.0,1387730.0
7,1192355,1045588,1000042.0,1190798.0
8,927717,831720,806393.0,922763.0
9,648777,595681,574932.0,646616.0
10,347762,325322,321808.0,348157.0


In [150]:
df_links['paid_ata'] = df_links.cumulative_paid / df_links.prior_cumulative_paid
df_links['incurred_ata'] = df_links.cumulative_incurred / df_links.prior_cumulative_incurred
df_links

Unnamed: 0_level_0,cumulative_incurred,cumulative_paid,prior_cumulative_paid,prior_cumulative_incurred,paid_ata,incurred_ata
lag,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,1856033,390836,0.0,0.0,inf,inf
2,1841144,867276,390145.0,1849308.0,2.222958,0.995585
3,1664113,1100096,822360.0,1789939.0,1.33773,0.929704
4,1566527,1173244,1012785.0,1571799.0,1.158433,0.996646
5,1484660,1182521,1082167.0,1470342.0,1.092734,1.009738
6,1375738,1159538,1095306.0,1387730.0,1.058643,0.991359
7,1192355,1045588,1000042.0,1190798.0,1.045544,1.001308
8,927717,831720,806393.0,922763.0,1.031408,1.005369
9,648777,595681,574932.0,646616.0,1.036089,1.003342
10,347762,325322,321808.0,348157.0,1.01092,0.998865


In [151]:
df_links = df_links.query('lag > 1') #Not needed in table
df_links

Unnamed: 0_level_0,cumulative_incurred,cumulative_paid,prior_cumulative_paid,prior_cumulative_incurred,paid_ata,incurred_ata
lag,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2,1841144,867276,390145.0,1849308.0,2.222958,0.995585
3,1664113,1100096,822360.0,1789939.0,1.33773,0.929704
4,1566527,1173244,1012785.0,1571799.0,1.158433,0.996646
5,1484660,1182521,1082167.0,1470342.0,1.092734,1.009738
6,1375738,1159538,1095306.0,1387730.0,1.058643,0.991359
7,1192355,1045588,1000042.0,1190798.0,1.045544,1.001308
8,927717,831720,806393.0,922763.0,1.031408,1.005369
9,648777,595681,574932.0,646616.0,1.036089,1.003342
10,347762,325322,321808.0,348157.0,1.01092,0.998865


In [None]:
df_links.paid_ata.cumprod() #there is a problem

In [152]:
df_links.paid_ata[::-1].cumprod() #Reverse before cumulative product (reverse order but index is in correct order)

lag
10    1.010920
9     1.047403
8     1.080300
7     1.129501
6     1.195738
5     1.306624
4     1.513637
3     2.024839
2     4.501131
Name: paid_ata, dtype: float64

In [153]:
df_links['paid_atu'] = df_links.paid_ata[::-1].cumprod()
df_links['incurred_atu'] = df_links.incurred_ata[::-1].cumprod()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [None]:
df_links

In [None]:
df_links = df_links[['paid_atu', 'incurred_atu']]
df_links

In [None]:
df_links = df_links.reset_index() #Readjust the index
df_links.lag = df_links.lag - 1
df_links = df_links.set_index('lag')
df_links
#Or df_links.index - 1

In [None]:
df_ultimate = df_allstate.query('DevelopmentYear == 1997')
df_ultimate

In [None]:
df_ultimate = pd.merge(df_ultimate, df_links, left_index = True, right_index = True)
df_ultimate['ult_paid'] = df_ultimate.cumulative_paid * df_ultimate.paid_atu
df_ultimate['ult_incurred'] = df_ultimate.cumulative_incurred * df_ultimate.incurred_atu
df_ultimate[['ult_paid', 'ult_incurred']]

In [None]:
df_ultimate[['ult_paid', 'paid_atu']]

In [None]:
df_ultimate[[ 'ult_incurred', 'incurred_atu']]

# Saving our work

In [None]:
import os
data= os.getcwd()
data

In [None]:
!dir

In [None]:
df_triangle.to_csv(data + '/df_ultimate.csv')

# If there's time

## Recap on generator objects and iteration

# Homework

1. Repeat the construction of LDFs for every company. 
2. Which company had the most significant difference between paid and incurred ultimate estimates?
1. Which company had the largest case reserves? In which cell can you find this?