<a href="https://colab.research.google.com/github/AndreyDyachkov/Data_analytics_with_Python_2/blob/main/Analysis%20of%20an%20unusual%20rise%20in%20GPE/GPE_analysis_of_anomalies.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analysis of an unusual rise in GPE (gross premium earned) in a portfolio subsegment in Jan 2023

(All sensitive data were excluded)

Goal: Find out the reason for an abnormal increase in GPE in Jan 2023 and the following plummet in Feb 2023.

Plan:
1. Get the list of clients with a peak in GPE in Jan 2023.
2. Get the list of clients with a decline in GPE in Feb 2023.
3. Get details for those clients.

### Conclusion

We found a pull of clients with a significant increase in GPE in Jan 2023 and a decrease in Feb 2023.
<list of clients>

#### Imports

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/frequency_analysis/data/data_GPE.csv')
df.head()

Unnamed: 0,HOLDING,MedOrg.Company_name,CONTRCODE,BEGDATE,CloseEndDate,GPE,YearMonthnumber
0,group19,Company25,AB-C037-0000788,06/09/2022 00:00,05/09/2023 00:00,7610496,2022/10
1,group19,Company25,AB-C037-0000788,06/09/2022 00:00,05/09/2023 00:00,7610496,2022/12
2,group19,Company25,AB-C037-0000788,06/09/2022 00:00,05/09/2023 00:00,7610496,2023/01
3,group19,Company25,AB-C037-0000788,06/09/2022 00:00,05/09/2023 00:00,7364996,2022/11
4,group19,Company25,AB-C037-0000455,06/09/2021 00:00,05/09/2022 00:00,7249291,2022/03


#### Data processing

Change column names

In [3]:
df.columns = ['holding','company','contract_id','start_date','end_date','gpe','year_month']

In [4]:
df.shape

(1647, 7)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1647 entries, 0 to 1646
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   holding      1385 non-null   object
 1   company      1647 non-null   object
 2   contract_id  1647 non-null   object
 3   start_date   1647 non-null   object
 4   end_date     1647 non-null   object
 5   gpe          1647 non-null   int64 
 6   year_month   1647 non-null   object
dtypes: int64(1), object(6)
memory usage: 90.2+ KB


In [6]:
df.duplicated().sum()

0

In [7]:
df.isnull().mean().sort_values(ascending=False)

holding        0.159077
company        0.000000
contract_id    0.000000
start_date     0.000000
end_date       0.000000
gpe            0.000000
year_month     0.000000
dtype: float64

In [8]:
df['company'].value_counts().head(50)

Company26     34
Company27     32
Company25     26
Company140    26
Company145    24
Company119    24
Company14     23
Company29     19
Company115    17
Company113    17
Company144    13
Company126    13
Company20     13
Company136    13
Company103    13
Company125    13
Company106    13
Company88     13
Company74     13
Company48     13
Company104    13
Company75     13
Company78     13
Company50     13
Company107    13
Company19     13
Company133    13
Company22     13
Company44     13
Company39     13
Company24     13
Company112    13
Company15     13
Company33     13
Company91     13
Company143    13
Company141    13
Company37     13
Company18     13
Company118    13
Company63     13
Company4      13
Company46     13
Company101    13
Company21     13
Company109    13
Company60     13
Company110    13
Company93     13
Company85     13
Name: company, dtype: int64

Cleaning column names

In [25]:
df['company'] = df['company'].str.lower()
df['company'] = df['company'].str.strip()

Replace empty holdings with company names

In [10]:
df['holding'] = df['holding'].fillna(df['company'])

Fix data types

In [11]:
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])

Pivot the table to look at the change in GPE over months

In [12]:
df_pivoted = df.pivot_table(index=['holding','company'], values='gpe', aggfunc = 'sum', columns='year_month').reset_index()
df_pivoted.drop(['2022/03','2022/04','2022/05','2022/06','2022/07','2022/08','2022/09'],axis=1, inplace=True)
df_pivoted

year_month,holding,company,2022/10,2022/11,2022/12,2023/01,2023/02
0,company10,company10,3189650.0,3086758.0,3189650.0,3406774.0,3077086.0
1,company100,company100,27532.0,26644.0,27532.0,27532.0,24868.0
2,company101,company101,1072695.0,1038092.0,449840.0,,
3,company107,company107,37352.0,36147.0,37352.0,35240.0,30258.0
4,company11,company11,,,,,
...,...,...,...,...,...,...,...
148,group8,company98,11239.0,10876.0,11239.0,11239.0,10151.0
149,group9,company34,6489.0,6279.0,6489.0,6489.0,5861.0
150,group9,company62,3346.0,3238.0,3346.0,3346.0,3022.0
151,group9,company86,9228.0,8930.0,9228.0,9228.0,8335.0


Fill missing values with 0

In [13]:
df_pivoted = df_pivoted.fillna(0)
df_pivoted

year_month,holding,company,2022/10,2022/11,2022/12,2023/01,2023/02
0,company10,company10,3189650.0,3086758.0,3189650.0,3406774.0,3077086.0
1,company100,company100,27532.0,26644.0,27532.0,27532.0,24868.0
2,company101,company101,1072695.0,1038092.0,449840.0,0.0,0.0
3,company107,company107,37352.0,36147.0,37352.0,35240.0,30258.0
4,company11,company11,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...
148,group8,company98,11239.0,10876.0,11239.0,11239.0,10151.0
149,group9,company34,6489.0,6279.0,6489.0,6489.0,5861.0
150,group9,company62,3346.0,3238.0,3346.0,3346.0,3022.0
151,group9,company86,9228.0,8930.0,9228.0,9228.0,8335.0


Total GPE by month

In [14]:
df_pivoted.sum(numeric_only=True)

year_month
2022/10    64570379.0
2022/11    62425471.0
2022/12    65090268.0
2023/01    69807554.0
2023/02    60357221.0
dtype: float64

Let's calculate the average growth in Jan vs Dec

In [15]:
(df_pivoted['2023/01'].sum() / df_pivoted['2022/12'].sum() -1).round(2)

0.07

and average decrease in Feb vs Jan

In [16]:
(df_pivoted['2023/02'].sum() / df_pivoted['2023/01'].sum()).round(2)

0.86

GPE growth in Jan vs Dec
and
GPE growth in Feb vs Dec

In [17]:
df_pivoted['jan_vs_dec'] = (df_pivoted['2023/01'] / df_pivoted['2022/12'] -1).round(2)
df_pivoted['feb_vs_dec'] = (df_pivoted['2023/02'] / df_pivoted['2022/12'] -1).round(2)

#### Clients with abnormal growth in GPE in Jan 2023

Companies with an increase in GPE in Jan over 7% (avg increase) and a drop in GPE in Feb

In [18]:
df_pivoted_to_check = df_pivoted[(df_pivoted['jan_vs_dec']>0.07)&(df_pivoted['feb_vs_dec']<=0)]
df_pivoted_to_check

year_month,holding,company,2022/10,2022/11,2022/12,2023/01,2023/02,jan_vs_dec,feb_vs_dec
69,group21,company111,42105.0,40747.0,42105.0,46821.0,42290.0,0.11,0.0
83,group22,company82,0.0,0.0,11311.0,12523.0,11311.0,0.11,0.0
92,group3,company129,2295496.0,2221448.0,2295496.0,4755049.0,2221532.0,1.07,-0.03


##### List #1 of the companies to check their contracts

In [19]:
list_to_check1 = df_pivoted_to_check['company']
list_to_check1

69    company111
83     company82
92    company129
Name: company, dtype: object

#### Companies with an abnormal decrease in GPE in Feb

The drop is over 20% (a typical decrease is around 10% (28/31=0.9)

In [20]:
df_pivoted_to_check2 = df_pivoted[df_pivoted['2023/02'] < (df_pivoted['2023/01']*0.8)]
df_pivoted_to_check2

year_month,holding,company,2022/10,2022/11,2022/12,2023/01,2023/02,jan_vs_dec,feb_vs_dec
10,company13,company13,67928.0,65737.0,67928.0,67928.0,32175.0,0.0,-0.53
13,company134,company134,358751.0,347178.0,358751.0,358751.0,231452.0,0.0,-0.35
19,company53,company53,61124.0,59152.0,61124.0,61124.0,17746.0,0.0,-0.71
22,company67,company67,82840.0,80167.0,82840.0,37411.0,0.0,-0.55,-1.0
23,company68,company68,48210.0,46655.0,48210.0,10886.0,0.0,-0.77,-1.0
33,group10,company43,414622.0,401247.0,414622.0,414622.0,173874.0,0.0,-0.58
45,group12,company145,382834.0,393147.0,406252.0,344608.0,251869.0,-0.15,-0.38
92,group3,company129,2295496.0,2221448.0,2295496.0,4755049.0,2221532.0,1.07,-0.03
145,group8,company80,54506.0,52747.0,54506.0,54506.0,40669.0,0.0,-0.25


##### List #2 of the companies to check their contracts

In [21]:
list_to_check2 = df_pivoted_to_check2['company']
list_to_check2

10      company13
13     company134
19      company53
22      company67
23      company68
33      company43
45     company145
92     company129
145     company80
Name: company, dtype: object

Appending lists

In [22]:
list_to_check = list_to_check1.append(list_to_check2, ignore_index=True)
list_to_check

  list_to_check = list_to_check1.append(list_to_check2, ignore_index=True)


0     company111
1      company82
2     company129
3      company13
4     company134
5      company53
6      company67
7      company68
8      company43
9     company145
10    company129
11     company80
Name: company, dtype: object

Filtering contracts by the list

In [23]:
df_to_check = df[df['company'].isin(list_to_check)]
df_to_check

Unnamed: 0,holding,company,contract_id,start_date,end_date,gpe,year_month
71,group3,company129,AB-C037-0000966,2023-01-01,2023-12-31,2459553,2023/01
77,group3,company129,AB-C037-0000564,2022-01-02,2023-01-31,2295496,2022/03
78,group3,company129,AB-C037-0000564,2022-01-02,2023-01-31,2295496,2022/05
79,group3,company129,AB-C037-0000564,2022-01-02,2023-01-31,2295496,2022/07
80,group3,company129,AB-C037-0000564,2022-01-02,2023-01-31,2295496,2022/08
...,...,...,...,...,...,...,...
1325,company53,company53,AB-C037-0000641,2022-10-02,2023-09-02,17746,2023/02
1392,group22,company82,AB-C037-0000937,2022-04-12,2023-03-12,12523,2023/01
1406,group22,company82,AB-C037-0000937,2022-04-12,2023-03-12,11311,2022/12
1407,group22,company82,AB-C037-0000937,2022-04-12,2023-03-12,11311,2023/02


#### Contract details

In [24]:
df_to_check_pivoted = df_to_check.pivot_table(index=['holding','company','contract_id','start_date','end_date'], values='gpe', aggfunc='sum', columns='year_month').reset_index()
df_to_check_pivoted.drop(['2022/03','2022/04','2022/05','2022/06','2022/07','2022/08','2022/09'],axis=1, inplace=True)
df_to_check_pivoted

year_month,holding,company,contract_id,start_date,end_date,2022/10,2022/11,2022/12,2023/01,2023/02
0,company13,company13,AB-C037-0000616,2022-01-02,2023-01-31,67928.0,65737.0,67928.0,67928.0,
1,company13,company13,AB-C037-0001014,2023-01-02,2024-01-31,,,,,32175.0
2,company134,company134,AB-C037-0000650,2022-02-21,2023-02-20,358751.0,347178.0,358751.0,358751.0,231452.0
3,company53,company53,AB-C037-0000641,2022-10-02,2023-09-02,61124.0,59152.0,61124.0,61124.0,17746.0
4,company67,company67,AB-C037-0000563,2022-01-15,2023-01-14,82840.0,80167.0,82840.0,37411.0,
5,company68,company68,AB-C037-0000573,2022-08-01,2023-07-01,48210.0,46655.0,48210.0,10886.0,
6,group10,company43,AB-C037-0000638,2022-02-14,2023-02-13,414622.0,401247.0,414622.0,414622.0,173874.0
7,group12,company145,AB-C037-0000513,2021-12-10,2022-11-10,75531.0,,,,
8,group12,company145,AB-C037-0000615,2022-01-17,2023-01-16,127397.0,123288.0,127397.0,65753.0,
9,group12,company145,AB-C037-0000885,2022-12-10,2023-11-10,179906.0,269859.0,278855.0,278855.0,251869.0
