Skip to content

alexkotsscott/Customised_Insruance_Data_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 

Repository files navigation

Last Update - 2023/02/25

Works in Progress TEST

    Project sits in the master branch

  1. Stratify the data better (by LOB in particular) to better relect realistic scenarios
  2. Example - for commercial property setting the value of the individual site, use external data
  3. Not worthwhile spending time here Going between rpy2 (using Python in R, seperate notebooks for now) - Easier in Data bricks %R, %sql, %Python magic commands
  4. Include some outliers for testing (so can create a control table to flag or drop these fields)

Exploring ways to share code & demos via new code space feature Codespace is pretty basic - about 5gb RAM Linux based Can handle some of the smaller demos Instance shuts down automatically after a period of inactivity - quite slow to boot up

Coding is primarly (at this point in time) written in both R & Python Both R & Python scripts will have a dedicated Engine to power the behind the scenes stuff - install (if not installed) packages The engine also contains user defined functions that are used regularly & can likely to be required to run a particular project

Random_Data_Exploration

01_Create_Data Generate random data - EDA it, model it, map it. Continual work in progress

Some of the Scripts in this can be run in the new CodeSpace feature

Random data is consists of these fields - will likely evolve to enrich the data Geographic data - for example Latidues & Longitudes are randomly generated

  • Then reverse geocoded to get the postcode
    • The open source version is new, very slow - many ways to break this down for example:
      • Break down the data set into seperate samples and run on different machines
        • Run time about 5 hours (average) for 10,000 random rows of lats & longs)
          • Currently just random, so good luck finding a postcode in the Adriatic

Another appproach for generating REAL Postcodes is via python random addresses US

  • Currently only generates random postcodes for a few states in the US

Randomly Generated fields below

The link below creates random data based on criteria to generate random data, sample size is optional, simple logic for example you can't have a claim count where there was no conversion (sale) to begin with

The below is a generic random dataset for multiple lines of business

  • work to be done here on the data sample - a claim for £200 for a phone is not the same as £2,000,000 for comercial property
  • Example creating a random data set purely for the motor market - so more sensible inputs in terms of premium claims etc (second hyperlink)

01_Create_Data

01_Create_Data_Motor

Variable Definition

"Customer_ID": Cryptographically generated random identifiers "Purchase_Date": Dates are random within a given range, Purchase date must always be earlier or equal to - Cover_Start_Date "Cover_Start_Date": Date Cover Starts - Random "LOB": Line Of Business "Sale_Flag": Binary - 0/1 "Purchase_Price": Randomly generated - needs to be taylored for each LOB "Claims_Count": Number of claims in the customers history - currently just between 0/1 at random - only generated for sales (work to be done here) "Convictions Count": Number of historical convictions (regardless of sale 0/1 here), bound between 0 -> 5 at random "Period_of_Cover": 12, 24, 36, 48 months randomly sampled "Premium": Is Premium, random, needs logic to keep it sensible by LOB etc "Age": Between 18 & 80 random "Broker": Random between -> 'london_ins', 'some_syndicate', 'some_mga' # Could add in weights for balance
Variable Definition
Customer_ID: Cryptographically generated random identifiers
Purchase_Date: Dates are random within a given range, Purchase date must always be earlier or equal to - Cover_Start_Date
Cover_Start_Date: Date Cover Starts - Random
LOB: Line Of Business

About

Generate Random Data for Modelling & EDA showcasing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published