Skip to content

Clear and uncomplicated database and data warehouse for conceptual insurance company. With analytical queries.

License

Notifications You must be signed in to change notification settings

Kielx/Insurance-company-database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

55 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Issues MIT License


Logo

Insurance company database and data warehouse

πŸ‡΅πŸ‡± POLISH VERSION

The goal of the project was to create a fragment of an insurance company's database and a data warehouse based on it. The database contains information about the insurance company's customers, employees and branches, as well as the insurance policies between these entities. The database and warehouse designed this way allows for easy visualization and development of summaries of the work of all branches, employees, as well as policies taken by clients. The assumption is that a company can offer different types of policies - e.g., home insurance, vehicle insurance - and each customer can take out an unlimited number of policies. The project was developed entirely in the environment offered by Oracle, using SQL Developer and SQL Developer Data Modeler, as well as the SQL*Loader tool. The scripts generating the data with which the database was fed were written in Python using the Faker library.

Report Bug

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

Database screenshot

This project was part of my coursework done through my studies at the Kielce University of Technology. It aimed to create a simplified version of a database for an insurance company (under 20 tables) and later to build a data warehouse on top of it (using a star schema).

Prior to starting work on this project, I did research to determine if anyone had created something similar and made it available to the public. Unfortunately, the results of my search were less than satisfactory. I had to attempt to create the database model, the relationships, the data generator, the transformation of the data into a data warehouse and all the queries myself. However, this was my first approach to creating a database and data warehouse so the model or queries may contain errors or fall short of good practices. Nevertheless, below you will find the results of my work, which may be useful to those who are tasked with creating a similar project.

If you would like to know more about the creation process you can jump to Roadmap. If you want to see the project in action you will have to follow the steps described in Getting Started

(back to top)

Built With

  • SQLDeveloper
  • SQLDataModeler
  • SQLLoader
  • Python

(back to top)

Getting Started

To start working with this project you need to install Oracle Express database. You can find instructions on how to perform this process here: Oracle Database XE Quick Start

The more experienced users can use the Docker image: Oracle Container Registry Look for the appropriate image under Database -> Express

The database model is created in Oracle SQL Data Modeler. This tool is useful to more easily visualize and create the structure of the database.

To operate the database I used SQL Developer

When we have all the tools we can proceed to launch the project. You can use the project structure below and roadmap to navigate the project. Refer to Usage section for quick tips on how to operate on project.

Basic Usage

  1. Be sure that you have installed all the prerequisites from Getting Started section
  2. Open SQL Developer and run 01_database/create.sql to create the database
  • Open 01_database/dataGenerator/main.py and change set USERNAME, set PASSWORD, set CONNECTION_STRING values to fit your database connection.

  • Run 01_database/dataGenerator/main.py

  • OR You could also invoke load_data.bat, and provide username, password and connection string via the command line:

.\load_files.bat yourUserName yourPassword yourDatabaseConnectionString
  1. Query the database using the 03_queries/ of your choice
  2. You can create a separate database instance and use 02_data_warehouse/create.sql and 02_data_warehouse/insertsql to create tables with data exported from the database. If you want to export your own set of data you can follow the instructions provided on the Oracle website
  3. Query the created data warehouse with queries from 03_queries/

Project structure

β”œβ”€β”€ 01_database // Database related files
β”‚Β Β  β”œβ”€β”€ create.sql // Sql script that creates database tables - DDL generated from model
β”‚Β Β  β”œβ”€β”€ dataGenerator // Files related to data generator script AND generated by script
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ generatedData // Folder containing .csv files with generated data from python script
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ branch.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ city.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ claim.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ claimStatus.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ client.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ clientType.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ employee.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ houseNr.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ insurance.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ insuranceType.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ payment.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ phone.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ phoneType.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ region.csv
β”‚Β Β  β”‚Β Β  β”‚Β Β  └── street.csv
β”‚Β Β  β”‚Β Β  └── main.py // Generates data to populate database. Also contains functions that create files and scripts to load data
β”‚Β Β  β”œβ”€β”€ load_files.bat // Batch file that invokes sqlldr 
β”‚Β Β  β”œβ”€β”€ model_database // Folder containing model files
β”‚Β Β  β”‚Β Β  └── model.dmd
β”‚Β Β  └── sqlldr // Everything related to sqlloader. All files inside get generated by main.py script
β”‚Β Β      β”œβ”€β”€ bads // If loading goes wrong, info will be available here
β”‚Β Β      β”œβ”€β”€ branch.ctl
β”‚Β Β      β”œβ”€β”€ city.ctl
β”‚Β Β      β”œβ”€β”€ claim.ctl
β”‚Β Β      β”œβ”€β”€ claimStatus.ctl
β”‚Β Β      β”œβ”€β”€ client.ctl
β”‚Β Β      β”œβ”€β”€ clientType.ctl
β”‚Β Β      β”œβ”€β”€ employee.ctl
β”‚Β Β      β”œβ”€β”€ houseNr.ctl
β”‚Β Β      β”œβ”€β”€ insurance.ctl
β”‚Β Β      β”œβ”€β”€ insuranceType.ctl
β”‚Β Β      β”œβ”€β”€ payment.ctl
β”‚Β Β      β”œβ”€β”€ phone.ctl
β”‚Β Β      β”œβ”€β”€ phoneType.ctl
β”‚Β Β      β”œβ”€β”€ region.ctl
β”‚Β Β      └── street.ctl
β”‚Β Β      β”œβ”€β”€ logs // Sqlloader log files
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ branch.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ city.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ claim.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ claimStatus.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ client.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ clientType.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ employee.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ houseNr.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ insurance.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ insuranceType.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ payment.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ phone.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ phoneType.log
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ region.log
β”‚Β Β      β”‚Β Β  └── street.log
β”œβ”€β”€ 02_data_warehouse // Files related to data warehouse. Create and insert scripts along with model.
β”‚Β Β  β”œβ”€β”€ create.sql
β”‚Β Β  β”œβ”€β”€ insert.sql
β”‚Β Β  └── model_data_warehouse
β”‚Β Β      └── model.dmd
β”œβ”€β”€ 03_queries // All database and data warehouse queries divided by type
β”‚Β Β  β”œβ”€β”€ 01_rollup
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ database.sql
β”‚Β Β  β”‚Β Β  └── warehouse.sql
β”‚Β Β  β”œβ”€β”€ 02_cube
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ database.sql
β”‚Β Β  β”‚Β Β  └── warehouse.sql
β”‚Β Β  β”œβ”€β”€ 03_partition
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ database.sql
β”‚Β Β  β”‚Β Β  └── warehouse.sql
β”‚Β Β  β”œβ”€β”€ 04_window
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ database.sql
β”‚Β Β  β”‚Β Β  └── warehouse.sql
β”‚Β Β  └── 05_rank
β”‚Β Β      β”œβ”€β”€ database.sql
β”‚Β Β      └── warehouse.sql
β”œβ”€β”€ images // README images
β”‚Β Β  β”œβ”€β”€ data_warehouse.png
β”‚Β Β  β”œβ”€β”€ example_query.png
β”‚Β Β  β”œβ”€β”€ logo.png
β”‚Β Β  β”œβ”€β”€ original_design.png
β”‚Β Β  └── screenshot.png
β”œβ”€β”€ LICENSE.txt
└── README.md

(back to top)

Roadmap

Original Idea and database model

My original idea was to create a more real-life database design following all normalization principles. It looked something like that:

Original design

You can check it out here.

I later had to transform it into the design that you can see in the first screenshot in the about section. The reason for it was that it could become too complicated to create a data warehouse and queries based on my original design. Here is the finished data warehouse model:

The finished model of data warehouse

To create the models I used Oracle SQL Data Modeler. It's a great tool that can be used to create logical models, convert them to physical models, and finally DDL.

Data generation and loading data with SQL*Loader

With database and data warehouse models created I had to generate the necessary data (one of the project requirements was that tables should be loaded with at least 1000 or 10000 rows of data). I could take the short path and use one of the data generation tools available on the web but I decided that the web available tools wouldn't make a cut and decided to create a python script using the Faker library to generate the data. It proved to be a sound decision because I used the same script to generate control files for SQL* Loader and write lines to batch files that could then be run to load data into the database. SQL * Loader is a tool developed by Oracle, that can be invoked through command line to load files (.csv files in our case) into database. The syntax to load data looks like this:

sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/region.ctl' log='sqlldr/logs/region.log' bad='sqlldr/bads/region.bad'

Invoking such command for each of 15 tables would be tedious and ineffective.

My idea to simplify the process was to:

  1. Generate the data for the tables using separate python functions (the last line invokes a function to generate control file)
def generate_branches(records):
    headers = ["branch_id", "branch_name", "region_id", "city_id", "street_id", "houseNr_id"]

    with open(f"./generatedData/branch.csv", 'wt', newline='') as csvFile:
        writer = csv.DictWriter(csvFile, fieldnames=headers)
        writer.writeheader()
        for i in range(records):
            writer.writerow({
                headers[0]: i,
                headers[1]: faker.company(),
                headers[2]: i % 16,
                headers[3]: i % 16,
                headers[4]: i % 16,
                headers[5]: i,
            })
    print(f'Successfully generated {records} branches')
    create_ctl_file("branch", headers)
  1. Invoke a function that will create a .ctl file that will control the data loading to the database through SQL*Loader
def create_ctl_file(filename, headers):
    file = open(f'../sqlldr/{filename}.ctl', "w+")
    file.write("LOAD DATA\n")
    file.write(f"INFILE 'dataGenerator/generatedData/{filename}.csv'\n")
    file.write("REPLACE\n")
    file.write(f"INTO TABLE {filename}\n")
    file.write("FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'\n")
    file.write("TRAILING NULLCOLS\n")
    file.write("(\n")
    for i in range(len(headers)):
        if re.search(r'date', headers[i], re.IGNORECASE):
            file.write(f'{headers[i]} DATE "yyyy-mm-dd"')
        else:
            file.write(headers[i])
        if i != (len(headers) - 1):
            file.write(",")
    file.write("\n)")
    file.close()
    print(f"Successfully generated {filename}.ctl file")
    bat_file = open('../load_data.bat', "a+")
    bat_file.write(f"sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/{filename}.ctl' log='sqlldr/logs/{filename}.log' bad='sqlldr/bads/{filename}.bad'\n")
    bat_file.close()
    print(f"Successfully appended {filename} data to load_files.bat")
  1. Append the appropriate line to the batch file and save it. It can be conveniently run to load data using SQL*Loader. load_files.bat
@echo off

rem Script that loads previously created data into the database using sqlldr
rem To change the default values you need to call the script and enter them sequentially as command arguments
rem Run the script using a command line, e.g. cmd
rem Then we move to the folder where the script is located
rem And then run it by entering the command .\data.bat
rem This command will run it with the default user data
rem If you want to change it, enter other data, e.g. .\load_data.bat username password my_base:1521


rem Set default values for the connection
set USERNAME=kielx
set PASSWORD=d11
set CONNECTION_STRING=@//localhost:1521/XEPDB1

rem Check if values exist
if not "%1"=="" (
  set USERNAME=%1
) 

if not "%2"=="" (
  set PASSWORD=%2
) 

if not "%3"=="" (
  set CONNECTION_STRING=%3
) 

sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/region.ctl' log='sqlldr/logs/region.log' bad='sqlldr/bads/region.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/city.ctl' log='sqlldr/logs/city.log' bad='sqlldr/bads/city.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/street.ctl' log='sqlldr/logs/street.log' bad='sqlldr/bads/street.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/houseNr.ctl' log='sqlldr/logs/houseNr.log' bad='sqlldr/bads/houseNr.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/clientType.ctl' log='sqlldr/logs/clientType.log' bad='sqlldr/bads/clientType.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/client.ctl' log='sqlldr/logs/client.log' bad='sqlldr/bads/client.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/employee.ctl' log='sqlldr/logs/employee.log' bad='sqlldr/bads/employee.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/branch.ctl' log='sqlldr/logs/branch.log' bad='sqlldr/bads/branch.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/phoneType.ctl' log='sqlldr/logs/phoneType.log' bad='sqlldr/bads/phoneType.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/phone.ctl' log='sqlldr/logs/phone.log' bad='sqlldr/bads/phone.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/payment.ctl' log='sqlldr/logs/payment.log' bad='sqlldr/bads/payment.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/insuranceType.ctl' log='sqlldr/logs/insuranceType.log' bad='sqlldr/bads/insuranceType.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/insurance.ctl' log='sqlldr/logs/insurance.log' bad='sqlldr/bads/insurance.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/claimStatus.ctl' log='sqlldr/logs/claimStatus.log' bad='sqlldr/bads/claimStatus.bad'
sqlldr %USERNAME%/%PASSWORD%%CONNECTION_STRING% control='sqlldr/claim.ctl' log='sqlldr/logs/claim.log' bad='sqlldr/bads/claim.bad'

All this code is held in 01_database/dataGenerator, with generated data being in 01_database/dataGenerator/generatedData subfolder and control files that handle loading are held in 01_database/sqlldr. The .bat file that can be used to load the data, is generated by the python program and can be run with command arguments that specify a username, password, and database connection string.

Database to data warehouse conversion

To convert the database tables and data into a data warehouse I used Oracle SQL Developer Export Wizard. It proved to be an easy way to achieve my goal. I tried to use the Oracle data pump but couldn't get it to work. All related files are held in 02_data_warehouse folder

Queries

Queries to the database are stored in 03_queries folder. Each type of query is stored in separate folder (one for each - rollup, cube, partition, windows and rank functions). Queries differ depending on whether they are executed against a database or a data warehouse. One of project requierements was to store them in that way, with queries to database in file named database.sql and queries to data warehouse in warehouse.sql

Example ROLLUP query with annual summary of the number of insurance policies sold in relation to the branch and type of policy:

SELECT EXTRACT(YEAR FROM insurance.begin_date) AS Year,
  NVL2(branch.branch_name, branch.branch_name, 'All Branches') AS Branch_Name,
  NVL2(insurancetype.insurance_type, insurancetype.insurance_type, 'All Policies') AS Policy_Type,
  COUNT(insurance.insuranceType_id) AS Number_of_Policies
FROM insurance
INNER JOIN insuranceType
ON insurance.insurancetype_id = insurancetype.insurancetype_id
INNER JOIN branch
ON insurance.branch_id = branch.branch_id
GROUP BY ROLLUP (EXTRACT(YEAR FROM insurance.begin_date), branch.branch_name, insurancetype.insurance_type )
ORDER BY EXTRACT(YEAR FROM insurance.begin_date),
  branch.branch_name,
  COUNT(insurance.insuranceType_id) DESC;

This query returns a summary of the number of insurance policies sold each year, grouped by branch name and policy type. The ROLLUP operator is used to generate subtotals for each year, branch name, and policy type. The NVL2 function is used to replace NULL values in the branch name and policy type columns with 'All Branches' and 'All Policies', respectively. The result is ordered by year, branch name, and number of policies sold in descending order.

Example query

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Krzysztof Pantak - kielx.dev@gmail.com

Website

(back to top)

About

Clear and uncomplicated database and data warehouse for conceptual insurance company. With analytical queries.

Topics

Resources

License

Stars

Watchers

Forks