# Shipping Company database

In this project I created the database for a polish shipping company. I assume that the company began its activity in the beginning of year 2012. In this notebook I generate the data for the database and upload it into SQL server. The data are from time period from the beginning of year 2012 to half of year 2020. 

<br>

## Table of contents

1. [Generation of data](#Generation-of-data)  
    1.1. [Employees table](#Generation-of-the-Employees-table)  
    1.2. [Clients table](#Generation-of-the-Clients-table)  
    1.3. [Vehicles table](#Generation-of-the-Vehicles-table)  
    1.4. [Contractors table](#Generation-of-the-Contractors-table)  
    1.5. [Commissions table](#Generation-of-the-Commissions-table)  
    1.6. [Order details table](#Generation-of-the-Order-details-table)  
    1.7. [Orders table](#Generation-of-the-Orders-table)  
    1.8. [Incoming transactions table](#Generation-of-the-Incoming-transactions-table)  
    1.9. [Transactors table](#Generation-of-the-Transactors-table)  
    1.10. [Outgoing transactions table](#Generation-of-the-Outgoing-transactions-table)  
2. [Inserting data into the server](#Inserting-data-into-the-server)

    

<br>

# Generation of data

In [23]:
import random 
import pandas as pd
import numpy as np
import datetime
import os
import string
from dateutil.relativedelta import relativedelta

# Loading the files containing all possible names
nazwiska = pd.read_csv('nazwiska.txt')
nazwiska = nazwiska.values.tolist()

imiona_m = pd.read_csv('first-m.txt')
imiona_m = imiona_m.values.tolist()

imiona_f = pd.read_csv('first-f.txt')
imiona_f = imiona_f.values.tolist()

nazwiska = [val for sublist in nazwiska for val in sublist]
imiona_f = [val for sublist in imiona_f for val in sublist]
imiona_m = [val for sublist in imiona_m for val in sublist]

# Random generation of all names

# Male
Imiona_M = random.choices(imiona_m, k=93)
Nazwiska_M = random.sample(nazwiska, k=93)

Nazwiska_1 = [sub.replace('ski', 'ska') for sub in nazwiska]
Nazwiska_2 = [sub.replace('cki', 'cka') for sub in Nazwiska_1]
Nazwiska_3 = [sub.replace('dzki', 'dzka') for sub in Nazwiska_2]

# Female
Imiona_D = random.choices(imiona_f, k=93)
Nazwiska_D = random.sample(Nazwiska_3, k=93)


## Generation of the Employees table

This table contains the data of all the people working for the company. It contains information such as: unique employee ID, first name, last name, date of birth, educational background, date of hiring, job title, salary, phone number, bank account number, city and country. The drivers have additional info about their driving licence and Id of their default vehicle.

In [24]:
# Generation of drivers data
##############################################################

# Driver ID
pula_id = list(np.linspace(10000,20000,10001,dtype=int))
WorkerID = random.sample(pula_id, k=200)  # all workers ID
DriverID = WorkerID[0:20]

# First name and last name
FirstName_Driver = Imiona_M[0:16] + Imiona_D[0:4]   
LastName_Driver = Nazwiska_M[0:16] + Nazwiska_D[0:4]

# Date of birth
start_date = datetime.date(1965, 1, 1)  # Period when drivers could be born
end_date = datetime.date(1995, 1, 1)
BirthDate_Driver = []
i=0
while i < 20:
    time_between_dates = end_date - start_date
    days_between_dates = time_between_dates.days
    random_number_of_days = random.randrange(days_between_dates)
    random_date = start_date + datetime.timedelta(days=random_number_of_days)
    BirthDate_Driver.append(random_date)
    i = i + 1
    
# Data Zatrudnienia 
start_date = datetime.date(2012, 1, 1)  # Period when drivers could be hired
end_date = datetime.date(2020, 6, 6)
HireDate_Driver = []
i = 0
# Assuming that few worked from the very beginning
while i < 16:
    time_between_dates = end_date - start_date
    days_between_dates = time_between_dates.days
    random_number_of_days = random.randrange(days_between_dates)
    random_date = start_date + datetime.timedelta(days=random_number_of_days)
    HireDate_Driver.append(random_date)
    i = i + 1
    
i = 0
while i < 4:
    HireDate_Driver.append(datetime.date(2012, 1, 1))
    i = i + 1
    
# Salary
pula_zarobkow = list(np.linspace(2000, 4600, 14, dtype = int))
Salary_Driver = random.choices(pula_zarobkow, k = 20) 

# Default vehicle
pula_id_pojazdow = list(np.linspace(1, 20, 20, dtype = int))
DefaultVehicle = random.sample(pula_id_pojazdow, k = 20)  # no repetitions

# Driving licence
pula_driving_licence = ["C + E", "C1 + E"]
DrivingLicence = random.choices(pula_driving_licence, k = 20)

# Phone number
pula_nr_telefonu = list(np.linspace(700000001,800000000,5000, dtype = int))
PhoneNumber = random.sample(pula_nr_telefonu, k = 205)  # pool of all phone numbers - no repetitions
PhoneNumber_Driver = PhoneNumber[0:20]

# Country
Country_Driver = ['Polska' for i in range(20)]

# City
pula_miast = ['Wrocław', 'Wrocław', 'Wrocław', 'Wrocław', 'Wrocław', 'Wrocław', 'Wrocław', 'Chrzanów', 'Wrocław', 'Wrocław', 'Wrocław', "Bielany Wrocławskie", "Żórawina", "Siechnice", "Domasław"]
City_Driver = random.choices(pula_miast, k = 20)

# Bank account
bank1=[]
for i in range(20):
    bank1.append(''.join(random.sample(string.digits, k=10)))
bank2=[]
for i in range(20):
    bank2.append(''.join(random.sample(string.digits, k=10)))
bank3=[]
for i in range(20):
    bank3.append(''.join(random.sample(string.digits, k=6)))

bank = [i + j + k for i, j, k in zip(bank1, bank2, bank3)] 

BankAccount_Driver = bank[0:20]

# Creating DataFrame 
Drivers = pd.DataFrame(data = {'EmployeeID':DriverID, 'FirstName':FirstName_Driver, 'LastName':LastName_Driver, 'BirthDate':BirthDate_Driver,'EducationalBackground':['Wykształcenie średnie']*20,'JobTitle':['Kierowca']*20, 'HireDate':HireDate_Driver, 'Salary':Salary_Driver, 'DefaultVehicle':DefaultVehicle, 'DrivingLicence':DrivingLicence, 'PhoneNumber':PhoneNumber_Driver, 'Country':Country_Driver,'City':City_Driver,'BankAccountNumber':BankAccount_Driver})
Drivers

Unnamed: 0,EmployeeID,FirstName,LastName,BirthDate,EducationalBackground,JobTitle,HireDate,Salary,DefaultVehicle,DrivingLicence,PhoneNumber,Country,City,BankAccountNumber
0,15753,Kamil,Kalinowski,1976-10-23,Wykształcenie średnie,Kierowca,2014-04-06,3400,16,C1 + E,707001401,Polska,Wrocław,17263495808435719026407829
1,13179,Tomasz,Makowski,1969-04-13,Wykształcenie średnie,Kierowca,2017-03-23,2800,18,C1 + E,734886978,Polska,Wrocław,78659321402796815430501987
2,11031,Krzysztof,Laskowski,1993-10-15,Wykształcenie średnie,Kierowca,2018-08-22,2000,5,C1 + E,765693138,Polska,Wrocław,86301247956037941528214583
3,10779,Sławomir,Maciejewski,1978-08-26,Wykształcenie średnie,Kierowca,2012-02-09,3600,20,C + E,749069814,Polska,Bielany Wrocławskie,94305817629725106834397865
4,12721,Mariusz,Andrzejewski,1981-04-01,Wykształcenie średnie,Kierowca,2019-09-20,3600,2,C1 + E,759411882,Polska,Wrocław,24089731652071938654107469
5,18665,Zbigniew,Sikorski,1994-12-30,Wykształcenie średnie,Kierowca,2016-05-26,2800,12,C1 + E,734526906,Polska,Siechnice,09324768519610475823287903
6,19608,Artur,Sikora,1974-04-13,Wykształcenie średnie,Kierowca,2017-10-17,2600,4,C + E,785337067,Polska,Wrocław,06574128396950378124039187
7,15900,Tomasz,Wójcik,1975-02-28,Wykształcenie średnie,Kierowca,2012-05-05,2000,6,C1 + E,791358271,Polska,Żórawina,43718069523198756042826593
8,17098,Wiesław,Brzeziński,1989-06-22,Wykształcenie średnie,Kierowca,2020-05-30,3400,19,C + E,733766754,Polska,Wrocław,71082956343218475690618905
9,15834,Mieczysław,Jakubowski,1969-03-09,Wykształcenie średnie,Kierowca,2016-10-06,2000,3,C1 + E,728845769,Polska,Siechnice,01794358269208641357172845


In [25]:
# Generation of administration employees data
###################################################################################################

# ID
EmployeeID = WorkerID[90:110]

# First name and last name
FirstName_Administration = Imiona_M[60:65] + Imiona_D[30:45]   
LastName_Administration = Nazwiska_M[60:65] + Nazwiska_D[30:45]

# Date of birth
start_date = datetime.date(1965, 1, 1) 
end_date = datetime.date(1995, 1, 1)
BirthDate_Administration= []
i=0
while i < 20:
    time_between_dates = end_date - start_date
    days_between_dates = time_between_dates.days
    random_number_of_days = random.randrange(days_between_dates)
    random_date = start_date + datetime.timedelta(days=random_number_of_days)
    BirthDate_Administration.append(random_date)
    i = i + 1
    
# Educational Background
pula_poziomu_wyksztalcenia = ["Wykształcenie średnie", "Wykształcenie wyższe"]
EducationalBackground = random.choices(pula_poziomu_wyksztalcenia, k = 20)

# Job title
stanowiska = ["Księgowy"]*5 + ["Główny księgowy"] + ["Pracownik sekretariatu"]*2 + ["Członek zarządu"]*2 + ["Archiwista"] + ["Asystent ds. transportu"]*3 + ["Informatyk"]
JobTitle = random.sample(stanowiska, k = 15)
JobTitle = JobTitle + ["Księgowy","Pracownik sekretariatu","Członek zarządu","Manager ds. transportu","Informatyk"]

# Date of hiring
start_date = datetime.date(2012, 1, 1) 
end_date = datetime.date(2020, 6, 6)
HireDate_Administration = []
i=0
# Assuming that few of them worked from the very beginning
while i < 15:
    time_between_dates = end_date - start_date
    days_between_dates = time_between_dates.days
    random_number_of_days = random.randrange(days_between_dates)
    random_date = start_date + datetime.timedelta(days=random_number_of_days)
    HireDate_Administration.append(random_date)
    i = i + 1
    
i = 0
while i < 5:
    HireDate_Administration.append(datetime.date(2012, 1, 1))
    i = i + 1
    
# Salary
Salary_Administration = [0]*20
for i in range(20):
    if JobTitle[i] == "Księgowy":
        Salary_Administration[i] = 4000
    elif JobTitle[i] == "Pracownik sekretariatu":
        Salary_Administration[i] = 3500
    elif JobTitle[i] == "Członek zarządu":
        Salary_Administration[i] = 10000
    elif JobTitle[i] == "Archiwista":
        Salary_Administration[i] = 3000
    elif JobTitle[i] == "Manager ds. transportu":
        Salary_Administration[i] = 6000
    elif JobTitle[i] == "Asystent ds. transportu":
        Salary_Administration[i] = 4000
    elif JobTitle[i] == "Informatyk":
        Salary_Administration[i] = 7000
    elif JobTitle[i] == "Główny księgowy":
        Salary_Administration[i] = 7000
        
# Phone number
PhoneNumber_Administration = PhoneNumber[90:110]

# Country
Country_Administration = ['Polska' for i in range(20)]

# City
City_Administration = random.choices(pula_miast, k = 20)

# Bank account
bank4=[]
for i in range(20):
    bank4.append(''.join(random.sample(string.digits, k=10)))
bank5=[]
for i in range(20):
    bank5.append(''.join(random.sample(string.digits, k=10)))
bank6=[]
for i in range(20):
    bank6.append(''.join(random.sample(string.digits, k=6)))

bank_administration= [i + j + k for i, j, k in zip(bank4, bank5, bank6)] 

for i in range(20): 
    bank_administration[i] = bank_administration[i] 

# DataFrame
Administration = pd.DataFrame(data={'EmployeeID':EmployeeID,'FirstName':FirstName_Administration,'LastName':LastName_Administration,'BirthDate':BirthDate_Administration,'EducationalBackground':EducationalBackground,'JobTitle':JobTitle,'HireDate':HireDate_Administration,'Salary':Salary_Administration,'DefaultVehicle':['NULL']*20, 'DrivingLicence':['NULL']*20,'PhoneNumber':PhoneNumber_Administration,'Country':Country_Administration,'City':City_Administration,'BankAccountNumber':bank_administration})
Administration

Unnamed: 0,EmployeeID,FirstName,LastName,BirthDate,EducationalBackground,JobTitle,HireDate,Salary,DefaultVehicle,DrivingLicence,PhoneNumber,Country,City,BankAccountNumber
0,11510,Rafał,Gajewski,1972-12-15,Wykształcenie średnie,Asystent ds. transportu,2015-01-11,4000,,,727845569,Polska,Chrzanów,53870926413246709581769342
1,17001,Tomasz,Szymański,1968-06-26,Wykształcenie średnie,Księgowy,2019-04-21,4000,,,708441689,Polska,Wrocław,78201359642456910873382956
2,14974,Tadeusz,Michalski,1991-10-27,Wykształcenie wyższe,Księgowy,2015-09-26,4000,,,723984797,Polska,Wrocław,14865723097615034289057986
3,16734,Andrzej,Kozłowski,1983-04-08,Wykształcenie wyższe,Członek zarządu,2018-06-23,10000,,,757291458,Polska,Bielany Wrocławskie,45917236808042593761341907
4,17406,Dariusz,Wasilewski,1978-03-15,Wykształcenie wyższe,Asystent ds. transportu,2014-08-13,4000,,,799079815,Polska,Chrzanów,95381720462675841930764239
5,18618,Natalia,Sawicka,1975-10-22,Wykształcenie średnie,Księgowy,2013-05-13,4000,,,717383477,Polska,Wrocław,43782950617234156809816453
6,19784,Magdalena,Czerwińska,1973-04-02,Wykształcenie wyższe,Księgowy,2016-03-03,4000,,,746209242,Polska,Wrocław,23507618495486921073198320
7,18093,Monika,Gajewska,1992-12-10,Wykształcenie wyższe,Księgowy,2014-02-11,4000,,,755511102,Polska,Wrocław,67420951382187645093640382
8,18908,Genowefa,Sobczak,1985-11-08,Wykształcenie wyższe,Pracownik sekretariatu,2014-02-15,3500,,,772154431,Polska,Wrocław,37908651420794631528218057
9,18075,Marta,Michalska,1984-01-17,Wykształcenie średnie,Główny księgowy,2013-12-07,7000,,,797819563,Polska,Chrzanów,32451987609618572340154207


In [26]:
# Connecting two dataframes to create one Employees table
###################################################################################################

Employees = pd.concat([Drivers, Administration])
Employees

Unnamed: 0,EmployeeID,FirstName,LastName,BirthDate,EducationalBackground,JobTitle,HireDate,Salary,DefaultVehicle,DrivingLicence,PhoneNumber,Country,City,BankAccountNumber
0,15753,Kamil,Kalinowski,1976-10-23,Wykształcenie średnie,Kierowca,2014-04-06,3400,16.0,C1 + E,707001401,Polska,Wrocław,17263495808435719026407829
1,13179,Tomasz,Makowski,1969-04-13,Wykształcenie średnie,Kierowca,2017-03-23,2800,18.0,C1 + E,734886978,Polska,Wrocław,78659321402796815430501987
2,11031,Krzysztof,Laskowski,1993-10-15,Wykształcenie średnie,Kierowca,2018-08-22,2000,5.0,C1 + E,765693138,Polska,Wrocław,86301247956037941528214583
3,10779,Sławomir,Maciejewski,1978-08-26,Wykształcenie średnie,Kierowca,2012-02-09,3600,20.0,C + E,749069814,Polska,Bielany Wrocławskie,94305817629725106834397865
4,12721,Mariusz,Andrzejewski,1981-04-01,Wykształcenie średnie,Kierowca,2019-09-20,3600,2.0,C1 + E,759411882,Polska,Wrocław,24089731652071938654107469
5,18665,Zbigniew,Sikorski,1994-12-30,Wykształcenie średnie,Kierowca,2016-05-26,2800,12.0,C1 + E,734526906,Polska,Siechnice,09324768519610475823287903
6,19608,Artur,Sikora,1974-04-13,Wykształcenie średnie,Kierowca,2017-10-17,2600,4.0,C + E,785337067,Polska,Wrocław,06574128396950378124039187
7,15900,Tomasz,Wójcik,1975-02-28,Wykształcenie średnie,Kierowca,2012-05-05,2000,6.0,C1 + E,791358271,Polska,Żórawina,43718069523198756042826593
8,17098,Wiesław,Brzeziński,1989-06-22,Wykształcenie średnie,Kierowca,2020-05-30,3400,19.0,C + E,733766754,Polska,Wrocław,71082956343218475690618905
9,15834,Mieczysław,Jakubowski,1969-03-09,Wykształcenie średnie,Kierowca,2016-10-06,2000,3.0,C1 + E,728845769,Polska,Siechnice,01794358269208641357172845


## Generation of the Clients table

This table contains information about clients, such as: client ID, first name, last name, phone number and their discount. The regular customers are marked by discount values lower than $1$.

In [27]:
# ID
ClientID = WorkerID[110:186]

# First name and last name
FirstName_Client = Imiona_M[65:93] + Imiona_D[45:93]   
LastName_Client = Nazwiska_M[65:93] + Nazwiska_D[45:93]

# Phone number
PhoneNumber_Client = PhoneNumber[110:186]

# Discount
pula_discount = [1, 1, 1, 1, 1, 0.9, 0.8]
Discount = random.choices(pula_discount, k = 76)

# DataFrame
Clients = pd.DataFrame(data={'ClientID':ClientID,'FirstName':FirstName_Client,'LastName':LastName_Client,'PhoneNumber':PhoneNumber_Client,'Discount':Discount})
Clients

Unnamed: 0,ClientID,FirstName,LastName,PhoneNumber,Discount
0,11502,Jerzy,Wojciechowski,739887978,1.0
1,16160,Zdzisław,Sokołowski,716283257,1.0
2,19189,Adam,Mazur,766093218,1.0
3,19722,Dariusz,Jasiński,770774155,1.0
4,18347,Piotr,Sawicki,741648330,0.9
...,...,...,...,...,...
71,13373,Danuta,Malinowska,706901381,1.0
72,13823,Marianna,Jabłońska,772694539,1.0
73,11135,Karolina,Włodarczyk,758531706,1.0
74,15043,Renata,Wróblewska,714542909,1.0


## Generation of the Vehicles table

This table contains information about the vehicles owned by the company, such as: vehicle ID, make of the vehicle, model of the vehicle, licence plate number, capacity, last service date and insurance number.

In [28]:
# VehicleID
VehicleID = DefaultVehicle

# Vehicle Make
pula_tirow = ["Mercedes-Benz", "MAN", "Volvo"]
Vehicle_Make = random.choices(pula_tirow, k=20)
pula_mercedes = ["Actros 1851 LS 4x2", "Actros 1845 LS 4x2", "Actros 1848 LSnRL", "Actros 1851 MirrorCam"]
pula_volvo = ["FH 500", "FH 500 4x2 Low Liner", "FH 460 4x2"]
pula_man = ["TGX 26.440 6X2 BLS", "TGX 18.500 LLS-U", "TGX 18.500 4X2 LLS-U", "TGX 18.560 4X2 BLS"]

# Vehicle Model
Vehicle_Model = []
for i in range(20):
    if Vehicle_Make[i] == "Mercedes-Benz":
        Vehicle_Model.append(random.choice(pula_mercedes))
    elif Vehicle_Make[i] == "MAN":
        Vehicle_Model.append(random.choice(pula_man))
    elif Vehicle_Make[i] == "Volvo":
        Vehicle_Model.append(random.choice(pula_volvo))

# Licence Plate
pula_LP1 = ["DWR", "DW"]
LP_1 = random.choices(pula_LP1, k=20)
pula_LP2 = list(np.linspace(1000,9999,600,dtype=int))
LP_2 = random.sample(pula_LP2, k=20)
pula_LP3 = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P"]
LP_3 = random.choices(pula_LP3, k=20)

Tablica = [i + str(j) + k for i, j, k in zip(LP_1, LP_2, LP_3)] 

# Capacity
pula_capacity = ["120m3", "86m3", "60m3"]
Capacity = random.choices(pula_capacity, k=20)

# Last Service Date
start_date = datetime.date(2019, 7, 1) 
end_date = datetime.date(2020, 6, 1)
Service_Date = []
i=0
while i < 20:
    time_between_dates = end_date - start_date
    days_between_dates = time_between_dates.days
    random_number_of_days = random.randrange(days_between_dates)
    random_date = start_date + datetime.timedelta(days=random_number_of_days)
    Service_Date.append(random_date)
    i = i + 1

# Insurance Number
polisa=[]
for i in range(20):
    polisa.append(''.join(random.sample(string.digits, k=10)))
    

# DataFrame
Vehicles = pd.DataFrame(data={'VehicleID':VehicleID,'VehicleMake':Vehicle_Make,'VehicleModel':Vehicle_Model,'LicencePlate':Tablica,'Capacity':Capacity,'LastServiceDate':Service_Date,'InsuranceNumber':polisa})
Vehicles

Unnamed: 0,VehicleID,VehicleMake,VehicleModel,LicencePlate,Capacity,LastServiceDate,InsuranceNumber
0,16,Mercedes-Benz,Actros 1851 MirrorCam,DW4665L,60m3,2019-11-17,1854327069
1,18,Volvo,FH 500,DWR4004L,60m3,2019-09-25,1357940826
2,5,Mercedes-Benz,Actros 1851 LS 4x2,DW1961N,60m3,2020-02-11,5629108743
3,20,Mercedes-Benz,Actros 1851 MirrorCam,DWR3553D,120m3,2019-12-26,5406182397
4,2,Volvo,FH 500 4x2 Low Liner,DWR2877D,86m3,2019-07-13,3196087452
5,12,Volvo,FH 500 4x2 Low Liner,DW1165H,86m3,2019-08-16,8502913476
6,4,Volvo,FH 500 4x2 Low Liner,DWR2171P,120m3,2020-05-23,792816534
7,6,Mercedes-Benz,Actros 1851 LS 4x2,DWR2847B,120m3,2020-05-21,8645309127
8,19,Volvo,FH 460 4x2,DW3433K,86m3,2019-12-07,3820951467
9,3,Mercedes-Benz,Actros 1851 LS 4x2,DWR7535C,120m3,2019-11-04,1492307658


## Generation of the Contractors table

In the Contractors table we can find the information about the companies we work with (buying fuel, office supplies, office cleaning, insurance). It contains information about: contractor ID, contractor name, phone number and city.

In [29]:
# ID
ContractorID = random.sample(list(np.linspace(1, 10, 10, dtype = int)), k = 4)

# Contractor Name
ContractorName = ["Kopciuszek-Pol", "ExtraPaliwex", "Papier i Pióro", "PZU"]

# Phone Number
PhoneNumber_Contractors = PhoneNumber[201:205] 

# City
City_Contractors = random.choices(pula_miast, k = 3)
City_Contractors.append("Wrocław")

# DataFrame
Contractors = pd.DataFrame(data={'ContractorID':ContractorID,'ContractorName':ContractorName,'PhoneNumber':PhoneNumber_Contractors,'City':City_Contractors})
Contractors

Unnamed: 0,ContractorID,ContractorName,PhoneNumber,City
0,2,Kopciuszek-Pol,716903381,Żórawina
1,3,ExtraPaliwex,711782357,Wrocław
2,4,Papier i Pióro,724884977,Wrocław
3,5,PZU,702640529,Wrocław


## Generation of the Commissions table

In this table all the payments for commissioned work every month are registered. Each has its own ID, name of the contractor, date of commission, maturity date and price. The price of fuel and insurance depend on the amount of cars the company posesses at the time.

In [30]:
# CommissionID
pula_id_comm = list(np.linspace(1000,2000,1001,dtype=int))
ilosc_miesiecy = 12*8+6  # number of month
ilosc_transakcji = 4*ilosc_miesiecy  # company pays every month
CommissionID = random.sample(pula_id_comm, k = ilosc_transakcji)

# ContractorID, Commission Date, Contractor Name
CommissionDate = []
ContractorName = []
ContractorID = []
date1 = datetime.date(2012, 1, 1)
date2 = datetime.date(2012, 1, 2)
for i in range(ilosc_miesiecy):
    CommissionDate.append(date1 + relativedelta(months=i))
    ContractorID.append(Contractors.loc[0]['ContractorID'])
    ContractorName.append(Contractors.loc[0]['ContractorName'])
for i in range(ilosc_miesiecy):
    CommissionDate.append(date2 + relativedelta(months=i))
    ContractorID.append(Contractors.loc[1]['ContractorID'])
    ContractorName.append(Contractors.loc[1]['ContractorName'])
for i in range(ilosc_miesiecy):
    CommissionDate.append(date2 + relativedelta(months=i))
    ContractorID.append(Contractors.loc[2]['ContractorID'])
    ContractorName.append(Contractors.loc[2]['ContractorName'])
for i in range(ilosc_miesiecy):
    CommissionDate.append(date1 + relativedelta(months=i))
    ContractorID.append(Contractors.loc[3]['ContractorID'])
    ContractorName.append(Contractors.loc[3]['ContractorName'])

# Price
num_hired_monthly = []
for i in range(101):
    num_hired_monthly.append(sum(j < CommissionDate[i+1] for j in HireDate_Driver))
num_hired_monthly.append(num_hired_monthly[-1])

CommissionPrice = [3000]*ilosc_miesiecy+ [x*8775 for x in num_hired_monthly] + [250]*ilosc_miesiecy + [x*250 for x in num_hired_monthly]

# Maturity Date
MaturityDate = [x+relativedelta(days=3) for x in CommissionDate]

# DataFrame
Commissions = pd.DataFrame(data={'CommissionID':CommissionID,'ContractorID':ContractorID,'ContractorName':ContractorName,'CommissionDate':CommissionDate,'CommissionPrice':CommissionPrice,'MaturityDate':MaturityDate})
Commissions = Commissions.sort_values(by = 'CommissionDate')
Commissions

Unnamed: 0,CommissionID,ContractorID,ContractorName,CommissionDate,CommissionPrice,MaturityDate
0,1086,2,Kopciuszek-Pol,2012-01-01,3000,2012-01-04
306,1181,5,PZU,2012-01-01,1000,2012-01-04
204,1710,4,Papier i Pióro,2012-01-02,250,2012-01-05
102,1010,3,ExtraPaliwex,2012-01-02,35100,2012-01-05
1,1802,2,Kopciuszek-Pol,2012-02-01,3000,2012-02-04
...,...,...,...,...,...,...
202,1580,3,ExtraPaliwex,2020-05-02,175500,2020-05-05
407,1530,5,PZU,2020-06-01,5000,2020-06-04
101,1352,2,Kopciuszek-Pol,2020-06-01,3000,2020-06-04
305,1792,4,Papier i Pióro,2020-06-02,250,2020-06-05


## Generation of the Order details table

This table contains detailed information about all orders taken in the years that company was active. Each record has information about: order ID, assigned driver ID, client ID, vehicle ID, weight of the order, pickup address, delivery address, price of delivery, cost of delivery, date of order, date of pickup, expected delivery date and maturity date. 

In [31]:
OrderID = [] 
DriverID_ord = [] 
ClientID_ord = [] 
VehicleID_ord = []
OrderWeight = [] 
OrderDate = [] 
PickupDate = [] 
ExpectedDeliveryDate = [] 
PickupAddress = []
DeliveryAddress = []
Price_ord = [] 
Cost_ord = [] 
MaturityDate_ord = [] 

for i in range(20):
    num_of_days = datetime.date(2020,6,6) - HireDate_Driver[i]
    num_of_days = num_of_days.days
    j = 0
    urlop = 1
    while j < num_of_days:
        if (urlop % 3) != 0 :
            k = random.randint(2,5)
            PickupDate.append(HireDate_Driver[i] + relativedelta(days=j))
            ExpectedDeliveryDate.append(HireDate_Driver[i] + relativedelta(days=j + k))
            DriverID_ord.append(DriverID[i])
            VehicleID_ord.append(DefaultVehicle[i])
            if Capacity[i] == '120m3':
                OrderWeight.append(random.choice(['20t', '22t', '24t']))
                if OrderWeight[-1] == '20t':
                    Price_ord.append(1200*8)
                elif OrderWeight[-1] == '22t':
                    Price_ord.append(1300*8)
                elif OrderWeight[-1] == '24t':
                    Price_ord.append(1400*8)
            elif Capacity[i] ==  "86m3":
                OrderWeight.append(random.choice(['10t', '12t', '14t']))
                if OrderWeight[-1] == '10t':
                    Price_ord.append(700*8)
                elif OrderWeight[-1] == '12t':
                    Price_ord.append(800*8)
                elif OrderWeight[-1] == '14t':
                    Price_ord.append(900*8)
            elif Capacity[i] == "60m3":
                OrderWeight.append(random.choice(['5t', '7t', '8t']))
                if OrderWeight[-1] == '5t':
                    Price_ord.append(400*8)
                elif OrderWeight[-1] == '7t':
                    Price_ord.append(500*8)
                elif OrderWeight[-1] == '8t':
                    Price_ord.append(600*8)
            Cost_ord.append(int(Price_ord[-1]*0.2 + k*200))
        urlop = urlop + 1    
        j = j + k + 1  # additional day between orders
    PickupDate.pop()
    ExpectedDeliveryDate.pop()
    DriverID_ord.pop()
    VehicleID_ord.pop()
    OrderWeight.pop()
    Price_ord.pop()
    Cost_ord.pop()
    
# OrderID
OrderID =  np.linspace(1,len(DriverID_ord),len(DriverID_ord), dtype=int)

# OrderDate - assuming that packages are taken in one to three days
for i in range(len(PickupDate)):
    OrderDate.append(PickupDate[i] - relativedelta(days=random.randint(1,3)))

# MaturityDate
MaturityDate_ord = [x+relativedelta(days=7) for x in OrderDate]

# ClientID
ClientID_ord = random.choices(ClientID, k = len(PickupDate))

# Pickup and delivery address
pula_miast_2 = ['Wrocław','Wałbrzych', 'Bielany Wrocławskie','Jelenia Góra','Lublin','Głogów','Świdnica', 'Bolesławiec','Oleśnica','Oława','Świebodzice','Kłodzko','Jawor','Polkowice','Nowa Ruda', 'Złotoryja','Strzelin','Milicz','Sobótka','Lwówek Śląski','Kobierzyce','Żarów','Twardogóra','Szklarska Poręba','Głuszyca']
pula_ulic = ['Polna','Leśna','Słoneczna','Krótka','Szkolna','Ogrodowa','Lipowa','Łąkowa','Brozowa','Kwiatowa','Kościelna','Sosnowa','Zielona','Parkowa','Akacjowa','Kolejowa','Wietrzna','Krakowska','Katowicka','Zimowa','Wiosenna','Dębowa','Jastrzębia','Piaskowa','Sportowa','Tyniecka','Orla']
for i in range(len(PickupDate)):
    miasta = random.sample(pula_miast_2, k = 2)
    PickupAddress.append(miasta[0]+', ul. '+random.choice(pula_ulic)+' '+str(random.randint(1,70)))
    DeliveryAddress.append(miasta[1]+', ul. '+random.choice(pula_ulic)+' '+str(random.randint(1,70)))

# DataFrame
OrderDetails = pd.DataFrame(data = {'OrderID':OrderID, 'ClientID':ClientID_ord, 'DriverID':DriverID_ord, 'VehicleID':VehicleID_ord, 'OrderWeight':OrderWeight, 'OrderDate':OrderDate, 'PickupDate':PickupDate, 'ExpectedDeliveryDate':ExpectedDeliveryDate, 'PickupAddress':PickupAddress, 'DeliveryAddress':DeliveryAddress, 'Price':Price_ord, 'Cost':Cost_ord, 'MaturityDate':MaturityDate_ord})
OrderDetails = OrderDetails.sort_values(by = 'OrderDate')
OrderDetails


Unnamed: 0,OrderID,ClientID,DriverID,VehicleID,OrderWeight,OrderDate,PickupDate,ExpectedDeliveryDate,PickupAddress,DeliveryAddress,Price,Cost,MaturityDate
3707,3708,13265,16775,14,20t,2011-12-29,2012-01-01,2012-01-04,"Twardogóra, ul. Orla 39","Strzelin, ul. Wietrzna 14",9600,2520,2012-01-05
5074,5075,14434,16535,17,12t,2011-12-30,2012-01-01,2012-01-03,"Nowa Ruda, ul. Kolejowa 1","Oława, ul. Zimowa 47",6400,1680,2012-01-06
4160,4161,18347,15741,1,5t,2011-12-30,2012-01-01,2012-01-04,"Jelenia Góra, ul. Orla 62","Kłodzko, ul. Sportowa 8",3200,1240,2012-01-06
4617,4618,18686,10439,11,12t,2011-12-31,2012-01-01,2012-01-04,"Żarów, ul. Akacjowa 69","Nowa Ruda, ul. Parkowa 7",6400,1880,2012-01-07
4161,4162,17950,15741,1,8t,2012-01-02,2012-01-05,2012-01-10,"Głuszyca, ul. Tyniecka 54","Oława, ul. Kwiatowa 2",4800,1960,2012-01-09
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1885,1886,15763,15900,6,24t,2020-05-28,2020-05-30,2020-06-04,"Polkowice, ul. Wiosenna 42","Głogów, ul. Kwiatowa 5",11200,3240,2020-06-04
1092,1093,15018,12721,2,12t,2020-05-28,2020-05-30,2020-06-01,"Jelenia Góra, ul. Kolejowa 39","Milicz, ul. Kolejowa 18",6400,1680,2020-06-04
2504,2505,13489,14563,8,7t,2020-05-30,2020-05-31,2020-06-04,"Głuszyca, ul. Wietrzna 31","Świebodzice, ul. Parkowa 61",4000,1600,2020-06-06
4616,4617,16667,15741,1,5t,2020-05-31,2020-06-02,2020-06-04,"Złotoryja, ul. Polna 10","Żarów, ul. Kwiatowa 21",3200,1040,2020-06-07


## Generation of the Orders table

This table contains shortened information about orders along with corresponding transaction ID.

In [32]:
# TrancactionID
INTransactionID = OrderID

# Dataframe
Orders = pd.DataFrame(data={'OrderID':OrderID,'VehicleID':VehicleID_ord, 'DriverID':DriverID_ord, 'ClientID':ClientID_ord, 'INTransactionID':INTransactionID})
Orders

Unnamed: 0,OrderID,VehicleID,DriverID,ClientID,INTransactionID
0,1,16,15753,13373,1
1,2,16,15753,19189,2
2,3,16,15753,16736,3
3,4,16,15753,14248,4
4,5,16,15753,12725,5
...,...,...,...,...,...
5524,5525,17,16535,19882,5525
5525,5526,17,16535,11502,5526
5526,5527,17,16535,16434,5527
5527,5528,17,16535,14434,5528


## Generation of the Incoming transactions table

This table containg information about the transactions corresponding to the payments for orders, such as: transaction ID, type of transaction, order ID, transactor ID (client ID), the amount paid and the date of transaction.

In [33]:
# TransactionType
TransactionType = ['Opłata za przewóz']*len(OrderID)

# DataFrame
Incoming_Transactions = pd.DataFrame(data={'INTransactionID':INTransactionID,'TransactionType':TransactionType,'OrderID':OrderID,'TransactorID':ClientID_ord,'Amount':Price_ord,'TransactionDate':MaturityDate_ord})
Incoming_Transactions = Incoming_Transactions.sort_values(by='TransactionDate')
Incoming_Transactions

Unnamed: 0,INTransactionID,TransactionType,OrderID,TransactorID,Amount,TransactionDate
3707,3708,Opłata za przewóz,3708,13265,9600,2012-01-05
5074,5075,Opłata za przewóz,5075,14434,6400,2012-01-06
4160,4161,Opłata za przewóz,4161,18347,3200,2012-01-06
4617,4618,Opłata za przewóz,4618,18686,6400,2012-01-07
4161,4162,Opłata za przewóz,4162,17950,4800,2012-01-09
...,...,...,...,...,...,...
1885,1886,Opłata za przewóz,1886,15763,11200,2020-06-04
1092,1093,Opłata za przewóz,1093,15018,6400,2020-06-04
2504,2505,Opłata za przewóz,2505,13489,4000,2020-06-06
4616,4617,Opłata za przewóz,4617,16667,3200,2020-06-07


## Generation of the Transactors table

This table connects the employees and contractors, giving them unified id number for the outgoing transactions table.

In [34]:
Employee_ID = list(Employees['EmployeeID']) + ['NULL']*len(list(Contractors['ContractorID']))
Contractor_ID = ['NULL']*len(list(Employees['EmployeeID'])) + list(Contractors['ContractorID'])
TransactorID = np.linspace(1, len(Employee_ID), len(Employee_ID), dtype=int)

Out_Transactors = pd.DataFrame(data={'TransactorID':TransactorID, 'EmployeeID':Employee_ID, 'ContractorID':Contractor_ID})
Out_Transactors

Unnamed: 0,TransactorID,EmployeeID,ContractorID
0,1,15753.0,
1,2,13179.0,
2,3,11031.0,
3,4,10779.0,
4,5,12721.0,
5,6,18665.0,
6,7,19608.0,
7,8,15900.0,
8,9,17098.0,
9,10,15834.0,


## Generation of the Outgoing transactions table

The table below contain information about the outgoing transactions to the employees and to the contractors. The columns in these tables are: transaction ID, type of transaction, transactor ID , amount paid and the date of transaction.

In [35]:
# Outgoing transactions - commissions
#########################################################################

TransactionType = ["Zlecenie"]*306 + ["Opłata za ubezpieczenie"]*102
Transactor_out = ContractorID
Amount = CommissionPrice
TransactionDate_out = MaturityDate

# DataFrame
Outgoing_Transactions_commissions_temp = pd.DataFrame(data={'TransactionType':TransactionType,'ContractorID':Transactor_out,'Amount':Amount,'TransactionDate':TransactionDate_out})
Outgoing_Transactions_commissions = Outgoing_Transactions_commissions_temp.merge(Out_Transactors[['ContractorID', 'TransactorID']], on='ContractorID')
del Outgoing_Transactions_commissions['ContractorID']
Outgoing_Transactions_commissions

Unnamed: 0,TransactionType,Amount,TransactionDate,TransactorID
0,Zlecenie,3000,2012-01-04,41
1,Zlecenie,3000,2012-02-04,41
2,Zlecenie,3000,2012-03-04,41
3,Zlecenie,3000,2012-04-04,41
4,Zlecenie,3000,2012-05-04,41
...,...,...,...,...
403,Opłata za ubezpieczenie,4500,2020-02-04,44
404,Opłata za ubezpieczenie,4750,2020-03-04,44
405,Opłata za ubezpieczenie,4750,2020-04-04,44
406,Opłata za ubezpieczenie,5000,2020-05-04,44


In [36]:
# Outgoing transactions - employees
####################################################################

TransactionType = []
Transactor_out = []
Amount = []
TransactionDate_out = []

# Salary
for i in range(20):
    
    # Drivers
    date = HireDate_Driver[i]
    TransactionDate_out.append(date)
    TransactionType.append("Wypłata pensji")
    Transactor_out.append(DriverID[i])
    Amount.append(Salary_Driver[i])
    
    while date < datetime.date(2020,6,6):
        TransactionDate_out.append(date + relativedelta(months=1))
        date = date + relativedelta(months=1)
        TransactionType.append("Wypłata pensji")
        Transactor_out.append(DriverID[i])
        Amount.append(Salary_Driver[i])
    TransactionDate_out.pop()
    TransactionType.pop()
    Amount.pop()
    Transactor_out.pop()
    
    # Office workers
    date = HireDate_Administration[i]
    TransactionDate_out.append(date)
    TransactionType.append("Wypłata pensji")
    Transactor_out.append(EmployeeID[i])
    Amount.append(Salary_Administration[i])
    
    while date < datetime.date(2020,6,6):
        TransactionDate_out.append(date + relativedelta(months=1))
        date = date + relativedelta(months=1)
        TransactionType.append("Wypłata pensji")
        Transactor_out.append(EmployeeID[i])
        Amount.append(Salary_Administration[i])
    TransactionDate_out.pop()
    TransactionType.pop()
    Amount.pop()
    Transactor_out.pop()
    

# DataFrame
Outgoing_Transactions_employees_temp = pd.DataFrame(data={'TransactionType':TransactionType,'EmployeeID':Transactor_out,'Amount':Amount,'TransactionDate':TransactionDate_out})
Outgoing_Transactions_employees = Outgoing_Transactions_employees_temp.merge(Out_Transactors[['EmployeeID', 'TransactorID']], on='EmployeeID')
del Outgoing_Transactions_employees['EmployeeID']
Outgoing_Transactions_employees

Unnamed: 0,TransactionType,Amount,TransactionDate,TransactorID
0,Wypłata pensji,3400,2014-04-06,1
1,Wypłata pensji,3400,2014-05-06,1
2,Wypłata pensji,3400,2014-06-06,1
3,Wypłata pensji,3400,2014-07-06,1
4,Wypłata pensji,3400,2014-08-06,1
...,...,...,...,...
2488,Wypłata pensji,7000,2020-02-01,40
2489,Wypłata pensji,7000,2020-03-01,40
2490,Wypłata pensji,7000,2020-04-01,40
2491,Wypłata pensji,7000,2020-05-01,40


In [37]:
Outgoing_Transactions = pd.concat([Outgoing_Transactions_commissions, Outgoing_Transactions_employees])
Outgoing_Transactions = Outgoing_Transactions.sort_values(by='TransactionDate')
Outgoing_Transactions['OUTTransactionID'] = np.linspace(1, len(list(Outgoing_Transactions['Amount'])), len(list(Outgoing_Transactions['Amount'])), dtype=int)
Outgoing_Transactions = Outgoing_Transactions[['OUTTransactionID', 'TransactionType', 'TransactorID', 'Amount', 'TransactionDate']]
Outgoing_Transactions

Unnamed: 0,OUTTransactionID,TransactionType,TransactorID,Amount,TransactionDate
2391,1,Wypłata pensji,40,7000,2012-01-01
1575,2,Wypłata pensji,36,4000,2012-01-01
1677,3,Wypłata pensji,17,2600,2012-01-01
2289,4,Wypłata pensji,20,2800,2012-01-01
2187,5,Wypłata pensji,39,6000,2012-01-01
...,...,...,...,...,...
789,2897,Wypłata pensji,8,2000,2020-06-05
305,2898,Zlecenie,43,250,2020-06-05
1491,2899,Wypłata pensji,35,10000,2020-06-05
1160,2900,Wypłata pensji,11,2200,2020-06-05


<br>

# Inserting data into the server

In [38]:
import mysql.connector

In [39]:
conn = mysql.connector.connect(host = "host",
                                    user = "user",
                                    password = "password",
                                   database = "company")

In [40]:
cursor = conn.cursor(buffered=True,dictionary=True)

In [41]:
for i,row in Clients.iterrows():
    sql = "INSERT INTO Clients VALUES (" + str(row[0])+',' + "'"+str(row[1])+"'"+ ','+"'"+str(row[2])+"'"+','+str(row[3])+ ','+str(row[4]) + ");"
    cursor.execute(sql)
    conn.commit()

In [42]:
for i,row in Vehicles.iterrows():
    sql = "INSERT INTO Vehicles VALUES (" + str(row[0])+',' + "'"+str(row[1])+"'"+ ','+"'"+str(row[2])+"'"+','+ "'"+str(row[3])+"'"+ ','+"'"+str(row[4])+"'"+ ','+"'"+str(row[5])+"'"+ ','+ "'"+str(row[6])+"'"+ ");"
    cursor.execute(sql)
    conn.commit()

In [43]:
for i,row in Employees.iterrows():
    if row[9] == 'NULL':
        sql = "INSERT INTO Employees VALUES (" + str(row[0])+',' + "'"+str(row[1])+"'"+ ','+"'"+str(row[2])+"'"+','+ "'"+str(row[3])+"'"+ ',' + "'" + str(row[4]) + "'" + ',' + "'" + str(row[5]) + "'" + ',' +"'"+str(row[6])+"'"+ ','+ str(row[7])+ ','+str(row[8])+ ','+ str(row[9])+ ','+str(row[10])+ ','+"'"+str(row[11])+"'"+ ','+"'"+str(row[12])+"'"+ ','+"'"+str(row[13])+"'" + ");"
    else:
        sql = "INSERT INTO Employees VALUES (" + str(row[0])+',' + "'"+str(row[1])+"'"+ ','+"'"+str(row[2])+"'"+','+ "'"+str(row[3])+"'"+ ',' + "'" + str(row[4]) + "'" + ',' + "'" + str(row[5]) + "'" + ',' +"'"+str(row[6])+"'"+ ','+ str(row[7])+ ','+str(row[8])+ ','+"'"+str(row[9])+"'"+ ','+str(row[10])+ ','+"'"+str(row[11])+"'"+ ','+"'"+str(row[12])+"'"+ ','+"'"+str(row[13])+"'" + ");"
    cursor.execute(sql)
    conn.commit()

In [44]:
for i,row in Contractors.iterrows():
    sql = "INSERT INTO Contractors VALUES (" + str(row[0])+',' + "'"+str(row[1])+"'"+ ','+str(row[2])+','+"'"+str(row[3])+ "'" + ");"
    cursor.execute(sql)
    conn.commit()

In [45]:
for i,row in Commissions.iterrows():
    sql = "INSERT INTO Commissions VALUES (" + str(row[0])+',' +str(row[1])+ ','+"'"+str(row[2])+"'"+','+ "'"+str(row[3])+"'"+ ','+str(row[4])+ ','+"'"+str(row[5])+"'"+ ");"
    cursor.execute(sql)
    conn.commit()

In [46]:
for i,row in OrderDetails.iterrows():
    sql = "INSERT INTO Order_Details VALUES (" + str(row[0])+',' +str(row[1])+ ','+str(row[2])+','+ str(row[3])+ ','+"'"+str(row[4])+"'"+ ','+"'"+str(row[5])+ "'"+','+"'"+str(row[6])+"'"+ ','+"'"+str(row[7])+"'"+ ','+"'"+str(row[8])+"'"+ ','+"'"+str(row[9])+"'"+ ','+str(row[10])+ ','+str(row[11])+ ','+"'"+str(row[12])+"'" + ");"
    cursor.execute(sql)
    conn.commit()

In [47]:
for i,row in Incoming_Transactions.iterrows():
    sql = "INSERT INTO Incoming_Transactions VALUES (" + str(row[0])+',' +"'"+str(row[1])+"'"+ ','+str(row[2])+','+ "'"+str(row[3])+"'"+ ','+str(row[4])+ ','+"'"+str(row[5])+"'"+ ");"
    cursor.execute(sql)
    conn.commit()

In [48]:
for i,row in Out_Transactors.iterrows():
    sql = "INSERT INTO Out_Transactors VALUES (" + str(row[0])+',' +str(row[1])+ ','+ str(row[2])+ ");"
    cursor.execute(sql)
    conn.commit()

In [49]:
for i,row in Outgoing_Transactions.iterrows():
    sql = "INSERT INTO Outgoing_Transactions VALUES (" + str(row[0])+',' +"'"+str(row[1])+"'"+ ','+ str(row[2])+ ','+str(row[3])+ ','+"'"+str(row[4])+"'"+ ");"
    cursor.execute(sql)
    conn.commit()

In [50]:
for i,row in Orders.iterrows():
    sql = "INSERT INTO Orders VALUES (" + str(row[0])+',' +str(row[1])+ ','+str(row[2])+','+ str(row[3])+ ','+str(row[4])+ ");"
    cursor.execute(sql)
    conn.commit()