# Building a Corporate Valuation Model with Python Pt. I: Build the Database

I will build a corporate valuation model identical to the methods given in [Brigham & Houston ("Fundamentals of Financial Management", 6th ed., 2009, South Western Cengage Learning, pp. 288 and pp. 306)](https://www.valorebooks.com/textbooks/fundamentals-of-financial-management-concise-edition-with-thomson-one-business-school-edition-6th-edition/9780324664553). Additionally, one can check the [this source](https://corporatefinanceinstitute.com/resources/knowledge/modeling/dcf-model-training-free-guide/), too, for further insights on the topic of corporate valuation.
   

## Set up PostgreSQL Database to store Financial Information

To set up the database we will use **sqlalchemy**, **PostgreSQL** and **Psycopg2** as driver and build the model similar to the approach given in [this article](https://www.pythonforfinance.net/2020/10/24/build-a-financial-data-database-with-python/). We will create several tables in order to store the financial statement data from yahoo finance. With each run we can append the most recent data and so build up a historical database which is usually not available. Yahoo Finance usually just offers financial statement data for the last 4 financial years. If we establish a database we can store the data values and so - going forward - extend the data beyond just 4 financial years. As we will see this will require a workaround with several SQL statements and tables. 

Specifically, we will build 'backup' temporary tables into which we simply insert the data with each new run and then store the new data - as identified with the id-column and the date-column - in the standard tables. So each statement table is identified by a **combined Primary Key (id, date)**. Note that after the comparison of the new data in the temporary tables with the given data in the standard tables we might have some duplicate values in the **id column** of each standard table, which would usually not be allowed in the case of single Primary Keys - however, in combination with each **date** each row will be uniquely identified. 

Thus, the `insert_new_data()` function will check whether (id, date)-value pairs are already existent in the standard table and insert only those rows from the temporary table which are not yet existent, given the respective (id, date)-value pair.

We will first add the necessary packages and setup an engine object as 'medium' for database communication with Postgres. Then, we will define the **standard tables**. The **temporary tables** will be setup identically to the standard tables. Hence, we will have four standard tables + four temporary tables in the following structure:

   - Table "company", with `shortName`, `symbol`, `sector`, `industry` and `currency` as attributes (identical to the yfinance API keys)
   - Table "balanceSheet", with  **primary key `(id, date)`** and **foreign key `company_id`**
   - Table "incomeStatement", with **primary key `(id, date)`** and **foreign key `company_id`**
   - Table "cashflowStatement", with **primary key `(id, date)`** and **foreign key `company_id`**

I use **PgAdmin 4** as a (visual) database management tool.

In [90]:
#---- DATABASE MANAGEMENT TOOLS --------------#
from sqlalchemy import create_engine
import psycopg2
import psycopg2.extras as extras

#---- DATA MANIPULATION TOOLS ----------------#
import yfinance as yf
import pandas_datareader as dr
import numpy as np
import pandas as pd

#---- OWN MODULE IMPORTS --------------------#
import ValuationModel.pw
from ValuationModel.assist_functions import swap_columns
from ValuationModel.assist_functions import execute_values

In [91]:
# Set necessary url variables for the sqlalchemy create_engine() method.
user='svenst89' # or default user 'postgres'
password=ValuationModel.pw.password # edit the password if you switch to the default user 'postgres'; I setup different passwords.
host='localhost'
port='5433'
database='fundamentalsdb'

### Connect to the Database & Build the Basic Database Structure

In order to connect with any Database management system, it is essential to create an engine object, that serves as a central source of connection by providing a connection pool that manages the database connections. This SQLAlchemy engine is a global object which can be created and configured once and use the same engine object multiple times for different operations.

The first step in establishing a connection with the PostgreSQL database is creating an engine object using the create_engine() function of SQLAlchemy.

In [94]:
%%timeit -r 1 -n 1
# Create an engine object as medium for database exchange with PostgreSQL
def run_engine():
    return create_engine(f"postgresql://{user}:{password}@{host}:{port}/{database}")
if __name__=='__main__':
    try:
        engine=run_engine()
        print(f"You have successfully created an engine object for your Postgres DB at {host} for user {user}!")
    except Exception as ex:
        print("Sorry your engine has not been created. Some exception has occurred. Please check and try it again!\n", ex)

You have successfully created an engine object for your Postgres DB at localhost for user svenst89!
1.07 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [95]:
# Function to instantiate a database connection
def connect():
    """ create a database connection to the Postgres database and a cursor object to control queries
    :return: Connection and a cursor object or None
    """
    try:
        conn = psycopg2.connect(
            database=database,
            user=user,
            password=password,
            host=host,
            port=port)
        cur = conn.cursor()
        print(f"Successfully created a connection with the Postgres Database {database} at host {host} for user {user}!")
    except (Exception, psycopg2.DatabaseError) as error:
        print ("Error while creating PostgreSQL database connection", error)
    
    return conn, cur

In [96]:
%%timeit -r 1 -n 1
# With PostgreSQL I deleted the foreign keys in the company table of the statements, as the tables are sequentially built, which means that errors will be
# thrown if the statement tables are not yet created but we try to reference them as a foreign key in the company table!
def create_tables():
    """ create tables in the PostgreSQL database"""
    commands = (
        """
        CREATE TABLE IF NOT EXISTS company (
                id BIGINT PRIMARY KEY,
                shortName TEXT NOT NULL,
                symbol VARCHAR(100) NOT NULL,
                industry VARCHAR(100) NOT NULL,
                sector VARCHAR(100) NOT NULL,
                currency VARCHAR(100) NOT NULL,
                bs_id INTEGER,
                is_id INTEGER,
                cs_id INTEGER
        )
        """,
        """ CREATE TABLE IF NOT EXISTS balancesheet (
                id BIGINT,
                shortName TEXT NOT NULL,
                date TIMESTAMP without TIME ZONE NOT NULL,
                item TEXT NOT NULL,
                value INTEGER,
                company_id BIGINT,
                PRIMARY KEY (id, date),
                FOREIGN KEY (company_id) REFERENCES company (id)
        )
        """,
        """
        CREATE TABLE IF NOT EXISTS incomestatement (
                id BIGINT,
                shortName TEXT NOT NULL,
                date TIMESTAMP without TIME ZONE NOT NULL,
                item TEXT NOT NULL,
                value INTEGER,
                company_id BIGINT,
                PRIMARY KEY (id, date),
                FOREIGN KEY (company_id) REFERENCES company (id)
        )
        """,
        """
        CREATE TABLE IF NOT EXISTS cashflowstatement (
                id BIGINT,
                shortName TEXT NOT NULL,
                date TIMESTAMP without TIME ZONE NOT NULL,
                item TEXT NOT NULL,
                value INTEGER,
                company_id BIGINT,
                PRIMARY KEY (id, date),
                FOREIGN KEY (company_id) REFERENCES company (id)
        )
        """
        )
    conn = None
    try:
        # connect to the PostgreSQL server
        conn, cur = connect()
        # create table one by one
        for command in commands:
            cur.execute(command)
        # close communication with the PostgreSQL database server
        cur.close()
        # commit the changes
        conn.commit()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
  
  
if __name__ == '__main__':
    create_tables()

Successfully created a connection with the Postgres Database fundamentalsdb at host localhost for user svenst89!
158 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


## Get Company Data & Store it into the Database

In [97]:
%%timeit -r 1 -n 1
# Get the Dax Stock List (40 stocks): https://www.banken-auskunft.de/blog/dax-index-ticker-symbole/
# I added the ".DE" to each Ticker where necessary as Yahoo Finance requires this extension to the usual Tickers for German stocks
tickers=pd.read_excel('data/dax_40_stocks.xlsx', sheet_name='Tabelle1')[['Ticker']]

# Make a ticker list to iterate through with a for loop to fetch company information data
ticker_list=tickers['Ticker'].tolist()

# Make a dictionary to transform it into a dataframe later
dict_list=[]

for ticker in ticker_list:
    # create a ticker object, that will deliver a dictionary of infos
    info_dict=yf.Ticker(ticker).info
    # Append the dictionary with the company data
    frame=dict_list.append(info_dict)

df = pd.DataFrame(dict_list)

5min 8s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [98]:
# Now make a sub-dataframe containing your respective indicators you want
company_data=df[['shortName', 'symbol', 'industry', 'sector', 'currency']]
company_data.rename(columns = {'shortName':'shortname'}, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  company_data.rename(columns = {'shortName':'shortname'}, inplace = True)


In [99]:
# Create an id-column in the company data table
company_data['id']=company_data.index
# Map the company id as foreign key for the statement tables
company_id_mapper = pd.Series(company_data.id.values, index=company_data.shortname).to_dict()
company_data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  company_data['id']=company_data.index


Unnamed: 0,shortname,symbol,industry,sector,currency,id
0,ADIDAS AG,ADS.DE,Footwear & Accessories,Consumer Cyclical,EUR,0
1,AIRBUS SE,AIR.PA,Aerospace & Defense,Industrials,EUR,1
2,ALLIANZ SE,ALV.DE,Insurance—Diversified,Financial Services,EUR,2
3,BASF SE,BAS.DE,Chemicals,Basic Materials,EUR,3
4,BAYER AG,BAYN.DE,Drug Manufacturers—General,Healthcare,EUR,4
5,BEIERSDORF AG O.N.,BEI.DE,Household & Personal Products,Consumer Defensive,EUR,5
6,BAYERISCHE MOTOREN WERKE AG,BMW.DE,Auto Manufacturers,Consumer Cyclical,EUR,6
7,BRENNTAG SE NA O.N.,BNR.DE,Specialty Chemicals,Basic Materials,EUR,7
8,CONTINENTAL AG,CON.DE,Auto Parts,Consumer Cyclical,EUR,8
9,COVESTRO AG,1COV.DE,Specialty Chemicals,Basic Materials,EUR,9


### Retrieve Balance Sheet Data

In [100]:
%%timeit -r 1 -n 1
# Retrieve balance sheet data
tickers=[yf.Ticker(ticker) for ticker in ticker_list]
all_bs_data_list=[]
for ticker in tickers:
    bs=ticker.balancesheet
    bs=bs.reset_index()
    # Rename the Index column 'Index' to 'Item'
    bs.columns=['item', *bs.columns[1:]]
    bs['shortname']=ticker.info['shortName']
    # Now use pd.melt() to transform the dataframe into a transposed version with Dates along rows and items along rows
    bs_transformed=pd.melt(bs, id_vars=["shortname", "item"], var_name="date", value_name="value")
    all_bs_data_list.append(bs_transformed)
all_bs_df=pd.concat(all_bs_data_list, ignore_index=True) # change to 'ignore_index=True' otherwise you will get an 'id' column which is not continuous, but instead restarts always at '0' at each new company

6min 53s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [101]:
# Function for swapping columns to fit the database schema
def swap_columns(df, col1, col2):
    col_list = list(df.columns)
    x, y = col_list.index(col1), col_list.index(col2)
    col_list[y], col_list[x] = col_list[x], col_list[y]
    df = df[col_list]
    return df

#---- SET ID for the BS -----------------------------------------#
all_bs_df['id']=all_bs_df.index

In [103]:
#%%timeit -r 1 -n 1
# Map back and forth between the company table and the balance sheet table to link their respective foreign keys
# Map company id to balance sheet table
all_bs_df['company_id'] = all_bs_df['shortname'].map(company_id_mapper)
# Map balance sheet id to company table
bs_id_mapper1 = pd.Series(all_bs_df.id.values, index=all_bs_df.shortname)
#bs_id_mapper2 = pd.Series(all_bs_df.date.values, index=all_bs_df.shortname)
#bs_id_mapper=pd.concat([bs_id_mapper1, bs_id_mapper2], axis=1).to_dict()
company_data['bs_id'] = company_data['shortname'].map(bs_id_mapper1.to_dict())
all_bs_df.reset_index(inplace=True, drop=True)
# Fill the 'NaN' rows with zeros
all_bs_df=all_bs_df.fillna(0)
# Convert the value column to integers otherwise you cannot write the data to the database table! It will give you a 'type' error!
all_bs_df['value'] = all_bs_df['value'].astype(int)
all_bs_df['shortname'] = all_bs_df['shortname'].astype(str)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  company_data['bs_id'] = company_data['shortname'].map(bs_id_mapper1.to_dict())


In [105]:
#%%timeit -r 1 -n 1
# Swap columns to fit into the database schema
all_bs_df=swap_columns(all_bs_df, 'value', 'id')
all_bs_df=swap_columns(all_bs_df, 'date', 'id')
all_bs_df=swap_columns(all_bs_df, 'item', 'id')
all_bs_df=swap_columns(all_bs_df, 'shortname', 'id')

In [108]:
#%%timeit -r 1 -n 1
all_bs_df=swap_columns(all_bs_df, 'item', 'date')
#all_bs_df.rename(columns = {'shortName':'shortname'}, inplace = True)
all_bs_df

Unnamed: 0,id,shortname,date,item,value,company_id
0,0,ADIDAS AG,2021-12-31,Intangible Assets,352000000,0
1,1,ADIDAS AG,2021-12-31,Total Liab,-2147483648,0
2,2,ADIDAS AG,2021-12-31,Total Stockholder Equity,-2147483648,0
3,3,ADIDAS AG,2021-12-31,Minority Interest,318000000,0
4,4,ADIDAS AG,2021-12-31,Other Current Liab,-2147483648,0
...,...,...,...,...,...,...
4379,4379,ZALANDO SE,2018-12-31,Net Receivables,477600000,39
4380,4380,ZALANDO SE,2018-12-31,Long Term Debt,5600000,39
4381,4381,ZALANDO SE,2018-12-31,Inventory,819500000,39
4382,4382,ZALANDO SE,2018-12-31,Accounts Payable,1298900000,39


### Create Temporary Tables and Insert the Data

As we cannot use `to_sql()` in order to append only the new data which we retrieved from the yahoo finance API, we need to insert **temporary tables** in which we simply append with `to_sql()` all the obtained data from each run and then insert with a `WHERE` statement comparison only the new data into the **'original' standard tables**.

The `to_sql()` method of pandas does not check with its `if_exists` parameter whether each row already exists, but instead checks whether the table already exists. If the table exists, `to_sql()` either replaces the whole table OR simply appends each new dataframe from each run of this script such that we get a duplicate error, since the structure of the yahoo finance statements is always the same.

I follow [this article](https://stackoverflow.com/questions/63992639/pandas-to-sql-append-vs-replace) on _stackoverflow_ to construct and execute the necessary SQL query.

In [109]:
%%timeit -r 1 -n 1
# CREATE A FUNCTION FOR TEMPORARY TABLE CREATION
def create_temp_tables():
    """ create temporary tables in the PostgreSQL database in which 
        we can store our yahoo finance data with 'to_sql' append or replace (probably 'replace')"""
    
    commands = (
        """ CREATE TABLE IF NOT EXISTS balancesheet_temp (
                id BIGINT,
                shortName TEXT NOT NULL,
                date TIMESTAMP without TIME ZONE NOT NULL,
                item TEXT NOT NULL,
                value INTEGER,
                company_id BIGINT,
                PRIMARY KEY (id, date),
                FOREIGN KEY (company_id) REFERENCES company (id)
        )
        """,
        """
        CREATE TABLE IF NOT EXISTS incomestatement_temp (
                id BIGINT,
                shortName TEXT NOT NULL,
                date TIMESTAMP without TIME ZONE NOT NULL,
                item TEXT NOT NULL,
                value INTEGER,
                company_id BIGINT,
                PRIMARY KEY (id, date),
                FOREIGN KEY (company_id) REFERENCES company (id)
        )
        """,
        """
        CREATE TABLE IF NOT EXISTS cashflowstatement_temp (
                id BIGINT,
                shortName TEXT NOT NULL,
                date TIMESTAMP without TIME ZONE NOT NULL,
                item TEXT NOT NULL,
                value INTEGER,
                company_id BIGINT,
                PRIMARY KEY (id, date),
                FOREIGN KEY (company_id) REFERENCES company (id)
        )
        """
        )
    conn = None
    try:
        # connect to the PostgreSQL server
        conn, cur = connect()
        # create table one by one
        for command in commands:
            cur.execute(command)
        # close communication with the PostgreSQL database server
        cur.close()
        # commit the changes
        conn.commit()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
  
  
if __name__ == '__main__':
    create_temp_tables()

Successfully created a connection with the Postgres Database fundamentalsdb at host localhost for user svenst89!
148 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


So, once we have the temp tables created, we should then always upload the data we collected with each run from the yahoo finance API into those tables and later on copy these values with a specific function called `insert_new_data()` (see below!) into the 'original' standard tables.

In [110]:
%%timeit -r 1 -n 1
#---WRITING DATA TO TABLE-----------------------------------#
# USE 'engine' as connection entrypoint from sqlalchemy engine defined above! Otherwise sqlalchemy will use the default sqlite3 schema, which does not
# match with our 'Postgres' schema here and which will throw an error! : https://stackoverflow.com/questions/45326026/to-sql-pandas-data-frame-into-sql-server-error-databaseerror
all_bs_df.to_sql('balancesheet_temp', engine, if_exists='replace', index=False)

449 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


### Retrieve Income Statement Data

In [111]:
%%timeit -r 1 -n 1
all_is_data_list=[]
for ticker in tickers:
    incstate=ticker.financials
    incstate=incstate.reset_index()
    # Rename the Index column 'Index' to 'Item'
    incstate.columns=['item', *incstate.columns[1:]]
    incstate['shortname']=ticker.info['shortName']
    # Now use pd.melt() to transform the dataframe into a transposed version with Dates along rows and items along rows
    incstate_transformed=pd.melt(incstate, id_vars=["shortname", "item"], var_name="date", value_name="value")
    all_is_data_list.append(incstate_transformed)
all_is_df=pd.concat(all_is_data_list, ignore_index=True)
all_is_df['id']=all_is_df.index

248 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [112]:
#%%timeit -r 1 -n 1
# Map back and forth between the company table and the income statement table to link their respective foreign keys
# Map company id to income statement table
all_is_df['company_id'] = all_is_df['shortname'].map(company_id_mapper)
# Map income statement id to company table
is_id_mapper = pd.Series(all_is_df.id.values, index=all_is_df.shortname).to_dict()
company_data['is_id'] = company_data['shortname'].map(is_id_mapper)
all_is_df.reset_index(inplace=True, drop=True)
# Fill the 'NaN' rows with zeros
all_is_df=all_is_df.fillna(0)
# Convert the value column to integers otherwise you cannot write the data to the database table! It will give you a 'type' error!
all_is_df['value'] = all_is_df['value'].astype(int)
all_is_df['shortname'] = all_is_df['shortname'].astype(str)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  company_data['is_id'] = company_data['shortname'].map(is_id_mapper)


In [113]:
#%%timeit -r 1 -n 1
# Swap columns to fit into the database schema
all_is_df=swap_columns(all_is_df, 'value', 'id')
all_is_df=swap_columns(all_is_df, 'date', 'id')
all_is_df=swap_columns(all_is_df, 'item', 'id')
all_is_df=swap_columns(all_is_df, 'shortname', 'id')
all_is_df=swap_columns(all_is_df, 'item', 'date')
all_is_df

Unnamed: 0,value,id,date,shortname,item,company_id
0,0,0,2021-12-31,ADIDAS AG,Research Development,0
1,0,1,2021-12-31,ADIDAS AG,Effect Of Accounting Charges,0
2,1852000000,2,2021-12-31,ADIDAS AG,Income Before Tax,0
3,318000000,3,2021-12-31,ADIDAS AG,Minority Interest,0
4,2116000000,4,2021-12-31,ADIDAS AG,Net Income,0
...,...,...,...,...,...,...
3515,-2147483648,3515,2018-12-31,ZALANDO SE,Cost Of Revenue,39
3516,-13600000,3516,2018-12-31,ZALANDO SE,Total Other Income Expense Net,39
3517,0,3517,2018-12-31,ZALANDO SE,Discontinued Operations,39
3518,51200000,3518,2018-12-31,ZALANDO SE,Net Income From Continuing Ops,39


In [119]:
%%timeit -r 1 -n 1
# Store income statement data in the database
all_is_df.to_sql("incomestatement_temp", engine, if_exists="replace", index=False)

252 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


### Retrieve Cashflow Statement Data

In [120]:
%%timeit -r 1 -n 1
all_cs_data_list=[]
for ticker in tickers:
    cashstate=ticker.cashflow
    cashstate=cashstate.reset_index()
    # Rename the Index column 'Index' to 'Item'
    cashstate.columns=['item', *cashstate.columns[1:]]
    cashstate['shortname']=ticker.info['shortName']
    # Now use pd.melt() to transform the dataframe into a transposed version with Dates along rows and items along rows
    cashstate_transformed=pd.melt(cashstate, id_vars=["shortname", "item"], var_name="date", value_name="value")
    all_cs_data_list.append(cashstate_transformed)
all_cs_df=pd.concat(all_cs_data_list, ignore_index=True)
all_cs_df['id']=all_cs_df.index

464 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [121]:
#%%timeit -r 1 -n 1
# Map back and forth between the company table and the income statement table to link their respective foreign keys
# Map company id to income statement table
all_cs_df['company_id'] = all_cs_df['shortname'].map(company_id_mapper)
# Map income statement id to company table
cs_id_mapper = pd.Series(all_cs_df.id.values, index=all_cs_df.shortname).to_dict()
company_data['cs_id'] = company_data['shortname'].map(cs_id_mapper)
all_cs_df.reset_index(inplace=True, drop=True)
# Fill the 'NaN' rows with zeros
all_cs_df=all_cs_df.fillna(0)
# Convert the value column to integers otherwise you cannot write the data to the database table! It will give you a 'type' error!
all_cs_df['value'] = all_cs_df['value'].astype(int)
all_cs_df['shortname'] = all_cs_df['shortname'].astype(str)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  company_data['cs_id'] = company_data['shortname'].map(cs_id_mapper)


In [122]:
#%%timeit -r 1 -n 1
# Swap columns to fit into the database schema
all_cs_df=swap_columns(all_cs_df, 'value', 'id')
all_cs_df=swap_columns(all_cs_df, 'date', 'id')
all_cs_df=swap_columns(all_cs_df, 'item', 'id')
all_cs_df=swap_columns(all_cs_df, 'shortname', 'id')
all_cs_df=swap_columns(all_cs_df, 'item', 'date')
all_cs_df

Unnamed: 0,id,shortname,date,item,value,company_id
0,0,ADIDAS AG,2021-12-31,Investments,49000000,0
1,1,ADIDAS AG,2021-12-31,Change To Liabilities,226000000,0
2,2,ADIDAS AG,2021-12-31,Total Cashflows From Investing Activities,-424000000,0
3,3,ADIDAS AG,2021-12-31,Net Borrowings,-1251000000,0
4,4,ADIDAS AG,2021-12-31,Total Cash From Financing Activities,-2147483648,0
...,...,...,...,...,...,...
2915,2915,ZALANDO SE,2018-12-31,Change To Account Receivables,-116400000,39
2916,2916,ZALANDO SE,2018-12-31,Other Cashflows From Financing Activities,-400000,39
2917,2917,ZALANDO SE,2018-12-31,Change To Netincome,59500000,39
2918,2918,ZALANDO SE,2018-12-31,Capital Expenditures,-226100000,39


In [123]:
%%timeit -r 1 -n 1
# Store income statement data in the database
all_cs_df.to_sql("cashflowstatement_temp", engine, if_exists="replace", index=False)

371 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [124]:
%%timeit -r 1 -n 1
# Add data content to the databases
company_data.to_sql("company", engine, if_exists="append", index=False)

18.2 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


#### Append only the new Data to the Statement Tables

Now, that we have all the data ready in the temporary tables we can insert the new data only into the original standard tables. The following function iterates across the tables to insert the relevant data.

In [126]:
%%timeit -r 1 -n 1
def insert_new_data():
    """ After the data has been stored into the temporary tables, check for each table
        whether there are new id-date value pairs with an SQL WHERE clause. If yes, then insert only the new data into the 
        original standard tables.
        
        Specifically, the SQL query does the following:
        It checks in the original standard table which id-date value pairs are identical in the temporary table. Then, it selects
        only those rows from the temporary table in which the id-date value pairs are NOT yet in the original standard table and inserts these
        rows in the respective original standard table."""
    
    commands = (
        """ INSERT INTO balancesheet (id, shortname, date, item, value, company_id)
            SELECT id, shortname, date, item, value, company_id FROM balancesheet_temp
            WHERE NOT EXISTS (
                SELECT * FROM balancesheet bs
                WHERE bs.id = balancesheet_temp.id AND bs.date = balancesheet_temp.date
                )
        """,
        """ INSERT INTO incomestatement (id, shortname, date, item, value, company_id)
            SELECT id, shortname, date, item, value, company_id FROM incomestatement_temp
            WHERE NOT EXISTS (
                SELECT * FROM incomestatement
                WHERE incomestatement.id = incomestatement_temp.id AND incomestatement.date = incomestatement_temp.date
                )
        """,
        """ INSERT INTO cashflowstatement (id, shortname, date, item, value, company_id)
            SELECT id, shortname, date, item, value, company_id FROM cashflowstatement_temp
            WHERE NOT EXISTS (
                SELECT * FROM cashflowstatement cs
                WHERE cs.id = cashflowstatement_temp.id AND cs.date = cashflowstatement_temp.date
                )
        """
        )
    conn = None
    try:
        # connect to the PostgreSQL server
        conn, cur = connect()
        # insert data into each table one by one
        for command in commands:
            cur.execute(command)
        # close communication with the PostgreSQL database server
        cur.close()
        # commit the changes
        conn.commit()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
  
  
if __name__ == '__main__':
    insert_new_data()

Successfully created a connection with the Postgres Database fundamentalsdb at host localhost for user svenst89!
556 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
