<a href="https://colab.research.google.com/github/Thukyd/TradeRepublic_Transactions_Sheet/blob/master/TradeRepublic_PDF_Converter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#   0.1 | TradeRepublic - PDF Converter for Portfolio Performance
# a) Description
This tool should help to keep the overview of the transactions within TradeRepublic. In TR there is the possibility to export all orders as PDF. For testing purposes the import and export is currently only possible with G-Drive. Alternatives are on the todo list. 

## Output A: Master Sheet
The master sheet lists all previous transactions from the PDFs. 

## Output B: Delta Sheet (dated)
The delta sheet contains all new transactions that have been added since the script was last executed. These are intended to be imported into Portfolio Performance. 

# b) Setup
*   Optimized for Google Colab (https://colab.research.google.com/)
*   TradeRepublic statements will be imported via GoogleDrive. All PDF files shoule be in a single folder. You need to configure the path to your G-Drive before usage (see: gdrive_path) 
*   Export .csv optimized for "Portfolio Performance" (https://www.portfolio-performance.info/)

# c) Options to customize
- for different data sources, see: https://colab.research.google.com/notebooks/io.ipynb
- in order to create different data structures, take a look at "Examples for extracted fields". 

# d) Further Information
## Handling of costs statements
*   "Kosten des Wertpapierkaufs/verkaufs" are be considered.
*   "Kosten während der Haltedauer (pro Jahr)" are not extracted and therefore do not appear the sheets. 

## Deposits & Withdrawls
* Deposits and withdrawals to the depot are not recorded in PDF format. Therefore they are not taken into account and must be entered by hand if necessary.

## "Order" ID PDF documents
* Each TradeRepublic document has got a "order" number. This is extracted and stored in the field "Notiz". It serves to prevent duplicate entries.  

# e) Todos
## Done
* Extract G-Drive folder of TradeRepublic PDFs
* Create data structure (for Portfolio Performance or other purposes)
* Generate master sheet of all transactions
* Generate delta sheet for new transactions (basis for Trade Republic Import)

## Open
* Create sheets for portfolio performance in .csv format
* Offer alternatives to G-Drive import/export

## To be fixed
- calculate_stock_value() => conversion of german/international number system

In [None]:
######################################
########    Define before Usage
######################################
# load all pdfs in folder
gdrive_path = "/content/drive/My Drive/your_document_foder/"

In [None]:
# PyPDF2 for PDF extraction
!pip install PyPDF2
import PyPDF2
# Regex for text processing
import re
# requirement for pdf-folder extraction
import glob
import os
# requirement for get_date_today()
from datetime import date



# Extraction & helper functions

In [None]:
def get_date_today ():
  today = date.today()
  return today.strftime("%Y-%m-%d")

In [None]:
def preprocess_letter (full_extraction):
  # preporcesss text
  header = "WERTPAPIERGESCHÄFTWERTPAPIERORDER"
  header, core = full_extraction.split(header)
  return header, core

In [None]:
def extract_transaction_date (raw_text):
  pattern = "DATUM\d{2}.\d{2}.\d{4}"
  transaction_date = re.findall(pattern, raw_text)
  date = re.split("DATUM", transaction_date[0])
  return date[1]

In [None]:
def extract_order_id (raw_text):
  pattern = "(?<=ORDER)(.*?)(?=DEPOT)"
  id = re.findall(pattern, raw_text)
  return id[0]

In [None]:
def extract_isin(raw_text):
  # Search for ISIN of Stock
  isin_pattern = "(?<=ISIN: )(BE|BM|FR|BG|VE|DK|HR|DE|JP|HU|HK|JO|BR|XS|FI|GR|IS|RU|LB|PT|NO|TW|UA|TR|LK|LV|LU|TH|NL|PK|PH|RO|EG|PL|AA|CH|CN|CL|EE|CA|IR|IT|ZA|CZ|CY|AR|AU|AT|IN|CS|CR|IE|ID|ES|PE|TN|PA|SG|IL|US|MX|SK|KR|SI|KW|MY|MO|SE|GB|GG|KY|JE|VG|NG|SA|MU)([0-9A-Z]{9})([0-9])"
  matches = re.findall(isin_pattern, text)
  item = "".join(matches[0])
  return item

In [None]:
def extract_name(raw_text):
  pattern = "(?<=ANZAHLWERTAUSFÜHRUNGSPLATZ)(.*?)(?=ISIN)"
  name = re.findall(pattern, raw_text)
  return name[0]

In [None]:
def extract_order(raw_text):
  ISIN = extract_isin(raw_text)
  pattern = "(?<="+ ISIN +")(.*?)(?=Lang und Schwarz Exchange)"
  order = re.findall(pattern, raw_text)
  matched = ''.join(order)
  matched = matched.split(" ")
  transaction_type = matched[0]
  order_count, total_value = matched[2].split("Stk.") 
  return [transaction_type, order_count, total_value]

In [None]:
# does not include running costs
# takes first appearence of costs in the pdf
  # buy operations => consists of buying costs, running costs & selling costs | only buying costs considered
  # sell operation => consits of selling costs | selling costs considered
def extract_order_costs(raw_text):
  transaction_type = extract_order(raw_text)[0]
  pattern = "(?<=SUMME)(.*?)(?= )"
  costs = re.findall(pattern, raw_text)
  return costs[0]

In [None]:
# TODO: does not work with numbers as "1.312,23"

# import locale
# def calculate_stock_value(raw_text):
#   t_type, count, value = extract_order(raw_text) # extract how many stocks & for which value - transaction type not needed
#   count = int(count)
#   value
#   value = float(value.replace(",",".")) # replace "," for float conversion
#   single_value = float(value) / float(count)
#   single_value = str(single_value).replace(".",",") # use german "," system again
#   return single_value

In [None]:
# In TradeRepublic statements all total values for buy & sell are in positive numbers.
# For the conversion to Portfolio Performance (PP) this value has got to take the type of transaction in account.
# It needs to have a sign. Negative numbers for "buy" and positive numbers for "sell".
def add_sign_to_order_value(raw_text):
  transaction_type, count, total_value = extract_order(raw_text) # count not needed
  if transaction_type == "Kauf":
    return "-" + total_value
  else: 
    return total_value

# Get data from Google Drive

- for different data sources, see: https://colab.research.google.com/notebooks/io.ipynb



In [None]:
# Load Google Drive helper
from google.colab import drive
# This will prompt for authorization
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Create Data Structure
- optimized for Portfolio Performance

In [None]:
# create list to save extracted data
transaction_record = []

for filename in glob.glob(os.path.join(gdrive_path, '*.pdf')):
   with open(filename, 'rb') as fin: # open in readonly mode
      # read and extract pdf infos
      pdf_reader = PyPDF2.PdfFileReader(fin)
      # extract first page
      extr_page = pdf_reader.getPage(0)
      text = extr_page.extractText()
      # check if current pdf file is a "WERTPAPIERGESCHÄFTWERTPAPIERORDER"
      transaction_pdf = "WERTPAPIERGESCHÄFTWERTPAPIERORDER"
      duplicate_file = ").pdf"

      if duplicate_file in filename: # try to find pdf duplicates - eg. "filename (1).pdf" instead of "filename.pdf"
          #print("Processing PDF | DUPLICATE   | \"Wertpapiergeschäftsorder\" |  " + filename)
          continue
      elif transaction_pdf in text:  ## all non duplicate,"Wertpapergeschäftsorder"
          #print("Processing PDF | RELEVANT    | \"Wertpapiergeschäftsorder\" |  " + filename)
          pass
      else: # non duplicate, irelevant files
          #print("Processing PDF | IRELEVANT   |                            |  " + filename)
          continue

      # split document for easier processing
      full_text = preprocess_letter(text)
      text = full_text[1]
      header = full_text[0]

      # set data structure
      current_transaction = {
      "Datum" : extract_transaction_date(header),
      "ISIN" : extract_isin(text),
      "Typ": extract_order(text)[0],
      "Wert" : add_sign_to_order_value(text),
      "Buchungswährung" : "EUR",
      "Steuern" : "",
      "Stück" : extract_order(text)[1],
      "Steuern" : "",
      "Steuern" : "",
      "Werpapiername" : extract_name(text),
      "Notiz" : extract_order_id(header),
      "Gebühren" : extract_order_costs(text),
      "WKN" : "",
      "Ticker-Symbol": ""
      }

      transaction_record.append(current_transaction)

transaction_record



# Google Sheets Export


In [None]:
!pip install --upgrade --quiet gspread

In [None]:
from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

In [None]:
# prepare transaction record for spreadsheet
worksheet_data = [["Datum", "Typ", "Wert", "Buchungswährung", "Steuern", "Stück", "ISIN", "WKN", "Ticker-Symbol", "Wertpapiername", "Notiz"]]

for transaction in range(len(transaction_record)):
  worksheet_data.append([
                        transaction_record[transaction]["Datum"],
                        transaction_record[transaction]["Typ"],
                        transaction_record[transaction]["Wert"],
                        transaction_record[transaction]["Buchungswährung"],
                        transaction_record[transaction]["Steuern"],
                        transaction_record[transaction]["Stück"],
                        transaction_record[transaction]["ISIN"],
                        transaction_record[transaction]["WKN"],
                        transaction_record[transaction]["Ticker-Symbol"],
                        transaction_record[transaction]["Werpapiername"],
                        transaction_record[transaction]["Notiz"]
                        ])
worksheet_data

 ['08.07.1921',
  'Kauf',
  '-1.314,00',
  'EUR',
  '',
  '6',
  'US342354543',
  '',
  '',
  'McBurgerChicken',
  '8532-d3432']

In [None]:
######################
# Update Mastersheet => worksheet_master
 
try: # Check if Spreadsheet already exists
  spreadsheet = gc.open("tradeRepublic_Complete") # open existing sheet
except:
  spreadsheet = gc.create("tradeRepublic_Complete") # create new sheet
 
worksheet_master = spreadsheet.sheet1 # get worksheet for document 
 
 
# Update information by order id
for row in range(len(worksheet_data)): 
  current_data_item = worksheet_data[row][10] # data from extraction; "10" is the index row for "Notiz". In Notiz there is the TradeRepublic Order Id
  current_sheet_row = worksheet_master.acell("K" + str(row+1)).value # equivalent data in sheets
 
  if current_sheet_row == current_data_item:   # update only new entries (defined by order_id & row in sheet)
    print("Entry exists already")
  else:
    print("Added new entry to sheet")
    worksheet_master.update("A" + str(row+1), [worksheet_data[row]]) # update sheet row by row
 
#######################
# Update Delta Sheet => "worksheet_delta"
    today = get_date_today () 
 
    try: # Check if Spreadsheet already exists
      spreadsheet_delta = gc.open("tradeRepublic_" + today) # open existing sheet
    except:
      spreadsheet_delta = gc.create("tradeRepublic_" + today) # create new sheet
    worksheet_delta = spreadsheet_delta.sheet1
 
    # don't overwrite existing entries, but check for the first free row before inserting current item
    count = 0
    while True:
      count += 1
      if worksheet_delta.acell("A" + str(count)).value == "":
        worksheet_delta.update("A" + str(count), [worksheet_data[row]])      
        break

Entry exists already
Entry exists already
Entry exists already
Entry exists already
Entry exists already
Entry exists already
Added new entry to sheet
Added new entry to sheet
Added new entry to sheet
Added new entry to sheet
Entry exists already
Entry exists already
Entry exists already


# Examples for extracted values

In [None]:
# extract ISNIN
ISIN = extract_isin(text)
ISIN

'US01609sdf027'

In [None]:
# date of transaction
date = extract_transaction_date(header)
date

'08.07.2010'

In [None]:
# name of position
name = extract_name(text)
name

'Alibaba'

In [None]:
# transaction type & order volume
transaction_type, order_count, total_value = extract_order(text)
transaction_type

'Kauf'

In [None]:
 costs = extract_order_costs(text)
 costs

'1,00'

In [None]:
# TODO: needs to be fixed
# stock_value = calculate_stock_value(text)
# stock_value

In [None]:
# add sign in total number for buy/sell actions
order_value = add_sign_to_order_value(text)
order_value

'-1.314,00'

In [None]:
# Range of extraced fields
print("Date : " + date)
print("Werpapiername : " + name)
print("ISIN : " + ISIN)
print("Transaction Type : " + transaction_type)
print("Order Count : " + order_count)
#print("Stock Value : " + stock_value)
print("Total Value : " + total_value)
print("Transaction Costs : " + costs)
print("Total Value with Sign: " + order_value)

Date : 08.07.1920
Werpapiername : BurgerEnron
ISIN : fsdf
Transaction Type : Kauf
Order Count : 6
Total Value : 1.314,00
Transaction Costs : 1,00
Total Value with Sign: -1.314,00
