<div class="alert alert-block alert-info text-center">
    <H1> Requirements and setup </H1>
</div>

- If you are new to Python I recommend installing the Anaconda distribution (anaconda.com), which already has Jupyter Notebooks included.


- If you already know Python, you should be able to follow along just fine. We'll use a few packages that can easily be installed with "pip install" but other than that it will be straight forward.


- Again, you **don't need to have a portfolio to take advantage of this course**.

<div class="alert alert-block alert-info text-center">
    <H1> PHASE I </H1>
</div>

## From transactions list to transactions final

- Using the transactions from your brokers to create a cumulative view on your portfolio
- Expanding the table with Buy and Sell orders
- Save the dataframe

In [1]:
# Imports 

import pandas as pd
import numpy as np
from datetime import datetime

In [2]:
# Define function expand buy and sell df.
def expand_buyselldf(df):
    buysell_df = df.copy()
    buysell_df['transact_val'] = round(buysell_df['quantity'] * buysell_df['price'], 2)

    # Getting the previous row for the ticker
    prev_row = []
    for x, tick in enumerate(buysell_df['ticker']):
        if x == 0:
            prev_row.append(pd.NA)

        else:
            row_tick = buysell_df['ticker'][:x]
            last_occ = row_tick.where(row_tick == tick).last_valid_index()

            if last_occ is not None:
                prev_row.append(last_occ)
            else:
                prev_row.append(pd.NA)

    buysell_df['last_occurrence'] = prev_row
    buysell_df['last_occurrence'] = buysell_df['last_occurrence'].astype('Int64')

    # Getting the cashflow column
    cash_flow = []
    for x, ref in enumerate(buysell_df['type']):
        if ref == 'Buy':
            cash_flow.append(buysell_df['quantity'].iloc[x] * buysell_df['price'].iloc[x] * (-1))
        else:
            cash_flow.append(buysell_df['quantity'].iloc[x] * buysell_df['price'].iloc[x])

    buysell_df['cashflow'] = cash_flow
    buysell_df['cashflow'] = buysell_df['cashflow'].round(2)

    # Getting the previous units and the cumulative units for each row
    buysell_df['prev_units'] = 0.0
    buysell_df['cml_units'] = 0.0
    for x, ref in enumerate(buysell_df['last_occurrence']):
        if ref is pd.NA:
            buysell_df.iat[x,9] = 0
            if buysell_df['type'].iloc[x] == 'Buy':
                buysell_df.iat[x,10] = buysell_df['quantity'].iloc[x]

        else:
            buysell_df.iat[x,9] = buysell_df['cml_units'].iloc[ref]
            if buysell_df['type'].iloc[x] == 'Buy':
                buysell_df.iat[x,10] = round(buysell_df['cml_units'].iloc[ref] + buysell_df['quantity'].iloc[x], 4)
            else:
                buysell_df.iat[x,10] = round(buysell_df['cml_units'].iloc[ref] - buysell_df['quantity'].iloc[x], 4)

    # Getting the previous cost, cumulative cost, transtype cost and unit cost for each row
    buysell_df['prev_cost'] = 0.0 # 11
    buysell_df['cml_cost'] = 0.0 # 12
    buysell_df['cost_transact'] = 0.0 # 13
    buysell_df['cost_unit'] = 0.0 # 14

    for x, ref in enumerate(buysell_df['last_occurrence']):
        if ref is pd.NA:
            buysell_df.iat[x,11] = 0
            buysell_df.iat[x,13] = np.nan
            buysell_df.iat[x,14] = np.nan
            if buysell_df['type'].iloc[x] == 'Buy':
                buysell_df.iat[x,12] = buysell_df['transact_val'].iloc[x]
                buysell_df.iat[x,13] = np.nan
                buysell_df.iat[x,14] = np.nan
            # there should be no SELL on the first row!

    else: # in case last occurrence is not nan
        buysell_df.iat[x,11] = buysell_df['cml_cost'].iloc[ref]
        if buysell_df['type'].iloc[x] == 'Buy':
            buysell_df.iat[x,12] = round(buysell_df['cml_cost'].iloc[ref] + buysell_df['transact_val'].iloc[x], 4)
            buysell_df.iat[x,13] = np.nan
            buysell_df.iat[x,14] = np.nan
        else: # in case SELL
            if buysell_df['cml_units'].iloc[ref] == 0:
                buysell_df.iat[x,13] = np.nan
                buysell_df.iat[x,14] = np.nan
            else:
                buysell_df.iat[x,13] = round((buysell_df['quantity'].iloc[x]) / (buysell_df['cml_units'].iloc[ref]) * (buysell_df['cml_cost'].iloc[ref]), 4)
                buysell_df.iat[x,14] = round(buysell_df['cml_cost'].iloc[ref] / buysell_df['cml_units'].iloc[ref], 4)
                buysell_df.iat[x,12] = round(buysell_df['cml_cost'].iloc[ref] - buysell_df['cost_transact'].iloc[x], 4)
                buysell_df.iat[x, 14] = round(buysell_df['cml_cost'].iloc[ref] / buysell_df['cml_units'].iloc[ref], 4)

                
    # Getting the realized Gain/Loss and yield %
    buysell_df['gain_loss'] = 0.0
    buysell_df['yield'] = 0.0
    for x, ref in enumerate(buysell_df['type']):
        if ref == 'Sell':
            buysell_df.iat[x,15] = round(buysell_df['transact_val'].iloc[x] - buysell_df['cost_transact'].iloc[x], 4)
            buysell_df.iat[x,16] = round(buysell_df.iat[x,15] / buysell_df['cost_transact'].iloc[x], 4)
    return buysell_df.fillna(0)

def clean_header(df):
    df.columns = df.columns.str.strip().str.lower().str.replace('.', '', regex=False).str.replace('(', \
                '', regex=False).str.replace(')', '', regex=False).str.replace(' ', '_', regex=False).str.replace('_/_', '/', regex=False)
    
def get_now():
    now = datetime.now().strftime('%Y-%m-%d_%Hh%Mm')
    return now

def datetime_maker(df, datecol):
    df[datecol] = pd.to_datetime(df[datecol])
    


Tip:
If you have multiple brokers , you might want to add a script to merge them and format them together.

You can also pick what assets you want to track separatly

In [3]:
##In case of excel file use this code to read it:

# broker1_raw = pd.read_csv(("../inputs/broker1/cryptocom.csv", sep=';')
# broker1_raw.sort_index(inplace=True)
# clean_header(broker1_raw)
# datetime_maker(broker1_raw, 'time')
# broker1_raw['no_of_shares'] = broker1_raw['no_of_shares'].round(4)
# broker1_raw['action'].mask(broker1_raw['action'].str.contains('uy'), 'Buy', inplace=True)
# broker1_raw['action'].mask(broker1_raw['action'].str.contains('ell'), 'Sell', inplace=True)
# buysell_filter = (broker1_raw['action'].str.contains('Buy') | broker1_raw['action'].str.contains('Sell'))
# cols_brok1 = ['time', 'action', 'ticker', 'no_of_shares', 'price/share', 'withholding_tax']
# broker1_buysell = broker1_raw[buysell_filter][cols_brok1]
# broker1_buysell.reset_index(inplace=True, drop=True)
# cols_buysell = ['date', 'type', 'ticker', 'quantity', 'price', 'fees']
# broker1_buysell.columns = cols_buysell
# broker1_buysell['date'] = broker1_buysell['date'].dt.normalize()

READ CSV FILE AND CLEAN YOUR DATA

In [4]:
# # Use pandas to read csv file
broker1_raw = pd.read_csv("../inputs/broker1/crypto_com.csv")
broker1_raw.rename(columns = {'Timestamp (UTC)' : 'date', 'Transaction Description' : 'type', 'Currency' : 'ticker','Amount':'quantity', 'Native Amount (in USD)': 'price'}, inplace = True)


In [5]:
# Add the fees column to the Data Frame and 
# Calculate fees per transaction and add the result to the fees column
fee = 2.99
broker1_raw['fees']= fee * broker1_raw['price']/100
broker1_raw = broker1_raw[['date', 'type', 'ticker', 'quantity','To Currency','To Amount', 'price', 'fees']].round(2)

In [6]:
# Display Dataframe
display(broker1_raw.head(5))
# display(broker1_raw.tail(5))

Unnamed: 0,date,type,ticker,quantity,To Currency,To Amount,price,fees
0,2023-02-25 08:00:06,Recurring Buy,USD,15.0,VET,526.67,15.0,0.45
1,2023-02-11 08:00:13,Recurring Buy,USD,15.0,VET,622.05,15.0,0.45
2,2023-01-28 08:04:06,Recurring Buy,USD,15.0,VET,586.44,15.0,0.45
3,2023-01-20 03:21:11,Buy ONE,ONE,946.0,,,15.51,0.46
4,2023-01-14 08:00:20,Recurring Buy,USD,15.0,VET,675.99,15.0,0.45


In [7]:
# Drop rows containing Adjustment|Withdraw (no necesary) 
broker1_raw = broker1_raw[broker1_raw['type'].str.contains('Adjustment|Withdraw ')==False]

In [8]:
# define function to mofdify type column content buy/sell
def change_type(broker1_raw):
    broker1_raw.loc[broker1_raw['type'].str.contains('Buy', case=False), 'type'] = 'buy'
    broker1_raw.loc[broker1_raw['type'].str.contains('->', case=False), 'type'] = 'sell'

# Call the  function
change_type(broker1_raw)

In [9]:
# Display df changes
display(broker1_raw.tail(25))
# display(broker1_raw.tail(50))

Unnamed: 0,date,type,ticker,quantity,To Currency,To Amount,price,fees
52,2021-06-06 14:28:37,buy,ONE,150.0,,,14.25,0.43
53,2021-05-08 17:54:56,buy,AUDIO,25.0,,,55.99,1.67
54,2021-05-08 17:17:56,sell,USDC,-57.84,AUDIO,25.74,57.84,1.73
55,2021-05-08 07:25:23,sell,USDC,-19.96,ONE,150.0,19.96,0.6
56,2021-05-08 07:24:25,sell,USDC,-18.23,VET,80.0,18.23,0.55
57,2021-05-08 07:20:50,sell,DOGE,-139.5,USDC,95.9,95.9,2.87
58,2021-05-07 04:06:12,sell,USDC,-5.88,ONE,46.0,5.88,0.18
59,2021-05-07 04:05:22,sell,DOGE,-10.5,USDC,5.93,5.93,0.18
62,2021-05-02 19:44:22,buy,VET,220.0,,,49.21,1.47
63,2021-05-02 15:40:34,buy,DOGE,150.0,,,58.44,1.75


In [10]:
broker1_raw.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 75 entries, 0 to 80
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   date         75 non-null     object 
 1   type         75 non-null     object 
 2   ticker       75 non-null     object 
 3   quantity     75 non-null     float64
 4   To Currency  27 non-null     object 
 5   To Amount    27 non-null     float64
 6   price        75 non-null     float64
 7   fees         75 non-null     float64
dtypes: float64(4), object(4)
memory usage: 5.3+ KB


In [11]:
# drop last two unnecesary columns
broker1_raw.drop(['To Currency', 'To Amount'],  axis=1,  inplace=True) 


In [12]:


# # Clean rows wrong content this can be improved implementing a function for future reference
# broker1_raw.iloc[12, broker1_raw.columns.get_loc('ticker')] = 'VET'
# broker1_raw.iloc[13, broker1_raw.columns.get_loc('ticker')] = 'AUDIO'
# broker1_raw.iloc[21, broker1_raw.columns.get_loc('ticker')] = 'AUDIO'
# broker1_raw.iloc[22, broker1_raw.columns.get_loc('ticker')] = 'VET'
# broker1_raw.iloc[23, broker1_raw.columns.get_loc('ticker')] = 'AUDIO'
# broker1_raw.iloc[30, broker1_raw.columns.get_loc('ticker')] = 'AUDIO'
# broker1_raw.iloc[31, broker1_raw.columns.get_loc('ticker')] = 'ONE'
# broker1_raw.iloc[32, broker1_raw.columns.get_loc('ticker')] = 'VET'
# broker1_raw.iloc[34, broker1_raw.columns.get_loc('ticker')] = 'ONE'
# broker1_raw.iloc[35, broker1_raw.columns.get_loc('ticker')] = 'VET'
# broker1_raw.iloc[43, broker1_raw.columns.get_loc('ticker')] = 'ONE'

# broker1_raw.iloc[12, broker1_raw.columns.get_loc('quantity')] = 734.70
# broker1_raw.iloc[21, broker1_raw.columns.get_loc('quantity')] = 13.63
# broker1_raw.iloc[22, broker1_raw.columns.get_loc('quantity')] = 404.50
# broker1_raw.iloc[30, broker1_raw.columns.get_loc('quantity')] = 25.74
# broker1_raw.iloc[31, broker1_raw.columns.get_loc('quantity')] = 150.00
# broker1_raw.iloc[32, broker1_raw.columns.get_loc('quantity')] = 80.00
# broker1_raw.iloc[34, broker1_raw.columns.get_loc('quantity')] = 46.00
# broker1_raw.iloc[43, broker1_raw.columns.get_loc('quantity')] = 292.00


cols_buysell = ['date', 'type', 'ticker', 'quantity', 'price', 'fees']
broker1_raw.columns = cols_buysell
broker1_raw.date = pd.to_datetime(broker1_raw.date, format="%Y/%m/%d %H:%M")
broker1_raw.type = broker1_raw.type.str.capitalize()
broker1_buysell = broker1_raw[broker1_raw.type.str.lower().str.contains('buy|sell')]
broker1_buysell.sort_values(by='date')
broker1_buysell.reset_index(inplace=True, drop=True)
broker1_buysell.date = broker1_buysell.date.dt.date

broker1_buysell

Unnamed: 0,date,type,ticker,quantity,price,fees
0,2023-02-25,Buy,USD,15.0,15.00,0.45
1,2023-02-11,Buy,USD,15.0,15.00,0.45
2,2023-01-28,Buy,USD,15.0,15.00,0.45
3,2023-01-20,Buy,ONE,946.0,15.51,0.46
4,2023-01-14,Buy,USD,15.0,15.00,0.45
...,...,...,...,...,...,...
70,2021-03-07,Buy,CRO,200.0,31.62,0.95
71,2021-03-07,Buy,ADA,30.0,34.21,1.02
72,2021-03-07,Buy,BTC,0.0,116.89,3.50
73,2021-03-04,Buy,ADA,100.0,121.59,3.64


In [13]:
#Get the Data types of each column
broker1_buysell.dtypes

date         object
type         object
ticker       object
quantity    float64
price       float64
fees        float64
dtype: object

<div class="alert alert-block alert-success">
<b>Run it!</b><br>Once you have the transactions ready, just call the function to expand the dataframe!
    <br><br>For the Excel die-hards, you can also use this table to do some quick exploring on a spreadsheet.
</div>

In [14]:
final = expand_buyselldf(broker1_buysell).sort_values(by='date')



In [15]:
final#.tail(10)


Unnamed: 0,date,type,ticker,quantity,price,fees,transact_val,last_occurrence,cashflow,prev_units,cml_units,prev_cost,cml_cost,cost_transact,cost_unit,gain_loss,yield
74,2021-03-04,Buy,CRO,200.0,30.67,0.92,6134.00,70,-6134.00,200.0,400.0,0.0,6134.00,0.0,0.0,0.0,0.0
73,2021-03-04,Buy,ADA,100.0,121.59,3.64,12159.00,71,-12159.00,200.0,300.0,0.0,0.00,0.0,0.0,0.0,0.0
71,2021-03-07,Buy,ADA,30.0,34.21,1.02,1026.30,68,-1026.30,170.0,200.0,0.0,0.00,0.0,0.0,0.0,0.0
70,2021-03-07,Buy,CRO,200.0,31.62,0.95,6324.00,69,-6324.00,0.0,200.0,0.0,0.00,0.0,0.0,0.0,0.0
72,2021-03-07,Buy,BTC,0.0,116.89,3.50,0.00,0,-0.00,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4,2023-01-14,Buy,USD,15.0,15.00,0.45,225.00,2,-225.00,45.0,60.0,0.0,0.00,0.0,0.0,0.0,0.0
3,2023-01-20,Buy,ONE,946.0,15.51,0.46,14672.46,0,-14672.46,0.0,946.0,0.0,14672.46,0.0,0.0,0.0,0.0
2,2023-01-28,Buy,USD,15.0,15.00,0.45,225.00,1,-225.00,30.0,45.0,0.0,0.00,0.0,0.0,0.0,0.0
1,2023-02-11,Buy,USD,15.0,15.00,0.45,225.00,0,-225.00,15.0,30.0,0.0,0.00,0.0,0.0,0.0,0.0


In [16]:
final['cml_cashflow'] = final['cashflow'].cumsum()*-1


In [17]:
final['avg_price'] = final['cml_cost']/final['cml_units'].round(2)

In [18]:
final.dtypes


date                object
type                object
ticker              object
quantity           float64
price              float64
fees               float64
transact_val       float64
last_occurrence      Int64
cashflow           float64
prev_units         float64
cml_units          float64
prev_cost          float64
cml_cost           float64
cost_transact      float64
cost_unit          float64
gain_loss          float64
yield              float64
cml_cashflow       float64
avg_price          float64
dtype: object

In [19]:
final.to_excel('../outputs/transactions_all/transactions_finaldf_{}.xlsx'.format(get_now()), index=False)
