<img src="http://eikon.tpq.io/refinitiv_logo.png" width="28%" align="left" style="vertical-align: top; padding-top: 23px;">
<img src="http://hilpisch.com/tpq_logo_long.png" width="36%" align="right" style="vertical-align: top;">

# Eikon Data API

**Portfolio Selection**

Dr. Yves J. Hilpisch | The Python Quants GmbH

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:training@tpq.io">training@tpq.io</a>

<img src="http://hilpisch.com/images/tr_eikon_02.png" width=350px align=left>

## The Agenda

This tutorial shows

* how to retrieve historical data across asset classes via the Eikon Data API,
* how to work with such data using `pandas`, `Plotly` and `Cufflinks` and
* how to compose and analyze portfolios with regard to their expected return and volatility.

## Portfolio Selection

Markowitz(1952): “Portfolio Selection”:

> “Various reasons recommend the use of the expected return-variance of return rule, both as a hypothesis to explain well-established investment behavior and as a maxim to guide one's own action.”

In what follows, the composition of a portfolio made up of different risky assets is analyzed in light of the **Mean-Variance Portfolio Theory (MVP)** of Markowitz (1952), i.e. with regard to the resulting **expected portfolio return** and **expected portfolio volatility** (instead of variance).

## Importing Required Packages

In [1]:
import math
import eikon as ek  # the Eikon Python wrapper package
import numpy as np  # NumPy
import pandas as pd  # pandas
import cufflinks as cf  # Cufflinks
import configparser as cp
import scipy.optimize as sco  # optimization routines
cf.set_config_file(offline=True)  # set the plotting mode to offline

The following **Python and package versions** are used.

In [2]:
import sys
print(sys.version)

3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)]


In [3]:
ek.__version__

'1.1.16'

In [4]:
np.__version__

'1.21.5'

In [5]:
pd.__version__

'1.4.4'

In [6]:
cf.__version__

'0.17.3'

## Connecting to Eikon Data API

This code sets the `app_id` to connect to the **Eikon Data API Proxy** which needs to be running locally.

In [7]:
cfg = cp.ConfigParser()
cfg.read('eikon.cfg')

[]

In [8]:
from dotenv import load_dotenv
from IPython.display import display, Markdown
import warnings
import os

cf.set_config_file(offline=True)  
warnings.filterwarnings("ignore")
pd.set_option('display.max_columns', None)

In [9]:
load_dotenv("eikon.env")
eikon_api_key = os.getenv("eikon_api_key")
ek.set_app_key(eikon_api_key)

In [None]:
# ek.set_app_id(cfg['eikon']['app_id']) #set_app_id function being deprecated

## Retrieving Cross-Asset Data

We first define a **small universe of `RICS`** for which to retrieve data.

In [27]:
ticker = [
    'SPAXX',
]

In [28]:
ek.get_symbology(ticker, from_symbol_type='ticker', to_symbol_type='RIC')

Unnamed: 0,error
SPAXX,No best match available


In [None]:
'''
rics = [
    'AAPL.O',  # Apple stock
    'AMZN.O',  # Amazon stock
    'SPY',  # S&P 500 ETF
    'GLD',  # Gold ETF
    'EUR=',  # EUR/USD exchange rate
]
'''

In [64]:
# rics = ['BRKa', 'US10YT=RR', 'US1MT=RR', 'AMT', 'GLD', 'BLK', 'BX', 'IVZ', 'LAZ', 'KKR', 'IEP.O']
rics = ['BRKa', 'US10YT=RR', 'AMT', 'GLD', 'BLK', 'BX', 'IVZ', 'LAZ', 'KKR', 'IEP.O']

Second, **end-of-day (EOD) data** is retrieved.

In [65]:
data = ek.get_timeseries(rics,  # the RICs
                         fields='CLOSE',  # the required fields
                         start_date='2011-01-01',  # start date
                         end_date='2023-06-01')  # end date

In [66]:
start_date = '2011-01-01'
end_date = '2023-06-01'

# empty df
data = pd.DataFrame()

# looping 
for year in range(2011, 2024):
    start_date_year = f'{year}-01-01'
    end_date_year = f'{year}-06-01'
    df = ek.get_timeseries(rics, fields='CLOSE', start_date=start_date_year, end_date=end_date_year)
    
    # concatenating
    data = pd.concat([data, df])


In [67]:
data.head()  # first five rows

CLOSE,BRKa,US10YT=RR,AMT,GLD,BLK,BX,IVZ,LAZ,KKR,IEP.O
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2011-01-03,120498.0,3.336,51.63,138.0,190.19,14.503767,24.46,35.366027,14.5,34.29421
2011-01-04,120200.0,3.338,51.47,134.75,190.04,14.680284,24.41,35.14315,14.45,34.275984
2011-01-05,121300.0,3.463,50.76,134.37,192.0,14.670477,24.39,36.346683,15.14,34.53403
2011-01-06,120600.0,3.403,50.62,133.83,189.93,14.69009,24.35,37.050972,15.23,34.514844
2011-01-07,119681.0,3.326,50.5,133.58,188.36,14.621445,24.33,36.587389,14.97,35.243896


In [68]:
data.tail()  # final five rows

CLOSE,BRKa,US10YT=RR,AMT,GLD,BLK,BX,IVZ,LAZ,KKR,IEP.O
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2023-05-25,484000.0,3.815,182.56,180.2,660.52,83.53,14.7,28.15,50.7,20.63
2023-05-26,486650.0,3.82,182.18,180.92,672.3,85.7,14.89,28.76,51.68,20.65
2023-05-30,489224.73,3.696,182.0,182.04,673.58,86.4,14.82,28.93,51.67,22.38
2023-05-31,488023.98,3.637,184.44,182.32,657.55,85.64,14.38,28.69,51.49,22.57
2023-06-01,492000.01,3.608,187.01,183.76,668.84,87.14,14.74,29.1,52.43,21.81


Only complete data rows are selected.

In [69]:
data.dropna(inplace=True)  # deletes tows with NaN values

In [70]:
data.info()  # DataFrame meta information

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1351 entries, 2011-01-03 to 2023-06-01
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   BRKa       1351 non-null   Float64
 1   US10YT=RR  1351 non-null   Float64
 2   AMT        1351 non-null   Float64
 3   GLD        1351 non-null   Float64
 4   BLK        1351 non-null   Float64
 5   BX         1351 non-null   Float64
 6   IVZ        1351 non-null   Float64
 7   LAZ        1351 non-null   Float64
 8   KKR        1351 non-null   Float64
 9   IEP.O      1351 non-null   Float64
dtypes: Float64(10)
memory usage: 129.3 KB


In [71]:
data.normalize().iplot(kind='lines')

## Statistics for Single Instruments

We calculate the **log returns** in vectorized fashion for all financial instruments and all days available.

In [72]:
rets = np.log(data / data.shift(1))  # log returns in vectorized fashion

In [73]:
rets.head()

CLOSE,BRKa,US10YT=RR,AMT,GLD,BLK,BX,IVZ,LAZ,KKR,IEP.O
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2011-01-03,,,,,,,,,,
2011-01-04,-0.002476,0.000599,-0.003104,-0.023832,-0.000789,0.012097,-0.002046,-0.006322,-0.003454,-0.000532
2011-01-05,0.00911,0.036763,-0.01389,-0.002824,0.010261,-0.000668,-0.00082,0.033673,0.046646,0.0075
2011-01-06,-0.005788,-0.017478,-0.002762,-0.004027,-0.01084,0.001336,-0.001641,0.019192,0.005927,-0.000556
2011-01-07,-0.007649,-0.022887,-0.002373,-0.00187,-0.008301,-0.004684,-0.000822,-0.012591,-0.017219,0.020903


In [74]:
rets.iplot(kind='histogram', subplots=True)

In MVP, the **average returns** of the financial instruments play an important role since they are used to approximate the **expected returns**.

In [75]:
rets.mean()  # daily mean returns

CLOSE
BRKa         0.001042
US10YT=RR    0.000058
AMT          0.000953
GLD          0.000212
BLK          0.000931
BX           0.001328
IVZ         -0.000375
LAZ         -0.000144
KKR          0.000952
IEP.O       -0.000335
dtype: float64

In [76]:
rets.mean() * 252  # annualied mean returns

CLOSE
BRKa         0.262611
US10YT=RR    0.014631
AMT          0.240251
GLD          0.053457
BLK          0.234737
BX           0.334713
IVZ         -0.094542
LAZ         -0.036403
KKR          0.239928
IEP.O       -0.084487
dtype: float64

In [77]:
(rets.mean() * 252).iplot(kind='bar')

On the other hand, the **historical volatility** (standard deviation of returns) plays an important role, although it is not used directly.

In [78]:
rets.std()  # daily volatilities

CLOSE
BRKa         0.016959
US10YT=RR    0.040567
AMT          0.020039
GLD          0.012327
BLK          0.025460
BX           0.034240
IVZ          0.036300
LAZ          0.032247
KKR          0.032291
IEP.O        0.031954
dtype: float64

In [79]:
rets.std() * math.sqrt(252)  # annualized volatilities

CLOSE
BRKa         0.269208
US10YT=RR    0.643975
AMT          0.318116
GLD          0.195687
BLK          0.404169
BX           0.543540
IVZ          0.576249
LAZ          0.511903
KKR          0.512610
IEP.O        0.507247
dtype: float64

In [80]:
(rets.std() * math.sqrt(252)).iplot(kind='bar')

## Portfolio Statistics

Assume a portfolio composed of **all financial instruments** with **equal weighting**.

In [81]:
weights = len(rics) * [1 / len(rics)]
weights

[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]

The **expected portfolio return** according to MVP is the dot product of the expected returns and the weights.

In [82]:
np.dot(rets.mean() * 252, weights)

0.11648982347633949

In [83]:
def portfolio_return(symbols, weights):
    return np.dot(rets[symbols].mean() * 252, weights)

In [84]:
portfolio_return(rics, weights)

0.11648982347633949

`pandas` allows to derive the **covariance matrix** with a single method call. It plays a major role in the theory of Markowitz (1952) in that it accounts for **diversification** effects.

In [85]:
data.cov() * 252  # annualized covariance matrix by column

CLOSE,BRKa,US10YT=RR,AMT,GLD,BLK,BX,IVZ,LAZ,KKR,IEP.O
CLOSE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
BRKa,3551380000000.0,2478288.0,1835329000.0,364035100.0,5439244000.0,830738600.0,-96383350.0,52855330.0,402980200.0,-66195060.0
US10YT=RR,2478288.0,132.6483,-2692.11,-160.1114,-1262.355,104.1108,138.0936,324.0966,248.7087,-236.4812
AMT,1835329000.0,-2692.11,1188057.0,208318.5,2980495.0,436335.7,-79136.89,14849.79,206490.0,-39510.16
GLD,364035100.0,-160.1114,208318.5,138671.2,566272.6,112537.9,-28663.62,-19452.99,61224.02,-53897.29
BLK,5439244000.0,-1262.355,2980495.0,566272.6,9310935.0,1298647.0,-121885.3,136705.2,651255.7,-68595.86
BX,830738600.0,104.1108,436335.7,112537.9,1298647.0,227020.4,-22834.22,6537.543,109980.7,-14014.61
IVZ,-96383350.0,138.0936,-79136.89,-28663.62,-121885.3,-22834.22,14736.15,8639.787,-10629.53,20460.12
LAZ,52855330.0,324.0966,14849.79,-19452.99,136705.2,6537.543,8639.787,16703.94,4013.26,15674.35
KKR,402980200.0,248.7087,206490.0,61224.02,651255.7,109980.7,-10629.53,4013.26,57127.61,-5575.09
IEP.O,-66195060.0,-236.4812,-39510.16,-53897.29,-68595.86,-14014.61,20460.12,15674.35,-5575.09,90165.35


With the covariance matrix, **expected portfolio variance** is calculated as follows.

In [86]:
np.dot(weights, np.dot(rets.cov() * 252, weights))

0.09894103441818411

Accordingly, **expected portfolio volatility** is then given by:

In [87]:
math.sqrt(np.dot(weights, np.dot(rets.cov() * 252, weights)))

0.31454893803378847

In [88]:
def portfolio_volatility(symbols, weights):
    return math.sqrt(np.dot(weights, np.dot(rets[symbols].cov() * 252, weights)))

In [89]:
portfolio_volatility(rics, weights)

0.31454893803378847

## Simulating Portfolio Compositions &mdash; Two Instruments

To get started, consider just two financial instruments for which **portfolio compositions** are simulated that add up to 100% (= 1).

In [92]:
fis = ['BRKa', 'BX']

In [93]:
w = np.random.random((500, len(fis)))  # random portfolio compositions ...

In [94]:
w[:5].round(2)  # ... that do not yet add up to 100%

array([[0.4 , 0.33],
       [0.16, 0.99],
       [0.02, 0.12],
       [0.73, 0.81],
       [0.48, 0.2 ]])

In [95]:
w = (w.T / w.sum(axis=1)).T  # normalization ...

In [96]:
w[:5].round(2)  # ... let's the random numbers add up to 100%

array([[0.55, 0.45],
       [0.14, 0.86],
       [0.16, 0.84],
       [0.47, 0.53],
       [0.71, 0.29]])

In [97]:
w.sum(axis=1)[:5] # ... let's the random numbers add up to 100%

array([1., 1., 1., 1., 1.])

Given these random composition, the resulting **portfolio statistics** can be derived.

In [98]:
mvp = [(portfolio_volatility(fis, weights),
       portfolio_return(fis, weights))
         for weights in w]

In [99]:
mvp = pd.DataFrame(np.array(mvp), columns=['volatility', 'return'])
mvp.iloc[:5]

Unnamed: 0,volatility,return
0,0.346175,0.295163
1,0.489013,0.324776
2,0.480753,0.323233
3,0.369152,0.300592
4,0.305718,0.283869


These results can then be **visualized**.

In [100]:
mvp.iplot(x='volatility', y='return', kind='scatter', mode='markers', color='red')

## Simulating Portfolio Compositions &mdash; All Instruments

Second, consider portfolio compositions for **all financial instruments**. The code is basically the same.

In [116]:
w = np.random.random((2500, len(rics)))  # random portfolio compositions ...

In [117]:
w[:5].round(2)  # ... that do not yet add up to 100%

array([[0.8 , 0.56, 0.24, 0.19, 0.99, 0.48, 0.46, 0.92, 0.08, 0.36],
       [0.07, 0.15, 0.17, 0.62, 0.81, 0.47, 0.66, 0.15, 0.51, 0.96],
       [0.53, 0.57, 0.09, 0.95, 0.28, 0.68, 0.92, 0.4 , 0.44, 0.77],
       [0.98, 0.26, 0.91, 0.71, 0.03, 0.24, 0.41, 0.51, 0.69, 0.01],
       [0.07, 0.76, 0.76, 0.99, 0.37, 0.35, 0.05, 0.01, 0.34, 0.27]])

In [118]:
w = (w.T / w.sum(axis=1)).T  # normalization ...

In [119]:
w[:5].round(2)  # ... let's the random numbers add up to 100%

array([[0.16, 0.11, 0.05, 0.04, 0.19, 0.09, 0.09, 0.18, 0.01, 0.07],
       [0.02, 0.03, 0.04, 0.14, 0.18, 0.1 , 0.14, 0.03, 0.11, 0.21],
       [0.09, 0.1 , 0.02, 0.17, 0.05, 0.12, 0.16, 0.07, 0.08, 0.14],
       [0.21, 0.06, 0.19, 0.15, 0.01, 0.05, 0.09, 0.11, 0.15, 0.  ],
       [0.02, 0.19, 0.19, 0.25, 0.09, 0.09, 0.01, 0.  , 0.08, 0.07]])

In [120]:
w.sum(axis=1)[:5] # ... let's the random numbers add up to 100%

array([1., 1., 1., 1., 1.])

Given these random composition, the resulting **portfolio statistics** can be derived.

In [121]:
mvp = [(portfolio_volatility(rics, weights),
       portfolio_return(rics, weights))
         for weights in w]

In [122]:
mvp = pd.DataFrame(np.array(mvp), columns=['volatility', 'return'])
mvp.iloc[:5]

Unnamed: 0,volatility,return
0,0.341534,0.115825
1,0.329462,0.091485
2,0.317421,0.080172
3,0.272132,0.149706
4,0.251852,0.131961


These results can then be **visualized**.

In [123]:
mvp.iplot(x='volatility', y='return', kind='scatter', mode='markers', color='red')

## Minimum Volatility Portfolio

Let us now derive the portfolio composition that **minimizes the expected volatility** of the portfolio. First, the **boundary conditions** for the single weights (between 0 and 1).

In [124]:
bounds = len(rics) * [(0, 1)]

In [125]:
bounds

[(0, 1),
 (0, 1),
 (0, 1),
 (0, 1),
 (0, 1),
 (0, 1),
 (0, 1),
 (0, 1),
 (0, 1),
 (0, 1)]

Second, the condition that the **weights add up to 100%**.

In [126]:
constraints = {'type': 'eq', 'fun': lambda weights: weights.sum() - 1} 

Third, the **function to be minimized**. This is the function `portfolio_volatility()` from above.

These three elements are used in combination with the **general minimizer function `sco.minimize()`** to derive the minimum volatility portfolio.

In [127]:
res = sco.minimize(lambda x: portfolio_volatility(rics, x),  # function to be minized
                   len(rics) * [1 / len(rics)],  # initial guess
                   bounds=bounds,  # boundary conditions
                   constraints=constraints  # single equality constraint
                  )

The **results** are:

In [128]:
res

     fun: 0.15138806688674178
     jac: array([0.15191929, 0.15305229, 0.15156108, 0.15102613, 0.20106824,
       0.22623222, 0.23814757, 0.2262693 , 0.24008676, 0.15241497])
 message: 'Optimization terminated successfully'
    nfev: 89
     nit: 8
    njev: 8
  status: 0
 success: True
       x: array([8.66831872e-02, 6.75865033e-02, 1.77607199e-01, 6.30795242e-01,
       3.08184468e-17, 0.00000000e+00, 0.00000000e+00, 1.30104261e-18,
       3.22550146e-18, 3.73278681e-02])

In [129]:
res['fun']  # minimum volatility

0.15138806688674178

In [130]:
for r in zip(rics, res['x']):
    print('%7s | %7.3f' % (r[0], r[1])) # optimal portfolio composition

   BRKa |   0.087
US10YT=RR |   0.068
    AMT |   0.178
    GLD |   0.631
    BLK |   0.000
     BX |   0.000
    IVZ |   0.000
    LAZ |   0.000
    KKR |   0.000
  IEP.O |   0.037


## Conclusions

Based on this tutorial, we can conclude that

* it is easy to retrieve **historical end-of-day across asset classes** via the Eikon Data API,
* `Plotly` and `Cufflinks` make **financial data visualization** convenient and
* `pandas` is a powerful data analysis tool to implement **financial algorithms** efficiently, such as the portfolio selection approach from Markowitz (1952).

## Eikon Data API Developer Resources

* [Overview](https://developers.thomsonreuters.com/eikon-data-apis) 
* [Quick Start ](https://developers.thomsonreuters.com/eikon-data-apis/quick-start)
* [Documentation](https://developers.thomsonreuters.com/eikon-data-apis/docs)
* [Downloads](https://developers.thomsonreuters.com/eikon-data-apis/downloads)
* [Tutorials](https://developers.thomsonreuters.com/eikon-data-apis/learning)
* [Q&A Forums](https://developers.thomsonreuters.com/eikon-data-apis/qa) 

Data Item Browser Application: Type `DIB` into Eikon Search Bar.

<img src="http://eikon.tpq.io/refinitiv_logo.png" width="28%" align="left" style="vertical-align: top; padding-top: 23px;">
<img src="http://hilpisch.com/tpq_logo_long.png" width="36%" align="right" style="vertical-align: top;">