## Exploring our World Indices data and simuating investments over 20 years

### Introduction
In this notebook we will import our data, before analysing, manipulating and exporting new datasets.

We are using 4 World Indices to model investments: FTSE 100, S&P 500, EUROSTOXX 50, and NIKKEI 225/

We will similate investing into these indices for a 20 year period.

### Importing Libraries and Data

In [2]:
#Import libraries
import pandas as pd
import numpy as np
import yfinance as yf
from datetime import datetime
import matplotlib.pyplot as plt

In [3]:
#Import indicies data
FTSE100 = pd.read_csv("Data//FTSE100 Data.csv", index_col=0, parse_dates=True)
SNP500 = pd.read_csv("Data//SNP500 Data.csv", index_col=0, parse_dates=True)
EURO50 = pd.read_csv("Data//EURO50 Data.csv", index_col=0, parse_dates=True)
NIKKEI = pd.read_csv("Data//Nikkei Data.csv", index_col=0, parse_dates=True)

#Import exchange rate data
USDtoGBP = pd.read_csv("Data//USD to GBP Data.csv", index_col=0, parse_dates=True)
EURtoGBP = pd.read_csv("Data//EUR to GBP Data.csv", index_col=0, parse_dates=True)
JPYtoGBP = pd.read_csv("Data//JPY to GBP Data.csv", index_col=0, parse_dates=True)


  SNP500 = pd.read_csv("Data//SNP500 Data.csv", index_col=0, parse_dates=True)
  EURO50 = pd.read_csv("Data//EURO50 Data.csv", index_col=0, parse_dates=True)
  NIKKEI = pd.read_csv("Data//Nikkei Data.csv", index_col=0, parse_dates=True)
  USDtoGBP = pd.read_csv("Data//USD to GBP Data.csv", index_col=0, parse_dates=True)
  EURtoGBP = pd.read_csv("Data//EUR to GBP Data.csv", index_col=0, parse_dates=True)
  JPYtoGBP = pd.read_csv("Data//JPY to GBP Data.csv", index_col=0, parse_dates=True)


### Combining and Formatting the Data

In [4]:
#We want to combine all the data into a singe dataframe

#First we make a list of all the datasets
data_list = [FTSE100, SNP500, EURO50, NIKKEI, USDtoGBP, EURtoGBP, JPYtoGBP]

#Next we cycle through the list and merge each dataset together sequentually
indexes = range(1,len(data_list))
dataset = data_list[0]

for i in indexes:
    dataset = pd.merge(dataset,data_list[i],on="Date")

In [5]:
#Renaming columns
dataset = dataset.rename(columns = {"EURO_Price": "EURO50_Price_EUR", "Nikkei_Price": "Nikkei_Price_JPY"})

#Removing columns and making sure every value is a float value
dataset.replace(',','', regex=True,inplace=True)
dataset = dataset.astype("float64")
dataset

Unnamed: 0_level_0,FTSE100_Price,SNP500_Price_USD,EURO50_Price_EUR,Nikkei_Price_JPY,USDtoGBP,EURtoGBP,JPYtoGBP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2025-07-01,9024.64,6307.67,5289.05,39744.00,0.7392,0.8685,0.503905
2025-06-01,8760.96,6204.95,5303.24,40487.39,0.7281,0.8582,0.505450
2025-05-01,8772.38,5911.69,5366.59,37965.10,0.7429,0.8431,0.515700
2025-04-01,8494.85,5569.06,5160.22,36045.38,0.7502,0.8498,0.524300
2025-03-01,8582.81,5611.85,5248.39,35617.56,0.7741,0.8374,0.516150
...,...,...,...,...,...,...,...
2005-03-01,4894.37,1180.60,3055.73,11668.95,0.5291,0.6859,0.494000
2005-02-01,4968.50,1203.60,3058.32,11740.60,0.5202,0.6887,0.497500
2005-01-01,4852.31,1181.30,2984.59,11387.59,0.5310,0.6919,0.512150
2004-12-01,4814.30,1211.90,2951.24,11488.76,0.5212,0.7070,0.508900


In [6]:
#Next we want to convert every value into GBP, to do this we mutiply the index value with the exchange rate for each date respectively
dataset["SNP500_Price_USD"] = dataset["SNP500_Price_USD"] * dataset["USDtoGBP"]
dataset["EURO50_Price_EUR"] = dataset["EURO50_Price_EUR"] * dataset["EURtoGBP"]
dataset["Nikkei_Price_JPY"] = dataset["Nikkei_Price_JPY"] * dataset["JPYtoGBP"]

#Renaming the columns to show we are working in GBP and dropping the exchange rate columns
dataset = dataset.rename(columns = { "SNP500_Price_USD": "SNP500_Price_GBP", 
                                    "EURO50_Price_EUR":"EURO50_Price_GBP", 
                                    "Nikkei_Price_JPY": "Nikkei_Price_GBP"})
dataset = dataset.drop(columns = ["USDtoGBP", "EURtoGBP", "JPYtoGBP"])

### Creating our Investment Frequency Functions

In [7]:
#Set start and end date
start = datetime(2005,1,1)
end = datetime(2024,12,31)

#Creating out desired date range
date_range = pd.date_range(start, end, freq='MS')

#Creating a list of year starts for use in our functions
year_starts = pd.date_range(start, end, freq='YS')

#Loading in data for quarterly dates and dates in the middle of each year
#Here, year_middle means 1st of July of each year, quarterly means 1st day of each quarter
year_middle = pd.read_csv("Data//year_middle.csv", parse_dates=True)
year_middle = pd.DatetimeIndex(data=year_middle["Date"])

quaterly_dates = pd.read_csv("Data//quaterly dates.csv", parse_dates=True)
quaterly_dates = pd.DatetimeIndex(data=quaterly_dates["Date"])

In [8]:
#Restricing the dataset to our desired date range
dataset = dataset.loc[date_range]

In [9]:
#First we create our function to simulate a monthly investment of £100, taking 4 inputs of the proportions to be invested in each index

#Refer to the diverse portfolio notebook for an indepth explanation of how the functions are designed
def invest_monthly(FTSE100_prop,SNP500_prop,EURO50_prop,Nikkei_prop):
    total_shares = 0
    total_invested = 0
    
    FTSE100_shares = 0
    SNP500_shares = 0
    EURO50_shares = 0
    Nikkei_shares = 0
    
    df = pd.DataFrame({'Date': [], 'FTSE100_Value': [], 'SNP500_Value': [],
                       "EURO50_Value":[],"Nikkei_Value":[], "Total_Value":[]}).set_index("Date")
    
    for date in date_range:
        #FTSE100
        FTSE100_Price = dataset.loc[date,"FTSE100_Price"]
        FTSE100_shares =  FTSE100_shares + (100*FTSE100_prop/FTSE100_Price)
        FTSE100_Value = FTSE100_shares*FTSE100_Price
        
        #SNP500
        SNP500_Price = dataset.loc[date,"SNP500_Price_GBP"]
        SNP500_shares =  SNP500_shares + (100*SNP500_prop/SNP500_Price)
        SNP500_Value = SNP500_shares*SNP500_Price
        
        #FTSE100
        EURO50_Price = dataset.loc[date,"EURO50_Price_GBP"]
        EURO50_shares =  EURO50_shares + (100*EURO50_prop/EURO50_Price)
        EURO50_Value = EURO50_shares*EURO50_Price

        #Nikkei
        Nikkei_Price = dataset.loc[date,"Nikkei_Price_GBP"]
        Nikkei_shares =  Nikkei_shares + (100*Nikkei_prop/Nikkei_Price)
        Nikkei_Value = Nikkei_shares*Nikkei_Price

        #Total Value
        Total_Value = FTSE100_Value + SNP500_Value + EURO50_Value + Nikkei_Value
    
        newrow = pd.DataFrame({'Date': [date], 'FTSE100_Value': [FTSE100_Value], 'SNP500_Value': [SNP500_Value],
                       "EURO50_Value":[EURO50_Value],"Nikkei_Value":[Nikkei_Value], "Total_Value": [Total_Value]}).set_index("Date")
    
        df = pd.concat([df, newrow])
   
    
    results = df
    return results

In [10]:
#Very similar to the monthly investment, to invest quarterly we have an if statement to check if we are at the start of the quarter
#If yes, we invest £300 (since it's 3 months), else we do nothing
def invest_quarterly(FTSE100_prop,SNP500_prop,EURO50_prop,Nikkei_prop):
    total_shares = 0
    total_invested = 0
    
    FTSE100_shares = 0
    SNP500_shares = 0
    EURO50_shares = 0
    Nikkei_shares = 0
    
    df = pd.DataFrame({'Date': [], 'FTSE100_Value': [], 'SNP500_Value': [],
                       "EURO50_Value":[],"Nikkei_Value":[], "Total_Value":[]}).set_index("Date")
    
    for date in date_range:
        if date in quaterly_dates:
            #FTSE100
            FTSE100_Price = dataset.loc[date,"FTSE100_Price"]
            FTSE100_shares =  FTSE100_shares + (300*FTSE100_prop/FTSE100_Price)
            FTSE100_Value = FTSE100_shares*FTSE100_Price
            
            #SNP500
            SNP500_Price = dataset.loc[date,"SNP500_Price_GBP"]
            SNP500_shares =  SNP500_shares + (300*SNP500_prop/SNP500_Price)
            SNP500_Value = SNP500_shares*SNP500_Price
            
            #FTSE100
            EURO50_Price = dataset.loc[date,"EURO50_Price_GBP"]
            EURO50_shares =  EURO50_shares + (300*EURO50_prop/EURO50_Price)
            EURO50_Value = EURO50_shares*EURO50_Price
    
            #Nikkei
            Nikkei_Price = dataset.loc[date,"Nikkei_Price_GBP"]
            Nikkei_shares =  Nikkei_shares + (300*Nikkei_prop/Nikkei_Price)
            Nikkei_Value = Nikkei_shares*Nikkei_Price

        else:
            #FTSE100
            FTSE100_Price = dataset.loc[date,"FTSE100_Price"]
            FTSE100_Value = FTSE100_shares*FTSE100_Price
            
            #SNP500
            SNP500_Price = dataset.loc[date,"SNP500_Price_GBP"]
            SNP500_Value = SNP500_shares*SNP500_Price
            
            #FTSE100
            EURO50_Price = dataset.loc[date,"EURO50_Price_GBP"]
            EURO50_Value = EURO50_shares*EURO50_Price
    
            #Nikkei
            Nikkei_Price = dataset.loc[date,"Nikkei_Price_GBP"]
            Nikkei_Value = Nikkei_shares*Nikkei_Price

        #Total Value
        Total_Value = FTSE100_Value + SNP500_Value + EURO50_Value + Nikkei_Value
    
        newrow = pd.DataFrame({'Date': [date], 'FTSE100_Value': [FTSE100_Value], 'SNP500_Value': [SNP500_Value],
                       "EURO50_Value":[EURO50_Value],"Nikkei_Value":[Nikkei_Value], "Total_Value": [Total_Value]}).set_index("Date")
    
        df = pd.concat([df, newrow])
   
    
    results = df
    return results

In [11]:
#Again very similar, here we check if we are at July 1st of each year, if yes we invest £1200, if not, we do nothing
def invest_yearly(FTSE100_prop,SNP500_prop,EURO50_prop,Nikkei_prop):
    total_shares = 0
    total_invested = 0
    
    FTSE100_shares = 0
    SNP500_shares = 0
    EURO50_shares = 0
    Nikkei_shares = 0
    
    df = pd.DataFrame({'Date': [], 'FTSE100_Value': [], 'SNP500_Value': [],
                       "EURO50_Value":[],"Nikkei_Value":[], "Total_Value":[]}).set_index("Date")
    
    for date in date_range:
        if date in year_middle:
            #FTSE100
            FTSE100_Price = dataset.loc[date,"FTSE100_Price"]
            FTSE100_shares =  FTSE100_shares + (1200*FTSE100_prop/FTSE100_Price)
            FTSE100_Value = FTSE100_shares*FTSE100_Price
            
            #SNP500
            SNP500_Price = dataset.loc[date,"SNP500_Price_GBP"]
            SNP500_shares =  SNP500_shares + (1200*SNP500_prop/SNP500_Price)
            SNP500_Value = SNP500_shares*SNP500_Price
            
            #FTSE100
            EURO50_Price = dataset.loc[date,"EURO50_Price_GBP"]
            EURO50_shares =  EURO50_shares + (1200*EURO50_prop/EURO50_Price)
            EURO50_Value = EURO50_shares*EURO50_Price
    
            #Nikkei
            Nikkei_Price = dataset.loc[date,"Nikkei_Price_GBP"]
            Nikkei_shares =  Nikkei_shares + (1200*Nikkei_prop/Nikkei_Price)
            Nikkei_Value = Nikkei_shares*Nikkei_Price

        else:
            #FTSE100
            FTSE100_Price = dataset.loc[date,"FTSE100_Price"]
            FTSE100_Value = FTSE100_shares*FTSE100_Price
            
            #SNP500
            SNP500_Price = dataset.loc[date,"SNP500_Price_GBP"]
            SNP500_Value = SNP500_shares*SNP500_Price
            
            #FTSE100
            EURO50_Price = dataset.loc[date,"EURO50_Price_GBP"]
            EURO50_Value = EURO50_shares*EURO50_Price
    
            #Nikkei
            Nikkei_Price = dataset.loc[date,"Nikkei_Price_GBP"]
            Nikkei_Value = Nikkei_shares*Nikkei_Price

        #Total Value
        Total_Value = FTSE100_Value + SNP500_Value + EURO50_Value + Nikkei_Value
    
        newrow = pd.DataFrame({'Date': [date], 'FTSE100_Value': [FTSE100_Value], 'SNP500_Value': [SNP500_Value],
                       "EURO50_Value":[EURO50_Value],"Nikkei_Value":[Nikkei_Value], "Total_Value": [Total_Value]}).set_index("Date")
    
        df = pd.concat([df, newrow])
   
    
    results = df
    return results

### Running our Similations and Creating New Datasets

In [12]:
#First we simulate investing at an equal 25% split across each of the four indices

#We want to create a dataframe with the results of investing montly, quarterly and yearly at equal proportions

#We combine the last rows (values at the end of our simulation)
equal_split = pd.concat([invest_monthly(0.25,0.25,0.25,0.25).tail(1), 
                   invest_quarterly(0.25,0.25,0.25,0.25).tail(1),
                   invest_yearly(0.25,0.25,0.25,0.25).tail(1)])

#Changing the index to show at what frequency the corresponding values were invested at
equal_split["Investment Frequency"]=["Monthly","Quarterly","Yearly"]
equal_split = equal_split.set_index("Investment Frequency")

In [13]:
#Next we want to simulate investing into each index at an amount proportional to said country's/region's GDP

#Here are the percentage of world GDP contributed: EU 14.7% USA 26.1% UK 3.2% Japan 4%

#Adding up the total and working out what percentage each country/region accounts for
total_gdp=14.7+26.1+3.2+4

UK_gdp = 3.2/total_gdp
USA_gdp = 26.1/total_gdp
EU_gdp = 14.7/total_gdp
Japan_gdp = 4/total_gdp

print(UK_gdp, USA_gdp,EU_gdp,Japan_gdp)



0.06666666666666667 0.5437500000000001 0.30624999999999997 0.08333333333333333


In [14]:
#Again we simulate investing monthly, quarterly and yearly
gdp_split = pd.concat([invest_monthly(UK_gdp,USA_gdp,EU_gdp,Japan_gdp).tail(1), 
                   invest_quarterly(UK_gdp,USA_gdp,EU_gdp,Japan_gdp).tail(1),
                   invest_yearly(UK_gdp,USA_gdp,EU_gdp,Japan_gdp).tail(1)])
gdp_split["Investment Frequency"]=["Monthly","Quarterly","Yearly"]
gdp_split = gdp_split.set_index("Investment Frequency")


### Exporting our Results and Creating a Normalised Dataset

In [15]:
#Exporting our results to csv files

gdp_split.to_csv("Exported_Data//World Indices GDP Split.csv")
equal_split.to_csv("Exported_Data//World Indices Equal Split.csv")

In [16]:
#For use on Power BI we want to create a normalised version of our original Indices dataset

#We are going to normalise to each column by dividing by its first value
normalised_dataset = pd.DataFrame([])

for column in ["FTSE100_Price",	"SNP500_Price_GBP",	"EURO50_Price_GBP",	"Nikkei_Price_GBP"]:
    normalised_dataset[column] = dataset[column] / dataset[column][0]


In [17]:
#Exporting our indices dataset, and our new normalised data
dataset.to_csv("Exported_Data//World Indices Dataset.csv")
normalised_dataset.to_csv("Exported_Data//Normalised World Indices Dataset.csv")


### Quickly looking into with vs with out rebalancing on our World Indices

In [18]:
#Refer to our diverse portfolio notebook to further understand this rebalancing function
def rebalancing(FTSE100_prop,SNP500_prop,EURO50_prop,Nikkei_prop):
    total_shares = 0
    total_invested = 0
    
    FTSE100_shares = 0
    SNP500_shares = 0
    EURO50_shares = 0
    Nikkei_shares = 0
    
    df = pd.DataFrame({'Date': [], 'FTSE100_Value': [], 'SNP500_Value': [],
                       "EURO50_Value":[],"Nikkei_Value":[], "Total_Value":[]}).set_index("Date")
    Total_Value = 0
    
    for date in date_range:
        if date in year_starts:
            #FTSE100
            FTSE100_Price = dataset.loc[date,"FTSE100_Price"]
            FTSE100_shares =  (100+Total_Value)*FTSE100_prop/FTSE100_Price
            FTSE100_Value = FTSE100_shares*FTSE100_Price
            
            #SNP500
            SNP500_Price = dataset.loc[date,"SNP500_Price_GBP"]
            SNP500_shares = (100+Total_Value)*SNP500_prop/SNP500_Price
            SNP500_Value = SNP500_shares*SNP500_Price
            
            #FTSE100
            EURO50_Price = dataset.loc[date,"EURO50_Price_GBP"]
            EURO50_shares =  (100+Total_Value)*EURO50_prop/EURO50_Price
            EURO50_Value = EURO50_shares*EURO50_Price
    
            #Nikkei
            Nikkei_Price = dataset.loc[date,"Nikkei_Price_GBP"]
            Nikkei_shares =  (100+Total_Value)*Nikkei_prop/Nikkei_Price
            Nikkei_Value = Nikkei_shares*Nikkei_Price
            
        else:
            #FTSE100
            FTSE100_Price = dataset.loc[date,"FTSE100_Price"]
            FTSE100_shares =  FTSE100_shares + (100*FTSE100_prop/FTSE100_Price)
            FTSE100_Value = FTSE100_shares*FTSE100_Price
            
            #SNP500
            SNP500_Price = dataset.loc[date,"SNP500_Price_GBP"]
            SNP500_shares =  SNP500_shares + (100*SNP500_prop/SNP500_Price)
            SNP500_Value = SNP500_shares*SNP500_Price
            
            #FTSE100
            EURO50_Price = dataset.loc[date,"EURO50_Price_GBP"]
            EURO50_shares =  EURO50_shares + (100*EURO50_prop/EURO50_Price)
            EURO50_Value = EURO50_shares*EURO50_Price
    
            #Nikkei
            Nikkei_Price = dataset.loc[date,"Nikkei_Price_GBP"]
            Nikkei_shares =  Nikkei_shares + (100*Nikkei_prop/Nikkei_Price)
            Nikkei_Value = Nikkei_shares*Nikkei_Price

        #Total Value
        Total_Value = FTSE100_Value + SNP500_Value + EURO50_Value + Nikkei_Value
    
        newrow = pd.DataFrame({'Date': [date], 'FTSE100_Value': [FTSE100_Value], 'SNP500_Value': [SNP500_Value],
                       "EURO50_Value":[EURO50_Value],"Nikkei_Value":[Nikkei_Value], "Total_Value": [Total_Value]}).set_index("Date")
    
        df = pd.concat([df, newrow])
   
    
    results = df
    return results

In [19]:
#Creating a new database with our with vs without rebalancing results, we have ran the functions with our earlier equal and gdp weights
with_vs_without_rebalancing = pd.concat([invest_monthly(0.25,0.25,0.25,0.25).tail(1), 
                                    rebalancing(0.25,0.25,0.25,0.25).tail(1),
                                    invest_monthly(UK_gdp, USA_gdp,EU_gdp,Japan_gdp).tail(1),
                                    rebalancing(UK_gdp, USA_gdp,EU_gdp,Japan_gdp).tail(1)])

In [20]:
#Adding a description column to explain the origin of each value
with_vs_without_rebalancing["Description"] = ["Equal Weights w/o Rebalancing", "Equal Weights with Rebalancing",
                                              "GDP Weights w/o Rebalancing", "GDP Weights with Rebalancing"]
                                        

In [21]:
#Exporting our results to a csv file
with_vs_without_rebalancing.to_csv("Exported_Data//World Indicies Rebalancing Data.csv")