## Capstone: EUR/USD Forecast - Time Series Analysis
---

#### Problem Statement

Build a model to predict the future price of EUR/USD currency pair using historical daily close price data along with exogenous variables such as economic and financial indicators.

#### Target Audience

This model is targeted to Hedge Funds, Financial Institutions and Multinational Corporations with businesses in both USA and Europe looking reduce their risk from market volatility by hedging against their current positions.

#### Sections

- [Import libraries](#Import-libraries)
- [Load Data](#Load-Data)
- [Data Cleaning](#Data-Cleaning)

### Import libraries

In [17]:
import pandas as pd
import numpy as np
from functools import reduce

### Load Data

In [18]:
# Load features from Jan 1999 to Dec 2019 
# eur_usd target varaible
eur_usd = pd.read_csv(r'../datasets/EURUSD.csv')
gbp_usd = pd.read_csv(r'../datasets/GBPUSD.csv')
usd_jpy = pd.read_csv(r'../datasets/USDJPY.csv')
chf_usd = pd.read_csv(r'../datasets/CHFUSD.csv')
usd_index = pd.read_csv(r'../datasets/DollarIndex.csv')
wti_crude = pd.read_csv(r'../datasets/WTICrude.csv')                        
sp500 = pd.read_csv(r'../datasets/S&P500.csv')
dow_jones = pd.read_csv(r'../datasets/DowJones.csv')
nasdaq = pd.read_csv(r'../datasets/Nasdaq.csv')
euro_n100 = pd.read_csv(r'../datasets/EURO_N100.csv')
cac40 = pd.read_csv(r'../datasets/CAC40.csv')
dax = pd.read_csv(r'../datasets/DAX.csv')
gold_usd = pd.read_csv(r'../datasets/gold_usd.csv')
brent_usd = pd.read_csv(r'../datasets/BrentCrude.csv')
fed_rate = pd.read_csv(r'../datasets/FedFundRate.csv')
euro_libor = pd.read_csv(r'../datasets/EuroLibor.csv')

In [19]:
# Combine features and target variable into main dataset
features = [eur_usd, gbp_usd, usd_jpy, chf_usd, usd_index, wti_crude, sp500, dow_jones, nasdaq, euro_n100, cac40, 
            dax, gold_usd, brent_usd, fed_rate, euro_libor]
main_df = reduce(lambda left,right: pd.merge(left,right,on=[' Date'], how='outer'), features)

In [20]:
# Display the datasets
main_df.head()

Unnamed: 0,Date,EURUSD Open,EURUSD Low,EURUSD High,EURUSD Close,EURUSD Adj. Close,GBPUSD Open,GBPUSD Low,GBPUSD High,GBPUSD Close,...,XAUUSD Adj. Close,CO1 Open,CO1 Low,CO1 High,CO1 Close,CO1 Adj. Close,FDTR Close,FDTR Adj. Close,EMUEVOLVINTRAT Close,EMUEVOLVINTRAT Adj. Close
0,4/30/2020,1.0873,1.0836,1.0965,1.0952,1.0952,1.2467,1.2433,1.2634,1.2593,...,1686.63,22.87,22.87,25.76,25.27,25.27,,,-0.16,-0.16
1,4/29/2020,1.082,1.0818,1.0885,1.0874,1.0874,1.243,1.2391,1.2485,1.2468,...,1712.6,20.66,20.53,23.88,22.54,22.54,0.25,0.25,-0.17,-0.17
2,4/28/2020,1.0831,1.081,1.0888,1.0819,1.0819,1.243,1.2404,1.2517,1.2426,...,1708.18,19.9,18.73,21.29,20.46,20.46,,,-0.15,-0.15
3,4/27/2020,1.0823,1.0811,1.086,1.0829,1.0829,1.2369,1.236,1.2455,1.2429,...,1714.9,21.55,19.11,21.91,19.99,19.99,,,-0.15,-0.15
4,4/24/2020,1.0778,1.0728,1.0819,1.0817,1.0817,1.2344,1.23,1.2373,1.2363,...,1726.96,21.93,20.5,22.7,21.44,21.44,,,-0.14,-0.14


### Data Cleaning

In [21]:
# Remove additional rows where target variable eur/usd is nan
# Create final df with the closing price columns for all features and date
main_df = main_df[main_df['EURUSD Close'].notna()]
main_df.columns = main_df.columns.str.replace(' Close', '')
final_df = main_df[[' Date','EURUSD', 'GBPUSD', 'USDJPY', 'CHFUSD', 'DXY', 'CL1', 'SPX', 'INDU_EC', 'CCMP_EC', 'N100', 
                    'CAC_EC', 'DAX_EC', 'XAUUSD', 'CO1', 'FDTR', 'EMUEVOLVINTRAT']]

In [22]:
# Rename columns with shorter names and lowercase
new_columns = {
    ' Date': 'date',
    'EURUSD': 'eur/usd',
    'GBPUSD': 'gbp/usd',
    'USDJPY': 'usd/jpy',
    'CHFUSD': 'chf/usd',
    'DXY': 'usd_index',
    'CL1': 'wti_crude',
    'SPX': 'snp_500',
    'INDU_EC': 'dow_jones',
    'CCMP_EC': 'nasdaq',
    'N100': 'euro_n100',
    'CAC_EC': 'cac_40',
    'DAX_EC': 'dax',
    'XAUUSD': 'gold_usd',
    'CO1': 'brent_crude',
    'FDTR': 'fed_rate',
    'EMUEVOLVINTRAT': 'euro_libor'
}

final_df.rename(columns = new_columns, inplace=True)
final_df.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,date,eur/usd,gbp/usd,usd/jpy,chf/usd,usd_index,wti_crude,snp_500,dow_jones,nasdaq,euro_n100,cac_40,dax,gold_usd,brent_crude,fed_rate,euro_libor
0,4/30/2020,1.0952,1.2593,107.1835,1.0356,99.58,18.84,2912.43,24345.72,8889.55,899.87,4559.18,10861.64,1686.63,25.27,,-0.16
1,4/29/2020,1.0874,1.2468,106.647,1.027,99.87,15.06,2939.51,24633.86,8914.71,919.99,4675.32,11107.74,1712.6,22.54,0.25,-0.17
2,4/28/2020,1.0819,1.2426,106.8695,1.0254,100.09,12.34,2863.39,24101.55,8607.73,900.85,4566.22,10795.63,1708.18,20.46,,-0.15
3,4/27/2020,1.0829,1.2429,107.26,1.0251,100.3,12.78,2878.48,24133.78,8730.16,886.07,4505.26,10659.99,1714.9,19.99,,-0.15
4,4/24/2020,1.0817,1.2363,107.482,1.0269,100.51,16.94,2836.74,23775.27,8634.52,865.8,4393.32,10336.09,1726.96,21.44,,-0.14


In [23]:
# Check the column data types and shape of dataset
final_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5559 entries, 0 to 5558
Data columns (total 17 columns):
date           5559 non-null object
eur/usd        5559 non-null float64
gbp/usd        5559 non-null float64
usd/jpy        5559 non-null float64
chf/usd        5559 non-null float64
usd_index      5559 non-null float64
wti_crude      5370 non-null float64
snp_500        5366 non-null object
dow_jones      5467 non-null object
nasdaq         5375 non-null object
euro_n100      5204 non-null object
cac_40         5439 non-null object
dax            5423 non-null object
gold_usd       5555 non-null object
brent_crude    5549 non-null float64
fed_rate       288 non-null float64
euro_libor     5550 non-null float64
dtypes: float64(9), object(8)
memory usage: 781.7+ KB


In [24]:
# Convert date column to datetime 
# Create a month and year column for the final data set
final_df['date'] = pd.to_datetime(final_df['date'])
final_df['month'] = final_df['date'].map(lambda x: x.month)
final_df['year'] = final_df['date'].map(lambda x: x.year)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


In [25]:
# Set date as index and sort from oldest to latest 
final_df.set_index('date', inplace=True)
final_df.sort_index(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [26]:
# Check for null values
null_values = final_df.isnull().sum().sort_values(ascending=False)
print(null_values[null_values > 0])

fed_rate       5271
euro_n100       355
snp_500         193
wti_crude       189
nasdaq          184
dax             136
cac_40          120
dow_jones        92
brent_crude      10
euro_libor        9
gold_usd          4
dtype: int64


In [27]:
# Euro N100 only started trading on 31st Dec 1999, used 31st value to bfill for the year(260 null values)
# Remaining null values due to public holiday, insert ffill values from previous business day
final_df['euro_n100'] = final_df['euro_n100'].ffill().bfill()
# CAC 40 missing values due to public holidays in France, use ffill values from previous business day, bfill for 1st Jan 1999
final_df['cac_40'] = final_df['cac_40'].ffill().bfill()
# DAX missing values due to public holidays in Germany, use ffill values from previous business day, bfill for 1st Jan 1999
final_df['dax'] = final_df['dax'].ffill().bfill()

# Missing values in dow_jones, sp_500 and nasdaq due to US public holidays
# Use ffill values from previous business day, bfill for 1st Jan 1999
final_df['snp_500'] = final_df['snp_500'].ffill().bfill()
final_df['dow_jones'] = final_df['dow_jones'].ffill().bfill()
final_df['nasdaq'] = final_df['nasdaq'].ffill().bfill()

# Missing values for commodities (wti, brent and gold) due to US public holidays
# Use ffill values from previous business day, bfill for 1st Jan 1999
final_df['wti_crude'] = final_df['wti_crude'].ffill().bfill()
final_df['brent_crude'] = final_df['brent_crude'].ffill().bfill()
final_df['gold_usd'] = final_df['gold_usd'].ffill().bfill()

# Missing values for rates (Fed Fund Rate and Euro Libor) due to public holidays and Fed Fund Rate is not available daily
# Use ffill values from previous business day, bfill for 1st Jan 1999
final_df['fed_rate'] = final_df['fed_rate'].ffill().bfill()
final_df['euro_libor'] = final_df['euro_libor'].ffill().bfill()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation:

In [28]:
# Convert remaining columns with data type object to float
col_to_float = ['snp_500', 'dow_jones', 'nasdaq',
             'euro_n100', 'cac_40', 'dax',
             'gold_usd']

for col in col_to_float:
    final_df.loc[:,col] = final_df.loc[:,col].apply(lambda x: x.replace(",", "")).astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


In [29]:
# Convert daily data to weekly and monthly
final_df1 = final_df.resample('W').mean()
final_df2 = final_df.resample('M').mean()

In [31]:
# Save the clean data set to csv
final_df1.to_csv(r'..\datasets\weekly_clean.csv')
final_df2.to_csv(r'..\datasets\monthly_clean.csv')
final_df.to_csv(r'..\datasets\daily_clean.csv')

In [32]:
final_df2.head()

Unnamed: 0_level_0,eur/usd,gbp/usd,usd/jpy,chf/usd,usd_index,wti_crude,snp_500,dow_jones,nasdaq,euro_n100,cac_40,dax,gold_usd,brent_crude,fed_rate,euro_libor,month,year
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1999-01-31,1.15991,1.650562,113.334762,0.7219,94.583333,12.430952,1247.527619,9337.773333,2350.211905,1000.0,4137.265238,5147.515238,287.516667,11.051905,4.75,3.13619,1,1999
1999-02-28,1.120615,1.627655,116.6825,0.70072,97.07,12.014,1245.7595,9320.539,2355.2335,1000.0,4124.2045,4960.8245,287.7575,10.5345,4.75,3.0935,2,1999
1999-03-31,1.08837,1.621683,119.534783,0.683709,99.405217,14.682174,1281.663913,9753.632174,2391.14,1000.0,4139.752174,4878.616522,285.934783,12.877391,4.75,3.048261,3,1999
1999-04-30,1.070041,1.608532,119.793182,0.669168,100.468182,17.272273,1332.892273,10415.725909,2535.863182,1000.0,4317.932727,5158.316364,282.563636,15.396364,4.75,2.696818,4,1999
1999-05-31,1.06251,1.614914,121.948095,0.663143,101.003333,17.726667,1330.634286,10842.446667,2510.591429,1000.0,4360.141905,5220.360952,276.761905,15.843333,4.75,2.58,5,1999
