<a href="https://colab.research.google.com/github/blackbudge98-cpu/gt-markets/blob/main/Google_Keywords_as_a_predictive_indicator_of_USD_trading_performance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Google Keywords as a predictive indicator of USD trading performance**

The Project is exploring how Google Trend KeyWord Data can be used in forward validation to determine the probability of a movement in a trading pair.

Our Control Variable will be USD to determine's its performance on the following trading pairs:

*   USD to Chinese Yuan
*   USD to BTC
*   USD to Oil
*   USD to Gold





In [None]:
#Use of the following libaries will assist in providing the project manager with the data

import yfinance as yf
import pandas as pd
from datetime import date

#The first data set we will want to see is USD over a 10 year period

tickers = ["USD", "USDCNY=X","BTC-USD","CL=F", "GC=F"]

#The definition of tickers will assist in a batch query rather than a singular batch query

df = yf.download(tickers, period="10y", interval="1d")["Close"]

#Rename the columns to be more user friendly, and align with our assumptions

df.rename(columns={"CL=F":"USD to Oil","GC=F":"USD to Gold","BTC-USD": "USD to BTC","USDCNY=X": "USD to Chinese Yuan"},inplace=True)

#print headers for 10 rows to see what the data looks like

print(df.head(10))

  df = yf.download(tickers, period="10y", interval="1d")["Close"]
[*********************100%***********************]  5 of 5 completed


Ticker      USD to BTC  USD to Oil  USD to Gold       USD  USD to Chinese Yuan
Date                                                                          
2015-08-29  229.779999         NaN          NaN       NaN                  NaN
2015-08-30  228.761002         NaN          NaN       NaN                  NaN
2015-08-31  230.056000   49.200001  1131.599976  1.390876               6.3785
2015-09-01  228.121002   45.410000  1138.699951  1.281003               6.3664
2015-09-02  229.283997   46.250000  1132.500000  1.341353               6.3545
2015-09-03  227.182999   46.750000  1123.699951  1.383659               6.3459
2015-09-04  230.298004   46.049999  1120.599976  1.345162               6.3459
2015-09-05  235.018997         NaN          NaN       NaN                  NaN
2015-09-06  239.839996         NaN          NaN       NaN                  NaN
2015-09-07  239.847000         NaN          NaN       NaN               6.3459


In [None]:
#Next step is to obtain information on the datatable, and apply pre-processing steps

df.info()

#understand how many numbers of rows are in the dataset
print('\n')
num_rows = len(df)
print(f"Number of rows: {num_rows}")

#the next step is to export the dataset as a csv file to enable a view of the data
today = date.today()
filename = f"financial_data_raw_data_from_yf{today.strftime('%Y-%m-%d')}.csv"
df.to_csv(filename)

print (f"Data exported to {filename}")

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3654 entries, 2015-08-29 to 2025-08-29
Freq: D
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   USD to BTC           3653 non-null   float64
 1   USD to Oil           2515 non-null   float64
 2   USD to Gold          2514 non-null   float64
 3   USD                  2514 non-null   float64
 4   USD to Chinese Yuan  2603 non-null   float64
dtypes: float64(5)
memory usage: 171.3 KB


Number of rows: 3654
Data exported to financial_data_raw_data_from_yf2025-08-29.csv


In [None]:
print("Blank values in the raw database")
print('\n')

#Identify the number of rows are blank
print(df.isna().sum())

print('\n')
blank_rate = (df.isna().sum() / num_rows) * 100
print("Blank Rate (%):")
print(blank_rate.round(2))

Blank values in the raw database


Ticker
USD to BTC                1
USD to Oil             1139
USD to Gold            1140
USD                    1140
USD to Chinese Yuan    1051
dtype: int64


Blank Rate (%):
Ticker
USD to BTC              0.03
USD to Oil             31.17
USD to Gold            31.20
USD                    31.20
USD to Chinese Yuan    28.76
dtype: float64


In [None]:
#After Identifying the blank rate in the original dataframe pre-processing needs to be applied
df_for_pre_processing = df.copy()
df_for_pre_processing['Day of Week'] = df_for_pre_processing.index.day_name()
print (df_for_pre_processing.head(10))

#We know Bitcoin trades all the time, but we want to see if there is any other blanks in the dataset
print('\n')
print("Blank values in the pre-processed database")
print('\n')
missing_values_per_day_of_week = df_for_pre_processing.groupby('Day of Week').apply(lambda g: g.isna().sum().sum())
print(missing_values_per_day_of_week)

Ticker      USD to BTC  USD to Oil  USD to Gold       USD  \
Date                                                        
2015-08-29  229.779999         NaN          NaN       NaN   
2015-08-30  228.761002         NaN          NaN       NaN   
2015-08-31  230.056000   49.200001  1131.599976  1.390876   
2015-09-01  228.121002   45.410000  1138.699951  1.281003   
2015-09-02  229.283997   46.250000  1132.500000  1.341353   
2015-09-03  227.182999   46.750000  1123.699951  1.383659   
2015-09-04  230.298004   46.049999  1120.599976  1.345162   
2015-09-05  235.018997         NaN          NaN       NaN   
2015-09-06  239.839996         NaN          NaN       NaN   
2015-09-07  239.847000         NaN          NaN       NaN   

Ticker      USD to Chinese Yuan Day of Week  
Date                                         
2015-08-29                  NaN    Saturday  
2015-08-30                  NaN      Sunday  
2015-08-31               6.3785      Monday  
2015-09-01               6.3664     T

  missing_values_per_day_of_week = df_for_pre_processing.groupby('Day of Week').apply(lambda g: g.isna().sum().sum())


In [None]:
#Therefore in the pre-processing dataset the decision will be made to drop both Saturday and Sunday
df_weekday = df_for_pre_processing.drop(df_for_pre_processing[(df_for_pre_processing['Day of Week'] == 'Saturday') | (df_for_pre_processing['Day of Week'] == 'Sunday')].index)
df_weekday_reordered = df_weekday[['Day of Week', 'USD', 'USD to Chinese Yuan', 'USD to BTC', 'USD to Oil', 'USD to Gold']]
df_weekday_reordered.head(10)

Ticker,Day of Week,USD,USD to Chinese Yuan,USD to BTC,USD to Oil,USD to Gold
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-08-31,Monday,1.390876,6.3785,230.056,49.200001,1131.599976
2015-09-01,Tuesday,1.281003,6.3664,228.121002,45.41,1138.699951
2015-09-02,Wednesday,1.341353,6.3545,229.283997,46.25,1132.5
2015-09-03,Thursday,1.383659,6.3459,227.182999,46.75,1123.699951
2015-09-04,Friday,1.345162,6.3459,230.298004,46.049999,1120.599976
2015-09-07,Monday,,6.3459,239.847,,
2015-09-08,Tuesday,1.448019,6.3559,243.606995,45.939999,1120.400024
2015-09-09,Wednesday,1.419348,6.3572,238.167999,44.150002,1102.199951
2015-09-10,Thursday,1.444812,6.3678,238.477005,45.919998,1109.5
2015-09-11,Friday,1.426165,6.3672,240.106995,44.630001,1103.5


In [None]:
#Where data is missing the assumption will be to replace in the df_weekday dataframe with the previous value in the dataset
df_weekday_usd = df_weekday.drop(columns=['USD to BTC','USD to Oil','USD to Gold','USD to Chinese Yuan'])
df_weekday_usd_reordered = df_weekday_usd[['Day of Week', 'USD']]
df_weekday_usd_reordered.head(10)

Ticker,Day of Week,USD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-08-31,Monday,1.390876
2015-09-01,Tuesday,1.281003
2015-09-02,Wednesday,1.341353
2015-09-03,Thursday,1.383659
2015-09-04,Friday,1.345162
2015-09-07,Monday,
2015-09-08,Tuesday,1.448019
2015-09-09,Wednesday,1.419348
2015-09-10,Thursday,1.444812
2015-09-11,Friday,1.426165


In [None]:
#Next step is for any NaN is to show the dates
df_weekday_usd_reordered.isna().sum()



Unnamed: 0_level_0,0
Ticker,Unnamed: 1_level_1
Day of Week,0
USD,96


In [None]:
# Show dates where 'USD' is NaN
dates_with_missing_usd = df_weekday_usd_reordered[df_weekday_usd_reordered['USD'].isna()].index
print("Dates with missing 'USD' values:")
print(dates_with_missing_usd)

Dates with missing 'USD' values:
DatetimeIndex(['2015-09-07', '2015-11-26', '2015-12-25', '2016-01-01',
               '2016-01-18', '2016-02-15', '2016-03-25', '2016-05-30',
               '2016-07-04', '2016-09-05', '2016-11-24', '2016-12-26',
               '2017-01-02', '2017-01-16', '2017-02-20', '2017-04-14',
               '2017-05-29', '2017-07-04', '2017-09-04', '2017-11-23',
               '2017-12-25', '2018-01-01', '2018-01-15', '2018-02-19',
               '2018-03-30', '2018-05-28', '2018-07-04', '2018-09-03',
               '2018-11-22', '2018-12-05', '2018-12-25', '2019-01-01',
               '2019-01-21', '2019-02-18', '2019-04-19', '2019-05-27',
               '2019-07-04', '2019-09-02', '2019-11-28', '2019-12-25',
               '2020-01-01', '2020-01-20', '2020-02-17', '2020-04-10',
               '2020-05-25', '2020-07-03', '2020-09-07', '2020-11-26',
               '2020-12-25', '2021-01-01', '2021-01-18', '2021-02-15',
               '2021-04-02', '2021-05-31', '

There are holidays in the United States which are not a set date but rather a day near of. For simplicity of the dataset it will be easier to use the previous close value.

In [None]:
# Where there is a NaaN going to use the previous day close to populate the value
df_weekday_usd_reordered['USD'] = df_weekday_usd_reordered['USD'].ffill()
df_weekday_usd_reordered.head(10)

Ticker,Day of Week,USD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-08-31,Monday,1.390876
2015-09-01,Tuesday,1.281003
2015-09-02,Wednesday,1.341353
2015-09-03,Thursday,1.383659
2015-09-04,Friday,1.345162
2015-09-07,Monday,1.345162
2015-09-08,Tuesday,1.448019
2015-09-09,Wednesday,1.419348
2015-09-10,Thursday,1.444812
2015-09-11,Friday,1.426165


In [None]:
#Now we add a daily change amount, and a percentage daily change to the dataset
df_weekday_usd_reordered['Daily Change'] = df_weekday_usd_reordered['USD'].diff()
df_weekday_usd_reordered['% Daily Change'] = df_weekday_usd_reordered['Daily Change'] / df_weekday_usd_reordered['USD']

# Fill the initial NaN values with 0
df_weekday_usd_reordered['Daily Change'].fillna(0, inplace=True)
df_weekday_usd_reordered['% Daily Change'].fillna(0, inplace=True)

df_weekday_usd_reordered.head(10)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_weekday_usd_reordered['Daily Change'].fillna(0, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_weekday_usd_reordered['% Daily Change'].fillna(0, inplace=True)


Ticker,Day of Week,USD,Daily Change,% Daily Change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2015-08-31,Monday,1.390876,0.0,0.0
2015-09-01,Tuesday,1.281003,-0.109874,-0.085772
2015-09-02,Wednesday,1.341353,0.06035,0.044992
2015-09-03,Thursday,1.383659,0.042306,0.030575
2015-09-04,Friday,1.345162,-0.038496,-0.028618
2015-09-07,Monday,1.345162,0.0,0.0
2015-09-08,Tuesday,1.448019,0.102857,0.071033
2015-09-09,Wednesday,1.419348,-0.028671,-0.0202
2015-09-10,Thursday,1.444812,0.025463,0.017624
2015-09-11,Friday,1.426165,-0.018647,-0.013075


In [None]:
#Export the DF Weekday USD Performance as a CSV File with the First Date - Last Date as the file name
today = date.today()
filename = f"financial_data_pre_processed_data_from_yf{today.strftime('%Y-%m-%d')}.csv"
df_weekday_usd_reordered.to_csv(filename)