<h1 align="center">Predicting EUR/USD with LSTM Network</h1> 
<h3 align="center">Bradley Droegkamp</h3> 

# Introduction
***
Forex price prediction, much like stock price prediction, is a near impossible task given all the noise involved in price time series data.  However, profitable trading strategies can be made from models that provide only a sliver of edge.  In this project, I will use a Long Short-Term Memory (LSTM - http://colah.github.io/posts/2015-08-Understanding-LSTMs/) network to predict the 5 minute future price of the front month EUR/USD futures contract (EU) listed on the Chicago Mercantile Exchange (CME - https://www.cmegroup.com/trading/fx/g10/euro-fx.html).
<br>

# Data
***
The data set consists of 1-minute increment front-month EU price data from September 27, 2009 to April 18, 2018. I had purchased this data from kibot (www.kibot.com), a vendor of CME intraday data.  Note the data contains all open hours of trading, which is a 23 hour trading day of 17:00 t-1 - 16:00 CST Monday(Sunday PM) to Friday.
<br>
#### First let's bring in the raw data.

In [75]:
# The code was removed by Watson Studio for sharing.

+----------+-----+------+------+------+------+------+
|      Date| Time|  Open|  High|   Low| Close|Volume|
+----------+-----+------+------+------+------+------+
|09/27/2009|18:00|  1.47|1.4701| 1.469|1.4691|   441|
|09/27/2009|18:01|1.4691|1.4691|1.4689| 1.469|    29|
|09/27/2009|18:02| 1.469| 1.469|1.4688|1.4688|    22|
|09/27/2009|18:03|1.4687|1.4691|1.4687|1.4691|    38|
|09/27/2009|18:04|1.4692|1.4693|1.4692|1.4692|    20|
|09/27/2009|18:05|1.4692|1.4693| 1.469|1.4691|    11|
|09/27/2009|18:06|1.4691|1.4692|1.4689|1.4692|    14|
|09/27/2009|18:07|1.4691|1.4691| 1.469| 1.469|     6|
|09/27/2009|18:08| 1.469|1.4691| 1.469|1.4691|     5|
|09/27/2009|18:09| 1.469|1.4692| 1.469|1.4692|     7|
|09/27/2009|18:10|1.4692|1.4692|1.4684|1.4685|    81|
|09/27/2009|18:11|1.4686|1.4687|1.4683|1.4686|    63|
|09/27/2009|18:12|1.4687|1.4688|1.4686|1.4687|     7|
|09/27/2009|18:13|1.4687|1.4692|1.4687|1.4691|    25|
|09/27/2009|18:14| 1.469|1.4691|1.4684|1.4688|    37|
|09/27/2009|18:15|1.4686|1.4

#### Combine Date and Time columns.  Also, these times are in EST, but I prefer CST.

In [87]:
from pyspark.sql.functions import unix_timestamp, from_unixtime, concat, col, lit

# Convert Date and Time columns to Timestamps and combine
df_raw_2 = df_raw.select(unix_timestamp(concat(col('Date'), lit(' '), col('Time')), 'MM/dd/yyyy HH:mm')\
                   .cast(TimestampType()).alias('Timestamp'),
                   'Open', 'High', 'Low', 'Close', 'Volume')

# now substract hour from EST timestamps for CST
df = df_raw_2.select(from_unixtime(unix_timestamp(col('Timestamp')) - 60 * 60).alias('Timestamp'),
                    'Open', 'High', 'Low', 'Close', 'Volume')

df.createOrReplaceTempView('df')
spark.sql("SELECT * FROM df ORDER BY Timestamp LIMIT 10").show()

+-------------------+------+------+------+------+------+
|          Timestamp|  Open|  High|   Low| Close|Volume|
+-------------------+------+------+------+------+------+
|2009-09-27 17:00:00|  1.47|1.4701| 1.469|1.4691|   441|
|2009-09-27 17:01:00|1.4691|1.4691|1.4689| 1.469|    29|
|2009-09-27 17:02:00| 1.469| 1.469|1.4688|1.4688|    22|
|2009-09-27 17:03:00|1.4687|1.4691|1.4687|1.4691|    38|
|2009-09-27 17:04:00|1.4692|1.4693|1.4692|1.4692|    20|
|2009-09-27 17:05:00|1.4692|1.4693| 1.469|1.4691|    11|
|2009-09-27 17:06:00|1.4691|1.4692|1.4689|1.4692|    14|
|2009-09-27 17:07:00|1.4691|1.4691| 1.469| 1.469|     6|
|2009-09-27 17:08:00| 1.469|1.4691| 1.469|1.4691|     5|
|2009-09-27 17:09:00| 1.469|1.4692| 1.469|1.4692|     7|
+-------------------+------+------+------+------+------+

