# Simple Forex Backest Project

## Initial data read in and clean

In this notebook we will read in the raw data exported from Trading view and carry out some basic cleaning steps. 

In [18]:
#Import required libraries

import pandas as pd
import datetime as dt
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [19]:
!ls

Data Preparation.ipynb      gj_base.csv
README.md                   tv_export.csv
Simple Forex Features.ipynb


In [29]:
#Read in the data into a pandas dataframe
data = pd.read_csv("tv_export.csv")

In [30]:
data.head()

Unnamed: 0,time,open,high,low,close
0,1546291800,139.764,139.86,139.716,139.828
1,1546380000,139.828,139.851,139.695,139.822
2,1546381800,139.822,139.844,139.708,139.774
3,1546383600,139.774,139.915,139.76,139.883
4,1546385400,139.883,139.915,139.85,139.892


We can see that the data includes the time, open (open price of gbp/jpy thirty minute candle), high, low and close.

Our time format is currently in unix so we'll need to convert this to a datetime format which is suitable for our purposes. 

In [31]:
#convert unix time into python datetime format
data['time'] = pd.to_datetime(data['time'], unit='s')

In [32]:
data.head(3)

Unnamed: 0,time,open,high,low,close
0,2018-12-31 21:30:00,139.764,139.86,139.716,139.828
1,2019-01-01 22:00:00,139.828,139.851,139.695,139.822
2,2019-01-01 22:30:00,139.822,139.844,139.708,139.774


In [33]:
#Now specify our date range - start of first full week of trading in 2019 up to current date
date_range = data['time'].between('2019-01-06 22:00:00', '2020-09-24 18:00')

In [34]:
#pass this date range to our data frame
data = data[date_range]

In [35]:
data.head(2)
# We can see that the index will need to be reset due to the date range change

Unnamed: 0,time,open,high,low,close
145,2019-01-06 22:00:00,138.091,138.179,138.011,138.02
146,2019-01-06 22:30:00,138.02,138.084,137.922,138.074


In [36]:
#reset the index
data.reset_index(inplace=True, drop=True)

In [38]:
data.head(2)

Unnamed: 0,time,open,high,low,close
0,2019-01-06 22:00:00,138.091,138.179,138.011,138.02
1,2019-01-06 22:30:00,138.02,138.084,137.922,138.074


In [15]:
data.isna().sum()

time     0
open     0
high     0
low      0
close    0
dtype: int64

In [16]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21449 entries, 0 to 21448
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   time    21449 non-null  datetime64[ns]
 1   open    21449 non-null  float64       
 2   high    21449 non-null  float64       
 3   low     21449 non-null  float64       
 4   close   21449 non-null  float64       
dtypes: datetime64[ns](1), float64(4)
memory usage: 838.0 KB


This dataframe is looking ok to pass through to the next step. Let's save it to .csv file and store in our project directory. 

In [17]:
#Save our dataframe to .csv
data.to_csv('gj_base.csv', index=False)