<a href="https://colab.research.google.com/github/Coyote-Schmoyote/currency-exchange-prediction/blob/main/currency_exchange.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3. Currency exchange rate prediction model

This notebook looks into Python-based ,achine learning and data science libraries in an attempt to analyze time series data and build a machine learning model that can predict the exchange rate of JPY to USD, and USD to EUR for a given day.

## 1. Problem Definition

#### Problem 1 
Fill in the missing NaN values with the data from the most recent previous day. 
If there is missing data about the year or the month, ignore the data.

#### Problem 2 
With the above data, visualize each statistic and the time series.

#### Problem 3

Display a histogram of the rate exchange, taking the difference between each day and the previous day (day - previous day).

#### Problem 4
Build a linear regression model to predict future prices (e.g., next day), using November 2016 as training data.
Use the price of the day as the target variable, and build a model that predicts the price of the day based on the prices from the previous days. Use December 2016 as test data.

## 2. Data

For this project, we are going to generate data using `pandas` module `pandas_datareader.data`. This module extracts data from various Internet sources and converts it to a pandas DataFrame. We will use the data from Federal Reserve Economic Data (FRED), starting January 2nd 2001, and ending December 30th 2016.

## 3. Evaluation
For evaluation of our Linear Regression model, we are going to use 3 metrics:
* Mean Squared Error (MSE)
* Root Mean Squared Error (RMSE)
* Mean Absolute Error (MAE)

> We alredy used these metrics in Abalone Age Prediciont Project: https://colab.research.google.com/drive/1LaPYv6-9fyaSoHuqYVK3uzLYx2CFqKgt?usp=sharing 

In [2]:
# Import the tools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Import linear regression model
from sklearn.linear_model import LinearRegression

# Import model evaluation tools
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import mean_absolute_error, r2_score, mean_squared_log_error

# Generate data
import pandas_datareader.data as pdr
start_date = "2001-01-02"
end_date = "2016-12-30"

jpy_usd = pdr.DataReader("DEXJPUS", "fred", start_date, end_date)
usd_eur = pdr.DataReader("DEXUSEU", "fred", start_date, end_date)

## Problem 1: Deal with missing data
In our previous projects, we didn‘t have any missing data. However, in reale world, the majority of datasets will have missing values. Even a small amount of missing data can cause major problems with analysis and machine learning process, and therefore, one of the first things we have to do when starteting a new data science project is to make sure we have no missing values. The most common ways to handle missing data are:
* Imputation 
* Removal


In [5]:
jpy_usd

Unnamed: 0_level_0,DEXJPUS
DATE,Unnamed: 1_level_1
2001-01-02,114.73
2001-01-03,114.26
2001-01-04,115.47
2001-01-05,116.19
2001-01-08,115.97
...,...
2016-12-26,
2016-12-27,117.52
2016-12-28,117.66
2016-12-29,116.32


In [6]:
jpy_usd.isna().sum()

DEXJPUS    154
dtype: int64

In [7]:
usd_eur

Unnamed: 0_level_0,DEXUSEU
DATE,Unnamed: 1_level_1
2001-01-02,0.9465
2001-01-03,0.9473
2001-01-04,0.9448
2001-01-05,0.9535
2001-01-08,0.9486
...,...
2016-12-26,
2016-12-27,1.0458
2016-12-28,1.0389
2016-12-29,1.0486


In [8]:
usd_eur.isna().sum()

DEXUSEU    154
dtype: int64