# Dollar daily closing forecast (USD-BRL)
This project consists of extracting, through an API, the closing value of the dollar (for BRL) of the last few days and based on the data, creating a machine learning model that can predict the value that the dollar will close tomorrow.

The model selected for this project was [BayesianRidge](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html), as it is a model that handles well with few data samples.

##### Details about BayesianRidge:
The BayesianRidge model estimates the model parameters using the posterior probability distribution of the parameters, which is updated with each new observation. This allows the model to adapt to the data and make more accurate predictions. The main advantage of BayesianRidge is its ability to handle multicollinearity, which is the correlation between predictors. Furthermore, BayesianRidge is a regularized model that can handle regression problems where there are few samples and many predictors.

The API used for this project is provided by [AwesomeAPI](https://docs.awesomeapi.com.br/).

In [1]:
# import from core libraries
import requests
import json
import pandas as pd
import numpy as np
import datetime
from sklearn.model_selection import train_test_split

In [2]:
# model import
from sklearn import linear_model
bayesian_ridge = linear_model.BayesianRidge()

In [3]:
# timestamp conversion function
def convert_timestamp(timestamp):
    dt = datetime.datetime.fromtimestamp(timestamp)
    return dt.strftime("%m/%d/%Y")

In [15]:
# getting the connection to the API. The link returns the current dollar value in BRL
url = 'https://economia.awesomeapi.com.br/last/USD-BRL'
response = requests.get(url)

dolar_today = 0

if response.status_code == 200:
    data = response.json()
    dolar_today = data["USDBRL"]["bid"]
    print(data)
else:
    print(f'>> connection error (status_code={response.status_code})')

{'USDBRL': {'code': 'USD', 'codein': 'BRL', 'name': 'Dólar Americano/Real Brasileiro', 'high': '5.0493', 'low': '5.0493', 'varBid': '0.0008', 'pctChange': '0.01', 'bid': '5.0486', 'ask': '5.0501', 'timestamp': '1682110792', 'create_date': '2023-04-21 17:59:52'}}
5.0486


In [5]:
# obtaining the dollar closing of the last 360 days
url = 'https://economia.awesomeapi.com.br/json/daily/USD-BRL/360'

response = requests.get(url)
data = json.loads(response.text)

df = pd.DataFrame(columns=['USD', 'BRL'])

for item in data:
    df = df.append(
        {'BRL': float(item['bid']), 'USD': 1,'timestamp': int(item['timestamp'])},
        ignore_index=True
    )

df["closing_date"] = df['timestamp'].apply(convert_timestamp)
print(df.head(5))
print("shape:", df.shape)

   USD     BRL     timestamp closing_date
0  1.0  5.0486  1.682111e+09   04/21/2023
1  1.0  5.0486  1.682111e+09   04/21/2023
2  1.0  5.0486  1.682024e+09   04/20/2023
3  1.0  5.0754  1.681949e+09   04/19/2023
4  1.0  4.9854  1.681862e+09   04/18/2023
shape: (360, 4)


In [6]:
# separation of the data obtained into parts defined by the variable size_sequence
tamanho_sequencia = 15

sequencias_brl = []
for i in range(tamanho_sequencia, len(df['BRL'])):
    sequencia = df['BRL'][i-tamanho_sequencia:i+1]
    sequencias_brl.append(list(sequencia))

total_quotes = np.unique(np.array(sequencias_brl)).size

print(">> total_quotes:", total_quotes)
print(sequencias_brl[:3])

>> total_quotes: 326
[[5.0486, 5.0486, 5.0486, 5.0754, 4.9854, 4.9413, 4.9065, 4.907, 4.9275, 4.9176, 5.0049, 5.0662, 5.0549, 5.0549, 5.0549, 5.0594], [5.0486, 5.0486, 5.0754, 4.9854, 4.9413, 4.9065, 4.907, 4.9275, 4.9176, 5.0049, 5.0662, 5.0549, 5.0549, 5.0549, 5.0594, 5.034], [5.0486, 5.0754, 4.9854, 4.9413, 4.9065, 4.907, 4.9275, 4.9176, 5.0049, 5.0662, 5.0549, 5.0549, 5.0549, 5.0594, 5.034, 5.0692]]


In [7]:
# turning the list of parts into a dataframe, with a label and various
# features that will be used to feed the model
data = []
for seq in sequencias_brl:
    target = seq[0]
    preds = seq[1:]
    data.append([target] + preds)

colunas = ['brl_target'] + [f'brl_pred{i}' for i in range(1, tamanho_sequencia + 1)]

df_brl_sequencias = pd.DataFrame(data, columns=colunas)
df_brl_sequencias = df_brl_sequencias.iloc[:, ::-1]

display(df_brl_sequencias.head(3))
print("shape:", df_brl_sequencias.shape)

Unnamed: 0,brl_pred15,brl_pred14,brl_pred13,brl_pred12,brl_pred11,brl_pred10,brl_pred9,brl_pred8,brl_pred7,brl_pred6,brl_pred5,brl_pred4,brl_pred3,brl_pred2,brl_pred1,brl_target
0,5.0594,5.0549,5.0549,5.0549,5.0662,5.0049,4.9176,4.9275,4.907,4.9065,4.9413,4.9854,5.0754,5.0486,5.0486,5.0486
1,5.034,5.0594,5.0549,5.0549,5.0549,5.0662,5.0049,4.9176,4.9275,4.907,4.9065,4.9413,4.9854,5.0754,5.0486,5.0486
2,5.0692,5.034,5.0594,5.0549,5.0549,5.0549,5.0662,5.0049,4.9176,4.9275,4.907,4.9065,4.9413,4.9854,5.0754,5.0486


shape: (345, 16)


In [8]:
# defining the variables X and y, to be used as feed for the model later
X = df_brl_sequencias.drop(columns=["brl_target"])
y = df_brl_sequencias["brl_target"]

In [9]:
# dividing the variables X and y so that it is possible to analyze the
# performance of the model with a training and test set
X_train, X_test, y_train, y_test =  train_test_split(X, y, test_size=0.2, random_state=7)
print("training data size:", len(X_train)); print("test data size:", len(X_test))

training data size: 276
test data size: 69


In [10]:
# Here the BayesianRidge model is being instantiated, trained and evaluated
# by the accuracy metric
regression_model_ = bayesian_ridge
regression_model_.fit(X_train, y_train)

accuracy = regression_model_.score(X_test, y_test)
print(f">> {regression_model_}, accuracy: {accuracy:.2%}")

>> BayesianRidge(), accuracy: 93.90%


In [11]:
# this is a function that allows the developer to evaluate the
# performance of the model that was trained with the test data
def experiment(model, evaluation_df, line):
    query_feature = evaluation_df[line-1:line]
    query_feature = query_feature[:tamanho_sequencia]
    
    X_ = query_feature.drop(columns=["brl_target"])
    y_ = query_feature["brl_target"]
    
    prediction = model.predict(X_)
    
    print("features:")
    display(X_)
    
    print(f"\n>> model prediction: {prediction[0]:.{4}f}")
    print(f">> correct answer: {float(y_)}")
    print(f">> error: {prediction[0] - float(y_):.{4}f}")
    return

evaluation_df = X_test.copy()
evaluation_df["brl_target"] = y_test
evaluation_df = evaluation_df.reset_index(drop=True)

# the value informed in this variable will be the chosen line among the data
# that were separated for tests and that will be used to evaluate the model:
test_df_row_selection = 16

experiment(regression_model_, evaluation_df, test_df_row_selection + 1)

features:


Unnamed: 0,brl_pred15,brl_pred14,brl_pred13,brl_pred12,brl_pred11,brl_pred10,brl_pred9,brl_pred8,brl_pred7,brl_pred6,brl_pred5,brl_pred4,brl_pred3,brl_pred2,brl_pred1
16,4.7535,4.7314,4.7965,4.776,4.7953,4.8707,4.895,4.9043,4.9855,4.9868,5.1146,5.117,5.0558,5.0535,5.1533



>> model prediction: 5.1580
>> correct answer: 5.1525
>> error: 0.0055


In [12]:
# now that the model has been evaluated, it will be fed the full dataset
regression_model = bayesian_ridge
regression_model.fit(X, y)

BayesianRidge()

In [13]:
# here we are selecting data from the most recent days
to_tomorrow = sequencias_brl[0][:tamanho_sequencia]
to_tomorrow.reverse()
df_to_tomorrow = pd.DataFrame([to_tomorrow], columns=X.columns)
print(to_tomorrow)

[5.0549, 5.0549, 5.0549, 5.0662, 5.0049, 4.9176, 4.9275, 4.907, 4.9065, 4.9413, 4.9854, 5.0754, 5.0486, 5.0486, 5.0486]


In [41]:
# based on data from the last few days, the trained model will now be
# able to predict the value of the dollar tomorrow
prediction = regression_model.predict(df_to_tomorrow)
print(f"Tomorrow the dollar will close at the value of '{prediction[0]}'\n")
differ = prediction[0] - float(dolar_today)

print(f">> Dolar today: {dolar_today}")
print(f">> Dollar tomorrow (estimate): {prediction[0]:.{3}f}")
print(f">> Difference: {differ:.4f}\n")

if differ > 0:
    print(f"Tomorrow the price went up by '{differ:.4f}'")
elif differ < 0:
    print(f"Tomorrow the price of the dollar should fall by '{differ:.4f}'")
else:
    print("Tomorrow the price probably won't change significantly.")

Tomorrow the dollar will close at the value of '5.057606777207697'

>> Dolar today: 5.0486
>> Dollar tomorrow (estimate): 5.058
>> Difference: 0.0090

Tomorrow the price went up by '0.0090'
