# Importação das Bibliotecas 
Realizando o processo de import para todas as bibliotecas necessárias para rodar o notebook 

In [1]:
import yfinance as yf
import pandas as pd

## Importando a database

Aqui, eu importei a database utilizando da biblioteca YFinance, que fornece algumas informações sobre a operação do bitcoin. Além disso, eu comecei a ver a base de dados que estava lidando, compreendendo sua formatação e como os dados estavam organizados.

In [2]:
# get historical market data
df_bitcoin = yf.download(tickers="BTC-USD", period="5y", interval="1d")

df_bitcoin.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-09-28,8251.273438,8285.617188,8125.431641,8245.915039,8245.915039,14141152736
2019-09-29,8246.037109,8261.707031,7990.49707,8104.185547,8104.185547,13034629109
2019-09-30,8104.226562,8314.231445,7830.758789,8293.868164,8293.868164,17115474183
2019-10-01,8299.720703,8497.692383,8232.679688,8343.276367,8343.276367,15305343413
2019-10-02,8344.212891,8393.041992,8227.695312,8393.041992,8393.041992,13125712443


In [3]:
colunas = df_bitcoin.columns
print("Contagem de nulls por coluna: \n")
for coluna in colunas:
    print(f'{coluna}:{df_bitcoin[coluna].isna().sum()}')

Contagem de nulls por coluna: 

Open:0
High:0
Low:0
Close:0
Adj Close:0
Volume:0


In [9]:
df_bitcoin.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1828 entries, 2019-09-28 to 2024-09-28
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Open                  1828 non-null   float64
 1   High                  1828 non-null   float64
 2   Low                   1828 non-null   float64
 3   Close                 1828 non-null   float64
 4   Adj Close             1828 non-null   float64
 5   Volume                1828 non-null   int64  
 6   Percentage_Variation  1828 non-null   float64
 7   Amplitude             1828 non-null   float64
dtypes: float64(7), int64(1)
memory usage: 128.5 KB


## Feature Engineering

Para conseguir criar um modelo mais eficiente, pensei em algumas features que poderiam ser úteis ao modelo. Assim, implementei features como a variação percentual 

In [4]:
for coluna in colunas:
    df_bitcoin[coluna] = df_bitcoin[coluna].apply(lambda x: round(x, 2))
df_bitcoin.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-09-28,8251.27,8285.62,8125.43,8245.92,8245.92,14141152736
2019-09-29,8246.04,8261.71,7990.5,8104.19,8104.19,13034629109
2019-09-30,8104.23,8314.23,7830.76,8293.87,8293.87,17115474183
2019-10-01,8299.72,8497.69,8232.68,8343.28,8343.28,15305343413
2019-10-02,8344.21,8393.04,8227.7,8393.04,8393.04,13125712443


In [5]:
df_bitcoin["Percentage_Variation"] = 100*(df_bitcoin["Close"] - df_bitcoin["Open"])/df_bitcoin["Open"] 
df_bitcoin.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Percentage_Variation
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-09-28,8251.27,8285.62,8125.43,8245.92,8245.92,14141152736,-0.064839
2019-09-29,8246.04,8261.71,7990.5,8104.19,8104.19,13034629109,-1.72022
2019-09-30,8104.23,8314.23,7830.76,8293.87,8293.87,17115474183,2.340013
2019-10-01,8299.72,8497.69,8232.68,8343.28,8343.28,15305343413,0.524837
2019-10-02,8344.21,8393.04,8227.7,8393.04,8393.04,13125712443,0.585196


In [6]:
df_bitcoin["Amplitude"] = df_bitcoin["High"] - df_bitcoin["Low"]
df_bitcoin.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Percentage_Variation,Amplitude
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2019-09-28,8251.27,8285.62,8125.43,8245.92,8245.92,14141152736,-0.064839,160.19
2019-09-29,8246.04,8261.71,7990.5,8104.19,8104.19,13034629109,-1.72022,271.21
2019-09-30,8104.23,8314.23,7830.76,8293.87,8293.87,17115474183,2.340013,483.47
2019-10-01,8299.72,8497.69,8232.68,8343.28,8343.28,15305343413,0.524837,265.01
2019-10-02,8344.21,8393.04,8227.7,8393.04,8393.04,13125712443,0.585196,165.34


In [7]:
df_bitcoin.index

DatetimeIndex(['2019-09-28', '2019-09-29', '2019-09-30', '2019-10-01',
               '2019-10-02', '2019-10-03', '2019-10-04', '2019-10-05',
               '2019-10-06', '2019-10-07',
               ...
               '2024-09-19', '2024-09-20', '2024-09-21', '2024-09-22',
               '2024-09-23', '2024-09-24', '2024-09-25', '2024-09-26',
               '2024-09-27', '2024-09-28'],
              dtype='datetime64[ns]', name='Date', length=1828, freq=None)