# Bitcoin Price Prediction Project

## Abstract

Bitcoin is a cryptocurrency, created in 2009. Bitcoin system is a set of decentralized nodes with the bitcoin code that contains collection of transactions. A blockchain is a distributed database that is shared among the nodes of a computer network. As a database, a blockchain stores information in digital format. All the computers running the blockchain has the same list of blocks and transactions and can see all these new blocks being filled with new bitcoin transactions.

The original purpose of Bitcoin (BTC in short) is to allow two people to exchange value directly (using peer-to-peer technology), without centralbanks or goverments, regardless where they are. What this means is that Bitcoin blockchain is decentralized -  there is no centralized controll on this network.

Responsibility for processing transactions on the blockchain is done by - so called - Miners. "Mining" is performed using sophisticated hardware that solves an extremely complex math problem. The first computer to find the solution to the problem receives the next block of bitcoins and the process begins again. A newly mined block of bitcoins now can be used to store a value or be sold.

The amount of bitcoins is predetermined. Each and every Bitcoin had to be mined previously. For every four years the amount of bitcoins that can be mined, decreeses by half. That is so called halving of Bitcoin. The next halving will be in 2024 and means that miners will receive half of current revard for processing transactions.

Decentralization, predetermined amount and current dificulty of gaining new Bitcoins by miners is the reason why Bitcoin is so popular. Some people even calls it a digital gold. Popularity of Bitcoin makes the bitcoin market very volatile that is much higher compared to traditional currencies. Volatile marcet, may be an opportunity for speculation and other advantages of bitcoin may lead to long term, store of value strategy.

Popilarity of bitcoin has led invest founds to gain digitall assets in their wallets. This may disturb Bitcoin four years halving cycles.

Predicting Bitcoin price based on historical data should be accounted for, by the prism of marcet sentiment, current bitcoin phase and movement of large capital from invest founds and currend big holders.

### Data description

Historical data price has been taken from Coinpaprika API Python Client.
- Web Site - https://coinpaprika.com/waluta/btc-bitcoin/
- Github   - https://github.com/s0h3ck/coinpaprika-api-python-client

Coinpaprika is a popular data source for various cryptocurrencies, with Polish origins based in city of Poznań.
Free Coinpaprika API provides data in JSON and has limitation for amount of data per request. For example maximum of 50 tweets or historical data for at most 365 days at one request.
Examples of use can be found in their Github site.

To gain the access it is necessary to instal coinpaprica package:

In [1]:
# pip install coinpaprika

And import a data client from the installed pacage.

In [2]:
from coinpaprika import client as Coinpaprika

The rest used in this notebook packages:

In [3]:
import pandas as pd
import numpy as np

from datetime import datetime, date, timedelta

import matplotlib.pyplot as plt
import seaborn as sns

Load the Coinpaprica client and get random data sample.

In [4]:
client = Coinpaprika.Client()

## Gathering data

In [5]:
# Get historical OHLCV information for a specific coin (USD,BTC)

client.candles("btc-bitcoin", quotes="USD", start="2014-01-11T00:00:00Z")

[{'time_open': '2014-01-11T00:00:00Z',
  'time_close': '2014-01-11T23:59:59Z',
  'open': 867.32,
  'high': 921.48,
  'low': 861.72,
  'close': 913.95,
  'volume': 44754200,
  'market_cap': 11195636163}]

Limitation of historical data per request is 365 days. That is why it is necesarry to collect the data from Coinpaprica in parts and gather those parts in one dataset.

In [6]:
# client.candles("btc-bitcoin", quotes="USD", start="2020-01-01T00:00:00Z", end="2020-12-31T00:00:00Z")

Above request will be missing of one day (2020-12-31) and will end in 365 day of the year (2020-12-30). This is the reason to split one year reguest of data into two requests.

In [7]:
btc_array = []
days = 0

for year in range(2009, 2022+1, 1):
    
    first_half_start = str(year) + "-01-01T00:00:00Z"
    first_half_end = str(year) + "-06-30T00:00:00Z"
    second_half_start = str(year) + "-07-01T00:00:00Z"
    second_half_end = str(year) + "-12-31T00:00:00Z"
    
    first_half = client.candles("btc-bitcoin", quotes="USD", start=first_half_start, end=first_half_end)
    second_half = client.candles("btc-bitcoin", quotes="USD", start=second_half_start, end=second_half_end)
    
    btc_array = btc_array + first_half + second_half
    days_year = len(first_half + second_half)
    days = days + days_year
    print("Year:",year, "; Number of days in count:", days_year)

print("Days in total:", days)

Year: 2009 ; Number of days in count: 0
Year: 2010 ; Number of days in count: 168
Year: 2011 ; Number of days in count: 365
Year: 2012 ; Number of days in count: 366
Year: 2013 ; Number of days in count: 365
Year: 2014 ; Number of days in count: 365
Year: 2015 ; Number of days in count: 365
Year: 2016 ; Number of days in count: 366
Year: 2017 ; Number of days in count: 365
Year: 2018 ; Number of days in count: 365
Year: 2019 ; Number of days in count: 365
Year: 2020 ; Number of days in count: 366
Year: 2021 ; Number of days in count: 365
Year: 2022 ; Number of days in count: 69
Days in total: 4255


As it is shown above, Coinpaprika has no historical data from 2009 year and incomplite data from year 2010.

Check the length of gathered data.

In [8]:
len(btc_array)

4255

Check the last request.

In [9]:
btc_array[-1]

{'time_open': '2022-03-10T00:00:00Z',
 'time_close': '2022-03-10T16:27:00Z',
 'open': 41987.5749979755,
 'high': 42065.63722434308,
 'low': 38835.87721561094,
 'close': 39210.11347540771,
 'volume': 35927645380,
 'market_cap': 744188348706}

Convert gathered data to Pandas Data Frame and set the indexes starting from 1.

In [10]:
btc_df_gathered = pd.DataFrame(btc_array, index=(x for x in range(1, len(btc_array) + 1)))

Display df head with 100 rows.

In [11]:
btc_df_gathered.head(100)

Unnamed: 0,time_open,time_close,open,high,low,close,market_cap,volume
1,2010-07-17T00:00:00Z,2010-07-17T23:59:59Z,0.04951,0.04951,0.04951,0.04951,,
2,2010-07-18T00:00:00Z,2010-07-18T23:59:59Z,0.04951,0.04951,0.04951,0.04951,,
3,2010-07-19T00:00:00Z,2010-07-19T23:59:59Z,0.08584,0.08584,0.08584,0.08584,,
4,2010-07-20T00:00:00Z,2010-07-20T23:59:59Z,0.08080,0.08080,0.08080,0.08080,,
5,2010-07-21T00:00:00Z,2010-07-21T23:59:59Z,0.07474,0.07474,0.07474,0.07474,,
...,...,...,...,...,...,...,...,...
96,2010-10-20T00:00:00Z,2010-10-20T23:59:59Z,0.09700,0.09700,0.09700,0.09700,,
97,2010-10-21T00:00:00Z,2010-10-21T23:59:59Z,0.09900,0.09900,0.09900,0.09900,,
98,2010-10-22T00:00:00Z,2010-10-22T23:59:59Z,0.10700,0.10700,0.10700,0.10700,,
99,2010-10-23T00:00:00Z,2010-10-23T23:59:59Z,0.10250,0.10250,0.10250,0.10250,,


Display tail of the df.

In [12]:
btc_df_gathered.tail()

Unnamed: 0,time_open,time_close,open,high,low,close,market_cap,volume
4251,2022-03-06T00:00:00Z,2022-03-06T23:59:59Z,39446.033979,39688.489646,38342.367803,38427.596073,729209700000.0,24254710000.0
4252,2022-03-07T00:00:00Z,2022-03-07T23:59:59Z,38437.880981,39390.162713,37332.215703,38054.883339,722172000000.0,33136230000.0
4253,2022-03-08T00:00:00Z,2022-03-08T23:59:59Z,38070.263249,39258.762167,38065.562271,38769.972213,735778200000.0,30321650000.0
4254,2022-03-09T00:00:00Z,2022-03-09T23:59:59Z,38725.211796,42477.180071,38708.018503,42001.198148,797137000000.0,40501450000.0
4255,2022-03-10T00:00:00Z,2022-03-10T16:27:00Z,41987.574998,42065.637224,38835.877216,39210.113475,744188300000.0,35927650000.0


Save gathered data in to .csv file as a backup.

In [13]:
btc_df_gathered.to_csv("BTC_Preprocess_Backup.csv", sep = ",", index = True)

## Data preprocessing

For price prediction it will be used columns "close" and "time_close". 
Current df has to be refactored.