# Machine Learning project: Bitcoin Price Prediction

One of the interesting relationship that I noticed on a first glance was the steady increase in price of the cryptocurrency with time. While deeper analysis may reveal times where some sharp decline/drop in the price, it looks like the price of this particular coin has always been on a trajectory **to the moon**. 

I would like my model to be able to predict the price of Bitcoin based on the `date`, the `opening price` and the `volume`. 

# Reading the bitcoin.CSV file into a DataFrame

Here I am using pandas to examine the bitcoin.csv file.

Here the bitcoin.csv data file is being read into a data frame using pandas. They `keys()` method is used to display the names of the fields and `head(5)` is used to display the first 5 lines of the data frame.  

In [3]:
#importing the data into a pandas dataframe.

import pandas as pd
dataframe_btc = pd.read_csv('bitcoin.csv')

#For the Keys
print(dataframe_btc.keys())

#First five lines of the data frame using head.
dataframe_btc.head(5)

Index(['SNo', 'Name', 'Symbol', 'Date', 'High', 'Low', 'Open', 'Close',
       'Volume', 'Marketcap'],
      dtype='object')


Unnamed: 0,SNo,Name,Symbol,Date,High,Low,Open,Close,Volume,Marketcap
0,1,Bitcoin,BTC,4/29/2013 23:59,147.488007,134.0,134.444,144.539993,0.0,1603769000.0
1,2,Bitcoin,BTC,4/30/2013 23:59,146.929993,134.050003,144.0,139.0,0.0,1542813000.0
2,3,Bitcoin,BTC,5/1/2013 23:59,139.889999,107.720001,139.0,116.989998,0.0,1298955000.0
3,4,Bitcoin,BTC,5/2/2013 23:59,125.599998,92.281898,116.379997,105.209999,0.0,1168517000.0
4,5,Bitcoin,BTC,5/3/2013 23:59,108.127998,79.099998,106.25,97.75,0.0,1085995000.0


## Part 4 - Quick Overview
Lets get a quick overview of the data using the info() function.

In [5]:
#Using the info() function the get a quick overview of the data.

dataframe_btc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2991 entries, 0 to 2990
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   SNo        2991 non-null   int64  
 1   Name       2991 non-null   object 
 2   Symbol     2991 non-null   object 
 3   Date       2991 non-null   object 
 4   High       2991 non-null   float64
 5   Low        2991 non-null   float64
 6   Open       2991 non-null   float64
 7   Close      2991 non-null   float64
 8   Volume     2991 non-null   float64
 9   Marketcap  2991 non-null   float64
dtypes: float64(6), int64(1), object(3)
memory usage: 233.8+ KB


## _Notes_
`dataframe.info()` is very useful to obtain a quick summary of the data. It is evident from the output above that it provides a very good snapshot of the data. Some of the most noticeable informations are the range of index, the number of columns, the name of the column and the type of data. It also provides information about the data type of each column and the memory  usage.

In [9]:
#Your code here
dataframe_btc['Date'].value_counts()

4/29/2013 23:59     1
10/17/2018 23:59    1
10/8/2018 23:59     1
10/9/2018 23:59     1
10/10/2018 23:59    1
                   ..
1/23/2016 23:59     1
1/24/2016 23:59     1
1/25/2016 23:59     1
1/26/2016 23:59     1
7/6/2021 23:59      1
Name: Date, Length: 2991, dtype: int64

## Looking at the numerical fields
We can look at the numerical fields using the describe() function. 

In [10]:
dataframe_btc.describe()

Unnamed: 0,SNo,High,Low,Open,Close,Volume,Marketcap
count,2991.0,2991.0,2991.0,2991.0,2991.0,2991.0,2991.0
mean,1496.0,6893.326038,6486.009539,6700.14624,6711.290443,10906330000.0,120876100000.0
std,863.571653,11642.832456,10869.03213,11288.043736,11298.141921,18888950000.0,210943800000.0
min,1.0,74.561096,65.526001,68.504997,68.431,0.0,778411200.0
25%,748.5,436.179001,422.879486,430.445496,430.569489,30367250.0,6305579000.0
50%,1496.0,2387.610107,2178.5,2269.889893,2286.409912,946036000.0,37415030000.0
75%,2243.5,8733.926948,8289.80046,8569.656493,8576.238715,15920150000.0,149996000000.0
max,2991.0,64863.09891,62208.96437,63523.75487,63503.45793,350968000000.0,1186360000000.0


## Splitting the data frame into test and training set.

Used the train_test_split() function to split the data set into training(75%) and test(25%) sets. 

In [13]:
from sklearn.model_selection import train_test_split

train_set, test_set = train_test_split(dataframe_btc, test_size=0.25, random_state=123, shuffle=True)
print(len(train_set), len(test_set))
print(train_set.head())
print(test_set.head())

# Make a copy of the training set
working_set = train_set.copy()

2243 748
       SNo     Name Symbol              Date         High          Low  \
1408  1409  Bitcoin    BTC    3/7/2017 23:59  1275.550049  1204.800049   
838    839  Bitcoin    BTC   8/15/2015 23:59   266.666992   261.295990   
1993  1994  Bitcoin    BTC  10/13/2018 23:59  6308.510000  6259.810000   
2242  2243  Bitcoin    BTC   6/19/2019 23:59  9299.621044  9070.395994   
2526  2527  Bitcoin    BTC   3/29/2020 23:59  6250.467309  5920.086018   

             Open        Close        Volume     Marketcap  
1408  1273.209961  1223.540039  2.912560e+08  1.982477e+10  
838    265.528992   261.550995  1.932110e+07  3.792758e+09  
1993  6278.080000  6285.990000  3.064030e+09  1.088780e+11  
2242  9078.727603  9273.521766  1.554681e+10  1.647810e+11  
2526  6245.624627  5922.043123  2.837369e+10  1.083390e+11  
       SNo     Name Symbol             Date         High          Low  \
641    642  Bitcoin    BTC  1/30/2015 23:59   242.850998   225.839004   
1391  1392  Bitcoin    BTC  2/18/2