<h1>Beginner: Getting Comfortable with Data, Using Bitcoin Market Data - Part 1 (Python)</h1>

Looking back at 2017, there are a few things which stand out and Bitcoin is definitely one of them. The year 2017 was a roller coaster ride for Bitcoin. It had record highs and surpassed everyone's expectations.

Let's try looking at Bitcoin market data and take this opportunity to really get to know what it means to get "comfortable" with data.

If you're following my blog, then you probably just finished setting up your environment. If not, check out my post on <a href="https://datadatagoose.wordpress.com/2018/01/03/beginner-step-1-setting-up-your-data-environment-python/"><i>how to setup a Python data environment</i></a>.

<h2>Quick side step before moving ahead</h2>

For this post, we're going to be using <b>Quandl - a platform for financial, economic, and alternative data that serves investment professionals</b>. We'll get our Bitcoin market data from this resource.

Before we can get the data, we need to do a <b>quick installation of Quandl's Python package</b>. So how do we do that? Easy! Follow the 2 step process below, and it shouldn't take you more than a couple minutes:
<b>
1. Open up the <i>Anaconda Prompt</i> which was installed with the installation of Anaconda (Psst! If you haven't already installed Anaconda, you can always find out how to do that <a href="https://datadatagoose.wordpress.com/2018/01/03/beginner-step-1-setting-up-your-data-environment-python/"><i>here</i></a>!) 
2. Type <i>pip install quandl</i>
3. Let Anaconda do the rest =) !
</b>

That was easy, right?

Alright, so now we have our environment and we can get access to a dataset. Let's begin with the fun stuff!

<i>By the way, you can either copy and paste the code from here into your console in Spyder (my <a href="https://datadatagoose.wordpress.com/2018/01/03/beginner-step-1-setting-up-your-data-environment-python/">post on how to setup your environment</a> shows where the console is located) or you can head on over to my <a href="https://github.com/datadatagoose/Getting-Comfortable-with-Data-Using-Bitcoin-Market-Data">GitHub</a> and get the Jupyter Notebook for it.</i>

<h2>Finally, we can have fun!</h2>

First thing first, let's import the Quandl package we just installed. That way, we can use it in this tutorial.

In [1]:
import quandl

The following line is going to import the dataset from Quandl. I already took the liberty of finding a dataset which we can use for this post. I've never looked at this dataset before making this post, so this is just as new to me as it is to you. If you wish to take a look at the Quandl page where I found it, you can find it <a href="https://www.quandl.com/data/BCHARTS/ROCKUSD-Bitcoin-Markets-rockUSD">here</a>.

<b>To import the dataset, all you have to do is the following</b>

In [2]:
df = quandl.get('BCHARTS/ROCKUSD')

Now that we have our dataset imported, <b>let's look at the columns contained in this dataset</b>.

In [3]:
df.columns

Index(['Open', 'High', 'Low', 'Close', 'Volume (BTC)', 'Volume (Currency)',
       'Weighted Price'],
      dtype='object')

Interesting. What we're seeing above is that this dataset seems to have 7 columns. Cool!

<b>Let's look at the first 5 rows of the dataset.</b>

In [4]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume (BTC),Volume (Currency),Weighted Price
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2011-11-12,2.85,3.01,2.85,3.01,7.5,22.255,2.967333
2011-11-13,2.96,2.96,2.96,2.96,1.0,2.96,2.96
2011-11-14,3.0,3.0,2.74,2.74,6.55,18.868,2.880611
2011-11-15,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2011-11-16,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Alright, so we're seeing that this dataset is sort of like a stock dataset. It shows Bitcoin's dollar value at certain days. (Imagine having invested back in 2011! I could have turned 1 dollar into thousands of dollars...)

Everything seems all good and well, but hm... Didn't we just see above that there were 7 columns? Where did this 8th column by the name of <i>Date</i> come from? Good question.

<b>The column <i>Date</i> is actually a special type of column for this dataset known as the index... but what's an index? An index conists of key values which help speed up the process of data retrieval.</b> An index is very useful, although not necessary, but one should try and get into the habit of having an index. Considering this is stock-like data, it makes sense that the unique date (you're not going to see a certain date show up more than once in this dataset) is set as the index.

Good. Now that we cleared up that confusing issue, let's move forward.

<b>Let's look at the data types for each column. This will help later on if we wish to do some arithmetic or other fancy stuff.</b>

In [5]:
df.dtypes

Open                 float64
High                 float64
Low                  float64
Close                float64
Volume (BTC)         float64
Volume (Currency)    float64
Weighted Price       float64
dtype: object

Good. As expected, the data types for the columns are float64. I'm not going to go over in this post why one should have the correct data types. That will probably be covered in another post, but until then, just know that it's a good thing =).

<b>You can probably guess what's next. Since we're looking at data types, there's one "column" that is missing above. Yup, the index (i.e. the date in this dataset), so let's take a look at the data type for the index.</b>

In [6]:
df.index

DatetimeIndex(['2011-11-12', '2011-11-13', '2011-11-14', '2011-11-15',
               '2011-11-16', '2011-11-17', '2011-11-18', '2011-11-19',
               '2011-11-20', '2011-11-21',
               ...
               '2017-12-25', '2017-12-26', '2017-12-27', '2017-12-28',
               '2017-12-29', '2017-12-30', '2017-12-31', '2018-01-01',
               '2018-01-02', '2018-01-03'],
              dtype='datetime64[ns]', name='Date', length=2245, freq=None)

What we're seeing is that the first date stamp is November 12, 2011, and that the last date stamp is today's date (January 3, 2018). Keep in mind, this dataset gets new data daily and that your last date may differ than mine. 

<b>Also, the data type is of the time datetime64. Having it this way will help us to work with the data.</b>

In our next post, we'll dive a little deeper and add some visualizations, that way, we'll really get a sense of how our data behaves.