Your job here is to get the historical day-to-day pricing data for Ethereum from CoinMarketCap for all days (be sure to pick the right time range on the website!) before and including Dec 31, 2017. Write the function get_data(), which should print out whether or not the opening price was strictly greater than the closing price, for each day, in reverse chronological order. Print one line per day consisting of two space-separated items:
the date, in the format DD/MM/YYYY
1 if the opening price exceeds the closing price for that day, and 0 otherwise. 
Example 1:
```
Output:
12/31/2017 <1 or 0>
12/30/2017 <1 or 0>
...
```

In [1]:
from bs4 import BeautifulSoup
import urllib
import datetime
import pandas as pd
import string
import numpy as np

###########################################################################
#
# To send a HTTP request to http://example.com, and get the results, run:
# r = urllib.request.urlopen('http://example.com').read()
# soup = BeautifulSoup(r, "lxml")
# 
# To convert their date format to datetime object, use:
# dt = datetime.strptime(original_date, '%b %d, %Y') 
#
# To convert the datetime object to desired DD/MM/YYYY format, use:
# date = dt.strftime("%D")
#
###########################################################################


def strp(s):
        exclude = set(string.punctuation)
        x = ''.join(ch for ch in s if ch not in exclude)
        return datetime.datetime.strptime(x, '%b %d %Y')
def strp2(s):
    return s.strftime("%D")
        
def get_data(self): 
    url = 'https://coinmarketcap.com/currencies/ethereum/historical-data/?start=20130428&end=20171231'
    r = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(r)
    entries = []
    rows = soup.find_all("tr") # returns an array of <tr> objects
    for row in rows:
        cols = row.find_all("td")
        if len(cols) > 0:
            entry = {
                "time": cols[0].get_text(),
                "open": float(cols[1].get_text()),
                "close": float(cols[4].get_text())
            }
            entries.append(entry)

    df = pd.DataFrame(entries)
    df['date'] = df['time'].apply(strp)
    df['date2'] = df['date'].apply(strp2)
    for ind, row in df.iterrows():
        if row['open'] > row['close']:
            row['bool'] = 1 
        else:
            row['bool'] = 0 
        print(row['date2'], row['bool'])

Now that you've been able to obtain the daily prices of a cryptocurrency, let's take a look at how to compare two different coins. 

Golem is another cryptocurrency that is built on top of Ethereum's blockchain, so the two currencies should be very related. Using the same web scraping technique we previously implemented, we've obtained the daily closing prices for both Ethereum and Golem, from 1/1/2017 to 12/31/2017, chronologically. 

If you run the current boilerplate code, which just prints the raw input prices, you can see the results in our visualization. Clearly, the data is not very informative because the raw numerical prices of the two currencies are on two vastly different scales. We'll solve this using normalization.

These two time series of prices will be passed in as arrays in your function, compare_coins(eth, golem), which will process the data into two modified lists that can provide more insight into comparing them.

Your job is to scale each price list by a constant factor, so that the price on 1/1/2017 is scaled to equal $1. This will put the two datasets on an even playing ground. Print out the sequence of space-separated normalized prices, one day per line, in chronological order.

Example 1:
```
Input: 
eth = [23, 24, 25, ...]
golem = [1, 2, 3, ...]

Output: 
1.00 1.00
1.04 2.00
...
```

In [2]:
def compare_coins(self, eth, golem):
    # Print out raw prices
    scl_eth = eth[0]
    scl_gol = golem[0]
    for eth_price, golem_price in zip(eth, golem):
        eth_price = 1.0 * eth_price / scl_eth
        golem_price = 1.0 * golem_price / scl_gol
        print("%.2f %.2f" % (eth_price, golem_price))

Lets take our comparison analysis one step further. We've been able to see a pretty good overview of how the two prices compare, but the goal is really to understand one coin relative to the other. Ethereum is much more prevalent as a cryptocurrency, so we'll use that as a baseline. Also note that the data points in the previous graph were unnecessarily dense. This time, we're going to smooth the prices out using a moving average of 5 days. 

Write a new function set_baseline(eth, golem), that takes in your previous result of the normalized prices (as two arrays), and prints the new, processed prices where Ethereum has been set as a baseline and smoothed. This involves 2 steps:
1. First, transform the Ethereum prices into all $0 values, and recalculate the Golem prices as the fractional offset from the Ethereum price. For example, if at time t, the normalized price for Ethereum is 10, and Golem is at 9, then your output should be 0 for Ethereum and (9-10)/10 = -0.1 for Golem.
2. After you do that, we want to smooth the values a bit. The adjusted Golem price value on day i should be the average of the values on days i, i+1, i+2, i+3, i+4. This also means your new results will be 4 days shorter than before.
As before, print your results, one day per line, in chronological order. You'll be able to see the two time series again in our visualization.

Example 1:
```
Input: 
eth = [8, 9, 10, ...]
golem = [9, 11, 7, ...]

Output:
0.00 <golem_price_1>
0.00 <golem_price_2>
...
```

In [3]:
def set_baseline(self, eth, golem):
    # Print out normalized prices
    golem = np.asarray(golem)
    eth = np.asarray(eth)
    golem = 1.0 * (golem - eth) / eth
    eth = np.zeros([eth.shape[0]])
    eth = eth.tolist()
    #for eth_price, golem_price in zip(eth, golem):
    #    golem_price = 1.0 * (golem_price - eth_price) / eth_price
    #    eth_price = 0
    #    golem_ma += golem_price
    #    i += 1
    #    if i == 5
    #    print("%.2f %.2f" % (eth_price, golem_price))
    d = {'ma': golem}
    df = pd.DataFrame(data=d)
    l = []
    l.append(np.round(df['ma'].rolling(window = 5, center = False).mean(), 2))
    for i in range(4,len(l[0])):
        print("%.2f %.2f" % (eth[i], l[0][i]))

So far, our analysis has been relatively qualitative, by viewing the price time series in a graph. Before we get to the final analysis of the ETF, lets do some preliminary quantitative analysis of how two cryptocurrencies compare, so we can better interpret our final results.

Your job in this problem is to compute the correlation between the two raw price sequences, again inputted as arrays. Your function compute_correlations(eth, golem, neo)
should print three lines, consisting of the pairwise correlation values between the 3 cryptocurrencies. 

While simple, this procedure will give us a much better idea of which coins are most related. We suggest you look into the built-in functionality of the numpy package.

As a little background, correlation is a number that measures how closely two variables are related. It ranges between -1 and 1, and is unaffected by scaling either of the variables. Read more here.

Example 1:
```
Input: 
eth = [23, 24, 25, ...]
golem = [1, 2, 3, ...]
neo = [7, 5, 11, ...]

Output:
<correlation_eth_and_golem>
<correlation_eth_and_neo>
<correlation_golem_and_neo>
```

In [4]:
def compute_correlations(self, eth, golem, neo):
    print("%.5f" % np.corrcoef(eth, golem)[0][1])
    print("%.5f" % np.corrcoef(eth, neo)[0][1])
    print("%.5f" % np.corrcoef( golem, neo)[0][1])