In [None]:
import AlphaVantage as av
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Fetching the Data

For the following example we'll fetch the daily price history for Google stock.
This returns a python dictionary that we need to split up into the desired fields.

In [None]:
dat = av.apicall(function='TIME_SERIES_DAILY', symbol='GOOG')

In [None]:
# the keys are YYYY/MM/DD strings in a dict, so we need to sort to put them in chronological order
dat_ord = sorted(dat['Time Series (Daily)'].items())

In [None]:
dat_open = np.array([float(x[1]['1. open']) for x in dat_ord])
dat_high = np.array([float(x[1]['2. high']) for x in dat_ord])
dat_low = np.array([float(x[1]['3. low']) for x in dat_ord])
dat_close = np.array([float(x[1]['4. close']) for x in dat_ord])
dat_volume = np.array([float(x[1]['5. volume']) for x in dat_ord])

# Normalizing the Data for ANN Learning

The prices of different stocks will obviously have completely different prices.
What we're actually interested in is the percent growth of any given stock.
Thus I propose we divide by the furthest-back data point to normalize everything to be multiples of the starting time point.
Thus we have percent growth independent of stock price and zero is true zero.

In [None]:
# arbitrarily use the oldest open price to normalize all price metrics
open_norm = dat_open / dat_open[0]
close_norm = dat_close / dat_open[0]
high_norm = dat_high / dat_open[0]
low_norm = dat_low / dat_open[0]
volume_norm = dat_volume / dat_volume[0]

In [None]:
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, squeeze=True)
fig.subplots_adjust(hspace=0)

ax1.plot(open_norm, label='open')
ax1.plot(close_norm, label='close')
ax1.plot(high_norm, label='high')
ax1.plot(low_norm, label='low')
ax1.legend()

ax2.plot(volume_norm, label='volume')
ax2.legend()

print()

However, it might actually be the case that the difference between adjacent time points is more important than the general trend.
Thus we could examine day-to-day relative changes like so:

In [None]:
open_diff = np.array([(dat_open[i]-dat_open[i-1])/dat_open[i-1] for i in range(1, len(dat_open))])
close_diff = np.array([(dat_close[i]-dat_close[i-1])/dat_close[i-1] for i in range(1, len(dat_close))])
high_diff = np.array([(dat_high[i]-dat_high[i-1])/dat_high[i-1] for i in range(1, len(dat_high))])
low_diff = np.array([(dat_low[i]-dat_low[i-1])/dat_low[i-1] for i in range(1, len(dat_low))])
volume_diff = np.array([(dat_volume[i]-dat_volume[i-1])/dat_volume[i-1] for i in range(1, len(dat_volume))])

In [None]:
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, squeeze=True)
fig.subplots_adjust(hspace=0)

ax1.plot(open_diff, label='open')
ax1.plot(close_diff, label='close')
ax1.plot(high_diff, label='high')
ax1.plot(low_diff, label='low')
ax1.legend()

ax2.plot(volume_diff, label='volume')
ax2.legend()

print()

For the price difference graph, we see that the values are quite small.
It may be the case that we need to scale up all such price difference normalizations to put them into a more appropriate range for whatever ANN design we choose to use.

From all the graphs show, it's likely that the use of open, close, high, *and* low is unnecessary.
We could likely come up with one or two values to summarize this.
Obvious solutions include the high-low average or the open-close average.