<h2>Numpy</h2>

<h4>Multi-dimensional arrays</h4>
<li>Faster and more space efficient than lists 

<h4>Can incorporate C/C++/Fortran code

<h4>Linear algebra, Fourier transforms, Random number support


In [None]:
import numpy as np

In [None]:
x=[[0,1,2,3],[4,5,6,7],[8,9,10,11]]
ax=np.array(x,int)
#ax.reshape(4,3)
#ax.reshape(6,2)
ax
#print(ax.reshape(6,2)[1,1])

#np.ones_like(ax) 
#np.identity(5)
#np.where(ax%2==0,ax,0)

#ax.mean()
#ax.std()

#ay=np.array([[3,4],[5,6],[7,8],[9,10]],float)

#np.dot(ay,ax.reshape(2,6))

#linalg, a linear algebra module
#functions dealing with polynomials, differentials, etc


<h3>Random number support in numpy</h3>

In [None]:
np.random.normal(size=10)
#np.random.normal(size=(100,100))
#np.random.exponential()
#np.random.exponential(1.0,size=(6,3))
#np.random.randint(-10,10)

<h2>Pandas</h2>

<h4>Integrated data manipulation and analysis capabilities</h4>
<h4>Integration with data visualization libraries</h4>
<h4>Built in time-series capabilities</h4>
<h4>Optimized for speed (many functions are written in C)</h4>
<h4>Built-in support for grabbing data from multiple sources</h4>
csv, xls, html, yahoo, google, worldbank, FRED

In [None]:
#installing pandas libraries
!pip install pandas-datareader
!pip install --upgrade html5lib==1.0b8

#There is a bug in the latest version of html5lib so install an earlier version
#Restart kernel after installing html5lib

<h3>Necessary imports</h3>

In [None]:
import pandas as pd
from pandas_datareader import data
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mp
import datetime as dt

<h3>Data structures in Pandas</h3>

<h4>Pandas organizes data into two data objects</h4>
<li>Series: A one dimensional array object
<li>DataFrame: A two dimensional table object
<h4>Each column in a dataframe corresponds to a named series</h4>
<h4>Rows in a dataframe can be indexed by a column of any datatype</h4>

<h3>Getting data</h3>
<b>html data tables</b>

In [None]:
#Returns a list of data frames
pd.read_html('http://www.bloomberg.com/markets/currencies/major')

In [None]:
#Takes the first (0th) row and makes it a header row
datalist = pd.read_html('http://www.bloomberg.com/markets/currencies/major',header=0)
datalist
#print(type(datalist))
#dataframe = datalist[0]
#dataframe

In [None]:
#Access a column by name (Columns are stored as dictionaries)
dataframe['Currency']

In [None]:
#Create an index that can be used to access rows
datalist = pd.read_html('http://www.bloomberg.com/markets/currencies/major',header=0,index_col='Currency')
dataframe=datalist[0]

In [None]:
#Rows are accessed through the loc or iloc methods of a dataframe
#loc uses values
dataframe.loc['EUR-USD']
#iloc uses row numbers
#dataframe.iloc[0]

In [None]:
#Or using the ix attribute
dataframe.ix['EUR-USD','Change']
#dataframe.ix['EUR-USD',1]

In [None]:
#Views vs copies
#Chain indexing creates copies. So the copy is changed, not the original
eur_usd = dataframe.loc['EUR-USD']['Change']
dataframe.loc['EUR-USD']['Change'] = 1.0
#print(eur_usd)
#print(dataframe.loc['EUR-USD']['Change'])

In [None]:
#Use ix instead
dataframe.ix['EUR-USD','Change'] = 1.0
print(dataframe.loc['EUR-USD']['Change'])
print(eur_usd)

In [None]:
#Data frame can be sliced (Not that it means much for this dataset!)
dataframe.loc['EUR-USD':'GBP-USD']

In [None]:
#Slices using iloc
dataframe.iloc[0:3]

In [None]:
#read_html reads all the tables

df = pd.read_html('http://finance.google.com',header=None)
df

In [None]:
#Use match for controlling which tables you get
df = pd.read_html('http://finance.google.com',header=None,match='Sector')
df

<h2>Yahoo and Google Finance</h4>

In [None]:
import datetime as dt
start=dt.datetime(1980, 1, 1)
end=dt.datetime.today()
#Syntax - DataReader(ticker,source,startdate,enddate)
df = data.DataReader('AAPL', 'yahoo', start, end)
df

In [None]:
df = data.DataReader('AAPL', 'google', start, end)
df

In [None]:
start = dt.datetime(2010, 1, 1)

end = dt.datetime(2015, 11, 8)

gdp=data.DataReader('GDP', "fred", start, end)
gdp

<h2>Datareader documentation</h2>
http://pandas-datareader.readthedocs.io/en/latest/</h2>

In [None]:
# Let's do some analysis
#Create a new column of up days using np.where
#And use it to find the percent of days when the stock went up
df['UP']=np.where(df['Close']>df['Open'],1,0)
df['UP']
#up_percent = df['UP'].sum()/df['UP'].count()
#up_percent

In [None]:
#Compute X day percent changes
close_prices = df['Close']
print(close_prices)
#print(type(close_prices))
# Example - x = 21
#pct_changes = close_prices.pct_change(21)
#print(pct_changes)

In [None]:
#Compute moving averages
#rolling() creates a rolling window with a specified "width". Apply a function to this window to compute stats
close_prices.rolling(window=8).mean()
#ma_8 = close_prices.rolling(window=8).mean()
#ma_55= close_prices.rolling(window=55).mean()


In [None]:
ma_8.plot()
#ma_55.plot()


In [None]:
#Let's look at the most recent data 
ma_8[3900:].plot()
#ma_34[3900:].plot()


In [None]:
#Plot the area (departure) graph
(ma_8[3500:]-ma_34[3500:]).plot()


<h4>Working with multiple stocks and ordinary least squares</h4>

In [None]:
#Let's take a look at the solar sector
import datetime as dt
solar_df = data.DataReader(['FSLR', 'TAN','RGSE','SCTY'],'yahoo', start=dt.datetime(2016, 1, 1))['Adj Close']
solar_df
#Calculate returns
#rets = solar_df.pct_change()
#rets

In [None]:
#Construct a scatter plot to compare returns on FSLR vs. returns on TAN
plt.scatter(rets.FSLR,rets.TAN)

In [None]:
#Get correlations
solar_corr = rets.corr()
solar_corr

In [None]:
#Do some risk analysis
#We'll plot the mean and std or returns for each ticker to get a sense of the
#risk return profile
plt.scatter(rets.mean(), rets.std())
# Add info and formatting to the chart
#plt.xlabel('Expected returns')
#plt.ylabel('Standard deviations')
#for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
#    plt.annotate(
#        label,
#        xy = (x, y), xytext = (20, -20),
#        textcoords = 'offset points', ha = 'right', va = 'bottom',
#        bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
#        arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))
plt.show()


In [None]:
import numpy as np
import statsmodels.api as sm

<h2>Regressions</h2>
http://statsmodels.sourceforge.net/

In [None]:
#Construct y
#Construct matrix (dataframe) of X
#Add intercept
#Model the regression
#Get the results
X=solar_df[['FSLR','RGSE']]
X = sm.add_constant(X)
y=solar_df['SCTY']
model = sm.OLS(y,X)
result = model.fit()
print(result.summary())

In [None]:
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(y)
ax.plot(result.fittedvalues)