# Pulling Stock Data
This is a Python notebook to analyze Stock data based on ticker symbols.


<br/>
<br/>
<br/>
<br/>
<br/>
## 0. Programming in Python
Let's start by seeing how to make variables, functions, and logic in Python.

### Variables
Variables can take on many data types, from `'Strings'` to numbers (`0`), `True`/`False` and even functions.

Let's create a variable called `x` and assign it a string. 

Type **`x = 'Hello world!'`** in the box below, then press **`Shift + Enter`** to execute the code.

Nothing happened... That's because we've only just created the variable `x`. 

Now to show the value of the variable, we need to type **`x`**. Again, press **`Shift + Enter`** to execute the code.

### Functions

We can build and use functions to perform some action.

Functions can operate on one or more `variables` that get defined when you initially create the function. When definied, these variables are just placeholders for what to do with any data that gets sent to the function when it is used.

Python uses indentation instead of `{` and `}` like in JavaScript or CSS. So be careful how you indent your code.

Below, type **`def myFunction(y):`** on the first line, then hit `Enter` and type **`return y + 3`**. The second line should be automatically indented. Press **`Shift + Enter`** to save that function.

In the box below that, type **`myFunction(3)`** - what do you expect it to return when you press **`Shift + Enter`**?

### Logic
Logic is the last piece of the programming foundation; this will test some comparison, and depending on whether the comparison is `True` or `False`, one outcome will result.

We've put the basic structure in the box below, but you need to add a comparison in the `()` to test: e.g. **`(5 > 3)`** or **`('a' == 'a')`**. Press **`Shift + Enter`** to run the logic below.

In [None]:
if ():
   print 'The comparison is True'
else:
   print 'The comparison is False'

### Wrap Up
So that's it!  Variables, functions, logic and packages are the building blocks of programming in any language.

Now that you've got a handle on those, we're going to get a bit more complicated working with our data.

*Note: Just like above, you'll need to press `Shift + Enter` to run any code in an `In [ ]:` box.*

## <font color='green'>You're now finished with this section! Let your facilitators know, and have a break.</font>

![done](https://media.giphy.com/media/XreQmk7ETCak0/giphy.gif)

<br/>
<br/>
<br/>
<br/>
<br/>
## 1. Sourcing Data
To begin working with our data, let's use an API called [Quandl](https://www.quandl.com) to bring in stock data.


### 1.1 Packages

We need to first `import quandl` to get the Quandl library of functions, then give the API our API key to get access to the data. The key for today is **`Byjzu4U8rmR1iEhZnp7V`** - copy and paste that between the `""` below.

In [None]:
!pip install quandl              # install quandl
!pip install --upgrade pandas    # upgrade pandas (some housekeeping)

import quandl
quandl.ApiConfig.api_key = ""

### 1.2 Getting the data

Now that we've got our connection to quandl, let's pull a single stock (`AAPL`) and store that in a variable called `data`.

Add **`WIKI/AAPL`** between the `""` below.

In [None]:
data = quandl.get("", rows=5)

And if we want to look at the data itself, we can type **`data`** below.

To assess the health of each stock, let's find the `Close` price for each stock. If you notice above, that's the 4th column.

Based on Quandl's [API documentation](https://docs.quandl.com/docs/time-series-2), we can extract just that column by adding **`.4`** after `WIKI/AAPL` to get **`WIKI/AAPL.4`**:

In [None]:
data = quandl.get("WIKI/AAPL", rows=5, collapse='monthly')
data

That's great! But it looks like we're pulling data starting from 1980, when Apple had it's IPO. Not all stocks will date back that far, so we need to choose a time window that we think will work for most stocks.

To make a ***human choice***, let's say the last 10 years.

First, we get the date 10 years ago, then we use that start date to pull our data.

We need to check that it worked - write **`data.head(5)`** in place of `#TODO`:

In [None]:
import datetime

# Get start date of 10 years ago
start_date = (datetime.datetime.now() - datetime.timedelta(days=10*365)).strftime('%Y-%m-01')

# Make same request, but with our new start date
data = quandl.get("WIKI/AAPL.4", rows=120, collapse='monthly', start_date=start_date)

#TODO

### 1.3 Visualization for insight

Python comes with a number of great visualization tools built in. A common package is the Math Plot Library (MatPlotLib), which we import below.

Let's do a quick visualization of the data to see if it looks right.

Complete the `#TODO` below by typing **`data.plot()`** :

In [None]:
%matplotlib inline
!pip install mpld3 # install a package to let us zoom into our plots
import mpld3
mpld3.enable_notebook()

#TODO

Uh-oh! Looks like there's a problem: There's a big drop in AAPL stock in 2014! **If you don't see this, check with a facilitator**

Why?!  Well, a quick Google shows they [split their stock](https://www.washingtonpost.com/news/the-switch/wp/2014/06/09/apples-stock-price-just-dropped-more-than-500-a-share-but-dont-panic/). 

Luckily, Quandl has accounted for that. Instead of `Close`, we'll need to use the `Adjusted Close` price (column 11) from Quandl.

Modify the code below to get the **11**th column instead of the 4th.

In [None]:
data = quandl.get("WIKI/AAPL.11", collapse='monthly', start_date = start_date)

# Now plot the data:
data.plot()

Great! Let's now focus on the full portfolio.


## <font color='green'>You're now finished with this section! Let your facilitators know, and have a break.</font>

![done](https://media.giphy.com/media/3o7TKLpxzkbvjwEgSc/giphy.gif)

<br/>
<br/>
<br/>
<br/>
<br/>
## 2. Sourcing more Data

Now that we have access to stock data, we need to get that data for more companies. Let's pull in our list of Warren's 2003 aquisitions, and get the quandl data for each.

First, we need to bring in the cleaned CSV file we exported from Open Refine, and store it as a variable. Let's call it `buffett`.

Now let's take the output of the Open Refine scrubbing step and replace **`your_clean_csv.csv`** with that output:

In [None]:
from pandas import read_csv
buffett = read_csv('/resources/data/my_clean_csv.csv')
buffett.head(5)

Next we want to choose some of those stocks for our analysis. Let's choose the first 3 companies from `buffett`. 

Replace `buffett['TICKER'][0:0]` with `buffett['TICKER']`**`[0:3]`**. If your column is called something different (like `Ticker`) then update accordingly:

In [None]:
symbols = [ 'WIKI/%s.11' % ticker for ticker in buffett['TICKER'][0:0] ]
symbols

We can now load in the symbol data from quandl.

Let's create a new dataframe, `data2` and store our historical stock data for those 3 companies in it.

We'll also clean up our column names to something more readable. 

To look at the final 5 rows of our dataframe, replace `#TODO` with **`data2.tail(5`**):

In [None]:
# Get the historical stock prices from quandl for our top 3 companies
data2 = quandl.get(symbols, rows=121, collapse='monthly', start_date=start_date)

# Rename our columns
data2.columns = [col.replace(' - Adj. Close','').replace('WIKI/','') for col in data2.columns]

#TODO

Let's have another quick look at the data, to make sure everything looks good.

Add the code to plot `data2` (hint: you've used similar code above already for `data`):

In [None]:
data2.plot() #TODO

Explore the data by using the icons in the bottom left corner of the plot. 

It looks like we have some incomplete data.

Further investigation reveals that we only have data for Kraft Heinz (KHC) from July 2015 onwards. We'll need to remember this for later, as this might affect our analysis.

## <font color='green'>You're now finished with this section! Let your facilitators know, and have a break.</font>

![done](https://media.giphy.com/media/26FL2NwYBOq3Z6C6Q/giphy.gif)

<br/>
<br/>
<br/>
<br/>
<br/>
## 3. Analysis

## <font color='orange'>Please wait for your instructor to begin the Analysis section before proceeding</font>


### 3.1 Retrieve code, tune model

**Paste the model code below and run!**

A reminder on the meaning of the parameters are given below:
 - **p**.......The complexity of the "**A**uto**R**egressive" part. High numbers will follow previous behaviour closely, but may be slower and less flexible in the long run.
 - **d**.......This refers to the "**I**ntegrated" differencing, and is the number times we need to difference the data to make it stationary. This can be left at 1.
 - **q**.......The complexity of the "**M**oving **A**verage" part. A higher number will place a stronger emphasis on seasonality and trend.

In [None]:
#TODO

### 3.2 Apply forecast model to Apple stock data

Now to apply it to apple. As you can see in the code above, the function that we run to tune the model is called `custom_slider_ARIMA()`.

It takes one variable - the data to be modelled.

Run the function below, and pass the **`custom_slider_ARIMA()`** function the Apple stock data, which is accessed as **`data2.AAPL`**:

In [None]:
#TODO

### 3.3 Apply model to all stocks

Once you are happy with the shape of your prediction, **remember the (p,d,q) values you used for Apple**. We are now going to use them to create predictions for the other stocks we collected data for.

The code below applies the model we've built to each of the stocks, and adds the predictions to the end of our existing data. It will then print out the last 10 values of our new dataset.

**Complete the first two lines below** by naming the number of months you want to predict ahead (we recommend 6), and replace the (p,d,q) values with the values you used for Apple:

In [None]:
numberofmonths = 6         # how many predictions do we want to make?
master_params = [p,d,q]    # what are the best parameters?


# Here we need to add numberofmonths onto the end of the dataframe using data.reindex:
data3 = data2.reindex(pd.date_range(datetime.datetime.now().date(), periods=121 + numberofmonths, freq='MS')+pd.DateOffset(days=-1, months=-120),fill_value="NaN")



for i, column in enumerate(data3):
    if data3[column].isnull().values.any(): # select only the range of dates for which we have data
        last_nan = np.where(data3[column].isnull().values)[0][-1]
        stock = data3[column].iloc[(last_nan+1):-numberofmonths].values
    else:
        stock = data3[column].iloc[:-numberofmonths].values
    
    stock = stock.tolist()
    forecast = ARIMAforecast(stock, params = master_params, steps = numberofmonths) # create our forecast
    
    col_forecast = pd.Series(forecast[-numberofmonths:], index = pd.date_range(start= datetime.datetime.now().date(), periods=numberofmonths, freq='MS') + pd.DateOffset(days=-1, months=1))
    data3[column].iloc[-numberofmonths:] = col_forecast
    

data3.tail(10)

### 3.4 Plot results

Execute the code below to see the results.

In [None]:
fig, ax = plt.subplots()
_ = ax.plot(data3.iloc[:-numberofmonths])
_ = ax.plot(data3.iloc[-(numberofmonths+1):], color = "Red")

Congratulations! We have built and adapted a model that can predict the future.

This is a simplistic introduction to time series analysis, although these methods are the foundation to many of the cutting edge techniques used currently. If you'd like to like to investigate ARIMA further, we recommend following the introduction here:

https://datascience.ibm.com/exchange/public/entry/view/815137c868b916821dec777bdc23013c

## <font color='green'>You're now finished with this section!</font>

![done](https://media.giphy.com/media/R6aNZ3Uc1aR1K/giphy.gif)


<br/>
<br/>
<br/><br/>
<br/>
<br/>
## 4. Visualization for Communication

Let's get our data ready for a more snazzy visualization.


### 4.1 Exporting for further visualization


Almost there! We've done the hard work, now we just need it in a CSV format.

Based on the [D3.js show reel](https://bl.ocks.org/mbostock/1256572), we need the data to be arranged like this:

```
symbol,date,price
MSFT,Jan 2000,39.81
MSFT,Feb 2000,36.35
MSFT,Mar 2000,43.22
MSFT,Apr 2000,28.37
MSFT,May 2000,25.45```

We've written this bit of code to clean the data a bit more and to output the data in a comma-separated value format. Check over the code and **`print csv`** on the bottom line, to check the format is correct:

In [None]:
datalist=data3.unstack()
csv = datalist.to_csv(header=True, index_label=['symbol','date','price'], date_format='%b %Y', index=True)
csv = csv.replace("price,0","price") # remove addition of ',0' on first line
#TODO

We could copy & paste this into a new CSV file for our D3.js visualization, or we could write code to do that for us.

To make the downloadable file, we've got to bring in a library called `base64` which will encode the file. Then we use that to create the file and add a bit of HTML to make it so we can download the file.

Run the cell below, and download the resulting **`stocks.csv`** file.

In [None]:
import base64
from IPython.display import HTML

b64 = base64.b64encode(csv.encode())
payload = b64.decode()
html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{filename}</a>'
html = html.format(payload=payload,title="stocks.csv",filename="stocks.csv")
HTML(html)


Now you can import this file into your D3.js visualization.

## <font color='green'>You're done!</font>

![done](https://media.giphy.com/media/15BuyagtKucHm/giphy.gif)