# Pulling Stock Data
This is a Python notebook to get Stock data based on ticker symbols.

That data can then be used for a D3 visualization like: https://bl.ocks.org/mbostock/1256572

## 0. Programming in Python
Let's start by seeing how to make variables, functions, and logic in Python.

### Variables
Variables can take on many data types, from `'Strings'` to numbers (`0`), `True`/`False` and even functions.

Type `x = 'Hello world!'` in the box below, then press `Shift + Enter` to execute the code.

Nothing happened... That's because we've only just created the variable `x`. Now to show the value of the variable, we need to type `x`. Again, press `Shift + Enter` to execute the code.

### Functions
Functions can take on any name - you get to choose - but the syntax for defining a function is always the same. Functions can also operate on one or more `variable` that gets defined when you initially create the function. These variables are a placeholder for what to do with any data that gets sent to the function when it is used.
- First, define the name of the function with `def nameOfFunction():`
- Then, if you want to send a variable to the function, put that variable name between the ().
- Lastly, on the next line, tab in one tab, and write the instructions for what the function will do.

Below, type `def myFunction(y):` on the first line, then tab in and type `return y + 3`.  Press `Shift + Enter` to save that functions.

In the box below that, type `myFunction(3)` - what do you expect it to return when you press `Shift + Enter`?

### Logic
Logic is the last piece of the programming foundation; this will test some comparison, and depending on whether the comparison is `True` or `False`, one outcome will result.

We've put the basic structure in the box below, but you need to add a comparison in the `()` to test: e.g. `(5 > 3)` or `('a' == 'a')`. Press `Shift + Enter` to run the logic below.

In [None]:
if ():
   print 'The comparison is True'
else:
   print 'The comparison is False'

### Wrap Up
So that's it!  Variables, functions, and logic are the building blocks of programming in any language.

Now that you've got a handle on those, we're going to get a bit more complicated working with our data.

*Note: Just like above, you'll need to press `Shift + Enter` to run any code in an `In [ ]:` box.*

## 1. Sourcing Data
To begin working with our data, let's use an API called [Quandl](https://www.quandl.com) to bring in stock data.

We need to first `import quandl` to get the Quandl library of functions, then set our API key. The key for today is `Byjzu4U8rmR1iEhZnp7V` - copy and paste that between the `""` below.

In [1]:
import quandl
quandl.ApiConfig.api_key = ""

ImportError: No module named quandl

Now that we've gt our connection to quandl, let's pull a single stock (`AAPL`) and store that in a variable called `data`.

Add `WIKI/AAPL` between the `""` below.

In [None]:
data = quandl.get("", rows=5)

To find out the type of data, we can type `print type(data)`.

And if we want to look at the data itself, we can write `data` below.

To assess the health of each stock, let's find the `Close` price for each stock. If you notice above, that's the 4th column.

Based on Quandl's [API documentation](https://docs.quandl.com/docs/time-series-2), we can extract just that column by adding `.4` after `WIKI/AAPL` to get `WIKI/AAPL.4`:

In [None]:
data = quandl.get("WIKI/AAPL", rows=5, collapse='monthly')
data

That's great! But now we want to show data for the last 10 years.

First, we need to have Python tell us what today's date is. We've written most of this for you, but you need to add a line to output the value of `start_time`.

In [None]:
import datetime
start_date = (datetime.datetime.now() - datetime.timedelta(days=10*365)).strftime('%Y-%m-01')


Now we need to add a new parameter to our Quandl get request. After `collapse='monthly'`, add a comma and then `start_date=start_date` inside the parenthesis.

Then, to save space in the notebook, we're only going to show the top five rows of data using `data.head(5)`.

How could you show the top 10 rows instead?

In [None]:
data = quandl.get("WIKI/AAPL.4", rows=120, collapse='monthly')
data.head(5)

Python comes with a number of great visualization tools built in.

Let's do a quick visualization of the data to see if it looks right:

In [None]:
%matplotlib inline
ax = data.plot()

Uh-oh! Looks like there's a problem: There's a big drop in AAPL stock in 2014!

Why?!  Well, a quick Google shows they [split their stock](https://www.washingtonpost.com/news/the-switch/wp/2014/06/09/apples-stock-price-just-dropped-more-than-500-a-share-but-dont-panic/). 

Luckily, Quandl has accounted for that. Instead of `Close`, we'll need to use the `Adjusted Close` price (column 11) from Quandl.

Modify the code below to get the 11th column instead of the 4th.

In [None]:
data = quandl.get("WIKI/AAPL.4", rows=120, collapse='monthly', start_date=start_date)
ax = data.plot()

Great! Let's get it ready for D3.js

## 2. Scrubbing Data

Based on the [D3.js show reel](https://bl.ocks.org/mbostock/1256572), we need the data to be arranged like this:

```
symbol,date,price
MSFT,Jan 2000,39.81
MSFT,Feb 2000,36.35
MSFT,Mar 2000,43.22
MSFT,Apr 2000,28.37
MSFT,May 2000,25.45```

Let's choose a few symbols from the list of Warren's companies and pull the data we need.

First, we need to bring in the cleaned CSV file we exported from Open Refine, and store it as a variable. Let's call it `buffet`.

In [None]:
buffet = ??IMPORT CSV FILE??

Next we want to choose some of those stocks for our analysis. We could either type out the ticker symbol (e.g. `MSFT`) by hand, or we could refer to the `buffet` variable in order to select the ticker symbols.

To bring in a few ticker symbols, substitute `buffet[2][3]` between each pair of +'s below in the definition of the `symbols` array. [Amadeus: not sure if this is right]

In [None]:
symbols = ['WIKI/'++'.11', 'WIKI/'++'.11', 'WIKI/'++'.11', 'WIKI/'++'.11']
data = quandl.get(symbols, rows=120, collapse='monthly', start_date=start_date)
data.head(5)

Let's clean up the column name to just their symbol name by removing `WIKI/` and ` - Close` from each column name.

We can iterate over the columns using the `for col in data.columns` syntax:

In [None]:
data.columns = [col.replace(' - Adj. Close','').replace('WIKI/','') for col in data.columns]
data.head(5)

Let's have another quick look at the data?

First, add the code to show the data plot below. (Hint, you used it above each time you wanted to show the graph)

## 3. Analysis

What type of analysis would be most useful for Warren?

What if we could predict the future behavior of each stock based on its past behavior?  We might be able to project stock performance forward to determine which will do well, and which might not.

To do this, we're going to use a library called `ARIMA`. [Amadeus: can you have a look into this?]

## 4. Preparing for Visualization

Next, we want to rearrange all of the data to be more like what we need for D3. Python comes with a function called `unstack()` that does just that!

Below, make a variable called `datalist` and set it equal to `data.unstack()`.

Then show the top 10 rows of the datalist. (Hint, you've used this type of function before)

Almost there! We've done the hard work, now we just need it in a CSV format.

We've written this bit of code to clean the data a bit more and to output the data in a comma-separated value format.

In [None]:
csv = datalist.to_csv(header=True, index_label=['symbol','date','price'], date_format='%b %Y', index=True)
csv = csv.replace("price,0","price") # remove addition of ',0' on first line
print csv

Lastly, we want to download the file so we can bring it into our D3 visualization.

To make the downloadable file, we've got to bring in a library called `base64` which will encode the file. Then we use that to create the file and add a bit of HTML to make it so we can download the file.

In [None]:
import base64
from IPython.display import HTML

b64 = base64.b64encode(csv.encode())
payload = b64.decode()
html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{filename}</a>'
html = html.format(payload=payload,title="stocks.csv",filename="stocks.csv")
HTML(html)
