### Lesson outline

In this lesson you will learn how to read data, select subsets of it and generate useful plots, using [pandas](http://pandas.pydata.org/) and [matplotlib](http://matplotlib.org/). The documentation links below are for your reference.

- #### Read stock data from CSV files:
  - [pandas.DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)
  - [pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)
- #### Select desired rows and columns:
  - [Indexing and Slicing Data](http://pandas.pydata.org/pandas-docs/stable/indexing.html)
  - Gotchas: [Label-based slicing conventions](http://pandas.pydata.org/pandas-docs/stable/gotchas.html?#label-based-slicing-conventions)
- #### Visualize data by generating plots:
  - [Plotting](http://pandas.pydata.org/pandas-docs/stable/visualization.html)
  - [pandas.DataFrame.plot](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html)
  - [matplotlib.pyplot.plot](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot)


### Data in CSV files

Note: CSV = Comma-Separated-Value

In [13]:
import pandas as pd

df_hcp = pd.read_csv('data/HCP.csv')
df_hcp.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,2017-02-09,30.91,31.059999,30.620001,30.68,4677000,30.68
1,2017-02-08,30.860001,31.129999,30.77,30.91,2221900,30.91
2,2017-02-07,31.0,31.09,30.68,30.75,3108300,30.75
3,2017-02-06,30.75,30.93,30.540001,30.92,3385300,30.92
4,2017-02-03,30.799999,31.0,30.58,30.75,3835900,30.75


In [14]:
df_hcp[10:21] #Rows between index 10 and 20.

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
10,2017-01-26,29.940001,30.07,29.530001,29.6,5296900,29.6
11,2017-01-25,30.5,30.610001,29.68,29.879999,6100600,29.879999
12,2017-01-24,30.34,30.48,30.030001,30.24,3076700,30.24
13,2017-01-23,30.110001,30.370001,29.959999,30.33,2420000,30.33
14,2017-01-20,29.950001,30.15,29.870001,30.07,2977300,30.07
15,2017-01-19,30.41,30.41,29.75,29.959999,4502400,29.959999
16,2017-01-18,30.549999,30.92,30.530001,30.549999,2997800,30.549999
17,2017-01-17,30.33,30.73,30.33,30.68,2826100,30.68
18,2017-01-13,30.34,30.51,29.99,30.219999,2643700,30.219999
19,2017-01-12,30.4,30.5,29.99,30.41,2904200,30.41


In [61]:
def get_max_close(symbol):
    """Return the maximum closing value for stock indicated by symbol.
    Note: Data for a stock is stored in file: data/<symbol>.csv
    """
    df = pd.read_csv('data/{}.csv'.format(symbol)) #Read all in data
    return df['Close'].max() #Compute and return max.

def test_run():
    """Function called by Test Run."""
    for symbol in ['AAPL','IBM','HCP']:
        print ('The max close for %s is %.2f.' % (symbol, get_max_close(symbol)))

        
if __name__ == '__main__':
    test_run()

The max close for AAPL is 702.10.
The max close for IBM is 215.80.
The max close for HCP is 55.28.


---
## Quiz: Compute mean volume


In [55]:
def get_mean_volume(symbol):
    """Return the mean volume for stock indicated by symbol.
    
    Note: Data for a stock is stored in file: data/<symbol>.csv
    """
    df = pd.read_csv("data/{}.csv".format(symbol))  # read in data
    # TODO: Compute and return the mean volume for this stock
    return df['Volume'].mean() #Compute and return the mean value.

def test_run():
    """Function called by Test Run."""
    for symbol in ['AAPL','IBM']:
        print "Mean Volume"
        print symbol, get_mean_volume(symbol)


if __name__ == "__main__":
    test_run()

Mean Volume
AAPL 91824774.7589
Mean Volume
IBM 4722494.66818


---

In [56]:
import matplotlib.pyplot as plt
%matplotlib nbagg

def test_run():
    df = pd.read_csv('data/AAPL.csv')
    print df['Adj Close'].head()
    plt.figure();df['Adj Close'].plot();
    plt.show() #Must be called to show plots
    
if __name__ == "__main__":
    test_run()

0    132.419998
1    131.469994
2    130.962201
3    129.727549
4    128.522781
Name: Adj Close, dtype: float64


<IPython.core.display.Javascript object>

--- 
## Quiz: Plot High prices for IBM

In [57]:
import pandas as pd
import matplotlib.pyplot as plt

def test_run():
    df = pd.read_csv("data/IBM.csv")
    # TODO: Your code here
    
    plt.xlabel('Index')
    plt.ylabel('High Price')
    df['High'].plot()
    plt.show()  # must be called to show plots


if __name__ == "__main__":
    test_run()


<IPython.core.display.Javascript object>

---

In [58]:

def test_run():
    df = pd.read_csv("data/AAPL.csv")
    
    #plt.xlabel('Index')
    #plt.ylabel('High Price')
    df[['Close', 'Adj Close']].plot()
    plt.show()  # must be called to show plots


if __name__ == "__main__":
    test_run()

<IPython.core.display.Javascript object>