### Lesson outline

In this lesson you will learn how to read data, select subsets of it and generate useful plots, using [pandas](http://pandas.pydata.org/) and [matplotlib](http://matplotlib.org/). The documentation links below are for your reference.

- #### Read stock data from CSV files:
  - [pandas.DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)
  - [pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)
- #### Select desired rows and columns:
  - [Indexing and Slicing Data](http://pandas.pydata.org/pandas-docs/stable/indexing.html)
  - Gotchas: [Label-based slicing conventions](http://pandas.pydata.org/pandas-docs/stable/gotchas.html?#label-based-slicing-conventions)
- #### Visualize data by generating plots:
  - [Plotting](http://pandas.pydata.org/pandas-docs/stable/visualization.html)
  - [pandas.DataFrame.plot](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html)
  - [matplotlib.pyplot.plot](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot)


### Data in CSV files

Note: CSV = Comma-Separated-Value

---
## Quiz: Read CSV

In [1]:
import pandas as pd


def test_run():
    """Function called by Test Run."""
    df = pd.read_csv("data/AAPL.csv")
    # TODO: Print last 5 rows of the data frame
    print df.tail()

if __name__ == "__main__":
    test_run()


            Date        Open        High         Low       Close     Volume  \
1784  2010-01-08  210.299994  212.000006  209.060005  211.980005  111902700   
1785  2010-01-07  211.750000  212.000006  209.050005  210.580000  119282800   
1786  2010-01-06  214.379993  215.230000  210.750004  210.969995  138040000   
1787  2010-01-05  214.599998  215.589994  213.249994  214.379993  150476200   
1788  2010-01-04  213.429998  214.499996  212.380001  214.009998  123432400   

      Adj Close  
1784  27.464034  
1785  27.282650  
1786  27.333178  
1787  27.774976  
1788  27.727039  


---

In [11]:
'''Read CSV.'''
import pandas as pd

def test_run():
    df = pd.read_csv('data/HCP.csv')
    print df.head()

if __name__ == '__main__':
    test_run()    

         Date       Open       High        Low  Close   Volume  Adj Close
0  2017-02-09  30.910000  31.059999  30.620001  30.68  4677000      30.68
1  2017-02-08  30.860001  31.129999  30.770000  30.91  2221900      30.91
2  2017-02-07  31.000000  31.090000  30.680000  30.75  3108300      30.75
3  2017-02-06  30.750000  30.930000  30.540001  30.92  3385300      30.92
4  2017-02-03  30.799999  31.000000  30.580000  30.75  3835900      30.75


In [12]:
'''Select rows.'''

import pandas as pd

def test_run():
    df = pd.read_csv('data/HCP.csv')
    print df[10:21] #Rows between index 10 and 20.
    
if __name__ == '__main__':
    test_run()   

          Date       Open       High        Low      Close   Volume  Adj Close
10  2017-01-26  29.940001  30.070000  29.530001  29.600000  5296900  29.600000
11  2017-01-25  30.500000  30.610001  29.680000  29.879999  6100600  29.879999
12  2017-01-24  30.340000  30.480000  30.030001  30.240000  3076700  30.240000
13  2017-01-23  30.110001  30.370001  29.959999  30.330000  2420000  30.330000
14  2017-01-20  29.950001  30.150000  29.870001  30.070000  2977300  30.070000
15  2017-01-19  30.410000  30.410000  29.750000  29.959999  4502400  29.959999
16  2017-01-18  30.549999  30.920000  30.530001  30.549999  2997800  30.549999
17  2017-01-17  30.330000  30.730000  30.330000  30.680000  2826100  30.680000
18  2017-01-13  30.340000  30.510000  29.990000  30.219999  2643700  30.219999
19  2017-01-12  30.400000  30.500000  29.990000  30.410000  2904200  30.410000
20  2017-01-11  30.100000  30.490000  29.969999  30.400000  5589600  30.400000


In [13]:
'''Compute max closing price.'''

import pandas as pd

def get_max_close(symbol):
    """Return the maximum closing value for stock indicated by symbol.
    Note: Data for a stock is stored in file: data/<symbol>.csv
    """
    df = pd.read_csv('data/{}.csv'.format(symbol)) #Read all in data
    return df['Close'].max() #Compute and return max.

def test_run():
    """Function called by Test Run."""
    for symbol in ['AAPL','IBM','HCP']:
        print ('The max close for %s is %.2f.' % (symbol, get_max_close(symbol)))

        
if __name__ == '__main__':
    test_run()

The max close for AAPL is 702.10.
The max close for IBM is 215.80.
The max close for HCP is 55.28.


---
## Quiz: Compute mean volume


In [55]:
import pandas as pd
def get_mean_volume(symbol):
    """Return the mean volume for stock indicated by symbol.
    
    Note: Data for a stock is stored in file: data/<symbol>.csv
    """
    df = pd.read_csv("data/{}.csv".format(symbol))  # read in data
    # TODO: Compute and return the mean volume for this stock
    return df['Volume'].mean() #Compute and return the mean value.

def test_run():
    """Function called by Test Run."""
    for symbol in ['AAPL','IBM']:
        print "Mean Volume"
        print symbol, get_mean_volume(symbol)


if __name__ == "__main__":
    test_run()

Mean Volume
AAPL 91824774.7589
Mean Volume
IBM 4722494.66818


---

In [17]:
'''Plotting stock price data.'''

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib nbagg

def test_run():
    df = pd.read_csv('data/AAPL.csv')
    print df['Adj Close'].head()
    plt.figure();df['Adj Close'].plot();
    plt.show() #Must be called to show plots
    
if __name__ == "__main__":
    test_run()

0    132.419998
1    131.469994
2    130.962201
3    129.727549
4    128.522781
Name: Adj Close, dtype: float64


<IPython.core.display.Javascript object>

--- 
## Quiz: Plot High prices for IBM

In [16]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib nbagg

def test_run():
    df = pd.read_csv("data/IBM.csv")
    # TODO: Your code here
    
    plt.xlabel('Index')
    plt.ylabel('High Price')
    df['High'].plot()
    plt.show()  # must be called to show plots


if __name__ == "__main__":
    test_run()


<IPython.core.display.Javascript object>

---

In [15]:
'''Plot two columns.'''

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib nbagg

def test_run():
    df = pd.read_csv("data/AAPL.csv")
    
    #plt.xlabel('Index')
    #plt.ylabel('High Price')
    df[['Close', 'Adj Close']].plot()
    plt.show()  # must be called to show plots


if __name__ == "__main__":
    test_run()

<IPython.core.display.Javascript object>