# Review of pandas DataFrames

## pandas Common Commands

* type(df)
* df.shape
* df.columns
* df.iloc[:5,:]
    * Slicing above ends with same results as df.head()
        * Slice starting from start of df, to the 5th row, and include all columns
* df.info()

In [1]:
import pandas as pd
from pandas_datareader import data as wb
# Get Apple stock data from YahooFinance
AAPL = wb.DataReader('AAPL', data_source='yahoo', start='2010-1-1')
type(AAPL)

pandas.core.frame.DataFrame

In [2]:
AAPL.shape

(2453, 6)

In [3]:
AAPL.columns

Index(['High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close'], dtype='object')

In [4]:
# Slicing data starting from start of dataframe until the 5th row
#  index count starts from 0, selects rows 0,1,2,3,4 (row 5 is not selected)
AAPL.iloc[:5, :]

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-01-04,30.642857,30.34,30.49,30.572857,123432400.0,26.68133
2010-01-05,30.798571,30.464285,30.657143,30.625713,150476200.0,26.727465
2010-01-06,30.747143,30.107143,30.625713,30.138571,138040000.0,26.30233
2010-01-07,30.285715,29.864286,30.25,30.082857,119282800.0,26.253704
2010-01-08,30.285715,29.865715,30.042856,30.282858,111902700.0,26.428249


In [5]:
AAPL.tail()

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-09-25,221.5,217.139999,218.550003,221.029999,21903400.0,221.029999
2019-09-26,220.940002,218.830002,220.0,219.889999,18833500.0,219.889999
2019-09-27,220.960007,217.279999,220.539993,218.820007,25352000.0,218.820007
2019-09-30,224.580002,220.789993,220.899994,223.970001,25977400.0,223.970001
2019-10-01,228.199997,224.419998,225.070007,225.570099,27613144.0,225.570099


In [6]:
AAPL.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2453 entries, 2010-01-04 to 2019-10-01
Data columns (total 6 columns):
High         2453 non-null float64
Low          2453 non-null float64
Open         2453 non-null float64
Close        2453 non-null float64
Volume       2453 non-null float64
Adj Close    2453 non-null float64
dtypes: float64(6)
memory usage: 134.1 KB


## Series

* The columns of a *DataFrame* are themselves a specialized **pandas** structure called a **Series**
* Extracting a single column from a *DataFrame* returns a **Series**
    * Note: The **Series** extracted has its own head method and inherits the name attribute from the *DataFrame* column
* To extract the numerical entries from the **Series**, use the values attribute
    * Data in the **Series** actually form a *NumPy* array which is what the values attribute yields  
* **Series** - 1D labeled *NumPy* array
* **DataFrame** - 2D labeled array whose columns are **Series**

In [7]:
low = AAPL['Low']
type(low)

pandas.core.series.Series