### What does pandas consist of?
* DS: Series(Homo-), DataFrame(Hetero-), Panel
* Index objects: Hierarchical axis indexing
* **Group by engine**: aggregating & transfroming
* Date tools: range, offset, frequencies
* I/O tools: CSV, XLS, HDF5/PyTables, SQL DB
* Mem-effcient DS for sparse data
* Sliding windows tools: statistics, rolling mean, rolling st.d, regression, etc.

### Misc.
* Series, DataFrame, Panel -> container hierarchy + dict-like operations
* numpy axis v.s. pandas's design: the **index (the rows)** and the **columns**
* **Size-mutable**: Columns can be inserted and deleted from data structures for size mutability
* 
* 
* Short-hand: 
  - pandas does not implement significant modeling functionality outside of linear and panel regression; for this, look to statsmodels and scikit-learn. 

### Misc.Misc.
* pandas development began at **AQR Capital Management** in April 2008
* 3-clause ("Simplified" or "New") BSD license

### 10-Min to pandas

* Object Creation
  - Intro to DS in pandas
  - IPython, tab completion for column names, and public attributes
  
* Viewing Data
  - df.head(), df.tail()
  - df.index, df.columns, df.values, df.T
  - df.sort_index(axis=...), df.sort_values(by=...)
  
* Selection
  - optimized access methods: .at, .iat, .loc, .iloc and .ix.
  - Selecting via df.[a:b], which slices the rows.
  - Selecting a single column, which yields a Series
  - Selection by Label
      - df.loc[]
      - df.loc[:,['A','B']]: multi-axis label selecting
      - **both endpoints are included**
  - Selection by Position
      - .iloc[]
      - **right end-piont exclusive**
  - Boolean Indexing
      - where operation
      - isin() method filtering
* Setting
  - by label, by position, 
  - by *`where`* operation
  - assign numpy array
* Missing Data
  - df.reindex: returns a **copy** of the data
  - df1.dropna(how='any')
  - df1.fillna(value=5)
  - get the boolean mask where values are nan: pd.isnull(df1)
* Operations
  - Stats
  - Apply
  - Histogramming and Discretization
  - String Methods: obj_series.str.<string_method>()
      - pattern-matching use RegEx by default, some cases is *always*
      - [Vectorized String Methods](http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods)
* Merge
  - Concat
      pd.concat([...])
  - Join: SQL style merges. 
    ```python
        left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})
        right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})
        pd.merge(left, right, on='key')
    ```
    
  - Append
    ```python
        df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
        s = df.iloc[3]
        df.append(s, ignore_index=True)
    ```
* Grouping
  - **Splitting** the data into groups based on some criteria
  - **Applying** a function to each group independently
  - **Combining** the results into a data structure
  - One or more of the above
  - e.g. df.groupby('A').sum(), df.groupby(['A', 'B']).sum()
* Reshaping
  - stack & unstack & unstack by level
  - pivot table (using stack/unstack underlying)
* Time Series
* Categoricals ????? [link](http://pandas.pydata.org/pandas-docs/stable/10min.html#categoricals)
* Plotting
* Getting Data In/Out


In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [11]:
print(df)
df

   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50


Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50
