# CHAPTER 5 
---
# Getting Started With pandas

In [6]:
import pandas as pd
from pandas import DataFrame, Series

In [7]:
import numpy as np
import matplotlib as plt

## 5.1 Introduction to pandas Data Structures 

### Series
- Attributes
    - Series.values
    - Series.index
    - Series.name
    - Series.index.name

### DataFrame
- Attributes
    - DataFrames.T
- Possible data inputs to DataFrame constructor
    - 2D ndarray
    - dict of arrays, lists, or tuples
    - NumPy structured/record array
    - dict of Series
    - dict of dict
    - List of dicts or Series
    - List of list or tuples
    - Another DataFrame
    - NumPy MaskedArray

### Index Objects
- some Index methods and properties
    - append | 连接另一个Index对象,产生一个新的Index
    - difference | 计算差集, 并的到一个Index
    - intersection | 计算交集
    - union | 计算并集
    - isin | 是否包含
    - delete | 删除索引i
    - drop | 删除传入的值
    - insert | 插入到索引i
    - is_monotonic | 当各元素均大于等于前一个元素时,返回True
    - is_unique | 当Index没有重复值时,返回True
    - unique | 计算Index中唯一值的数组

## 5.2 Essential Functionality

### Reindexing
- `reindex` function arguments
    - `index=`
    - `method=`
    - `fill_value=`
    - `limit=`
    - `tolerance=`
    - `level=`
    - `copy=`

### Dropping Entries from an Axis
- `drop` function arguments
    - `axis=`
    - `inplace=`

### Indexing, Selection, and Filtering
- Indexing options with DataFrame
    - `df[val]`
    - `df.loc[val]`
    - `df.loc[:, val]`
    - `df.iloc[where]`
    - `df.iloc[:, where]`
    - `df.iloc[where_i, where_j]`
    - `df.at[lable_i, lable_j]`
    - `df.iat[i, j]`
    - `reindex()`
    - `get_value(), set_value()`
    

### Integer Indexes
To keep things consistent, if you have an axis index containing integers, data selection
will always be label-oriented.  
For more precise handling, use loc (for labels) or iloc (for integers).

### Arithmetic and Data Alignment
An important pandas feature for some applications is the behavior of arithmetic between objects with different indexes.  
When you are adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the
index pairs.   
For users with database experience, this is similar to an automatic outer join on the index labels.

The internal data alignment introduces missing values in the label locations that don’t overlap.   
Missing values will then propagate in further arithmetic computations.

In the case of DataFrame, alignment is performed on both the rows and the columns.

Each of them has a counterpart, starting with the letter r, that has arguments flipped.

- Flexible arithmetic methods
    - add, radd
    - sub, rsub
    - div, rdiv
    - floordiv, rfloordiv
    - mul, rmul
    - pow, rpow
    
By default, arithmetic between DataFrame and Series matches the index of the Series on the DataFrame’s columns, broadcasting down the rows

If an index value is not found in either the DataFrame’s columns or the Series’s index,the objects will be reindexed to form the union

### Function Application and Mapping
NumPy ufuncs (element-wise array methods) also work with pandas objects  

## Exercist

### Function Application and Mapping

In [29]:
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'),
                     index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [27]:
np.abs(frame)

Unnamed: 0,b,d,e
Utah,0.242653,0.752836,1.589859
Ohio,1.106992,0.337747,0.066669
Texas,0.166825,0.472245,0.044857
Oregon,0.529458,0.107138,1.203166


In [34]:
frame.apply(lambda x: x.max() - x.min(), axis='index')

b    0.514991
d    4.395140
e    4.063117
dtype: float64

### Arithmetic and Data Alignment

In [None]:
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])

In [None]:
# Adding these together yields:
s1 + s2

#### Arithmetic methods with fill values

In [21]:
df1 = pd.DataFrame(np.arange(12.).reshape((3, 4)), columns=list('abcd'))
df2 = pd.DataFrame(np.arange(20.).reshape((4, 5)), columns=list('abcde'))
df2.loc[1 ,'b'] = np.nan

In [22]:
df1.add(df2, fill_value=0)

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,4.0
1,9.0,5.0,13.0,15.0,9.0
2,18.0,20.0,22.0,24.0,14.0
3,15.0,16.0,17.0,18.0,19.0


In [23]:
df1.radd(df2, fill_value=1)

Unnamed: 0,a,b,c,d,e
0,0.0,2.0,4.0,6.0,5.0
1,9.0,6.0,13.0,15.0,10.0
2,18.0,20.0,22.0,24.0,15.0
3,16.0,17.0,18.0,19.0,20.0


#### Operations between DataFrame and Series

In [10]:
arr = np.arange(12.).reshape((3, 4))
arr - arr[0]

array([[ 0.,  0.,  0.,  0.],
       [ 4.,  4.,  4.,  4.],
       [ 8.,  8.,  8.,  8.]])

In [12]:
frame = pd.DataFrame(np.arange(12.).reshape((4, 3)),
                     columns=list('bde'),
                     index=['Utah', 'Ohil', 'Texas', 'Oregon'])
series = frame.iloc[0]

frame - series

Unnamed: 0,b,d,e
Utah,0.0,0.0,0.0
Ohil,3.0,3.0,3.0
Texas,6.0,6.0,6.0
Oregon,9.0,9.0,9.0


In [15]:
series2 = pd.Series(range(3),index=['b', 'e', 'f'])
frame + series2

Unnamed: 0,b,d,e,f
Utah,0.0,,3.0,
Ohil,3.0,,6.0,
Texas,6.0,,9.0,
Oregon,9.0,,12.0,


In [18]:
series3 = frame['d']
frame.sub(series3, axis='index')

Unnamed: 0,b,d,e
Utah,-1.0,0.0,1.0
Ohil,-1.0,0.0,1.0
Texas,-1.0,0.0,1.0
Oregon,-1.0,0.0,1.0


### Integer Indexes

In [None]:
ser = pd.Series(np.arange(3.), index=['a', 'b', 'c']) ; ser

In [None]:
ser[-1]

### Reindexing

In [None]:
frame = pd.DataFrame(np.arange(9).reshape((3,3)),
                     index=['a', 'c', 'd'],
                     columns=['Ohio', 'Texas', 'California'])

In [None]:
frame.reindex(index = ['a', 'b', 'c', 'd'], columns = ['Texas', 'Utah', 'California'])

In [None]:
# many users perfer to use it exclusively:
frame.loc[['a', 'b', 'c', 'd'], ['Texas', 'Utah', 'California']]

### Dropping Entries from an Axis

In [None]:
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns = ['one', 'two', 'three', 'four'])

In [None]:
obj.drop('b', axis='index')

In [None]:
data.drop(['Colorado', 'Ohio'])

In [None]:
data.drop(['one', 'two'], axis='columns')

### Indexing, Selection, and Filtering

In [None]:
obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
obj[[1, 3]]

In [None]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns = ['one', 'two', 'three', 'four'])

In [None]:
data.loc[data['three'] > 5, 'three'] = 5 ; data

In [None]:
data.loc['Colorado', ['two', 'three']]

In [None]:
data.iloc[[1, 2], [3, 0, 1]]

In [None]:
data.loc[:'Utah', 'two']

In [None]:
data.iloc[:, :3][data.two > 5]