# Mod09 Series Indexing and Selection

## Data Selection in Series

### Series as dictionary

In [2]:
import numpy as np
import pandas as pd

In [None]:
np.__version__


In [3]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [4]:
data['b']

0.5

In [6]:
# Fancy index 
data[['a','c']]

a    0.25
c    0.75
dtype: float64

In [7]:
# relational expression > return series bollean array
data>0.5

a    False
b    False
c     True
d     True
dtype: bool

In [8]:
# use as masking
data[data>0.5]

c    0.75
d    1.00
dtype: float64

We can also use dictionary-like Python expressions and methods to examine the keys/indices and values:

In [9]:
# return bollean
'a' in data

True

In [10]:
# id auto index > rangeindex
data.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [11]:
data.keys()

Index(['a', 'b', 'c', 'd'], dtype='object')

extend a ``Series`` by assigning to a new index value:

In [12]:
# writeable instantly
data['e'] = 1.25
data

a    0.25
b    0.50
c    0.75
d    1.00
e    1.25
dtype: float64

### Series as one-dimensional array

In [14]:
data

a    0.25
b    0.50
c    0.75
d    1.00
e    1.25
dtype: float64

In [13]:
data['b']

0.5

In [15]:
# slicing by explicit index
data['a':'c']

a    0.25
b    0.50
c    0.75
dtype: float64

In [16]:
data[2]

0.75

In [17]:
# slicing by implicit integer index
data[0:2]

a    0.25
b    0.50
dtype: float64

In [19]:
data[::2]

a    0.25
c    0.75
e    1.25
dtype: float64

In [20]:
data[0:4:2]

a    0.25
c    0.75
dtype: float64

In [21]:
data[:-1:2]

a    0.25
c    0.75
dtype: float64

In [22]:
# masking
data[(data > 0.3) & (data < 0.8)]

b    0.50
c    0.75
dtype: float64

In [23]:
# fancy indexing
data[['a', 'e']]

a    0.25
e    1.25
dtype: float64

### Indexers: loc, iloc

In [25]:
data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])
data

1    a
3    b
5    c
dtype: object

運算速度(由快而慢)： 
numeric numpy array > object numpy array > python list
! 運算完要檢查，避免資料型態變成py list 拖慢效能

In [27]:
data.values
# list效能最差，其次為object

array(['a', 'b', 'c'], dtype=object)

In [30]:
# explicit index when indexing(如果是indexing，就是明確的)
data[1]

'a'

In [31]:
# implicit index when slicing(如果是slicing，就是不明確的)
data[1:3]

3    b
5    c
dtype: object

In [32]:
data[0:2]  # slicing > 不明確

1    a
3    b
dtype: object

the ``loc`` attribute allows indexing and slicing that always references the explicit index:

In [33]:
data.loc[1]

'a'

In [34]:
data.loc[1:3]

1    a
3    b
dtype: object

The ``iloc`` attribute allows indexing and slicing that always references the implicit Python-style index:

In [35]:
data.iloc[1]

'b'

In [37]:
data.iloc[1:3]

3    b
5    c
dtype: object

In [38]:
data.at[1] # 相當於loc[1]，要取代ix。抓key為"1"的元素

'a'

In [39]:
data.iat[1] # 相當於iloc[1]。抓index第1個位置的元素

'b'

In [42]:
data.ix[1]

AttributeError: 'Series' object has no attribute 'ix'

In [40]:
data

1    a
3    b
5    c
dtype: object

效能測試

In [54]:
%%timeit
data.loc[3]

5.06 µs ± 61.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [55]:
%%timeit
data.iloc[1] 

3.14 µs ± 26.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [56]:
%%timeit
data.at[3]

2.97 µs ± 9.67 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [57]:
%%timeit
data.iat[1]

1.75 µs ± 5.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
