# Pandas / Python 추가 설명 

## MultiIndex

- Index Label: n-Tuple
- 계층적(Hierarchical) 구조 간주 가능


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
loc_arr = list(zip(*[['서울','서울', '서울', '서울', '경기', '경기', '경기'],
                      ['종로구', '중구', '강동구', '강남구', '안양시', '성남시', '가평군']]))

loc_index = pd.MultiIndex.from_tuples(loc_arr, names=['loc_wide', 'loc_mid'])
df_loc = pd.DataFrame(np.random.randn(len(loc_arr), 4), index=loc_index, columns=list('ABCD'))
df_loc

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C,D
loc_wide,loc_mid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
서울,종로구,1.443038,-0.807395,0.429408,0.568598
서울,중구,-1.551394,-0.033334,-2.304304,1.999278
서울,강동구,0.677651,-2.629746,0.830142,-0.650174
서울,강남구,0.71828,-0.844303,-1.644878,-0.204476
경기,안양시,0.019149,0.964109,-0.336355,-1.237866
경기,성남시,-0.316764,-0.286946,-0.256938,0.05998
경기,가평군,-0.193901,-1.247648,0.638637,0.223801


In [3]:
df_loc.loc['서울']

Unnamed: 0_level_0,A,B,C,D
loc_mid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
종로구,1.443038,-0.807395,0.429408,0.568598
중구,-1.551394,-0.033334,-2.304304,1.999278
강동구,0.677651,-2.629746,0.830142,-0.650174
강남구,0.71828,-0.844303,-1.644878,-0.204476


In [4]:
df_kyunggi = df_loc.loc['경기']
df_kyunggi.index.str.endswith('시')

array([ True,  True, False], dtype=bool)

In [5]:
df_kyunggi = df_loc.loc['경기']
df_kyunggi[df_kyunggi.index.str.endswith('시')]


Unnamed: 0_level_0,A,B,C,D
loc_mid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
안양시,0.019149,0.964109,-0.336355,-1.237866
성남시,-0.316764,-0.286946,-0.256938,0.05998


## Time Series

In [6]:
rng = pd.date_range('2017-01-01 00', periods=5, freq='H')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2017-01-01 00:00:00    0.802577
2017-01-01 01:00:00    0.909392
2017-01-01 02:00:00    2.335054
2017-01-01 03:00:00   -0.170556
2017-01-01 04:00:00   -0.443956
Freq: H, dtype: float64

### Time Offset Alias

Alias | Description
---|---
B | business day frequency
C | custom business day frequency (experimental)
D | calendar day frequency
W | weekly frequency
M | month end frequency
SM | semi-month end frequency (15th and end of month)
BM | business month end frequency
CBM | custom business month end frequency
MS | month start frequency
SMS | semi-month start frequency (1st and 15th)
BMS | business month start frequency
CBMS | custom business month start frequency
Q | quarter end frequency
BQ | business quarter endfrequency
QS | quarter start frequency
BQS | business quarter start frequency
A | year end frequency
BA | business year end frequency
AS | year start frequency
BAS | business year start frequency
BH | business hour frequency
H | hourly frequency
T, min | minutely frequency
S | secondly frequency
L, ms | milliseconds
U, us | microseconds
N | nanoseconds

- 월 주기 `M` : 각 주기 마지막 시점 기준으로 timestamp 기록됨

In [7]:
rng = pd.date_range('2016-01-01 00', periods=5, freq='M')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2016-01-31    2.803418
2016-02-29   -0.056898
2016-03-31   -0.379156
2016-04-30   -1.388276
2016-05-31    0.959279
Freq: M, dtype: float64

In [8]:
ps = ts.to_period()
ps.to_timestamp()

2016-01-01    2.803418
2016-02-01   -0.056898
2016-03-01   -0.379156
2016-04-01   -1.388276
2016-05-01    0.959279
Freq: MS, dtype: float64

In [9]:
rng = pd.date_range('2016-01-01 00', periods=5, freq='MS')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts
#ps = ts.to_period()
#ps.to_timestamp()

2016-01-01    0.273257
2016-02-01   -0.793216
2016-03-01   -1.821601
2016-04-01   -0.980874
2016-05-01   -1.010886
Freq: MS, dtype: float64

In [10]:
ps = ts.to_period()
ps.to_timestamp()

2016-01-01    0.273257
2016-02-01   -0.793216
2016-03-01   -1.821601
2016-04-01   -0.980874
2016-05-01   -1.010886
Freq: MS, dtype: float64

## Python 

### List Comprehension

- 아래와 같은 Sequence를 List Comprehension 형태로 표현 가능
  - $ S = \{ x^2 \mid x \in \{ 0 ... 9 \} \}$

In [11]:
[x**2 for x in range(0,10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

- 예: 2~49 소수 구하기: 소수가 아닌 `noprimes`를 생성하고, 거기에 포함되지 않는 수를 소수 `prime`으로 채택

In [12]:
noprimes = {j for i in range(2,8) for j in range(i*2, 50, i)} ## 중복 방지 위해 Set 형으로
primes = [x for x in range(2,50) if x not in noprimes] 
primes

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

- 위의 식을 다음과 같이 묶을수도 있으나, 가독성이 좋지 않다고 판단되면 나누는 것이 좋음

In [13]:
primes = [x for x in range(2,50) if x not in {j for i in range(2,8) for j in range(i*2, 50, i)}] 
primes

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

- 다음과 같이 `tuple` `list`를 만드는데 활용

In [14]:
words = 'The quick brown fox jumps over the lazy dog'.split()
stuff = [(w.upper(), w.lower(), len(w)) for w in words]
stuff

[('THE', 'the', 3),
 ('QUICK', 'quick', 5),
 ('BROWN', 'brown', 5),
 ('FOX', 'fox', 3),
 ('JUMPS', 'jumps', 5),
 ('OVER', 'over', 4),
 ('THE', 'the', 3),
 ('LAZY', 'lazy', 4),
 ('DOG', 'dog', 3)]

- 위와 같은 방법을 사용하거나, `map()` 함수와 `lambda` 함수를 이용할 수 있음

> 단, `map()`의 결과는 iterable이므로, 리스트로 활용하려면 `list()` 로 변환

In [15]:
stuff = list(map(lambda w: (w.upper(), w.lower(), len(w)), words)) 
stuff

[('THE', 'the', 3),
 ('QUICK', 'quick', 5),
 ('BROWN', 'brown', 5),
 ('FOX', 'fox', 3),
 ('JUMPS', 'jumps', 5),
 ('OVER', 'over', 4),
 ('THE', 'the', 3),
 ('LAZY', 'lazy', 4),
 ('DOG', 'dog', 3)]

### Packing / Unpacking Arguments

#### Arguments Packing

> `*`, `**` 사용 Argument

- `*` : unnamed args - 전체는 tuple로 묶임
- `**` : named args - 전체는 map으로 묶임
- named args는 unnamed args 앞에 와야 함!

In [16]:
def func(*args, **kwargs):
    print(type(args), type(kwargs))
    print("args: %s" % str(args))
    print("kwargs: %s" % kwargs)

func(1,"a", 27, x=1, y="abc")

<class 'tuple'> <class 'dict'>
args: (1, 'a', 27)
kwargs: {'x': 1, 'y': 'abc'}


#### Unpacking

> Argument로 들어갈 iterable을 `*` (unnamed args), `**` (named args)로 변환

In [17]:
words = 'The quick brown fox jumps over the lazy dog'.split()
term_len = {x:len(x) for x in words}
func(term_len)

<class 'tuple'> <class 'dict'>
args: ({'The': 3, 'quick': 5, 'brown': 5, 'fox': 3, 'jumps': 5, 'over': 4, 'the': 3, 'lazy': 4, 'dog': 3},)
kwargs: {}


In [18]:
func(*term_len)

<class 'tuple'> <class 'dict'>
args: ('The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog')
kwargs: {}


In [19]:
func(**term_len)

<class 'tuple'> <class 'dict'>
args: ()
kwargs: {'The': 3, 'quick': 5, 'brown': 5, 'fox': 3, 'jumps': 5, 'over': 4, 'the': 3, 'lazy': 4, 'dog': 3}


In [20]:
stuff = map(lambda w: (w.upper(), w.lower(), len(w)), words)
print(map)
func(*stuff)

<class 'map'>
<class 'tuple'> <class 'dict'>
args: (('THE', 'the', 3), ('QUICK', 'quick', 5), ('BROWN', 'brown', 5), ('FOX', 'fox', 3), ('JUMPS', 'jumps', 5), ('OVER', 'over', 4), ('THE', 'the', 3), ('LAZY', 'lazy', 4), ('DOG', 'dog', 3))
kwargs: {}
