# Pandas 기초

- Python에서 R 만큼의 강력한 데이터 핸들링 성능을 제공하는 모듈이다.
- 코딩 가능하고 응용 가능한 엑셀로 받아들여도 된다.
- 단일 프로세스에서는 최대 효율로 누군가 스테로이드를 맞은 엑셀로 표현한다.

- Pandas 공식문서를 확인해 본다.
https://pandas.pydata.org/docs/reference/api/pandas.Series.html

#### 🔰 Import Module

- pandas는 통상 별칭으로 pd를 사용한다.
- 수치해석적 함수가 많은 numpy의 별칭은 주로 np를 사용한다.

In [2]:
import pandas as pd
import numpy as np

-----

#### 🔰 Series

- index와 value로 이루어져 있다
- 한 가지 데이터 타입만 가질 수 있다
- DataFrame의 column 한 줄 한 줄이 Series이다.

In [3]:
pd.Series()

Series([], dtype: object)

⬆ 잘 모를 때는 위에 보이는 틀에 맞춰 써본다.

In [4]:
pd.Series([1, 2, 3, 4])

0    1
1    2
2    3
3    4
dtype: int64

In [6]:
pd.Series([1, 2, 3, 4], dtype=float64)

NameError: name 'float64' is not defined

In [7]:
pd.Series([1, 2, 3, 4], dtype=np.float64)

0    1.0
1    2.0
2    3.0
3    4.0
dtype: float64

In [8]:
pd.Series([1, 2, 3, 4], dtype=float)

0    1.0
1    2.0
2    3.0
3    4.0
dtype: float64

In [9]:
pd.Series([1, 2, 3, 4], dtype=str)

0    1
1    2
2    3
3    4
dtype: object

👆 object는 String과 동일한 개념이다.

In [10]:
pd.Series(np.array([1, 2, 3]))

0    1
1    2
2    3
dtype: int32

In [11]:
pd.Series({"key": "Value"})

key    Value
dtype: object

In [12]:
pd.Series([1, 2, 3, "5"])

0    1
1    2
2    3
3    5
dtype: object

👆 전체를 문자열 데이터로 인식한다.</br>
⭐ 즉, Series는 한 가지 데이터 타입만 가질 수 있다.

In [13]:
data = pd.Series([1, 2, 3, 4])
data

0    1
1    2
2    3
3    4
dtype: int64

In [15]:
print(data % 2)
data % 2

0    1
1    0
2    1
3    0
dtype: int64


0    1
1    0
2    1
3    0
dtype: int64

**<날짜 데이터>**

In [19]:
dates = pd.date_range("20230101", periods=6)
dates

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05', '2023-01-06'],
              dtype='datetime64[ns]', freq='D')

In [17]:
pd.date_range("20240101", periods=60)

DatetimeIndex(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04',
               '2024-01-05', '2024-01-06', '2024-01-07', '2024-01-08',
               '2024-01-09', '2024-01-10', '2024-01-11', '2024-01-12',
               '2024-01-13', '2024-01-14', '2024-01-15', '2024-01-16',
               '2024-01-17', '2024-01-18', '2024-01-19', '2024-01-20',
               '2024-01-21', '2024-01-22', '2024-01-23', '2024-01-24',
               '2024-01-25', '2024-01-26', '2024-01-27', '2024-01-28',
               '2024-01-29', '2024-01-30', '2024-01-31', '2024-02-01',
               '2024-02-02', '2024-02-03', '2024-02-04', '2024-02-05',
               '2024-02-06', '2024-02-07', '2024-02-08', '2024-02-09',
               '2024-02-10', '2024-02-11', '2024-02-12', '2024-02-13',
               '2024-02-14', '2024-02-15', '2024-02-16', '2024-02-17',
               '2024-02-18', '2024-02-19', '2024-02-20', '2024-02-21',
               '2024-02-22', '2024-02-23', '2024-02-24', '2024-02-25',
      

-----

#### 🔰 DataFrame

- pd.Series() 
    - index, value</br>
- pd.DataFrame() 
    - index, value, **column**

In [None]:
pd.DataFrame('data, index= , columns= ')
# jupyter notebook에서 [shift + tab]키를 누르면 함수 사용에 대한 상세한 설명(Init signature: & Docstring:)이 나온다.

In [18]:
# 표준정규분포에서 샘플링한 난수 생성
data = np.random.randn(6, 4)
data

array([[-0.37484744, -0.17419817, -0.07262281,  0.8790324 ],
       [-0.28999003, -1.91634375, -0.23137327, -0.92556844],
       [ 0.12510264, -1.21995893, -0.5331098 ,  0.72458363],
       [-1.76312672, -1.59429909,  0.51346475,  0.40591184],
       [ 0.12999805,  0.19284568,  0.17997607, -1.02660509],
       [ 1.10226085,  0.41388531, -0.12591309, -0.20470654]])

`(*args: int)` -> ndarray[Any, dtype[float64]]

__randn__(d0, d1, ..., dn)

Return a sample (or samples) from the "standard normal" distribution.

If positive int_like arguments are provided, randn generates an array of shape (d0, d1, ..., dn), filled with random floats sampled from a univariate "normal" (Gaussian) distribution of mean 0 and variance 1. A single float randomly sampled from the distribution is returned if no argument is provided.

In [20]:
df = pd.DataFrame(data, index=dates, columns=["A", "B", "C", "D"])
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


-----

#### 🔰 데이터 프레임 정보(내용) 탐색

- Pandas DataFrame 객체의 메서드 : df.head(), df.tail(), ...

- Pandas DataFrame 객체의 속성(변수) : df.index, df.columns, df.values, ...

In [21]:
df.head()

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605


In [22]:
df.tail()

Unnamed: 0,A,B,C,D
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [23]:
df.index

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05', '2023-01-06'],
              dtype='datetime64[ns]', freq='D')

In [24]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [25]:
df.values

array([[-0.37484744, -0.17419817, -0.07262281,  0.8790324 ],
       [-0.28999003, -1.91634375, -0.23137327, -0.92556844],
       [ 0.12510264, -1.21995893, -0.5331098 ,  0.72458363],
       [-1.76312672, -1.59429909,  0.51346475,  0.40591184],
       [ 0.12999805,  0.19284568,  0.17997607, -1.02660509],
       [ 1.10226085,  0.41388531, -0.12591309, -0.20470654]])

- **df.info()**

	- DataFrame의 개요(기본 정보)를 확인하는 메서드

In [26]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6 entries, 2023-01-01 to 2023-01-06
Freq: D
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       6 non-null      float64
 1   B       6 non-null      float64
 2   C       6 non-null      float64
 3   D       6 non-null      float64
dtypes: float64(4)
memory usage: 240.0 bytes


- **df.describe()**

	- count : colum별 value 개수
    - mean : 평균값
    - std : 표준편차
    - min, max : 최소, 최대 값

In [27]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.178434,-0.716345,-0.04493,-0.024559
std,0.936986,0.986144,0.358551,0.825998
min,-1.763127,-1.916344,-0.53311,-1.026605
25%,-0.353633,-1.500714,-0.205008,-0.745353
50%,-0.082444,-0.697079,-0.099268,0.100603
75%,0.128774,0.101085,0.116826,0.644916
max,1.102261,0.413885,0.513465,0.879032


-----

#### 🔰 데이터 정렬


- __df.sort_values(by='')__

	- 특정 컬럼(열)을 기준으로 데이터를 정렬한다.

In [28]:
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [29]:
df.sort_values(by='B')

Unnamed: 0,A,B,C,D
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [30]:
df.sort_values(by='B', ascending=False)

Unnamed: 0,A,B,C,D
2023-01-06,1.102261,0.413885,-0.125913,-0.204707
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568


- 정렬 결과를 데이터에 반영하려면 inplace param이 필요하다.

In [31]:
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [32]:
df.sort_values(by='B', ascending=False, inplace=True)
df

Unnamed: 0,A,B,C,D
2023-01-06,1.102261,0.413885,-0.125913,-0.204707
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568


#### 🔰 데이터 선택

- 한 개의 컬럼 선택

In [33]:
df['A']

2023-01-06    1.102261
2023-01-05    0.129998
2023-01-01   -0.374847
2023-01-03    0.125103
2023-01-04   -1.763127
2023-01-02   -0.289990
Name: A, dtype: float64

In [34]:
type(df['A'])

pandas.core.series.Series

In [35]:
df.A

2023-01-06    1.102261
2023-01-05    0.129998
2023-01-01   -0.374847
2023-01-03    0.125103
2023-01-04   -1.763127
2023-01-02   -0.289990
Name: A, dtype: float64

👆 컬럼명이 문자열일 때만 가능하다. 숫자이면 불가능!

- 두 개 이상의 컬럼 선택

	- param은 리스트 형태여야 한다.

In [36]:
df[['A', 'B']]

Unnamed: 0,A,B
2023-01-06,1.102261,0.413885
2023-01-05,0.129998,0.192846
2023-01-01,-0.374847,-0.174198
2023-01-03,0.125103,-1.219959
2023-01-04,-1.763127,-1.594299
2023-01-02,-0.28999,-1.916344


#### 🔰 Slicing : offset index

- [n:m] : n부터 (m-1)까지
- index나 column의 이름으로 slicing하는 경우에는 명시한 이름까지 포함된다.

In [37]:
df = pd.DataFrame(data, index=dates, columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [38]:
df[0:3]

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584


In [39]:
df["20230101":"20230104"]

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912


- __df.loc[index, coloum]__

	- location
	- 이름으로 특정 인덱스(행), 컬럼(열)을 선택해야 한다.
	- `:`는 전체를 의미한다.

In [40]:
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [41]:
df.loc[:, 'A']

2023-01-01   -0.374847
2023-01-02   -0.289990
2023-01-03    0.125103
2023-01-04   -1.763127
2023-01-05    0.129998
2023-01-06    1.102261
Freq: D, Name: A, dtype: float64

In [42]:
df.loc[:, ['A', 'B']]

Unnamed: 0,A,B
2023-01-01,-0.374847,-0.174198
2023-01-02,-0.28999,-1.916344
2023-01-03,0.125103,-1.219959
2023-01-04,-1.763127,-1.594299
2023-01-05,0.129998,0.192846
2023-01-06,1.102261,0.413885


In [45]:
df.loc["20230102", ['A', 'B']]

A   -0.289990
B   -1.916344
Name: 2023-01-02 00:00:00, dtype: float64

In [43]:
df.loc["20230102":"20230104", ['A', 'B']]

Unnamed: 0,A,B
2023-01-02,-0.28999,-1.916344
2023-01-03,0.125103,-1.219959
2023-01-04,-1.763127,-1.594299


In [44]:
df.loc["20230102":"20230104", 'A':'D']

Unnamed: 0,A,B,C,D
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912


- __df.iloc[rows, columns]__

	- index location
	- 컴퓨터가 인식하는 인덱스 값으로 선택

In [46]:
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [48]:
df.iloc[3]

A   -1.763127
B   -1.594299
C    0.513465
D    0.405912
Name: 2023-01-04 00:00:00, dtype: float64

In [49]:
df.iloc[3, 2]

0.5134647463577452

In [50]:
df.iloc[3:5, 0:2]

Unnamed: 0,A,B
2023-01-04,-1.763127,-1.594299
2023-01-05,0.129998,0.192846


In [51]:
df.iloc[[1, 2, 4], [0, 2]]

Unnamed: 0,A,C
2023-01-02,-0.28999,-0.231373
2023-01-03,0.125103,-0.53311
2023-01-05,0.129998,0.179976


In [52]:
df.iloc[:, 1:3]

Unnamed: 0,B,C
2023-01-01,-0.174198,-0.072623
2023-01-02,-1.916344,-0.231373
2023-01-03,-1.219959,-0.53311
2023-01-04,-1.594299,0.513465
2023-01-05,0.192846,0.179976
2023-01-06,0.413885,-0.125913


-----

#### 🔰 조건(condition)

In [53]:
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [54]:
# A 컬럼에서 0보다 큰 숫자(양수)만 선택

df['A'] > 0

2023-01-01    False
2023-01-02    False
2023-01-03     True
2023-01-04    False
2023-01-05     True
2023-01-06     True
Freq: D, Name: A, dtype: bool

- masking

In [55]:
# 전체 DataFrame에서 'A' 컬럼 내 value 중 0보다 큰 것을 만족하는 rows 출력

df[df['A'] > 0]

Unnamed: 0,A,B,C,D
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [56]:
# 전체 DataFrame에서 0보다 큰 것을 만족하는 value만 출력

df[df > 0]

Unnamed: 0,A,B,C,D
2023-01-01,,,,0.879032
2023-01-02,,,,
2023-01-03,0.125103,,,0.724584
2023-01-04,,,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,
2023-01-06,1.102261,0.413885,,


👆 조건(df>0)을 만족하지 못하는 value는 NaN으로 표시된다.

- NaN : Not a Number
	- NaN은 데이터가 아니라는 의미이다.

-----

#### 🔰 컬럼 추가

- 해당 컬럼이 기존 데이터에 없다면 __추가__

- 해당 컬럼이 기존 데이터에 있다면 __수정__

In [57]:
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [58]:
df['E'] = ['one', 'one', 'two', 'three', 'four', 'seven']
df

Unnamed: 0,A,B,C,D,E
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032,one
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568,one
2023-01-03,0.125103,-1.219959,-0.53311,0.724584,two
2023-01-04,-1.763127,-1.594299,0.513465,0.405912,three
2023-01-05,0.129998,0.192846,0.179976,-1.026605,four
2023-01-06,1.102261,0.413885,-0.125913,-0.204707,seven


- __df.isin()__

	- 특정 value의 존재 유무를 확인에 그 결과를 논리형으로 반환한다.
	
	- 조사 범위를 컬럼으로 지정할 수 있고, 데이터 전체를 대상으로도 할 수 있다.

	- (values: Iterable | Series | dict) -> Series[_bool]

		Whether elements in Series are contained in values.

		Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

		Parameters</br>
		values : set or list-like</br>
		The sequence of values to test. Passing in a single string will raise a TypeError. Instead, turn a single string into a list of one element.

In [59]:
df['E'].isin(['two'])

2023-01-01    False
2023-01-02    False
2023-01-03     True
2023-01-04    False
2023-01-05    False
2023-01-06    False
Freq: D, Name: E, dtype: bool

In [60]:
df['E'].isin(['two', 'three', 'five'])

2023-01-01    False
2023-01-02    False
2023-01-03     True
2023-01-04     True
2023-01-05    False
2023-01-06    False
Freq: D, Name: E, dtype: bool

In [61]:
# 전체 데이터에 masking : 조건이 True인 value가 포함된 rows 출력

df[df['E'].isin(['two', 'three', 'five'])]

Unnamed: 0,A,B,C,D,E
2023-01-03,0.125103,-1.219959,-0.53311,0.724584,two
2023-01-04,-1.763127,-1.594299,0.513465,0.405912,three


-----

#### 🔰 컬럼 제거

- 특정 컬럼을 제거한다.

- **del** df[columns_name]: 제거 결과가 데이터 바로 반영된다.
- df.**drop**([columns_name]) : inplace param을 적용해야 제거 결과가 데이터에 반영된다.

In [62]:
df

Unnamed: 0,A,B,C,D,E
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032,one
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568,one
2023-01-03,0.125103,-1.219959,-0.53311,0.724584,two
2023-01-04,-1.763127,-1.594299,0.513465,0.405912,three
2023-01-05,0.129998,0.192846,0.179976,-1.026605,four
2023-01-06,1.102261,0.413885,-0.125913,-0.204707,seven


In [63]:
del df['E']
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


- drip()은 axis 설정이 필요하다.

	- axis : {0 or 'index', 1 or 'columns'}, default 0

In [65]:
df.drop(['D'], axis=1) # axis=1 세로(column)

Unnamed: 0,A,B,C
2023-01-01,-0.374847,-0.174198,-0.072623
2023-01-02,-0.28999,-1.916344,-0.231373
2023-01-03,0.125103,-1.219959,-0.53311
2023-01-04,-1.763127,-1.594299,0.513465
2023-01-05,0.129998,0.192846,0.179976
2023-01-06,1.102261,0.413885,-0.125913


In [66]:
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [69]:
# axis=0 일 때는 Row 이름을 써줘야 한다.

df.drop(['20230104'], inplace=True)
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


#### 🔰 apply() 메서드

- DataFrame에 함수 기능을 적용해준다.

- 적용하고자 하는 기능의 함수를 인수로 넣어주면 원하는 연산의 결과를 반환받을 수 있다.

- func : function - Python function or NumPy ufunc to apply.

In [70]:
df = pd.DataFrame(data, index=dates, columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


In [71]:
df['A'].apply("sum")

-1.0706026548216272

In [72]:
df['A'].apply(np.sum)

2023-01-01   -0.374847
2023-01-02   -0.289990
2023-01-03    0.125103
2023-01-04   -1.763127
2023-01-05    0.129998
2023-01-06    1.102261
Freq: D, Name: A, dtype: float64

In [76]:
df[['A', 'D']].apply("sum")

A   -1.070603
D   -0.147352
dtype: float64

In [79]:
df[['A', 'D']].apply(np.sum)

A   -1.070603
D   -0.147352
dtype: float64

In [80]:
df.apply("sum")

A   -1.070603
B   -4.298069
C   -0.269578
D   -0.147352
dtype: float64

In [78]:
df.apply(np.sum)

A   -1.070603
B   -4.298069
C   -0.269578
D   -0.147352
dtype: float64

In [73]:
df['A'].apply("mean")

-0.17843377580360453

In [74]:
df['A'].apply(np.mean)

2023-01-01   -0.374847
2023-01-02   -0.289990
2023-01-03    0.125103
2023-01-04   -1.763127
2023-01-05    0.129998
2023-01-06    1.102261
Freq: D, Name: A, dtype: float64

In [77]:
df['A'].apply("min"), df['A'].apply("max")

(-1.7631267247078541, 1.1022608472636044)

In [81]:
df['A'].apply(np.std)

2023-01-01    0.0
2023-01-02    0.0
2023-01-03    0.0
2023-01-04    0.0
2023-01-05    0.0
2023-01-06    0.0
Freq: D, Name: A, dtype: float64

In [82]:
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.374847,-0.174198,-0.072623,0.879032
2023-01-02,-0.28999,-1.916344,-0.231373,-0.925568
2023-01-03,0.125103,-1.219959,-0.53311,0.724584
2023-01-04,-1.763127,-1.594299,0.513465,0.405912
2023-01-05,0.129998,0.192846,0.179976,-1.026605
2023-01-06,1.102261,0.413885,-0.125913,-0.204707


- 사용자 정의 함수 사용

In [83]:
def plusminus(num):
    return "plus" if num > 0 else "minus"

In [84]:
df['A'].apply(plusminus)

2023-01-01    minus
2023-01-02    minus
2023-01-03     plus
2023-01-04    minus
2023-01-05     plus
2023-01-06     plus
Freq: D, Name: A, dtype: object

- 람다 함수 사용

In [85]:
df['A'].apply(lambda num: "plus" if num > 0 else "minus")

2023-01-01    minus
2023-01-02    minus
2023-01-03     plus
2023-01-04    minus
2023-01-05     plus
2023-01-06     plus
Freq: D, Name: A, dtype: object