### 【ACCESSOR 접근자】
- pandas에서 str 데이터, datetime 데이터에 간편한 처리 위해 제공
- Series에 제공되는 기능
- 종류
    * str 데이터: Series.str 접근자
    * datetime 데이터: Series.dt 접근자

In [1]:
import pandas as pd

In [2]:
## DF 인스턴스 생성
dataDF = pd.DataFrame({'이름': ['홍 길동', '마 징가', '배 트맨'],
                       '댓글': ['Good Luck', 'Happy New Year', 'good day']})
dataDF

Unnamed: 0,이름,댓글
0,홍 길동,Good Luck
1,마 징가,Happy New Year
2,배 트맨,good day


In [3]:
## 댓글 컬럼의 내용을 모두 소문자로 변환
# dataDF['댓글'].lower()      # <= Series에 문자열 관련 메서드 X
dataDF['댓글'][0].lower()       # <= 원소 1개 선택 시 문자열. 문자열 관련 메서드 사용 O

dataDF['댓글'].shape[0]
for idx in range(dataDF['댓글'].shape[0]):
    dataDF['댓글'][idx] = dataDF['댓글'][idx].lower()

dataDF

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  dataDF['댓글'][idx] = dataDF['댓글'][idx].lower()


Unnamed: 0,이름,댓글
0,홍 길동,good luck
1,마 징가,happy new year
2,배 트맨,good day


In [4]:
## 댓글 컬럼의 내용을 모두 소문자로 변환
# dataDF['댓글'].lower()      # <= Series에 문자열 관련 메서드 X
# dataDF['댓글'][0].lower()       # <= 원소 1개 선택 시 문자열. 문자열 관련 메서드 사용 O
for idx in range(dataDF['댓글'].shape[0]):
    dataDF['댓글'][idx] = dataDF['댓글'][idx].lower()
display(dataDF)

for idx in dataDF.index:
    dataDF['댓글'][idx] = dataDF['댓글'][idx].upper()

display(dataDF)

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  dataDF['댓글'][idx] = dataDF['댓글'][idx].lower()


Unnamed: 0,이름,댓글
0,홍 길동,good luck
1,마 징가,happy new year
2,배 트맨,good day


You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  dataDF['댓글'][idx] = dataDF['댓글'][idx].upper()


Unnamed: 0,이름,댓글
0,홍 길동,GOOD LUCK
1,마 징가,HAPPY NEW YEAR
2,배 트맨,GOOD DAY


[1] Series.str 접근자: 문자열 관련 메서드들 제공 <hr>

In [5]:
dataSR = pd.Series( [" Hello, World! ",
                    "Pandas_is_FUN",
                    "email: user01@example.com",
                    None
])

display(dataSR, dataSR.index)

0               Hello, World! 
1                Pandas_is_FUN
2    email: user01@example.com
3                         None
dtype: object

RangeIndex(start=0, stop=4, step=1)

In [6]:
## 1) 좌우 공백 제거 & 소문자화

print(dataSR.str.strip())

print(dataSR.str.strip().str.lower())

0                Hello, World!
1                Pandas_is_FUN
2    email: user01@example.com
3                         None
dtype: object
0                hello, world!
1                pandas_is_fun
2    email: user01@example.com
3                         None
dtype: object


In [15]:
## 2) 포함 여부(boolean) / 위치 찾기
## contains(): [True/False] / case: 대소문자 구분할 지 말지 / na: 결측치에 대한 처리를 True or False로 반환 
print(dataSR.str.contains("world", case=False, na=False))

## find(): 문자열 안에서 제일 먼저 발견되는 인덱스 반환
print(dataSR.str.find("FUN"))                                   # 못찾으면 -1 / None인 데이터에 대하여 NaN 처리

0     True
1    False
2    False
3    False
dtype: bool
0    -1.0
1    10.0
2    -1.0
3     NaN
dtype: float64


In [8]:
## 3) 부분 문자열 추출 (슬라이스) / 길이
print(dataSR.str.slice(0, 5))       # 앞 5글자

print(dataSR.str.len())             # 글자 수 (한글도 문자 1개로 셈)

0     Hell
1    Panda
2    email
3     None
dtype: object
0    15.0
1    13.0
2    25.0
3     NaN
dtype: float64


In [9]:
## 4) 구분자 분리 및 확장
## 구분자 분리 -> List 담아서 반환: Series => Series
print(dataSR.str.split("_"))

ret = dataSR.str.split("_")
print(type(ret[0]))     # ret[0]은 첫 번째 행의 리스트 전체

## expand=True => DataFrame으로 확장: Series => DataFrame
print(dataSR.str.split("_", expand=True))

ret = dataSR.str.split("_", expand=True)
print(type(ret))

0              [ Hello, World! ]
1              [Pandas, is, FUN]
2    [email: user01@example.com]
3                           None
dtype: object
<class 'list'>
                           0     1     2
0             Hello, World!   None  None
1                     Pandas    is   FUN
2  email: user01@example.com  None  None
3                       None  None  None
<class 'pandas.core.frame.DataFrame'>


[3] Series.dt 접근자: 날짜/시간 관련 메서드들 <hr>

In [10]:
## Series 인스턴스 생성
dateSR = pd.Series([
    "2025-10-01 08:30",
    "2025/10/02 21:15",
    "Oct 03, 2025 06:00",
    None
])

display(dateSR)

0      2025-10-01 08:30
1      2025/10/02 21:15
2    Oct 03, 2025 06:00
3                  None
dtype: object

In [11]:
## 문자열 -> datetime 타입 변환
## pd.to_datetime() 함수
## - format: strftime()로 지정하거나 각각 다르면 mixed 
# "2025-10-01 08:30",
# "2025/10/02 21:15",
# "Oct 03, 2025 06:00",
# None

## - errors: 변환 실패 시 처리 방법 [가] raise ==> NaT
tsSR = pd.to_datetime(dateSR, format='mixed', errors="coerce")      # errors="coerce" / parsing 실패는 NaT (Not a Datetime)
print(tsSR)

0   2025-10-01 08:30:00
1   2025-10-02 21:15:00
2   2025-10-03 06:00:00
3                   NaT
dtype: datetime64[ns]


In [12]:
## 1) 구성요소 뽑기
print(tsSR.dt.year[0])
print(tsSR.dt.month[0])
print(tsSR.dt.day[0])
print(tsSR.dt.weekday[0])      # 0(월) ~ 6 (일)
print(tsSR.dt.day_name()[0])   # 요일 이름
print(tsSR.dt.hour[0])         # 시

2025.0
10.0
1.0
2.0
Wednesday
8.0


In [13]:
# 2) 형식 문자열로 출력
print(tsSR.dt.strftime("%Y/%m/%d %H:%M")[0])

2025/10/01 08:30


In [14]:
## 3) 시계열 성질 (월초/월말 등) boolean
print(tsSR.dt.is_month_start)
print(tsSR.dt.is_month_end)

0     True
1    False
2    False
3    False
dtype: bool
0    False
1    False
2    False
3    False
dtype: bool
