### 시계열 데이터
* 시계열 데이터
    - 시간의 흐름에 따라 데이터의 변화를 분석 및 예측하는데 사용되는 데이터
    - 타임스탬프(timestamp) : 특정 시점(시간)을 의미하는 자료형
* 기능
    - to_datetime : 데이터를 시간 자료형으로 변환시켜주는 함수
    - to_period : 날짜 데이터의 년, 월, 일 등을 가져올 수 있다.
        * 옵션
            - freq를 이용하여 각각의 날짜 정보를 얻어올 수 있다
            - freq = A(년), M(월), D(일)
            - D지정시 2020-01-01까지 나오며, M지정시 2020-01, A지정시 2020까지 나온다.
    - Datetimeindex자료형
        * dt객체를 제공하며, dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second...등을 제공한다
>
* 자세한 정보는 다음 사이트를 참조하기 바란다.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html

In [111]:
import pandas as pd
dates = ["2023-01-01", "2023-02-11", "2023-03-02", "2023-10-11"]
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,2023-01-01
1,2023-02-11
2,2023-03-02
3,2023-10-11


In [112]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    4 non-null      object
dtypes: object(1)
memory usage: 164.0+ bytes


In [113]:
df['new_date'] = pd.to_datetime(df['date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      4 non-null      object        
 1   new_date  4 non-null      datetime64[ns]
dtypes: datetime64[ns](1), object(1)
memory usage: 196.0+ bytes


In [114]:
df

Unnamed: 0,date,new_date
0,2023-01-01,2023-01-01
1,2023-02-11,2023-02-11
2,2023-03-02,2023-03-02
3,2023-10-11,2023-10-11


In [115]:
df['year'] = df['new_date'].dt.year
df['month'] = df['new_date'].dt.month
df['day'] = df['new_date'].dt.day
print(df.info())
df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      4 non-null      object        
 1   new_date  4 non-null      datetime64[ns]
 2   year      4 non-null      int32         
 3   month     4 non-null      int32         
 4   day       4 non-null      int32         
dtypes: datetime64[ns](1), int32(3), object(1)
memory usage: 244.0+ bytes
None


Unnamed: 0,date,new_date,year,month,day
0,2023-01-01,2023-01-01,2023,1,1
1,2023-02-11,2023-02-11,2023,2,11
2,2023-03-02,2023-03-02,2023,3,2
3,2023-10-11,2023-10-11,2023,10,11


### 형 변환할 때 사용

In [116]:
def test(num):
    return num*10;

In [117]:
number = 100;
type( number )

int

In [118]:
type( test(number) )

int

In [119]:
lb = lambda x : x+1000
type(lb(1000))

int

In [120]:
lb(100)

1100

### 데이터 연산 후 다시 넣기

In [121]:
dates = [1, 2, 3]
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,1
1,2
2,3


In [122]:
df['date'] = df['date'].apply( lambda x : x+10 )
df

Unnamed: 0,date
0,11
1,12
2,13


### apply기능을 쓰지 않을 시

In [123]:
count = 0;
for i in df['date']:
    print(i)
    count+=1
    df['date'][count] = test(i)

11
110
1100


In [124]:
df

Unnamed: 0,date
0,11
1,110
2,1100


In [125]:
dates = ["2023-01-01", "2023-02-11", "2023-03-02", "2023-10-11"]
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,2023-01-01
1,2023-02-11
2,2023-03-02
3,2023-10-11


In [126]:
print("변경 전 : ", type( df['date'][0] ))
datetimer = df['date'].apply( lambda x : pd.to_datetime(x) )
print("변경 후 : ", type( datetimer[0] ))
datetimer

변경 전 :  <class 'str'>
변경 후 :  <class 'pandas._libs.tslibs.timestamps.Timestamp'>


0   2023-01-01
1   2023-02-11
2   2023-03-02
3   2023-10-11
Name: date, dtype: datetime64[ns]

In [127]:
df['year'] = datetimer.apply( lambda x : x.year )
df['month'] = datetimer.apply( lambda x : x.month )
df['day'] = datetimer.apply( lambda x : x.day )
df['dayofweek'] = datetimer.apply( lambda x : x.dayofweek )
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,1,1,6
1,2023-02-11,2023,2,11,5
2,2023-03-02,2023,3,2,3
3,2023-10-11,2023,10,11,2


In [128]:
df_test = df[['date', 'year']]
df_test

Unnamed: 0,date,year
0,2023-01-01,2023
1,2023-02-11,2023
2,2023-03-02,2023
3,2023-10-11,2023


In [129]:
df_test = df_test.to_dict('list')
df_test

{'date': ['2023-01-01', '2023-02-11', '2023-03-02', '2023-10-11'],
 'year': [2023, 2023, 2023, 2023]}

In [130]:
df_test['date']

['2023-01-01', '2023-02-11', '2023-03-02', '2023-10-11']

In [131]:
df_test['date'][0]

'2023-01-01'

### length 기능

In [132]:
len(df_test['date'])

4

In [133]:
# for i in range(0, len(df_test['date']), 1 ):
for i in range( len(df_test['date']) ):
    print(i)
    print( df_test['date'][i], ":", df_test['year'][i] )

0
2023-01-01 : 2023
1
2023-02-11 : 2023
2
2023-03-02 : 2023
3
2023-10-11 : 2023


In [134]:
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,1,1,6
1,2023-02-11,2023,2,11,5
2,2023-03-02,2023,3,2,3
3,2023-10-11,2023,10,11,2


In [136]:
df['month'] = df['month'].apply( lambda x : 1234 if x > 2 else 333 )

In [137]:
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,1234,1,6
1,2023-02-11,2023,1234,11,5
2,2023-03-02,2023,1234,2,3
3,2023-10-11,2023,1234,11,2


In [None]:
df['month'].apply( lambda x : 1234 if x > 2 else 333 )

In [139]:
df['month'] == 1234

0    True
1    True
2    True
3    True
Name: month, dtype: bool

In [140]:
df['year'] == 2023

0    True
1    True
2    True
3    True
Name: year, dtype: bool

In [141]:
(df['month'] == 1234) & (df['year'] == 2023)

0    True
1    True
2    True
3    True
dtype: bool

In [142]:
df[ (df['month'] == 1234) & (df['year'] == 2023) ]

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,1234,1,6
1,2023-02-11,2023,1234,11,5
2,2023-03-02,2023,1234,2,3
3,2023-10-11,2023,1234,11,2


In [143]:
df[ (df['month'] == 1234) & (df['year'] == 2023) ]['dayofweek']

0    6
1    5
2    3
3    2
Name: dayofweek, dtype: int64

### 이렇게 변경하면 데이터 변경 안됨

In [147]:
df[ (df['month'] == 1234) & (df['year'] == 2023) ]['dayofweek']= df[ (df['month'] == 1234) & (df['year'] == 2023) ]['dayofweek'].apply( lambda x : 1111 if x > 2 else 333 )

In [149]:
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,1234,1,6
1,2023-02-11,2023,1234,11,5
2,2023-03-02,2023,1234,2,3
3,2023-10-11,2023,1234,11,2


### loc이용하여 데이터 변경

In [150]:
df.loc[ (df['month'] == 1234) & (df['year'] == 2023), 'dayofweek' ]= df[ (df['month'] == 1234) & (df['year'] == 2023) ]['dayofweek'].apply( lambda x : 1111 if x > 2 else 333 )

In [151]:
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,1234,1,1111
1,2023-02-11,2023,1234,11,1111
2,2023-03-02,2023,1234,2,1111
3,2023-10-11,2023,1234,11,333
