### 시계열 데이터
* 시계열 데이터
    - 시간의 흐름에 따라 데이터의 변화를 분석 및 예측하는데 사용되는 데이터
    - 타임스탬프(timestamp) : 특정 시점(시간)을 의미하는 자료형
* 기능
    - to_datetime : 데이터를 시간 자료형으로 변환시켜주는 함수
    - to_period : 날짜 데이터의 년, 월, 일 등을 가져올 수 있다.
        * 옵션
            - freq를 이용하여 각각의 날짜 정보를 얻어올 수 있다
            - freq = A(년), M(월), D(일)
            - D지정시 2020-01-01까지 나오며, M지정시 2020-01, A지정시 2020까지 나온다.
    - Datetimeindex자료형
        * dt객체를 제공하며, dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second...등을 제공한다
>
* 자세한 정보는 다음 사이트를 참조하기 바란다.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html

In [3]:
import pandas as pd
dates=["2023-01-01", "2023-03-02","1999-08-27"]
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,2023-01-01
1,2023-03-02
2,1999-08-27


In [4]:
df.info() #컬럼, 널값, 타입 등을 한번에 알려준다.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    3 non-null      object
dtypes: object(1)
memory usage: 156.0+ bytes


In [5]:
df['new_date']= pd.to_datetime(df['date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      3 non-null      object        
 1   new_date  3 non-null      datetime64[ns]
dtypes: datetime64[ns](1), object(1)
memory usage: 180.0+ bytes


In [6]:
df

Unnamed: 0,date,new_date
0,2023-01-01,2023-01-01
1,2023-03-02,2023-03-02
2,1999-08-27,1999-08-27


In [8]:
df['year'] = df['new_date'].dt.year
df['month'] = df['new_date'].dt.month
df['day'] = df['new_date'].dt.day
print(df.info())
df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      3 non-null      object        
 1   new_date  3 non-null      datetime64[ns]
 2   year      3 non-null      int32         
 3   month     3 non-null      int32         
 4   day       3 non-null      int32         
dtypes: datetime64[ns](1), int32(3), object(1)
memory usage: 216.0+ bytes
None


Unnamed: 0,date,new_date,year,month,day
0,2023-01-01,2023-01-01,2023,1,1
1,2023-03-02,2023-03-02,2023,3,2
2,1999-08-27,1999-08-27,1999,8,27


In [None]:
## 형 변환(람다 함수)

In [10]:
def test(num):
    return str(num);

In [11]:
number=100;
type(number)

int

In [12]:
type(test(number))

str

In [13]:
lb = lambda x : str(x)

In [14]:
type(lb(1000))

str

In [15]:
lb = lambda x : x+1000
type(lb(1000))

int

In [16]:
lb(1000)

2000

In [17]:
dates=[1,2,3]
df =pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,1
1,2
2,3


In [19]:
df['date'] = df['date'].apply(lambda x : x+10)
df

Unnamed: 0,date
0,11
1,12
2,13


In [27]:
count = 0;
for i in df['date']:
    print(i)
    count += 1;
    df['date'][count]=test(i) ########

11
11
11


In [28]:
df

Unnamed: 0,date
0,11
1,11
2,11


In [29]:
dates=["2023-01-01", "2023-03-02","1999-08-27"]
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,2023-01-01
1,2023-03-02
2,1999-08-27


In [31]:
print("변경 전 : ",type(df['date'][0]))
datetimer = df['date'].apply(lambda x : pd.to_datetime(x))
print("변경 후 : ",type(datetimer[0]))
datetimer

변경 전 :  <class 'str'>
변경 후 :  <class 'pandas._libs.tslibs.timestamps.Timestamp'>


0   2023-01-01
1   2023-03-02
2   1999-08-27
Name: date, dtype: datetime64[ns]

In [34]:
df['year']=datetimer.apply(lambda x : x.year)
df['month']=datetimer.apply(lambda x : x.month)
df['day']=datetimer.apply(lambda x : x.day)
df['dayofweek']=datetimer.apply(lambda x : x.dayofweek)
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,1,1,6
1,2023-03-02,2023,3,2,3
2,1999-08-27,1999,8,27,4


In [38]:
df_test=df[['date', 'year']]
df_test

Unnamed: 0,date,year
0,2023-01-01,2023
1,2023-03-02,2023
2,1999-08-27,1999


In [39]:
df_test = df_test.to_dict('list')
df_test

{'date': ['2023-01-01', '2023-03-02', '1999-08-27'],
 'year': [2023, 2023, 1999]}

In [40]:
df_test['date']

['2023-01-01', '2023-03-02', '1999-08-27']

In [41]:
df_test['date'][0]

'2023-01-01'

In [44]:
len(df_test['date'])

3

In [45]:
for i in range( len( df_test['date'])): #i=0, i<len , i++
    print(i)
    print( df_test['date'][i], ":", df_test['year'][i])

0
2023-01-01 : 2023
1
2023-03-02 : 2023
2
1999-08-27 : 1999


In [46]:
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,1,1,6
1,2023-03-02,2023,3,2,3
2,1999-08-27,1999,8,27,4


In [47]:
df['month']= df['month'].apply(lambda x : 1234 if x>2 else 333)

In [48]:
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,333,1,6
1,2023-03-02,2023,1234,2,3
2,1999-08-27,1999,1234,27,4


In [49]:
df['month'].apply(lambda x : 1234 if x>2 else 333)

0    1234
1    1234
2    1234
Name: month, dtype: int64

In [50]:
df['year']==2023

0     True
1     True
2    False
Name: year, dtype: bool

In [51]:
(df['month']== 1234)& (df['year']==2023)

0    False
1     True
2    False
dtype: bool

In [52]:
df[(df['month'] == 1234) & (df['year']==2023)]

Unnamed: 0,date,year,month,day,dayofweek
1,2023-03-02,2023,1234,2,3


In [53]:
df[(df['month'] == 1234) & (df['year']==2023)]['dayofweek']

1    3
Name: dayofweek, dtype: int64

In [55]:
df[(df['month'] == 1234) & (df['year']==2023)]['dayofweek'].apply(lambda x : 1111 if x >2 else 333)

1    1111
Name: dayofweek, dtype: int64

In [56]:
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,333,1,6
1,2023-03-02,2023,1234,2,3
2,1999-08-27,1999,1234,27,4


In [58]:
df.loc[(df['month'] == 1234) & (df['year']==2023),'dayofweek']=\
df[(df['month'] == 1234) & (df['year']==2023)]['dayofweek'].apply(lambda x : 1111 if x >2 else 333)
df

Unnamed: 0,date,year,month,day,dayofweek
0,2023-01-01,2023,333,1,6
1,2023-03-02,2023,1234,2,1111
2,1999-08-27,1999,1234,27,4
