<a href="https://colab.research.google.com/github/chaeyeongSon/pdm09/blob/master/py-pandas/pandas_1_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Python module 3. **pandas**

# Using pandas

* [10 Minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/10min.html)
* [Pandas tutorial with interactive exercises](https://www.kaggle.com/pistak/pandas-tutorial-with-interactive-exercises)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# %matplotlib inline  # work for Jupyter notebook or lab

## [1] Make data: Series, and DataFrame
> pandas의 데이터 구조
- Series
- DataFrame

### Series
> 1차원 데이터

In [None]:
# Creating a Series by passing a list of values
s = pd.Series([1,3,5,np.nan,6,8,'you']) # 리스트 형태로 1, 3, ... , you까지 s에 저장한다.
s # np.nan = NaN(Not a Number)

0      1
1      3
2      5
3    NaN
4      6
5      8
6    you
dtype: object

In [None]:
type(s) # ssms pandas의 Series 객체

pandas.core.series.Series

In [None]:
# indexing 7 slicing of series
s[0],s[:3]#,s[-1:]  # s[0] = 1

(1, 0    1
 1    3
 2    5
 dtype: object)

In [None]:
s[:3] # Series는 인덱스와 값들이 같이 나옴

0    1
1    3
2    5
dtype: object

In [None]:
s[-1:]  # s[-1]하면 오류가 발생! 무조건 ':'을 붙여줘야 함

6    you
dtype: object

### 1차원 series의 용도
- 2차원 데이터프레임의 열을 구성

In [None]:
# Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:
dates = pd.date_range('20200928', periods=6)  # date_range : 특정한 시간 범위 값들을 만들어줌 periods : 기간 
dates

DatetimeIndex(['2020-09-28', '2020-09-29', '2020-09-30', '2020-10-01',
               '2020-10-02', '2020-10-03'],
              dtype='datetime64[ns]', freq='D')

In [None]:
type(dates) # pandas의 DatetimeIndex라는 객체

pandas.core.indexes.datetimes.DatetimeIndex

In [None]:
# Make dataframe using an array with random numbers
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))  
#                 6행 4열 배열을 만들고 무작위 수를 넣음, index는 dates 변수, 첫 번째 열 : 'A', 두 번째 열 : 'B', 세 번째 열 : 'C', 네 번째 열 : 'D' 
df

Unnamed: 0,A,B,C,D
2020-09-28,1.55011,-2.162304,1.105802,-0.801202
2020-09-29,-1.007182,0.839841,-1.311886,-0.56513
2020-09-30,0.330995,-1.695883,-2.507643,0.704404
2020-10-01,0.206262,-0.023656,-0.64288,0.800684
2020-10-02,0.954752,-0.813296,-0.738221,0.296182
2020-10-03,-1.22332,0.13533,0.290725,-0.777037


In [None]:
# check types of df  --> same type
df.dtypes

A    float64
B    float64
C    float64
D    float64
dtype: object

In [None]:
type(df)

pandas.core.frame.DataFrame

### 데이터프레임 (DataFrame)
- 2차원 데이터
- 다차원 데이터

In [None]:
# Creating a DataFrame by passing a dict of objects that can be converted to series-like.
df2 = pd.DataFrame({ 'A' : 1.,  
                    'B' : pd.Timestamp('20200928'), 
                    'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                    'D' : np.array([3] * 4,dtype='int32'),
                    'E' : pd.Categorical(["test","train","test","train"]),
                    'F' : 'foo' })
# key : value - dictionary 구조
# C, D가 값이 4개가 들어있기 때문에 확장이 된다. -> broadcasting

In [None]:
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2020-09-28,1.0,3,test,foo
1,1.0,2020-09-28,1.0,3,train,foo
2,1.0,2020-09-28,1.0,3,test,foo
3,1.0,2020-09-28,1.0,3,train,foo


In [None]:
# check types of df2 --> different types
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

In [None]:
type(df2)

pandas.core.frame.DataFrame



---

