# pandas 자료 구조
- pandas는 DataFrame 과 Series의 2가지 테이터 형태를 가진다.
    - DataFrame : 2차원 형태 (table)
    - series    : 1차원 형태
    - 하나 이상의 series가 합쳐진 형태가 DataFrame
    
- dtype 
    - data type
    - 데이터프레임.dtypes
    ```
    -----------------------
     dtype         python 
    -----------------------
     int64          int
     float64        float
     bool           bool
     object(복합형)  str
     datetime64     datetime
     category       -
     
    ```

In [57]:
import pandas as pd


df = pd.DataFrame(
    [
        range(1, 7, 2),
        range(2, 7, 2),
        range(10, 31, 10)
    ],
    
    columns=[f'co{i}' for i in range(1,4)]
)
df

Unnamed: 0,co1,co2,co3
0,1,3,5
1,2,4,6
2,10,20,30


In [58]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   co1     3 non-null      int64
 1   co2     3 non-null      int64
 2   co3     3 non-null      int64
dtypes: int64(3)
memory usage: 200.0 bytes


In [59]:
df.loc[7] = [[50,50,'오십']]

ValueError: cannot set a row with mismatched columns

## 길이 : 행 개수

In [60]:
len(df)

3

In [61]:
df.shape

(3, 3)

In [62]:
# 행 개수
df.shape[0]

3

In [63]:
# 열 개수
df.shape[1]

3

In [64]:
## 마지막 행에 추가 
df.loc[len(df)] = [50,50,'오십']
df

Unnamed: 0,co1,co2,co3
0,1,3,5
1,2,4,6
2,10,20,30
3,50,50,오십


In [65]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   co1     4 non-null      int64 
 1   co2     4 non-null      int64 
 2   co3     4 non-null      object
dtypes: int64(2), object(1)
memory usage: 128.0+ bytes


In [66]:
type(df.co3[3])

str

## co2 값을 실수로 변경
- 컬럼에 실수가 나오면 모든 값이 실수로 변경 된다.

In [68]:
df.loc[2, 'co2'] = 20.5
df

Unnamed: 0,co1,co2,co3
0,1,3.0,5
1,2,4.0,6
2,10,20.5,30
3,50,50.0,오십


In [48]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   co1     5 non-null      int64  
 1   co2     5 non-null      float64
 2   co3     5 non-null      object 
dtypes: float64(1), int64(1), object(1)
memory usage: 332.0+ bytes


# 결측치(NaN: Not a Number) (자주 보임)

## 컬럼 co1 , index[1] 값이 2인 결측치로 변경


In [98]:
import numpy as np
# df.co1[1] = np.nan
df.loc[1,'co1'] = np.nan
df

Unnamed: 0,co1,co2,co3,1
0,1.0,3.0,5,
1,,4.0,6.5,
2,10.0,20.5,30,
3,50.0,50.0,오십,
co3,,,,6.7


In [97]:
df.co3[1] = 6.5
df.loc['co3',[1]] = 6.7


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.co3[1] = 6.5


AttributeError: __delitem__

In [88]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   co1     3 non-null      object 
 1   co2     4 non-null      float64
 2   co3     4 non-null      object 
dtypes: float64(1), object(2)
memory usage: 300.0+ bytes


In [93]:
df.columns.get_loc('co1')== 'NaN'

False