**필수 라이브러리**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib

**matplotlib 한글 설정**

- 운영체제에 따른 한글 지원 설정. 윈도우, 우분투, 구글 코랩 지원.
- 참고: [matplotlib에서 한글 지원하기](https://github.com/codingalzi/datapy/blob/master/matplotlib-korean.md)

In [7]:
import platform

if platform.system() == 'Windows': # 윈도우
    from matplotlib import font_manager, rc
    font_path = "C:/Windows/Fonts/NGULIM.TTF"
    font = font_manager.FontProperties(fname=font_path).get_name()
    rc('font', family=font)
elif platform.system() == 'Linux': # 우분투 또는 구글 코랩
    # please run the following commented out codes just once
#     if 'google.colab' in str(get_ipython()):
#         !apt-get install -y fonts-nanum*
#     else:
#         !sudo apt-get install -y fonts-nanum*
#     !fc-cache -fv
    
    applyfont = "NanumBarunGothic"
    import matplotlib.font_manager as fm
    if not any(map(lambda ft: ft.name == applyfont, fm.fontManager.ttflist)):
        fm.fontManager.addfont("/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf")
    plt.rc("font", family=applyfont)
    plt.rc("axes", unicode_minus=False)
    

**데이터**

데이터는 연도별로 정리되어 있음.

In [3]:
base_url = "https://github.com/codingalzi/water-data/raw/master/reservoirs/"

**승촌보-광산 (엑셀) 자료를 데이터프레임으로 불러오기**

- `header=0`: 0번 행을 header로 지정, 즉 열 인덱스로 사용.
- `sheet_name=None`: 모든 워크시트 가져오기. 워크시트별로 하나의 df 생성. 반환값은 사전.
- `na_values=0`: 0으로 입력된 값도 결측치로 처리
- `index_col=1`: 측정일을 행 인덱스로 사용
- `parse_dates=True`: 행 인덱스로 사용되는 날짜 대상 파싱 실행

주의: 아래 모듈을 먼저 설치해야 할 수도 있다.

```python
!pip install openpyxl
```

In [25]:
scb_gw = pd.read_excel(base_url+"Seungchonbo-Gwangsan.xlsx",
                            header=0, 
                            na_values=0,
                            index_col=1, 
                            parse_dates=True)

In [32]:
scb_gw = scb_gw.iloc[:, 2:].copy()

In [34]:
scb_gw

Unnamed: 0_level_0,수온(℃),DO(㎎/L),BOD(㎎/L),COD(㎎/L),클로로필 a(㎎/㎥),TN(㎎/L),TP(㎎/L),TOC(㎎/L),수소이온농도,페놀류(㎎/L),전기전도도(μS/㎝),총대장균군수(총대장균군수/100ml),용존총질소(㎎/L),암모니아성 질소(㎎/L),질산성 질소(㎎/L),용존총인(㎎/L),인산염인(㎎/L),SS(㎎/L),분원성대장균군수,유량(㎥/s)
년/월/일,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2021-02-01,7.7,12.1,4.8,8.7,63.5,9.422,0.150,5.3,7.2,0,484,5600,8.830,5.695,3.112,0.032,0.009,12.5,800,31.740
2021-02-08,7.4,11.8,4.5,8.7,51.7,8.182,0.103,5.3,7.4,0,463,4800,8.000,5.075,2.850,0.033,0.008,11.9,210,13.071
2021-02-15,10.4,14.1,5.1,10.7,91.8,11.109,0.133,6.1,7.5,0,474,1600,9.787,7.131,2.632,0.038,0.006,10.4,130,12.286
2021-02-22,9.6,12.2,4.4,9.0,52.7,9.629,0.127,5.6,7.2,0,584,2200,9.603,5.933,3.347,0.032,0.007,12.1,250,11.338
2021-03-02,10.7,8.5,3.9,9.3,32.8,4.568,0.288,7.9,7.1,0,278,82000,4.121,1.266,1.937,0.112,0.095,39.0,7200,94.936
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2006-08-10,28.8,5.7,4.8,6.8,0.4,12.864,0.301,,7.9,0,226,19700,10.260,0.018,0.035,0.251,0.197,13.2,3000,
2006-09-14,22.0,7.6,6.2,4.0,0.3,6.972,0.249,,7.9,0,220,29000,5.916,0.013,5.286,0.211,0.208,9.2,5400,
2006-10-13,20.0,8.4,4.0,4.0,0.3,12.734,0.488,,8.2,0,185,29000,11.892,0.355,8.363,0.457,0.420,9.2,5400,
2006-11-09,18.0,8.2,5.9,6.6,2.0,13.626,0.892,,7.5,0,390,500,13.553,0.014,9.374,0.595,0.430,0.4,1600,


In [33]:
scb_gw.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 771 entries, 2021-02-01 to 2006-12-12
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   수온(℃)                 771 non-null    float64
 1   DO(㎎/L)               771 non-null    float64
 2   BOD(㎎/L)              771 non-null    float64
 3   COD(㎎/L)              771 non-null    float64
 4   클로로필 a(㎎/㎥)           771 non-null    float64
 5   TN(㎎/L)               771 non-null    float64
 6   TP(㎎/L)               771 non-null    float64
 7   TOC(㎎/L)              691 non-null    float64
 8   수소이온농도                771 non-null    float64
 9   페놀류(㎎/L)              771 non-null    int64  
 10  전기전도도(μS/㎝)           771 non-null    int64  
 11  총대장균군수(총대장균군수/100ml)  771 non-null    int64  
 12  용존총질소(㎎/L)            771 non-null    float64
 13  암모니아성 질소(㎎/L)         771 non-null    float64
 14  질산성 질소(㎎/L)           770 non-null    float64
 15  용존총인

In [35]:
scb_gw.columns

Index(['수온(℃)', 'DO(㎎/L)', 'BOD(㎎/L)', 'COD(㎎/L)', '클로로필 a(㎎/㎥)', 'TN(㎎/L)',
       'TP(㎎/L)', 'TOC(㎎/L)', '수소이온농도', '페놀류(㎎/L)', '전기전도도(μS/㎝)',
       '총대장균군수(총대장균군수/100ml)', '용존총질소(㎎/L)', '암모니아성 질소(㎎/L)', '질산성 질소(㎎/L)',
       '용존총인(㎎/L)', '인산염인(㎎/L)', 'SS(㎎/L)', '분원성대장균군수', '유량(㎥/s)'],
      dtype='object')

유량 데이터에 결측치 있음.

In [42]:
mask = scb_gw['유량(㎥/s)'].isna()

결측치 수: 233개

In [44]:
mask.sum()

233

In [43]:
scb_gw[mask]

Unnamed: 0_level_0,수온(℃),DO(㎎/L),BOD(㎎/L),COD(㎎/L),클로로필 a(㎎/㎥),TN(㎎/L),TP(㎎/L),TOC(㎎/L),수소이온농도,페놀류(㎎/L),전기전도도(μS/㎝),총대장균군수(총대장균군수/100ml),용존총질소(㎎/L),암모니아성 질소(㎎/L),질산성 질소(㎎/L),용존총인(㎎/L),인산염인(㎎/L),SS(㎎/L),분원성대장균군수,유량(㎥/s)
년/월/일,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2020-02-18,7.7,10.2,6.8,12.9,45.3,6.728,0.270,7.4,7.4,0,365,100000,6.250,3.502,2.273,0.147,0.110,15.2,12000,
2019-09-03,24.3,7.7,3.8,6.4,40.7,3.690,0.078,4.9,7.3,0,280,7900,3.456,1.444,1.526,0.036,0.035,8.7,5700,
2011-01-11,4.0,12.7,5.9,8.0,25.7,9.022,0.524,6.3,7.8,0,514,5292,7.728,5.774,1.863,0.406,0.389,23.1,155,
2011-08-11,24.0,7.9,3.0,4.8,3.1,2.395,0.134,3.6,6.9,0,99,36000,1.996,0.138,1.337,0.092,0.060,22.5,9500,
2011-09-06,28.0,9.5,5.4,10.0,102.3,3.363,0.233,6.0,8.0,0,294,983,3.043,0.173,1.043,0.135,0.122,11.8,181,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2006-08-10,28.8,5.7,4.8,6.8,0.4,12.864,0.301,,7.9,0,226,19700,10.260,0.018,0.035,0.251,0.197,13.2,3000,
2006-09-14,22.0,7.6,6.2,4.0,0.3,6.972,0.249,,7.9,0,220,29000,5.916,0.013,5.286,0.211,0.208,9.2,5400,
2006-10-13,20.0,8.4,4.0,4.0,0.3,12.734,0.488,,8.2,0,185,29000,11.892,0.355,8.363,0.457,0.420,9.2,5400,
2006-11-09,18.0,8.2,5.9,6.6,2.0,13.626,0.892,,7.5,0,390,500,13.553,0.014,9.374,0.595,0.430,0.4,1600,
