# 승촌보 광산 주별 데이터

- 승촌보 광산 주별 데이터를 대략적으로 살펴본다.

**필수 라이브러리**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib

**matplotlib 한글 설정**

- 운영체제에 따른 한글 지원 설정. 윈도우, 우분투, 구글 코랩 지원.
- 참고: [matplotlib에서 한글 지원하기](https://github.com/codingalzi/datapy/blob/master/matplotlib-korean.md)

In [2]:
import platform
plt.rc('figure', figsize=(10, 6))  # 그림 크기 설정
if platform.system() == 'Windows': # 윈도우
    from matplotlib import font_manager, rc
    font_path = "C:/Windows/Fonts/NGULIM.TTF"
    font = font_manager.FontProperties(fname=font_path).get_name()
    rc('font', family=font)
elif platform.system() == 'Linux': # 우분투 또는 구글 코랩
    # !sudo apt-get install -y fonts-nanum*
    # !fc-cache -fv
    
    applyfont = "NanumBarunGothic"
    import matplotlib.font_manager as fm
    if not any(map(lambda ft: ft.name == applyfont, fm.fontManager.ttflist)):
        fm.fontManager.addfont("/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf")
    plt.rc("font", family=applyfont)
    plt.rc("axes", unicode_minus=False)

**데이터**

데이터는 연도별로 정리되어 있음.

In [3]:
base_url = "https://github.com/codingalzi/water-data/raw/master/reservoirs/"

**승촌보-광산 (엑셀) 자료를 데이터프레임으로 불러오기**

- `header=0`: 0번 행을 header로 지정, 즉 열 인덱스로 사용.
- `sheet_name=None`: 모든 워크시트 가져오기. 워크시트별로 하나의 df 생성. 반환값은 사전.
- `na_values=0`: 0으로 입력된 값도 결측치로 처리
- `index_col=1`: 측정일을 행 인덱스로 사용
- `parse_dates=True`: 행 인덱스로 사용되는 날짜 대상 파싱 실행

주의: 아래 모듈을 먼저 설치해야 할 수도 있다.

```python
!pip install openpyxl
```

데이터 불러오기

In [4]:
scb_gw = pd.read_excel(base_url+"Seungchonbo_Gwangsan.xlsx",
                            header=0, 
                            na_values=0,
                            index_col=1, 
                            parse_dates=True)

In [5]:
scb_gw.head()

Unnamed: 0_level_0,측정소명,회차,수온(℃),DO(㎎/L),BOD(㎎/L),COD(㎎/L),클로로필 a(㎎/㎥),TN(㎎/L),TP(㎎/L),TOC(㎎/L),...,전기전도도(μS/㎝),총대장균군수(총대장균군수/100ml),용존총질소(㎎/L),암모니아성 질소(㎎/L),질산성 질소(㎎/L),용존총인(㎎/L),인산염인(㎎/L),SS(㎎/L),분원성대장균군수,유량(㎥/s)
년/월/일,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-01-04,광산,1회차,5.9,12.7,5.8,8.5,66.9,9.644,0.272,7.2,...,631,3400,9.393,5.148,3.405,0.097,0.074,10.7,820,10.306
2021-01-14,광산,2회차,5.3,13.5,4.7,8.6,47.4,10.516,0.12,5.1,...,508,4800,10.511,6.2,3.35,0.027,0.007,10.9,190,13.2
2021-01-19,광산,3회차,5.2,12.4,3.9,7.7,49.4,10.447,0.218,4.0,...,478,3400,9.364,6.558,2.79,0.065,0.063,8.3,660,12.458
2021-01-25,광산,4회차,8.7,9.0,4.6,8.4,51.4,8.09,0.305,5.6,...,500,21000,8.066,5.866,2.183,0.215,0.162,9.7,1200,12.248
2021-02-01,광산,1회차,7.7,12.1,4.8,8.7,63.5,9.422,0.15,5.3,...,484,5600,8.83,5.695,3.112,0.032,0.009,12.5,800,31.74


In [6]:
scb_gw.columns

Index(['측정소명', '회차', '수온(℃)', 'DO(㎎/L)', 'BOD(㎎/L)', 'COD(㎎/L)', '클로로필 a(㎎/㎥)',
       'TN(㎎/L)', 'TP(㎎/L)', 'TOC(㎎/L)', '수소이온농도', '페놀류(㎎/L)', '전기전도도(μS/㎝)',
       '총대장균군수(총대장균군수/100ml)', '용존총질소(㎎/L)', '암모니아성 질소(㎎/L)', '질산성 질소(㎎/L)',
       '용존총인(㎎/L)', '인산염인(㎎/L)', 'SS(㎎/L)', '분원성대장균군수', '유량(㎥/s)'],
      dtype='object')

측정소와 회차 삭제

In [7]:
scb_gw = scb_gw.iloc[:, 2:].copy()

특성

In [8]:
scb_gw.columns

Index(['수온(℃)', 'DO(㎎/L)', 'BOD(㎎/L)', 'COD(㎎/L)', '클로로필 a(㎎/㎥)', 'TN(㎎/L)',
       'TP(㎎/L)', 'TOC(㎎/L)', '수소이온농도', '페놀류(㎎/L)', '전기전도도(μS/㎝)',
       '총대장균군수(총대장균군수/100ml)', '용존총질소(㎎/L)', '암모니아성 질소(㎎/L)', '질산성 질소(㎎/L)',
       '용존총인(㎎/L)', '인산염인(㎎/L)', 'SS(㎎/L)', '분원성대장균군수', '유량(㎥/s)'],
      dtype='object')

페놀류는 모두 0으로 되어 있어서 삭제 필요

In [9]:
scb_gw = scb_gw.drop(['페놀류(㎎/L)'], axis=1).copy()

In [10]:
scb_gw.head()

Unnamed: 0_level_0,수온(℃),DO(㎎/L),BOD(㎎/L),COD(㎎/L),클로로필 a(㎎/㎥),TN(㎎/L),TP(㎎/L),TOC(㎎/L),수소이온농도,전기전도도(μS/㎝),총대장균군수(총대장균군수/100ml),용존총질소(㎎/L),암모니아성 질소(㎎/L),질산성 질소(㎎/L),용존총인(㎎/L),인산염인(㎎/L),SS(㎎/L),분원성대장균군수,유량(㎥/s)
년/월/일,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2021-01-04,5.9,12.7,5.8,8.5,66.9,9.644,0.272,7.2,7.0,631,3400,9.393,5.148,3.405,0.097,0.074,10.7,820,10.306
2021-01-14,5.3,13.5,4.7,8.6,47.4,10.516,0.12,5.1,7.4,508,4800,10.511,6.2,3.35,0.027,0.007,10.9,190,13.2
2021-01-19,5.2,12.4,3.9,7.7,49.4,10.447,0.218,4.0,7.2,478,3400,9.364,6.558,2.79,0.065,0.063,8.3,660,12.458
2021-01-25,8.7,9.0,4.6,8.4,51.4,8.09,0.305,5.6,7.1,500,21000,8.066,5.866,2.183,0.215,0.162,9.7,1200,12.248
2021-02-01,7.7,12.1,4.8,8.7,63.5,9.422,0.15,5.3,7.2,484,5600,8.83,5.695,3.112,0.032,0.009,12.5,800,31.74


측정일 기준으로 오름차순으로 정렬

In [11]:
scb_gw.sort_index(axis=0, inplace=True)
scb_gw.head()

Unnamed: 0_level_0,수온(℃),DO(㎎/L),BOD(㎎/L),COD(㎎/L),클로로필 a(㎎/㎥),TN(㎎/L),TP(㎎/L),TOC(㎎/L),수소이온농도,전기전도도(μS/㎝),총대장균군수(총대장균군수/100ml),용존총질소(㎎/L),암모니아성 질소(㎎/L),질산성 질소(㎎/L),용존총인(㎎/L),인산염인(㎎/L),SS(㎎/L),분원성대장균군수,유량(㎥/s)
년/월/일,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2005-12-28,4.2,12.6,6.9,4.0,4.2,14.244,0.856,,6.9,884,300,12.672,0.035,11.53,0.67,0.658,16.0,100,
2006-01-25,6.9,13.5,7.4,4.1,17.3,15.108,0.558,,7.6,3,280,11.304,0.065,5.338,0.496,0.492,24.8,700,
2006-02-24,8.3,11.0,5.9,2.8,2.0,11.88,0.473,,7.2,223,280,11.34,0.07,7.301,0.412,0.389,16.8,700,
2006-03-24,13.6,5.8,9.2,5.7,18.3,14.88,1.002,,8.1,417,180,12.95,0.024,11.574,0.916,0.684,9.2,100,
2006-04-28,16.8,9.5,11.7,6.0,17.7,12.07,1.85,,8.1,539,150,10.699,0.033,11.397,1.727,1.054,34.4,9,


In [12]:
scb_gw.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 775 entries, 2005-12-28 to 2021-12-20
Data columns (total 19 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   수온(℃)                 775 non-null    float64
 1   DO(㎎/L)               775 non-null    float64
 2   BOD(㎎/L)              775 non-null    float64
 3   COD(㎎/L)              775 non-null    float64
 4   클로로필 a(㎎/㎥)           775 non-null    float64
 5   TN(㎎/L)               775 non-null    float64
 6   TP(㎎/L)               775 non-null    float64
 7   TOC(㎎/L)              695 non-null    float64
 8   수소이온농도                775 non-null    float64
 9   전기전도도(μS/㎝)           775 non-null    int64  
 10  총대장균군수(총대장균군수/100ml)  775 non-null    int64  
 11  용존총질소(㎎/L)            775 non-null    float64
 12  암모니아성 질소(㎎/L)         775 non-null    float64
 13  질산성 질소(㎎/L)           774 non-null    float64
 14  용존총인(㎎/L)             775 non-null    float64
 15  인산염인

유량 특성에 다량의 결측치 포함됨.

In [13]:
scb_gw.isna().sum()

수온(℃)                     0
DO(㎎/L)                   0
BOD(㎎/L)                  0
COD(㎎/L)                  0
클로로필 a(㎎/㎥)               0
TN(㎎/L)                   0
TP(㎎/L)                   0
TOC(㎎/L)                 80
수소이온농도                    0
전기전도도(μS/㎝)               0
총대장균군수(총대장균군수/100ml)      0
용존총질소(㎎/L)                0
암모니아성 질소(㎎/L)             0
질산성 질소(㎎/L)               1
용존총인(㎎/L)                 0
인산염인(㎎/L)                 7
SS(㎎/L)                   0
분원성대장균군수                  0
유량(㎥/s)                 233
dtype: int64