# 시애틀 강수량 데이터 분석
- 데이터 로딩
- 필요하지 않은 컬럼 삭제 (STATION, STATION_NAME)
- DATE를 날짜 형식으로 수정 : 기존 DATE 컬럼의 타입 int -> str로 변경 후 날짜 형식으로 변경
- 날짜 컬럼을 인덱스로 변경
- 강수량('PRCP')이 많은 날 상위 10일 출력
- 1월 강수량만 출력
- 1년동안 강수량이 40미만으로 내린 날이 몇일인가?
- 1월에 내린 강수량의 합과 평균 강수량
- 2월에 내린 강수량의 합과 평균 강수량

In [131]:
import pandas as pd
import numpy as np

df = pd.read_csv('Seattle2014.csv')
df.head(1)

Unnamed: 0,STATION,STATION_NAME,DATE,PRCP,SNWD,SNOW,TMAX,TMIN,AWND,WDF2,WDF5,WSF2,WSF5,WT01,WT05,WT02,WT03
0,GHCND:USW00024233,SEATTLE TACOMA INTERNATIONAL AIRPORT WA US,20140101,0,0,0,72,33,12,340,310,36,40,-9999,-9999,-9999,-9999


- 필요하지 않은 컬럼 삭제 (STATION, STATION_NAME)

In [132]:
df2 = df.drop(['STATION','STATION_NAME'],axis=1)
df2.head(3)

Unnamed: 0,DATE,PRCP,SNWD,SNOW,TMAX,TMIN,AWND,WDF2,WDF5,WSF2,WSF5,WT01,WT05,WT02,WT03
0,20140101,0,0,0,72,33,12,340,310,36,40,-9999,-9999,-9999,-9999
1,20140102,41,0,0,106,61,32,190,200,94,116,-9999,-9999,-9999,-9999
2,20140103,15,0,0,89,28,26,30,50,63,72,1,-9999,-9999,-9999


- DATE를 날짜 형식으로 수정 : 기존 DATE 컬럼의 타입 int -> str로 변경 후 날짜 형식으로 변경


In [133]:
df2['DATE'] = pd.to_datetime(df['DATE'].astype('str'))
# df2['DATE'] = pd.to_datetime(df['DATE'],format='%Y%m%d')

df2.head(3)


Unnamed: 0,DATE,PRCP,SNWD,SNOW,TMAX,TMIN,AWND,WDF2,WDF5,WSF2,WSF5,WT01,WT05,WT02,WT03
0,2014-01-01,0,0,0,72,33,12,340,310,36,40,-9999,-9999,-9999,-9999
1,2014-01-02,41,0,0,106,61,32,190,200,94,116,-9999,-9999,-9999,-9999
2,2014-01-03,15,0,0,89,28,26,30,50,63,72,1,-9999,-9999,-9999


- 날짜 컬럼을 인덱스로 변경

In [136]:
df2 = df2.set_index('DATE')
df2.head()


Unnamed: 0_level_0,PRCP,SNWD,SNOW,TMAX,TMIN,AWND,WDF2,WDF5,WSF2,WSF5,WT01,WT05,WT02,WT03
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2014-01-01,0,0,0,72,33,12,340,310,36,40,-9999,-9999,-9999,-9999
2014-01-02,41,0,0,106,61,32,190,200,94,116,-9999,-9999,-9999,-9999
2014-01-03,15,0,0,89,28,26,30,50,63,72,1,-9999,-9999,-9999
2014-01-04,0,0,0,78,6,27,40,40,45,58,1,-9999,-9999,-9999
2014-01-05,0,0,0,83,-5,37,10,10,67,76,-9999,-9999,-9999,-9999


- 강수량('PRCP')이 많은 날 상위 10일 출력

In [137]:
# df.loc[:,['PRCP']].sort_values(by="PRCP", axis=0)[::-1]
df2.sort_values(by='PRCP',ascending=False)[['PRCP']].head()

Unnamed: 0_level_0,PRCP
DATE,Unnamed: 1_level_1
2014-03-05,467
2014-11-28,343
2014-05-03,333
2014-03-08,323
2014-10-22,320


- 1월 강수량만 출력

In [138]:
# jan_prcp = df2['PRCP'].iloc[:31]
# jan_prcp

df2['PRCP'][df2.index.month == 1]


DATE
2014-01-01      0
2014-01-02     41
2014-01-03     15
2014-01-04      0
2014-01-05      0
2014-01-06      3
2014-01-07    122
2014-01-08     97
2014-01-09     58
2014-01-10     43
2014-01-11    213
2014-01-12     15
2014-01-13      0
2014-01-14      0
2014-01-15      0
2014-01-16      0
2014-01-17      0
2014-01-18      0
2014-01-19      0
2014-01-20      0
2014-01-21      0
2014-01-22      5
2014-01-23      0
2014-01-24      0
2014-01-25      0
2014-01-26      0
2014-01-27      0
2014-01-28     89
2014-01-29    216
2014-01-30      0
2014-01-31     23
Name: PRCP, dtype: int64

In [142]:
np.sum(df2['PRCP'] < 40)
len(df2[df2['PRCP'] < 40])

277

- 1년동안 강수량이 40미만으로 내린 날이 몇일인가?

In [139]:
df2['PRCP'] < 40

DATE
2014-01-01     True
2014-01-02    False
2014-01-03     True
2014-01-04     True
2014-01-05     True
              ...  
2014-12-27     True
2014-12-28    False
2014-12-29     True
2014-12-30     True
2014-12-31     True
Name: PRCP, Length: 365, dtype: bool

- 1월에 내린 강수량의 합과 평균 강수량


In [140]:
df2[['PRCP']][df2.index.month == 1].agg(['sum','mean']).round(2)

Unnamed: 0,PRCP
sum,940.0
mean,30.32
