<a href="https://colab.research.google.com/github/Redwoods/Py/blob/master/pdm2020/my-note/py-pandas/pandas_3_harnessing_df.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Python module 3. **pandas**

# Using pandas

* [10 Minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/10min.html)
* [Pandas tutorial with interactive exercises](https://www.kaggle.com/pistak/pandas-tutorial-with-interactive-exercises)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# %matplotlib inline  # work for Jupyter notebook or lab

In [None]:
# Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:
dates = pd.date_range('20210927', periods=6)
dates

In [None]:
# dataframe
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df



---



### 데이터 재구성(setting) 또는 확장

In [None]:
# Setting a new column automatically aligns the data by the indexes.
s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20210927', periods=6))
s1

In [None]:
df['F'] = s1

In [None]:
df  # 기존 df의 구조에 맞춰서 확장, 재구성됨.

#### Setting data by label & its index

> **at, iat**

- at : label을 이용하여 값 지정
- iat: index를 이용하여 값 지정

In [None]:
# Setting values by label:
df.at[dates[0],'A'] = 0
df

In [None]:
# Setting values by position (index):
df.iat[0,1] = 0
df

In [None]:
# Important properties of DataFrame
len(df), df.shape, df.size

In [None]:
# Setting by assigning with a NumPy array:
df.loc[:,'D'] = np.array([5] * len(df))
df

### Missing data 처리
> pandas primarily uses the value **np.nan** to represent missing data. 
- dropna()
- fillna()
- isna()

In [None]:
df

In [None]:
df.columns

In [None]:
# Reindexing allows you to change/add/delete the index on a specified axis.
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
df1

In [None]:
df1.loc[dates[0]:dates[1],'E'] = 1
df1

In [None]:
df1.info()

### Check 0 or NaN in dataframe
- df.isnull().sum()
- df.isna().sum()

In [None]:
df1.isnull().sum()

In [None]:
df1.isna().sum()

## Drop missing data from DataFrame

In [None]:
# To drop any rows that have missing data.
df1.dropna(how='any')

In [None]:
# Filling missing data.
df1.fillna(value=5)

In [None]:
# Get the boolean mask where values are nan.
pd.isna(df1)



---



### 데이터 통계 (Statistics)

In [None]:
df

In [None]:
df.mean()

In [None]:
df.mean(0)

In [None]:
df.mean(1)

In [None]:
df.std(0) #, df.std(1)

#### [DIY: 도전] 데이터프레임 df의 평균(mean(0))과 표준편차를 이용한 그래프
- 평균에 대한 꺽은선그래프
- 평균과 표준편차를 이용한 막대그래프
> x-축은 A,B,C,D,E

In [None]:
import numpy as np
import matplotlib.pyplot as plt
# 노트북 셀 내에 그림 출력 (Jupyter notebook or lab)
# %matplotlib inline

In [None]:
# 막대그래프(bar graph)를 그린다.
means = [1, 2, 3]
stddevs = [0.2, 0.4, 0.5]
bar_labels = ['bar 1', 'bar 2', 'bar 3']

# plot bars
x_pos = list(range(len(bar_labels)))
plt.bar(x_pos, means, yerr=stddevs)

plt.show()

In [None]:
plt.plot(df.mean(0), '-o', ms=8)

In [None]:
# df의 평균(mean(0))과 표준편차를 이용한 막대그래프
bar_labels = df.columns
# plot bars
plt.bar(bar_labels, df.mean(0), yerr=df.std(0)) #, color='rgbcy')

In [None]:
df

In [None]:
# [DIY] df의 날짜별 평균(mean(1))과 표준편차를 이용한 막대그래프
# Your code
# plt.plot(df.mean(1), '-o', ms=8)
# plt.bar(bar_labels,df.mean(1), yerr=df.std(1)) #, color='rgbcy')

---