# Summary Functions and Maps

In [1]:
import pandas as pd
import numpy as np

In [2]:
data = pd.read_csv('TempFolder/train.csv')

In [3]:
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


### 数据描述 

pandas 的 describe 数据不同会出现不同的结果

In [4]:
data.Ticket.describe()

count        891
unique       681
top       347082
freq           7
Name: Ticket, dtype: object

In [5]:
data.Age.describe()

count    714.000000
mean      29.699118
std       14.526497
min        0.420000
25%       20.125000
50%       28.000000
75%       38.000000
max       80.000000
Name: Age, dtype: float64

In [6]:
print(data.Age.mean())
print(data.Age.median())

29.69911764705882
28.0


In [7]:
print(data.Embarked.unique())
print(data.Embarked.value_counts())

['S' 'C' 'Q' nan]
S    644
C    168
Q     77
Name: Embarked, dtype: int64


### 数据操作

In [8]:
data2 = pd.read_csv('TempFolder/data2.csv')
data2

Unnamed: 0,Name,Score
0,猫猫虫,60.0
1,咖波,70.0
2,布偶猫,100.0
3,熊猫,30.0
4,狗,
5,,1.0


map 的对象是 Series 或者是 DataFrame 的一列，可以传入表达式，函数等等

注意，lambda 是定义一个[匿名函数](https://blog.csdn.net/qq_40089648/article/details/89022804)

In [9]:
print(data2.Score.map(lambda x : x - 1),end='\n\n')
data2.Score = data2.Score.map(lambda x: x-1)
print(data2)

0    59.0
1    69.0
2    99.0
3    29.0
4     NaN
5     0.0
Name: Score, dtype: float64

  Name  Score
0  猫猫虫   59.0
1   咖波   69.0
2  布偶猫   99.0
3   熊猫   29.0
4    狗    NaN
5  NaN    0.0


处理 NaN 可以设置 na_action 为 None 或者 'ignore'

In [10]:
data2.Name.map('I like {}'.format,na_action='ignore')

0    I like 猫猫虫
1     I like 咖波
2    I like 布偶猫
3     I like 熊猫
4      I like 狗
5           NaN
Name: Name, dtype: object

In [11]:
def Function(x):
    return x + 1

data2.Score.map(Function)

0     60.0
1     70.0
2    100.0
3     30.0
4      NaN
5      1.0
Name: Score, dtype: float64

apply 的对象是 DataFrame

In [12]:
data3 = pd.DataFrame({'Score1':[1,2,3],'Score2':[233,666,520]})
data3.apply(Function)

Unnamed: 0,Score1,Score2
0,2,234
1,3,667
2,4,521


也可以选定轴

In [13]:
data3.apply(np.sum,axis='index')

Score1       6
Score2    1419
dtype: int64

In [14]:
data3.apply(np.sum,axis='columns')

0    234
1    668
2    523
dtype: int64

除了 `apply` 和 `map` 之外，还可以直接对列进行操作，这样会更快，但是不如 `apply` 和 `map` 灵活

In [15]:
data3

Unnamed: 0,Score1,Score2
0,1,233
1,2,666
2,3,520


In [16]:
data3.Score2 * 10

0    2330
1    6660
2    5200
Name: Score2, dtype: int64