# 4.2 数据格式化

在数据计算中应用求均值（mean函数）后，发现结果中的小数位数增加了许多。此时需要对数据进行格式化，以增加数据可读性。如，保留小数位数，百分号、千位分隔符等。


## 4.2.1 设置小数位数

设置小数位数，主要使用DataFrame对象中的round函数，该函数可以实现四舍五入，而它的decimals参数则用于设置小数的位数，设置后数据类型不会发生变化，依然是浮点型。

语法：
**DataFrame.round(decimals=0,*args,**  ****kwargs)** 

* decimals: 每一列四舍五入的小数位数，整型、字典或Series对象。如果是整数，则将每一列四舍五入到相同的位置；否则，将dict和Series舍入到可变数目的位置。如果小数是类似于字典的，那么列名应该在键中；如果小数是级数，列名应该在索引中。没有包含在小数中的任何列都将保持原样，非输入列的小数元素将被忽略。
* args: 附加的关键字参数
* kwargs: 附加的关键字参数
* 返回值： 返回DataFrame对象。

In [5]:
import pandas as pd
import numpy  as np

df = pd.DataFrame(np.random.random([5,5]),columns=['A1','A2','A3','A4','A5'])
print(df)  
print(df.round(2)) #保留小数点后两位
print(df.round({'A1':1,'A2':2})) #A1列保留小数点后一位，A2列保留小数点的后两位



         A1        A2        A3        A4        A5
0  0.824280  0.776118  0.322127  0.779743  0.193635
1  0.703388  0.077386  0.421462  0.449983  0.332416
2  0.642077  0.321645  0.566401  0.502627  0.078501
3  0.548182  0.013079  0.476411  0.387628  0.362283
4  0.521940  0.703082  0.158230  0.085805  0.830950
     A1    A2    A3    A4    A5
0  0.82  0.78  0.32  0.78  0.19
1  0.70  0.08  0.42  0.45  0.33
2  0.64  0.32  0.57  0.50  0.08
3  0.55  0.01  0.48  0.39  0.36
4  0.52  0.70  0.16  0.09  0.83
    A1    A2        A3        A4        A5
0  0.8  0.78  0.322127  0.779743  0.193635
1  0.7  0.08  0.421462  0.449983  0.332416
2  0.6  0.32  0.566401  0.502627  0.078501
3  0.5  0.01  0.476411  0.387628  0.362283
4  0.5  0.70  0.158230  0.085805  0.830950


In [6]:
s1 = pd.Series([1,0,2],index=['A1','A2','A3'])  #设置Series对象的小数位数
print(df.round(s1))

    A1   A2    A3        A4        A5
0  0.8  1.0  0.32  0.779743  0.193635
1  0.7  0.0  0.42  0.449983  0.332416
2  0.6  0.0  0.57  0.502627  0.078501
3  0.5  0.0  0.48  0.387628  0.362283
4  0.5  1.0  0.16  0.085805  0.830950


保留小数位数也可以用自定义函数，如，将DataFrame对象中的各个浮点值保留两位小数

In [7]:
df.applymap(lambda x: '%.2f'%x)

Unnamed: 0,A1,A2,A3,A4,A5
0,0.82,0.78,0.32,0.78,0.19
1,0.7,0.08,0.42,0.45,0.33
2,0.64,0.32,0.57,0.5,0.08
3,0.55,0.01,0.48,0.39,0.36
4,0.52,0.7,0.16,0.09,0.83


注意： 经过自定义函数处理后数据将不再是浮点型而是对象型，如果后续计算则首先应该将数据类型进行转换。

##  4.2.2 设置百分比

利用字定义函数将数据进行格式处理，处理后的数据就可以从浮点型转换成带指定小数位数的百分比数据，主要使用apply函数和format函数。


In [13]:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.random([5,5]),columns=['A1','A2','A3','A4','A5'])
df['百分比'] = df['A3'].apply(lambda x:format(x,'.0%')) # 整列保留0位小数
df


Unnamed: 0,A1,A2,A3,A4,A5,百分比
0,0.954851,0.229141,0.946658,0.263225,0.644966,95%
1,0.062135,0.230992,0.581908,0.343151,0.568561,58%
2,0.836021,0.65303,0.749556,0.010234,0.881066,75%
3,0.725784,0.40912,0.166953,0.517864,0.587144,17%
4,0.464563,0.680571,0.807811,0.865758,0.887963,81%


In [14]:
df['百分比'] = df['A3'].apply(lambda x:format(x,'.2%')) # 整列保留0位小数
df


Unnamed: 0,A1,A2,A3,A4,A5,百分比
0,0.954851,0.229141,0.946658,0.263225,0.644966,94.67%
1,0.062135,0.230992,0.581908,0.343151,0.568561,58.19%
2,0.836021,0.65303,0.749556,0.010234,0.881066,74.96%
3,0.725784,0.40912,0.166953,0.517864,0.587144,16.70%
4,0.464563,0.680571,0.807811,0.865758,0.887963,80.78%


In [15]:
df['百分比'] = df['A3'].map(lambda x:'{:.0%}'.format(x)) #使用map函数整列保留0位小数
df

Unnamed: 0,A1,A2,A3,A4,A5,百分比
0,0.954851,0.229141,0.946658,0.263225,0.644966,95%
1,0.062135,0.230992,0.581908,0.343151,0.568561,58%
2,0.836021,0.65303,0.749556,0.010234,0.881066,75%
3,0.725784,0.40912,0.166953,0.517864,0.587144,17%
4,0.464563,0.680571,0.807811,0.865758,0.887963,81%


## 4.2.4 设置千位分隔符



In [18]:
import pandas as pd
data = [['零基础学python','1月',49768889],['零基础学python','2月',11777775],['零基础学python','3月',13799795]]
columns = ['图书','月份','码洋']
df = pd.DataFrame(data=data,columns=columns)
df['码洋'] = df['码洋'].apply(lambda x:format(int(x),','))
print(df)

           图书  月份          码洋
0  零基础学python  1月  49,768,889
1  零基础学python  2月  11,777,775
2  零基础学python  3月  13,799,795


注意：
设置千位分隔符后，对于程序来说，这些数据将不再是数值型，而是数字和逗号组成的字符串。如果由于程序需要再变成数值型就会很麻烦，因此设置千位分隔符要慎重。