Selecting columns based on dtype
====
**基于dtype选择列**

New in version 0.14.1.

The [`select_dtypes()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.select_dtypes.html#pandas.DataFrame.select_dtypes) method implements subsetting of columns based on their `dtype`.

First, let’s create a [`DataFrame`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.html#pandas.DataFrame) with a slew of different dtypes:


[`select_dtypes()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.select_dtypes.html#pandas.DataFrame.select_dtypes) 方法基于他们的`dtype`实现了列的子集化 。

首先，用不同的dtypes创建一个[`DataFrame`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.html#pandas.DataFrame):

In [3]:
import numpy as np
import pandas as pd
    
df = pd.DataFrame({'string': list('abc'),
                   'int64': list(range(1, 4)),
                   'uint8': np.arange(3, 6).astype('u1'),
                   'float64': np.arange(4.0, 7.0),
                   'bool1': [True, False, True],
                   'bool2': [False, True, False],
                   'dates': pd.date_range('now', periods=3).values,
                   'category': pd.Series(list("ABC")).astype('category')})

In [4]:
df

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category
0,a,1,3,4.0,True,False,2018-10-13 10:43:05.309131,A
1,b,2,4,5.0,False,True,2018-10-14 10:43:05.309131,B
2,c,3,5,6.0,True,False,2018-10-15 10:43:05.309131,C


In [6]:
df['tdeltas'] = df.dates.diff()

In [7]:
df['uint64'] = np.arange(3, 6).astype('u8')

In [8]:
df['other_dates'] = pd.date_range('20130101', periods=3).values

In [9]:
df['tz_aware_dates'] = pd.date_range('20130101', periods=3, tz='US/Eastern')

In [10]:
df

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category,tdeltas,uint64,other_dates,tz_aware_dates
0,a,1,3,4.0,True,False,2018-10-13 10:43:05.309131,A,NaT,3,2013-01-01,2013-01-01 00:00:00-05:00
1,b,2,4,5.0,False,True,2018-10-14 10:43:05.309131,B,1 days,4,2013-01-02,2013-01-02 00:00:00-05:00
2,c,3,5,6.0,True,False,2018-10-15 10:43:05.309131,C,1 days,5,2013-01-03,2013-01-03 00:00:00-05:00


And the dtypes

In [11]:
df.dtypes

string                                object
int64                                  int64
uint8                                  uint8
float64                              float64
bool1                                   bool
bool2                                   bool
dates                         datetime64[ns]
category                            category
tdeltas                      timedelta64[ns]
uint64                                uint64
other_dates                   datetime64[ns]
tz_aware_dates    datetime64[ns, US/Eastern]
dtype: object

[`select_dtypes()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.select_dtypes.html#pandas.DataFrame.select_dtypes) has two parameters `include` and `exclude` that allow you to say “give me the columns WITH these dtypes” (`include`) and/or “give the columns WITHOUT these dtypes” (`exclude`).

For example, to select `bool` columns:

[`select_dtypes()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.select_dtypes.html#pandas.DataFrame.select_dtypes)有两个参数 `include`和 `exclude` 允许你说“给我具有这些 dtypes 的列” (`include`) 和/或 “给我不是这些dtypes列” (`exclude`).

例如，要选择 `bool` 列：

In [12]:
df.select_dtypes(include=[bool])

Unnamed: 0,bool1,bool2
0,True,False
1,False,True
2,True,False


You can also pass the name of a dtype in the [numpy dtype hierarchy](http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html):

也可以传递一个dtype 在[numpy dtype hierarchy](http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html)中的名称:

In [13]:
df.select_dtypes(include=['bool'])

Unnamed: 0,bool1,bool2
0,True,False
1,False,True
2,True,False


[`select_dtypes()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.select_dtypes.html#pandas.DataFrame.select_dtypes) also works with generic dtypes as well.

For example, to select all numeric and boolean columns while excluding unsigned integers

[`select_dtypes()`](http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.select_dtypes.html#pandas.DataFrame.select_dtypes)使用通用dtypes也能很好工作。

例如，选择所有数字和布尔列，同时排除无符号整数

In [14]:
df.select_dtypes(include=['number', 'bool'], exclude=['unsignedinteger'])

Unnamed: 0,int64,float64,bool1,bool2,tdeltas
0,1,4.0,True,False,NaT
1,2,5.0,False,True,1 days
2,3,6.0,True,False,1 days


To select string columns you must use the object dtype:

要使用字符串列则必须使用object类型：

In [15]:
df.select_dtypes(include=['object'])

Unnamed: 0,string
0,a
1,b
2,c


To see all the child dtypes of a generic `dtype` like `numpy.number` you can define a function that returns a tree of child dtypes:

要查看泛型`dtype`的所有子dtypes，如`numpy.number`，您可以定义一个返回子dtypes树的函数：

In [16]:
def subdtypes(dtype):
    subs = dtype.__subclasses__()
    if not subs:
        return dtype
    return [dtype, [subdtypes(dt) for dt in subs]]

All numpy dtypes are subclasses of numpy.generic:

所有numpy dtypes都是numpy.generic的子类：

In [17]:
subdtypes(np.generic)

[numpy.generic,
 [[numpy.number,
   [[numpy.integer,
     [[numpy.signedinteger,
       [numpy.int8,
        numpy.int16,
        numpy.int32,
        numpy.int32,
        numpy.int64,
        numpy.timedelta64]],
      [numpy.unsignedinteger,
       [numpy.uint8,
        numpy.uint16,
        numpy.uint32,
        numpy.uint32,
        numpy.uint64]]]],
    [numpy.inexact,
     [[numpy.floating,
       [numpy.float16, numpy.float32, numpy.float64, numpy.float64]],
      [numpy.complexfloating,
       [numpy.complex64, numpy.complex128, numpy.complex128]]]]]],
  [numpy.flexible,
   [[numpy.character, [numpy.bytes_, numpy.str_]],
    [numpy.void, [numpy.record]]]],
  numpy.bool_,
  numpy.datetime64,
  numpy.object_]]

**Note**

Pandas also defines the types `category`, and `datetime64[ns, tz]`, which are not integrated into the normal numpy hierarchy and wont show up with the above function.

**Note**

The `include` and `exclude` parameters must be non-string sequences.

**注意**

Pandas还定义了类型`category`和`datetime64 [ns，tz]`，它们没有集成到普通的numpy层次结构中，并且使用上面的函数不会显示。

**注意**

`include`和`exclude`参数必须是非字符串序列。