# 结构化数组

<h1>Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#创建自定义数组" data-toc-modified-id="创建自定义数组-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>创建自定义数组</a></span></li><li><span><a href="#从文本读取自定义数组" data-toc-modified-id="从文本读取自定义数组-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>从文本读取自定义数组</a></span></li><li><span><a href="#嵌套结构" data-toc-modified-id="嵌套结构-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>嵌套结构</a></span></li></ul></div>

## 创建自定义数组

保存数组：

|name|	age|	wgt
----|---|-----
0|	dan	|1|	23.1
1|	ann	|0|	25.1
2|	sam	|2|	8.3

希望定义一个一维数组，每个元素有三个属性 `name, age, wgt`，此时我们需要使用结构化数组。

In [1]:
import numpy as np

In [2]:
a = np.array([1.0,2.0,3.0,4.0],np.float32)

In [3]:
# 使用view方法将a的内存按照复数来解释
# 复数就是结构化数组real和imag
a.view(np.complex64)

array([1.+2.j, 3.+4.j], dtype=complex64)

In [4]:
# 自定义数组结构
my_dtype = np.dtype([('mass', 'float32'), ('vol', 'float32')])
a.view(my_dtype)

array([(1., 2.), (3., 4.)], dtype=[('mass', '<f4'), ('vol', '<f4')])

这里 `f4`表示四字节浮点数，`<`表示小字节序。

In [5]:
# 初始化数组结构
my_data = np.array([(1,1),(2,3),(4,5),(1,3)],my_dtype)
print my_data

[(1., 1.) (2., 3.) (4., 5.) (1., 3.)]


In [6]:
my_data[0]

(1., 1.)

In [8]:
my_data[0]['vol']

1.0

In [9]:
my_data['mass']

array([1., 2., 4., 1.], dtype=float32)

In [10]:
# 自定义排序规则，先按速度，再按质量：
my_data.sort(order=('vol','mass'))
print my_data

[(1., 1.) (1., 3.) (2., 3.) (4., 5.)]


In [11]:
# 现在定义我们最开始想要的数组结构
my_dtype = np.dtype([('name','S10'),('age','int'),('weight','float')])

In [12]:
# 看一下该数组结构所占字节
my_dtype.itemsize

22

In [13]:
# 产生一个3x4的空数组
people = np.empty((3,4),my_dtype)

In [14]:
# 赋值
people['name'] = [['Brad', 'Jane', 'John', 'Fred'],
                  ['Henry', 'George', 'Brain', 'Amy'],
                  ['Ron', 'Susan', 'Jennife', 'Jill']]
people['age'] = [[33, 25, 47, 54],
                 [29, 61, 32, 27],
                 [19, 33, 18, 54]]
people['weight'] = [[135., 105., 255., 140.],
                    [154., 202., 137., 187.],
                    [188., 135., 88., 145.]]

In [15]:
print people

[[('Brad', 33, 135.) ('Jane', 25, 105.) ('John', 47, 255.)
  ('Fred', 54, 140.)]
 [('Henry', 29, 154.) ('George', 61, 202.) ('Brain', 32, 137.)
  ('Amy', 27, 187.)]
 [('Ron', 19, 188.) ('Susan', 33, 135.) ('Jennife', 18,  88.)
  ('Jill', 54, 145.)]]


In [16]:
# 查看最后一个人的信息
people[-1,-1]

('Jill', 54, 145.)

## 从文本读取自定义数组

In [24]:
%%writefile people.txt
name age weight
amy 11 38.2
john 10 40.3
bill 12 21.2

Overwriting people.txt


In [25]:
people = np.loadtxt('people.txt',
                   skiprows=1,
                   dtype=my_dtype)
people

array([('amy', 11, 38.2), ('john', 10, 40.3), ('bill', 12, 21.2)],
      dtype=[('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

In [26]:
people['name']

array(['amy', 'john', 'bill'], dtype='|S10')

In [27]:
import os
os.remove('people.txt')

In [28]:
%%writefile wood.csv
item,material,number
100,oak,33
110,maple,14
120,oak,7
145,birch,3

Writing wood.csv


In [29]:
# 定义转换函数处理材料属性，使之对应一个整数：
tree_to_int = dict(oak =1,
                  maple=2,
                  birch=3)
def convert(s):
    return tree_to_int.get(s,0)

In [30]:
data = np.genfromtxt('wood.csv',
                     delimiter=',', # 逗号分隔
                     dtype=np.int, # 数据类型
                     names=True,   # 从第一行读入域名
                     converters={1:convert} 
                    )
data

array([(100, 1, 33), (110, 2, 14), (120, 1,  7), (145, 3,  3)],
      dtype=[('item', '<i4'), ('material', '<i4'), ('number', '<i4')])

In [31]:
data['material']

array([1, 2, 1, 3])

In [32]:
os.remove('wood.csv')

## 嵌套结构

我们希望在二维平面上纪录一个质点的位置和质量：


|position|	mass
-----|--|--
|`x	y`|

In [33]:
particle_dtype = np.dtype([('position', [('x', 'float'), 
                                         ('y', 'float')]),
                           ('mass', 'float')
                          ])

In [34]:
%%writefile data.txt
2.0 3.0 42.0
2.1 4.3 32.5
1.2 4.6 32.3
4.5 -6.4 23.3

Writing data.txt


In [35]:
data = np.loadtxt('data.txt', dtype=particle_dtype)

In [45]:
name = ['position','mass']
for i in name:
    print data[i]

[(2. ,  3. ) (2.1,  4.3) (1.2,  4.6) (4.5, -6.4)]
[42.  32.5 32.3 23.3]


In [46]:
os.remove('data.txt')