# 5-2-如何处理二进制文件-以wav为例

## wav file format

![wav](./5-2-wav-format.gif)

In [1]:
with open("5-2-test.wav","rb") as f:
    info = f.read(44)
info

b'RIFF$\xbc(\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x02\x00"V\x00\x00\x88X\x01\x00\x04\x00\x10\x00data\x00\xbc(\x00'

### parse bytes

In [2]:
import struct
# to unpack the num channels
print(struct.unpack("h",info[22:24])) # two bytes of integer , so we use "h"
print(struct.unpack(">h",info[22:24])) # use big endianness
# to unpack the sample rate
print(struct.unpack("i",info[24:28]))

(2,)
(512,)
(22050,)


In [3]:
info.find(b"data")

36

然而上图表示的并不是很准确，有些wav文件也不是完全按这样的  
我们可以通过每个区块的chunksize来读取文件

### func (find specific subchunk postion)

In [4]:
def find_subchunk(f,name):
    f.seek(12) # the begining 12 bytes is fixed
    while True:
        chunk_name = f.read(4)
        chunk_size, = struct.unpack("i",f.read(4)) # notice there is a comma after the word "chunk_size"
        
        if chunk_name == name:
            return f.tell(),chunk_size
        
        f.seek(chunk_size,1) # 1 denote the current position

In [5]:
with open("5-2-test.wav","rb") as f:
    offset,size  = find_subchunk(f,b"data")
offset,size,offset+size

(44, 2669568, 2669612)

In [6]:
ll 5-2-test.wav

-rw-r--r-- 1 ubuntu 2669612 Nov 10 08:04 [0m[00;36m5-2-test.wav[0m


### example: to lower the sound

In [7]:
import numpy as np
buf = np.zeros(size//2,dtype=np.short) # np.short means 2 bytes, i.e. "h". This information is got from Num Channels. So size should // 2
buf

array([0, 0, 0, ..., 0, 0, 0], dtype=int16)

In [8]:
f = open("5-2-test.wav","rb")
offset,size  = find_subchunk(f,b"data") # the pointer positon has been changed to the offset
f.readinto(buf) # read exactly the number of bytes as the size of buf into buf
buf[1000:1020]

array([ -50, -298, -148,   68, -228,  173,  -98,  -77,  130, -294,  175,
       -123,   68,   77,  -57,  162,    7, -112,   34, -232], dtype=int16)

In [9]:
buf //= 8
f2 = open("5-2-test-out.wav","wb")
f.seek(0)
info = f.read(offset)
f2.write(info)
buf.tofile(f2) # better than f.write(buf.tobytes())
f2.close()
f.close()