#  Data storage

Python provides file read write and object serialisation / reconstruction (python pickle module). `numpy` provides methods for storing and retrieving structured arrays quickly and efficiently (including data compression). `scipy` provides some helper functions for common file formats such as `netcdf` and `matlab` etc etc.

Sometimes data are hard won - and reformatting them into easily retrieved files can be a lifesaver.

In [15]:
import numpy as np
from scipy.io import netcdf

In [45]:
nf = netcdf.netcdf_file(filename="../../Data/Reference/velocity_AU.nc")

from pprint import pprint # pretty printer for python objects

pprint( nf.dimensions )
pprint( nf.variables )

print nf.variables["lat"].data.shape
print nf.variables["lon"].data.shape
print nf.variables['ve'].data.shape
print nf.variables['vn'].data.shape

{'lat': 161, 'lon': 360}
{'lat': <scipy.io.netcdf.netcdf_variable object at 0x10c8f6950>,
 'lon': <scipy.io.netcdf.netcdf_variable object at 0x10c95aa90>,
 've': <scipy.io.netcdf.netcdf_variable object at 0x10c95a9d0>,
 'vn': <scipy.io.netcdf.netcdf_variable object at 0x10c95a910>}
(161,)
(360,)
(360, 161)
(360, 161)


In [52]:
%%timeit



gad = np.load("../../Data/Reference/global_age_data.3.6.z.npz")

The slowest run took 6.54 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 44.7 µs per loop


In [13]:
# Seafloor age data and global image - data from Earthbyters

# The data come as ascii lon / lat / age tuples with NaN for no data. 
# This can be loaded with ...

age = numpy.loadtxt("Resources/global_age_data.3.6.xyz")
age_data = age.reshape(1801,3601,3)  # I looked at the data and figured out what numbers to use
age_img  = age_data[:,:,2]

# But this is super slow, so I have just stored the Age data on the grid (1801 x 3601) which we can reconstruct easily

datasize = (1801, 3601, 3)
age_data = np.empty(datasize)

ages = np.load("Resources/global_age_data.3.6.z.npz")["ageData"]

lats = np.linspace(90, -90, datasize[0])
lons = np.linspace(-180.0,180.0, datasize[1])

arrlons,arrlats = np.meshgrid(lons, lats)

age_data[...,0] = arrlons[...]
age_data[...,1] = arrlats[...]
age_data[...,2] = ages[...]

../../Data/Reference/velocity_AU.nc
