Skip to content

seanjtaylor/ArrayServer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ArrayServer

Serves Numpy Arrays Persistently from Memory Mapped Files

Ever wanted to store large numpy arrays persistently, but using something faster than disk? Well here you go. The server process holds a handle to a memory mapped file containing a binary representation of a numpy array. When the client requests the array, the server returns the filename and the client can access the memory mapped array without having to read from disk.

Obligatory

This is not production quality code. It is a one-hour hack. I do think it's not too far off from being useful for a lot of people. Pull-requests welcome!

Starting the server

$ python -m arrayserver

Storing a large array

>>> from arrayserver import ArrayClient
>>> import numpy as np
>>> c = ArrayClient()
>>> myarray = np.random.random(500000)
>>> myarray.sum()
249876.10611505382
>>> c['myarray'] = myarray
>>> quit() # the array is still in memory in the server process

Accessing the array later (look ma, no disk!)

>>> from arrayserver import ArrayClient

>>> c = ArrayClient()
>>> myarray = c['myarray']
>>> myarray.shape
(500000,)
>>> myarray.sum()
memmap(249876.10611505382)

Deleting the array

>>> from arrayserver import ArrayClient

>>> c = ArrayClient()
>>> del c['myarray']

TODO

  • create a package
  • oh god how do I test this?
  • benchmarks
  • maybe persist mmapped files if server crashes?
  • implement keys() method
  • see how this plays with Pandas Series/DataFrame

About

Serves Numpy Arrays Persistently in Memory Mapped Files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages