## Decoupling data from compute in FaaS workflows with ProxyStore

Valerie Hayot-Sasson\
Braid meeting\
Jan 25, 2023

### Data transfers in FaaS is inefficient


![images/trad_faas.png](images/trad_faas.png)

### ProxyStore: Transfer proxy first, data later
<img src="images/proxy_data_transfer.png" alt="images/proxy_data_transfer.png" width="500" style="display:block;margin-right:auto;margin-left:auto"/>

### But what are Proxy objects ?

A reference to an immutable Python object located anywhere

They: 
- Transparently wrap ***target*** objects
- Are initialized by a ***factory***
- Provide ***just-in-time resolution***

### How to create a Proxy object

In [1]:
from proxystore.proxy import Proxy

x = "The proxy has been resolved"

class MyFactory():
    def __init__(self, obj):
        self.obj = obj

    def __call__(self):
        return self.obj


p = Proxy(MyFactory(x))

In [2]:
from proxystore.proxy import is_resolved
is_resolved(p)

False

In [3]:
print(p)

The proxy has been resolved


In [4]:
isinstance(p, Proxy)

True

In [5]:
from proxystore.proxy import extract

x = extract(p) # extracts the object from the proxy
isinstance(x, Proxy)

False

### The Stores

Class that handles communication of objects via mediated channels

Asynchronous transfers enabled by the stores

Exposes:
- Get
- Set
- Evict
- Exists

### Many Available Stores

And more to possibly come...

Stores based on existing services:
- RedisStore
- GlobusStore
- FileStore
- LocalStore

Custom in-memory stores:
- EndpointStore
- Distributed in-memory stores 

### The EndpointStore

An in-memory store for inter-site communication

Key format: `(object_id, endpoint_id)`


![images/endpointstore.png](images/endpointstore.png)

### The Distributed in-memory stores

An in-memory store for intra-site communication

Key format: (object_id, size, peer_id)

<img src="images/dimstore.png" alt="images/dimstore.png" width="400"/>

### Available distributed in-memory stores

- WebsocketStore (fallback)
- UCXStore
- MargoStore

### How to use a store

In [6]:
from proxystore.store.file import FileStore

fs = FileStore('mystore', store_dir='mystore', cache_size=16)
p = fs.set("hello")

In [7]:
p

FileStoreKey(filename='92383075450681783512448073936453971083')

In [8]:
!ls mystore

92383075450681783512448073936453971083


In [9]:
!cat "mystore/$(ls mystore)"

02
hello

In [10]:
q = fs.get(p)
q

'hello'

In [11]:
fs.evict(p)

In [12]:
!ls -larth mystore

total 0
drwxr-xr-x@ 8 valeriehayot-sasson  staff   256B Jan 26 07:41 [34m..[m[m
drwxr-xr-x@ 2 valeriehayot-sasson  staff    64B Jan 26 07:41 [34m.[m[m


### Experimental Results

![images/store_performance.png](images/store_performance.png)

### Distributed in-memory store

![images/dim_perf.png](images/dim_perf.png)

### Future work

- Develop a MultiStore
- Add policies for moving and persisting data
- Enable prefetching of data based on task placement

### Questions?
[GitHub](https://github.com/proxystore)

[Documentation](https://proxystore.readthedocs.io/en/latest/)