# A simple example of using dataspace in mongodb ETL

Sub-classes of the `Workspace` class are stored in the `workspaces` library. We'll load `MongoFrame`, which interfaces with mongodb collections.

In [1]:
from dataspace.workspaces.remote_db import MongoFrame

Constructing a workspace instance is as easy as passing the path to the mongodb collection (log-in credentials are optional key-word arguments).

In [2]:
retriever = MongoFrame(host='localhost', port=27017, database='test', collection='test')

Each instance has access to `from_storage` and `to_storage` methods, as well as a statefull `memory` attribute to facilitate IO.

In [3]:
from pandas import DataFrame

test_frame = DataFrame(data=[[1, 'one'], [2, 'two']], columns=['number', 'name'])

# saves the test DataFrame in storage
retriever.memory = test_frame
retriever.to_storage(identifier='number', upsert=True)  # the id column uniquely identifies documents for upsert

# loads the "name" fields from all documents in the collection
retriever.from_storage(  # from_storage wraps the pymongo collection-level find operation
    filter = {},
    projection = {'name': 1}
)
print(retriever.memory)

connected to test @ localhost
disconnected 

connected to test @ localhost
disconnected 

                        _id name
0  5cf86e1e7adc82470c58675b  one
1  5cf86e1e7adc82470c58675d  two


Deleting documents from the collection is implemented by `delete_storage`, which is protected by the `clear_collection` key-word argument.

In [4]:
retriever.delete_storage(filter={})  # an empty filter matches all documents

connected to test @ localhost
disconnected 



Exception: Do you mean to delete everything in test.test? If so,then flag clear_collection as True.

In [5]:
retriever.delete_storage(filter={}, clear_collection=True)  # flag enables full deletion
retriever.from_storage()

print(retriever.memory)

connected to test @ localhost
disconnected 

connected to test @ localhost
disconnected 

Empty DataFrame
Columns: []
Index: []
