Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: 98e81048c3
Fetching contributors…

Cannot retrieve contributors at this time

1019 lines (827 sloc) 42.959 kb
Low-level MongoDB support
=========================
{block}[WARNING]
This chapter is about advanced uses of MongoDB in Opa and details low-level access to MongoDB in Opa. For most applications, you should only read [this chapter](/manual/Hello--database) instead.
{block}
Introduction
------------
In this chapter, we describe the current state of support for MongoDB in the Opa
standard library.
We assume some familiarity with MongoDB concepts and particularly with the
MongoDB
[shell](http://www.mongodb.org/display/DOCS/mongo+-+The+Interactive+Shell).
This familiarization can be gained by reading the MongoDB
[tutorial](http://www.mongodb.org/display/DOCS/Tutorial).
MongoDB is a server-based document-oriented non-relational database intended to
be scalable and fast.
Documents are stored in a binary JSON-like format called
[BSON](http://bsonspec.org).
Although BSON has a richer set of types than JSON it is 100%
compatible with JSON.
For speed, MongoDB does not implement joins but is instead provided with a
powerful query language of its own and almost anything that can be done with a
relational database can be implemented in MongoDB with a little bit of effort
(see MongoDB's page on
[SQL compatibility](http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart)).
In addition, MongoDB allows multiple indices into its data although these are
not automatic and have to be initiated in client code.
MongoDB is intended to be deployed in reliable large-scale web-based
applications and thus has features which facilitate scalability such as sharding
and master-slave arrangements of servers along with features for reliability
such as replicated servers with fail-over.
Backups of MongoDB data are usually done either offline on a slave server in the
network using [external tools](http://www.mongodb.org/display/DOCS/Backups) or to
redundant nodes in the MongoDB server network.
### Setting-up MongoDB
If you are not familiar with the MongoDB database, here are some quick
instructions to get you going.
Firstly, make sure that you have MongoDB installed on your system:
```
% which mongod
```
Note that MongoDB doesn't come with any major packages such as Ubuntu, yet, but
installation is trivial, download the latest version from the MongoDB
[downloads](http://www.mongodb.org/downloads) site and unpack the files locally.
You should then just have to add the `bin` directory to your path and you should
be up and running.
To run a MongoDB server, you first have to create a directory to store the
database files.
In fact, you need a directory for each node you wish to run, see the MongoDB
documentation for how to create replica sets, sharding etc.
At its simplest, start a `mongod` server with:
```
% mkdir -p ~/mongodata/master
% mongod --rest --oplogSize 500 --noprealloc --master --dbpath ~/mongodata/master > ~/mongodata/master/log.txt 2>&1 &
```
Use the `--oplogsize` and `--noprealloc` options to limit the initial allocated
disk space (the default is about 1Gb).
The `--rest` option allows you to monitor your database via the http interface
(found at the port number plus 1000).
If you wish to run the server on a different port, use the `--port 27017`
option, the default MongoDB server port is 27017.
Note, however, that to run the MongoDB shell on a non-default port you also need
the `--port` option:
```
% mongo --port 27017
MongoDB shell version: 2.0.1
connecting to: test
>
```
For the MongoDB OPA drivers we recommend version 1.6.0 or greater since much of
the current functionality was mature by that version.
We always recommend the current MongoDB stable version (at the time of writing 2.0.2)
but for the most part the driver is quite stable with respect to backwards compatibility.
### Overview
The Opa support for MongoDB consists of a hierarchy of modules leading to
successively higher-level programming.
#### Bson
Support for the BSON binary format is in the form of the `Bson` module, all
other modules are built on top of this one.
In general, BSON values are handled by the `Mongo.document` Opa data-type but we
also provide the `Bson.opa2doc` and `Bson.doc2opa` functions to allow conversion
between Opa types and BSON documents.
#### MongoCommon
This contains general support routines for dealing with replies from the MongoDB
server.
These include:
- printing results to meaningful strings
- testing results for error status
- handling tag lists instead of bit-mapped integers
- extracting fields and Opa types from MongoDB replies
#### MongoConnection
The code which talks to the MongoDB server is in the private `MongoDriver`
module. This includes support for
[replica sets](http://www.mongodb.org/display/DOCS/Replica+Sets) with automatic
reconnection on fail-over and
[cursors](http://www.mongodb.org/display/DOCS/Queries+and+Cursors) but for
programming at this level we provide a single all-purpose module called
`MongoConnection`.
Advanced programmers wishing to use some of the more obscure features of MongoDB
can use the driver code directly but this is not recommended.
MongoDB has a complex API involving over 70 functions and many of the simple
access commands have numerous options.
Our intention with this driver is to make accessing MongoDB databases as simple
and logical as possible while still exposing the power and flexibility of the
MongoDB engine.
#### MongoCommands
As an adjunct to the low-level programming interface we provide a module
containing a large (but still incomplete) number of the MongoDB command set
called `MongoCommands`.
These encompass most functions that will be required for meta-programming the
MongoDB database, such as `dropDatabase`, `repairDatabase`, `createCollection`
and so on plus functions associated with normal database access operations such
as `getLastError`.
The more advanced MongoDB functionality is also supported here, including
`findAndModify` and the very powerful `mapReduce` function.
These commands occur in two flavors, those which return `Bson.document` values
and those which convert their results into Opa types.
If you are only looking for a single value out of a large and complex reply
document then using the `Bson` module access functions on the raw BSON may be
more efficient.
If you intend complex analysis of the reply then the Opa types may be more
convenient.
At the present time only partial support is provided for Opa types.
Some command results may never be treated this way because they include
arbitrary field names which we can't safely convert into Opa types.
#### MongoCollection
This module represents a type-safe view of the low-level routines in
`MongoConnection`.
Here, we insist upon Opa types as arguments and results from MongoDB operations.
This necessarily limits what we can put into the database since the BSON
documents stored in the database have to be consistent with the Opa types they
represent.
To achieve this, we have implemented the `MongoSelect` and `MongoUpdate` modules
which enforce a type discipline upon the arguments to, for example,
`MongoCollection.insert`.
The type safety is implemented as run-time type checks so there is a significant
performance penalty for using these routines.
In the future, however, we will provide fully type-safe compile-time type checks
along the lines of the Opa internal database.
Programming
-----------
Here, we provide some notes on programming with the Opa MongoDB driver.
The full interface is too large for complete coverage here, refer to the online
Opa [API documentation](http://doc.opalang.org/api) for detailed notes on each
function.
### Using BSON types in Opa
The full Opa BSON data-type is as follows:
```
/**
* A BSON value encapsulates the types used by MongoDB.
**/
type Bson.value =
{ float Double }
or { string String }
or { Bson.document Document }
or { Bson.document Array }
or { string Binary }
or { string ObjectID }
or { bool Boolean }
or { Date.date Date }
or { Null }
or { (string, string) Regexp }
or { string Code }
or { string Symbol }
or { (string, Bson.document) CodeScope }
or { int Int32 }
or { int32 RealInt32 }
or { (int, int) Timestamp }
or { int Int64 }
or { int64 RealInt64 }
or { Min }
or { Max }
/**
* A BSON element is a named value.
**/
type Bson.element = { string name, Bson.value value }
/**
* The main exported type, a BSON document is just a list of elements.
*/
type Bson.document = list(Bson.element)
```
While values of this type can be constructed manually:
```
doc = Bson.document
[{name: "$eval", value: {Code:"function(x,y) \{return x*y;}"}},
{name: "args", value:{Array:[{name:"0", value:{Int32:6}},
{name:"1", value:{Int32:7}}]}}]
```
there are two more convenient ways of constructing BSON values.
Firstly, we provide a set of abbreviations in the `Bson.Abbrevs` module:
```
H = Bson.Abbrevs
doc = Bson.document [H.code("$eval","function(x,y) \{return x*y;}"),
H.valarr("args",[{Int32:6},{Int32:7}])]
```
Secondly, we can construct the values in Opa and use `Bson.opa2doc`:
```
doc = Bson.opa2doc({`$eval`:(Bson.code "function(x,y) \{return x*y;}"),
args:(list(Bson.int32) [6,7])})
```
Notice that to get a field with non-alphanumeric characters we have to back-quote
the field name in the Opa value and that to control the representation in the
BSON type we can apply helper types, for example `Bson.code` is just a string
but it instructs `Bson.opa2doc` to treat it as code.
Remember also to escape curly brackets in strings.
Note that to get `Int32` values you need the `Bson.int32` type, the default for
`int` is actually `Bson.int64`.
There are several such types provided by the `Bson` module but some merit
special mention:
* Optional types have a special significance with respect to `Bson.doc2opa` in that if a field value is missing in the document it will appear in the Opa type as `{none}`. The alternate direction does not apply, `{none}` values are represented in the BSON document as `{ none : null }`.
```
type Bson.register('a) = {'a present} or {absent}
```
* We take this one step further, however, with the `Bson.register` type, which actually behaves much as `option('a)` except that when we call `Bson.doc2opa` any `{absent}` values are omitted from the resulting document altogether. Note that there is a module `Bson.Register` which provides the same functionality for `Bson.register` as the `Option` module does for type `option`.
* Care should be taken in dealing with integer values which may have been placed into the database outside of OPA. OPA uses, internally, the OCaml integer representation `int` which is actually 31 bits wide on 32-bit systems and 63 bits wide on 64-bit systems (the spare bit is reserved by the garbage collector). Now MongoDB actually uses fully 32-bit and 64-bit integers which means that it is possible to find an integer value in a MongoDB database which is too large for the OPA representation (remember that all values generated by OPA and stored in the database are guaranteed to be within range). Currently, OPA only has 32-bit and 64-bit integers as abstract values. Such values can be stored in OPA as an external type (`int32` and `int64`) but no operations are possible on these values (they are sometimes needed by external libraries). We handle this situation in the MongoDB driver by automatically detecting overflow values and storing them as `RealInt32` and `RealInt64` when returning `Bson.document` types from the driver. While these values may appear to be invisible to the `Bson` module functions such as `find_int`, you can detect overflows by inspecting the document values:
```
match (value) {
case {RealInt32:_}: error("overflow");
case {Int32:i}: i;
default: error("not an int");
}
```
* The `Bson.meta` type is intended to support situations where MongoDB can return a field of different types depending upon the nature of the command executed. A good example of this is the `out` option to the `mapReduce` function which can be either a `string` or a document type. We cast the parameter as `Bson.meta` which allows us to control the type at the function's application. We can also apply this trick to the `result` type from `mapReduce` calls:
```
mr = MC.mapReduceSimple(mongodb,map,reduce,{String:"example1"})
/* or */
mr = MC.mapReduceSimple(mongodb,map,reduce,{Document:[H.str("reduce","session_stat")]})
```
* Two other cases should be mentioned. Both `list` and `intmap` are mapped onto `Array` values in BSON. The difference is that `list` is mapped to consecutive-numbered elements in the `Array` document whereas `intmap` allows sparse arrays.
As a rough guide to `Bson.opa2doc` and `Bson.doc2opa`, the following simple
schema shows the mapping:
```
/* We use a "natural" mapping of constant types */
float <-> Double
string <-> String
Bson.binary <-> Binary
Bson.oid <-> ObjectID
bool <-> Boolean
Date.date <-> Date
void <-> Null
Bson.regexp <-> Regexp
Bson.code <-> Code
Bson.symbol <-> Symbol
Bson.codescope <-> CodeScope
Bson.int32 <-> Int32
Bson.realint32 <-> Int32
Bson.timestamp <-> Timestamp
Bson.realint64 <-> Int64
Bson.min <-> Min
Bson.max <-> Max
/* Basic record scheme */
{a:'a; b:'b} <-> { a: 'a, b: 'b }
/* Sum types */
{a:'a} / {b:'b} <-> { a: 'a } <or> { b: 'b }
/* Non-record types are called "value" */
'a <-> { value: 'a }
/* Special cases */
/* Default for int is Int64 */
int <-> Int64
/* Overflow */
Bson.realint32 <- Int32 /* when integer exceeds range */
Bson.realint64 <- Int64 /* when integer exceeds range */
/* Options */
option('a):
{some=a} <-> { some : 'a }
{none} <-> { none : null }
{none} <- { }
/* Registers */
Bson.register('a):
{present=a} <-> { present : 'a }
{absent} <- { absent : null }
{absent} <-> { }
/* Lists are consecutive arrays */
list('a) <-> { Array=(<label>,{ 0:'a; 1:'a; ... }) }
/* Intmaps are non-consecutive arrays */
ordered_map(int,'a) <or>
intmap('a) <-> { Array=(<label>,{ 1:'a; 3:'a; ... }) }
/* Bson.document is treated verbatim (including labels) */
Bson.document <-> Bson.document
/* Bson.meta is treated as a variable type */
int:Bson.meta <-> { Int64:int }
string:Bson.meta <-> { String:string }
bool:Bson.meta <-> { Boolean:bool }
etc.
```
Notes:
* For `ObjectID` values, there are a couple of routines which convert between (hex value) strings and the BSON representation, `Bson.oid_of_string` and `Bson.oid_to_string`. You can also create a BSON-style OID value with `Bson.new_oid`.
* `Bson.document` types are completely write-through, i.e. they are not processed at all.
* In case you're wondering, `Min` and `Max` are used in sharded databases to indicate infimum and supremum bounds on sharding regions, respectively.
//TODO: other functions find_xyz, to_pretty, error stuff
### Using the low-level interface
Connecting to and using the low-level drivers should be done using the
`MongoConnection` module.
This gathers together various low-level features in a single module.
#### Opening a connection to the MongoDB server
The preferred method is to use the system of named connections which can be
defined from the command line or setup internally using the `Mongo.param` type
and the `MongoConnection.add_named_connection` function.
Initially, there is one default connection (called ''default'') which is set to
`localhost:27017`, the default port for MongoDB servers on the local machine.
To open this connection use:
```
mongodb =
match (MongoConnection.open("default")) {
case {success:mongodb}: mongodb
case {~failure}: ... /* take action on error */
}
/* or */
mongodb = MongoConnection.openfatal("default")
```
The `MongoConnection.open` function returns an outcome of either the connection
or the standard `Mongo.failure` type whereas the `MongoConnection.openfatal`
function returns just the connection but treats a failed connection as a fatal
error.
To setup the connection from the command line the following options are defined:
{table}
{* Option | Abbrev Type | Description *}
{| `--mongo-name` | `(--mn) <string>` | Name for the MongoDB server connection |}
{| `--mongo-repl-name` | `(--mr) <string>` | Replica set name for the MongoDB server |}
{| `--mongo-buf-size` | `(--mb) <int>` | Hint for initial MongoDB connection buffer size |}
{| `--mongo-socket-pool`| `(--mp) <int>` | Number of sockets in socket pool (>=2 enables socket pool) |}
{| `--mongo-seed` | `(--ms) <host>{:<port>}` | Add a seed to a replica set, allows multiple seeds |}
{| `--mongo-host` | `(--mh) <host>{:<port>}` | Host name of a MongoDB server, overwrites any previous hosts |}
{| `--mongo-log` | `(--ml) <bool>` | Enable MongoLog logging |}
{| `--mongo-log-type` | `(--mt) <string>` | Type of logging: stdout, stderr, logger, none |}
{| `--mongo-auth` | `(--ma) <user:pwd@dbname>` | Define user name and password for database dbname |}
{table}
So, for example, to connect to the default connection at `machinexyz:12345` you
would use:
```{.sh}
% prog.exe --mh machinexyz:12345
```
This remains a single connection, to connect to a replica set you also need to
define a name for the replica set plus some seeds:
```{.sh}
% prog.exe --mn blort --mr blort --ms machinexyz:27017 --ms machineuvw:27017
```
Here we have defined a connection called ''blort'' to a replica set also called
''blort'' with two seed machines.
Remember that you only really need one seed which is active in the set, the
connection logic queries the seeds for the actual host list and then polls the
hosts until it finds the current primary server.
From then on reconnection will be attempted if the current primary goes down.
Note that you can define as many named connections as you like, this example
still retains the default connection.
Note also that you can clone a connection such that the connection itself will
not be closed until all clones have already been closed.
Handling concurrency within an Opa program is done by a socket pool.
This means that a pool of open connections is maintained to the same server such
that blocking only occurs if there are no more available connections in the pool
(set with `--mp 2`, for example).
If you ensure that the pool size is at least as big as the number of threads in
your code then no blocking will occur.
Named connections can also be defined within the program:
```
MongoConnection.add_named_connection({
name: "blort",
replname: {some: "blort"},
bufsize: 50*1024,
pool_max: 2,
log: false,
seeds:[("localhost",10001),("localhost",10002)],
auth:[{dbname:"mydb",user:"me",password:"secret"}]
})
mongodb2 = N.openfatal("blort")
```
Once a connection has been opened, it can be pointed to different databases and
collections using a functional interface.
The default database is ''db'' and the default collection is ''collection'' but
we can make a connection to a different collection without re-opening the
connection as follows:
```
mongodb_wiki = MongoConnection.namespace(mongodb,"db","wiki")
```
This mechanism also applies to the flags that some of the MongoDB operations can
take, for example to set the `Upsert` flag for all insert operations:
```
mongodb3 = MongoConnection.upsert(mongodb)
```
This method is quite flexible since you can define these flags once when the
connection is made, making the flags globally persistent, or you can add these
function calls at the point of calling the operation, i.e. locally defined flags
(there are examples below).
All of the MongoDB flags are supported in this way.
One particular flag is worth mentioning, the `log` flag which can be set on the
command line and can actually be overridden in this way allowing you to generate
logs for targeted sections of code.
In fact, you can change any of the command line options this way but bear in mind
that some of them, for example, seed lists, will not take effect until the
connection is reconnected.
#### Authentication
As you can see, you can add the MongoDB authentication parameters for a given database
either on the command line using the `--mongo-auth` argument which is of the
form: `user:password@database_name` or by placing the authentication
parameters in the `auth` field in the `add_named_connection` function argument.
Alternatively, you can call the `MongoCommands.authenticate` function to perform
an additional, external authentication.
Note that if you are connecting to a replica set then the driver needs to
re-authenticate after connecting to the new host so the authentication
parameters are built into the low-level Mongo datatype.
This means that if you call this function you should perform all subsequent
operations on the returned Mongo datatype, not on the original which won't have
the parameters built in.
Remember that authentication in MongoDB is to a database, not to a connection so
you can have multiple user names and passwords associated with a single
connection.
If you want to authenticate with all of the databases over a connection you need
to authenticate with the `admin` database which acts a bit like ''root'' access
for databases.
#### Basic operations
The basic database access operations are the same as the MongoDB protocol
operations, i.e. insert, update, query, get_more, delete, kill_cursors and msg.
So, for example, to insert a document:
```
/* A couple of documents */
p1 = [H.str("name","Joe1"), H.i32("age",44)]
p2 = [H.str("name","Joe2"), H.i32("age",55)]
/* Insert the documents */
MongoConnection.insert(mongodb,p1)
MongoConnection.insert_batch(mongodb,[p1,p2])
```
The basic write operations come in three types:
* `insert` is the write-and-forget operation where the insert message is sent and a boolean value is returned which simply states that the correct number of bytes were written to the socket.
* `inserte` is a ''safe'' operation where the insert message has a `getlasterror` query piggy-backed onto it and then the raw optional reply is returned.
* `insert_result` does an `inserte` and then analyzes the reply, turning it into a standard `Mongo.result` type.
All of the basic write operations have these three forms.
The `Mongo.result` type is an `outcome` of either success as a `Bson.document`
type or failure as a `Mongo.failure` type.
The `Mongo.failure` type looks like:
```
type Mongo.failure =
{OK}
or {string Error}
or {Bson.document DocError}
or {Incomplete}
or {NotFound}
```
This defines either a raw document error `{DocError:doc}` which is an error as
reported by the MongoDB server, a driver error `{Error:str}` which is a
message generated by the Opa driver or a few special-purpose errors returned
under specific circumstances (`{OK}` is simply a connection that has never
been used).
Post-processing of results may include checking for errors:
```
error = MongoConnection.insert_result(MongoConnection.upsert(mongodb),[H.i32("i",n)])
println("insert error={MongoCommon.is_error(error)}")
```
or extracting specific fields from the reply:
```
println("errmsg={MongoCommon.result_string(error,"errmsg")}")
```
noting that we also support the MongoDB
[dot notation](http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29)
syntax:
```
println("indexSizes._id_={MongoCommon.dotresult_int(collStats,"indexSizes._id_")}")
```
Closing a connection is as simple as:
```
MongoConnection.close(mongodb)
```
Remember that the connection will only close once all of the clones have also
been closed.
#### Cursors
Handling queries in MongoDB has the complication that, for efficiency, cursors
are stored on the server which entails tracking them at the client side.
While the bare `MongoConnection.query` and `MongoConnection.get_more` operations
can be used to handle queries in conjunction with the reply support code in
`MongoCommon` they are a bit inconvenient.
For this purpose we have defined cursor operations in the `MongoCursor` module
and re-exported the most important ones into the `MongoConnection.Cursor`
module.
A cursor object itself contains all the parameters needed to manage the cursor
at the server side and, in fact, duplicates some of the information in the
connection object.
Using the re-exported functions reduces the number of parameters to the basic
functions since this information can be lifted from the connection into the
cursor object.
Here is an example of a low-level cursor dialog:
```
cursor = MongoConnection.Cursor.init(mongodb)
cursor = MongoConnection.Cursor.set_query(cursor,{some:[H.str("name","Joe")]})
cursor = MongoConnection.Cursor.set_limit(cursor,3)
cursor = MongoConnection.Cursor.set_fields(cursor,{some:[H.i32("_id",0)]})
cursor = MongoConnection.Cursor.next(cursor)
result = MongoConnection.Cursor.check_cursor_error(cursor)
println("result 1 = {MongoCommon.pretty_of_result(result)}")
println("valid 1 ={MongoConnection.Cursor.valid(cursor)}")
cursor = MongoConnection.Cursor.next(cursor)
result = MongoConnection.Cursor.check_cursor_error(cursor)
println("result 2 = {MongoCommon.pretty_of_result(result)}")
println("valid 2 = {MongoConnection.Cursor.valid(cursor)}")
MongoConnection.Cursor.reset(cursor)
```
The cursor is initialized with `init` and then the parameters for the query
are setup.
The `next` function generates the `query` (or `get_more`) call to the server and
places the next document internally in the cursor object along with any error
status.
The `check_cursor_error` function is a convenient way of extracting either the
current document or the error as a `Mongo.result`.
Subsequent calls to `next` will either return the next document from the
previous reply or issue a `get_more` call to re-populate the cursor.
The end of the matching documents (or if no document matches) is signaled with
`NotFound` and if you try to read past the end of matching documents you will
get an ''end of data'' error from the driver.
The `valid` function is used to poll whether there is any remaining data.
Finally, the call to `reset` is important here because it doesn't just end the
query, it will issue a `kill_cursors` operation to the server to tell it to
delete the cursor (cursors time out after 10 minutes by default on the MongoDB
server).
This method works fine but this logic has been wrapped up into some convenience
functions:
* `find_one` returns the first matching document as a `Mongo.result`
* `find_all` gives all the matches as a list of documents (use the `limit` function to limit the number of replies).
For example:
```
/* Find all objects in db.session, excluding the _id field */
mongo_session_no_id =
MongoConnection.fields(MongoConnection.namespace(mongodb,"db","session"),{some:[H.i32("_id",0)]})
println("findAll: {CM.pretty_of_results(MongoConnection.Cursor.find_all(mongo_session_no_id,[]))}")
```
You can also define custom loops over the matches using `start` (or `find`) in
conjunction with `next` and `valid`.
(Note that you must use the `MongoConnection.Cursor.for` loop instead of the
more usual `for` function in the Opa stdlib, you need to check for valid and
only call next if still valid at that point, otherwise you will miss the last
document in the list of matches).
//Commands
//~~~~~~~~
Collections
-----------
While you can achieve anything that MongoDB is capable of using the low-level
drivers, there are no guarantees of type safety while converting between BSON
documents and Opa values.
You can of course base your entire project around BSON values and eliminate
the need for converting between MongoDB's documents and Opa types altogether but
this may not be very convenient depending upon what is happening elsewhere in
your application.
Secondly, to use the low-level drivers requires an investment in learning
MongoDB's powerful but rather complex interface (which may be new to users of
relational databases) in order to exploit what MongoDB has to offer.
Finally, basing your application on MongoDB's API will tie your application to
MongoDB and you may at some point in the future wish to migrate to other
database solutions.
Ultimately, the intention is to provide an abstract view of the database which
is general enough to encompass several of the existing database solutions, of
which MongoDB is an important player, and support this with compiler-generated
syntax in the manner of the Opa inbuilt database.
This support is still not available but we can offer an intermediate layer of
programming MongoDB whereby we assume collections of Opa types and support
type-safety by performing run-time type-checks on operations over these
collections.
This support is in the form of the `MongoCollection` module plus some support
modules for generating values suitable to be applied to these functions.
### The `collection` type
The central idea in the `MongoCollection` module is a collection (in the MongoDB
terminology sense) of Opa values.
This is embodied in the `Mongo.collection` type which is extremely simple, it's
just a `MongoConnection` value cast to the specific type of the values to be
stored in the collection:
```
type Mongo.collection('a) = {
Mongo.mongodb db /* the mongodb connection */
}
```
When a value is stored in the collection it is automatically converted from its
Opa type into a matching BSON document and _vice versa_ for queries.
While this sounds simple there are a number of pitfalls to watch out for.
We assume that any offline modifications of the collection will not
create any incompatible values.
If, for example, we add or delete a field from a record then the entry can no
longer be represented as an Opa type.
To overcome this problem we place checks in the code to verify the suitability
of documents read from the collection and an error will be generated if any such
values are found.
We also provide features to allow handling of this situation in some specific
circumstances, for example, if you type a field in the collection as
`Bson.register` it will allow you to successfully read in values with missing
fields but this is not recommended for collections.
Ultimately, it is up to the maintainer of the database to ensure that the values
stored there are consistent with the application's usage of the collection.
Despite these provisos, using a collection is very simple and gives the
programmer the ability to integrate Opa types with the MongoDB system without
having to understand the underlying complexity of the database and with a modest
level of type-safety.
The cost, for the moment, is the overhead of the run-time type-checks which will
slow down database operations.
### Programming with collections
A simple dialog for creating and manipulating a collection might be as follows:
```
/* The type of our first collection */
type t = {int i}
/* Create a collection of type t */
Mongo.collection(t) c1 = MongoCollection.openfatal("default","db","collection")
/* Put a single value into the collection */
result = MongoCollection.insert_result(c1,{i:0})
/* Finally, destroy the collection */
MongoCollection.destroy(c1)
```
We define a type for the collection (`type t`) so that when we open a connection
to the database we can cast the resulting collection object and thus install the
correct run-time representation of the type.
The `openfatal` function returns a collection and treats a connection failure as
fatal.
There are several variants of the `open` function.
A collection is a pointer to a specific collection in the database (here,
`db.collection`) and we create a connection to the MongoDB server using the
connection name (in this instance, `default`).
Inserting a value into the collection is trivial, the value is simply passed as
it is to the `insert` function (here we use the safe `insert_result` function
which also returns the result of a `getlasterror` call).
The insert has exactly the same effect as a call to `MongoConnection.insert` but
with the value automatically converted into a BSON document using the scheme
outlined above.
The call to `MongoCollection.destroy` should not be forgotten because this
closes the underlying connection.
While the `insert` function is trivial, we need more care with `update` and
`delete`.
The problem is that to maintain our level of type-safety we need to match select
(and update) documents with the type of the collection they are applied to.
We do this with a system of run-time type-checks applied to the select
documents.
For example:
```
/* Create pre-typed select and update generation functions */c
MongoSelect.create reatest = Bson.document -> Mongo.select(t)
MongoUpdate.create createut = Bson.document -> Mongo.update(t)
/* Generate the select documents */
select = createst(MongoSelectUpdate.int64(MongoSelectUpdate.empty(),"i",0))
update = createut(MongoSelectUpdate.inc(MongoSelectUpdate.int64(MongoSelectUpdate.empty(),"i",1)))
/* We can now apply update to these documents */
result = MongoCollection.update_result(c1,select,update)
```
Firstly, we use the `MongoSelectUpdate` module to generate the basic documents.
Note that we could also have used the `Bson.opa2doc` function to achieve the
same result:
```
select = createst(Bson.opa2doc({i:0}))
update = createut(Bson.opa2doc({`$inc`:{i:1}}))
```
The choice between these two styles may depend upon the type of document being
generated.
The Opa type-based versions are more readable but the `MongoSelectUpdate` ones
are much faster since no conversion is required.
The select documents have to be correctly typed for the collection they apply to
so we generate a couple of convenience functions `createst` and `createut` to do
the casting for us.
Secondly, once we have these documents we can apply the `update` function to
them but note that although a select document is just a typed `Bson.document` it
triggers a set of suitability tests.
These tests are complex and probably do not cover all possible MongoDB
operations but briefly, the select document is scanned by a knowledge-base of
the types of MongoDB field types, for example `$inc` only applies to updates,
`$and` only applies to selects whereas `$comment` can apply to both.
Once the status (select/update/both) is determined, the type of the resulting
values is determined from the select document and is verified to be a subtype of
the type of the collection.
So, for example, `{int a}` is a subtype of `{int a, string b}` but `{int a, bool c}`
is not.
Presently, we only print a suitable warning but in future, once these routines
have fully matured we may return an error value.
All of the basic database write operations occur in both send-and-forget and in
send-with-getlasterror forms: `insert`, `insert_result`, `insert_batch`,
`insert_batch_result`, `update`, `update_result`, `delete` and `delete_result`.
As an aside, notice that we use a similar functional interface for flags as for
the low-level code:
```
MongoCollection.delete(MongoCollection.singleRemove(c1),createst(Bson.opa2doc({i:104})))
```
The select mechanism applies to queries as well but in this case we have to be
careful what types we return from the database:
```
result = MongoCollection.find_one(c1,createst(Bson.op12doc({`$where`:(Bson.code "this.i > 106")})))
match (result) {
case {success:{~i}}: println("i={i}")
case {~failure}: println("error={MongoCommon.string_of_failure(failure)}")
}
```
This example returns the first value in the collection for which `i` is greater
than 106, it expresses the select as a Javascript expression.
Many of the MongoDB query methods are perfectly safe with collections such as
the `$where` example here but some methods are not safe in that they return
documents which contain fields other than those in the Opa type, a good example
being the http://www.mongodb.org/display/DOCS/Explain[`$explain`] documents
which are a set of statistical data concerning the given query (see the
`Mongo.explainType` type in `MongoCommands`).
In general, we attempt to support such features with special purpose functions
rather than via the normal database operations.
The usual simplified query functions are present in `MongoCollection`,
`find_one` and `find_all`.
There are also two functions which return the bare `Bson.document`
representation of the result, `find_one_doc` and `find_all_doc` which may be
useful in the above situation where the result of the query is not compatible
with Opa types.
For more general query scanning, the cursor-based routines are available.
For example, the following code scans the results of a `MongoCollection` query
```
query = createst(Bson.opa2doc({i:{`$gt`:102, `$lt`:106}}))
match (MongoCollection.query(MongoCollection.limit(c1,0),query)) {
case {success:cc1}:
cc1 =
while(cc1,(function(cc1) {
match (MongoCollection.next(cc1)) {
case (cc1,{success={~i}}):
println("i={v}")
(cc1,MongoCollection.has_more(cc1))
case (cc1,{~failure}):
println("error={MongoCommon.string_of_failure(failure)}")
(cc1,false))})
MongoCollection.kill(cc1)
case {~failure}:
println("error={MongoCommon.string_of_failure(failure)}")
}
```
In this code, we create a `Mongo.collection_cursor` object using
`MongoCollection.query` to which we can then apply the collection-specific cursor
functions `MongoCollection.next` and `MongoCollection.has_more`.
This allows arbitrary processing of collection queries.
Remember, as with the low-level cursors above, that the `MongoCollection.kill`
function does not just end the scan, it also sends a `kill_cursors` message to
the MongoDB server to tell it to destroy the cursor.
Another aside in this code is that we set the `limit` value to `0` which means
''use the default number of documents per reply''.
If we had set this to `1` we would only ever get one document in the reply
because MongoDB treats this as a special case, i.e. ''just return one document''.
Again, to help with the situation where return values may be incompatible with
Opa types, we provide the `_unsafe` variants of the query functions.
These, for example `query_unsafe`, take an additional boolean flag,
`ignore_incomplete` which instructs the driver to simply ignore any return
documents which have missing fields and are thus not compatible with Opa types.
MongoDB will actually return partial documents if the document meets the query
document but does not contain all of the fields (an exception is the `_id` field
which is always returned unless specifically excluded with the return field
selector document).
These functions should be used with care.
Apart from the support described here the `MongoCollection` module also provides
a few convenience functions such as creating indexes using collection objects
and some direct support for some of the aggregation functions (`count`,
`distinct` and `group`).
Finally, one of the variants of the `open` function, `openpkg` and
`openpkgfatal` supplies a set of pre-cast versions of `MongoSelect.create` and
`MongoUpdate.create`.
Example: Hello, MongoDB wiki
----------------------------
In this section, we describe how to convert the `hello_wiki` example described
in the [previous chapter](/manual/Hello--wiki) to using the MongoDB database.
This is actually a simple process and uses MongoDB as a simple key-value storage
database.
// TODO: more realistic example
The first task is to open a connection to the database.
We are going to use collections and in fact, we will use the version of `open`
which also gives us the casting functions for selects:
```
/**
* The basic info. about the database and table location.
*/
type page = {
string _id,
Bson.int32 _rev,
string content
}
/**
* We work at level 1, run-time type-checked storage of a collection of Opa values.
* The Mongo.pkg type provides convenience functions for building select and update documents.
**/
Mongo.pkg(page) (wiki_collection,wiki_pkg) = MongoCollection.openpkgfatal("default","db","wiki");
function pageselect(v) { wiki_pkg.select(Bson.opa2doc(v)); }
function pageupdate(v) { wiki_pkg.update(Bson.opa2doc(v)); }
```
The `_rev` field has been cast to `Bson.int32` so we can use 32-bit integers for
this field (it is unlikely we will ever have more than 4 giga-revisions of any
value in the database!).
We then open our connection using the default named connection and connect to
the collection `db.wiki`.
This returns a collection object plus a package of values which we use to build
our select documents.
Next we are actually going to search for documents including the `_rev` field so
we can't just use the default index for our collection (the `_id` field):
```
/**
* Indexes aren't automatic in MongoDB apart from the non-removable _id index.
* Since we're searching on _rev as well, we need a separate index.
**/
MongoCollection.create_index(wiki_collection, "db.wiki", Bson.opa2doc({_id:1; _rev:1}), 0)
```
The `get_content` function can then be modified using a simple call to
`MongoCollection.find_one`:
```
function get_content(docid) {
default_page = "This page is empty. Double-click to edit."
function extract_content(page record) { record.content }
/* Order by reverse _rev to get highest numbered _rev. */
orderby = {some:Bson.opa2doc({_rev:-1})}
match (MongoCollection.find_one(MongoCollection.orderby(wiki_collection,orderby),pageselect({_id:docid}))) {
case {success:page}: extract_content(page)
case {failure:{NotFound}}: default_page
case {~failure}:
jlog("hello_wiki_mongo: failure={MongoCommon.string_of_failure(failure)}")
default_page
}
}
```
We search the database for the given `_id` value but we want the
highest-numbered `_rev` field so we sort by inverse order on that field (the
default ordering for numerical fields is in increasing order).
A missing document is signaled by the `NotFound` failure condition, other
`failure` values are errors.
Finally, the `save_source` function becomes a call to
`MongoCollection.update_result`:
```
exposed function save_source(topic, source) {
select = pageselect({_id:topic})
update = pageupdate({`$set`:{content:source}, `$inc`:{_rev:(Bson.int32 1)}})
/* Upsert this so we create it if it isn't there */
result = MongoCollection.update_result(MongoCollection.upsert(wiki_collection),select,update);
if MongoCommon.is_error(result)
then <>Error: {MongoCommon.pretty_of_result(result)}</>;
else load_rendered(topic);
}
```
In this case, we select only the `_id` field and we update the document by
setting the `content` field and incrementing the `_rev` field.
Note that we use the `Upsert` flag which tells MongoDB to insert the document if
it isn't already present in the collection.
We test the result for errors using the safe update operation but apart from
that the code is identical to the existing ''Hello wiki'' example.
Jump to Line
Something went wrong with that request. Please try again.