## Storing the Polyline Data

Storing the polylines for emission via the API requires a database.

`PythonAnywhere` provides `MySQL` databases. I initially thought I could use this. `MySQL` is an open source Oracle-controlled relational database system, and possibly the largest DBMS out there (both in terms of raw usage). It is far heavier in code weight and featureset than what I have done so far in `sqlite`. `PostgreSQL` is a far more competent open source DBMS which has significantly less code yet somehow a better featureset. In particular, `PostgreSQL` is the basis for `PostGIS`, which is the de facto DBMS of choice for relational databases.

`MySQL` does have certain spatial features itself, of course. My hope was that I would be able to use the `SQLAlchemy` ORM to manage a connection to my `PythonAnywhere` `MySQL` database, and that this would form that backbone of my service.

It's technically possible to SSH tunnel into the database. However, SSH tunneling is very difficult on Windows (from my own prior experience). It is far easier to download, install, and begin working with a local MySQL database instead, which is what I had gone and done.

However, I found that while `MySQL` has `JSON` features, `SQLAlchemy` does not itself directly provide an interface for JSON features, except in the current beta version, which is difficult to install. It also has zero support for spatial database features. There is an extension for this called `GeoAlchemy`, but the extension was most recently rewritten to work with `PostGIS` *only*. This means that if I were to use the `SQLAlchemy` ORM for this project, I would have to map the `JSON` strings manually. This is fine for everything except the coordinate lists, which I do not know how to acceptably serialize.

After some more research I decided that this route was not very practical. I searched around and was happy to discover that `MongoDB` NoSQL document-based data stores, which allow me to represent my data with almost zero effort, are freely available (up to a 0.5 GB limit) on a service called [mLab](https://mlab.com/). Yay!

I created an account, created a `citibike` database, created a `biker` user for the database, and got things rolling...

<!-- Mainly I have no idea how to do the same thing that has always tripped me up when it comes to relational schemas&mdash;how best to store a list of things (coordinates in this case) which is (mostly) unique to every single relation (trip in this case). -->

## MongoDB Test Drive

In [1]:
from pymongo import MongoClient
import json

The `MongoDB` URI format is `mongodb://<username>:<password>@<instance>/<database>`.

In [2]:
uri = None
with open("../credentials/mlab_dummy_instance_api_key.json") as cred:
    uri = json.load(cred)['uri']
    print(uri)

client = MongoClient(uri)

mongodb://biker:thereismathtobedone@ds031975.mlab.com:31975/testing_my_stuff


The first layer is the name of the database.

In [3]:
citibike_db = client['citibike']

The second layer is the name of the document collection.

In [4]:
bike_weeks_document_collection = citibike_db['bike-weeks']

In [5]:
import geojson

We store documents in this layer.

In [9]:
client.system.users.find({})

<pymongo.cursor.Cursor at 0x55dcef0>

**TODO**: Fix this error so that I can return this notebook to working order.

In [6]:
bike_weeks_document_collection.insert_one(geojson.loads(open("../data/part_1/one_trip.geojson").read()))

OperationFailure: not authorized on citibike to execute command { insert: "bike-weeks", ordered: true, documents: [ { _id: ObjectId('579abd0beb758c14d86ca5e5'), properties: { bike_id: 20249 }, features: [], type: "FeatureCollection" } ] }

In [None]:
result = _

We can tell when a document is stored successfully.

In [None]:
result.acknowledged

In [None]:
result.inserted_id

To delete everything:

In [None]:
bike_weeks_document_collection.delete_one({'_id': result.inserted_id})

In [None]:
del_result = _

In [None]:
bike_weeks_document_collection.delete_many({})

MongoDB is apparently very bad at selecting one document in particular before the `3.2` version. However, the version of the database handed to me by `mLab` is `3.0.12`, an older (maintained) branch. Given that fact, the most efficient way to implement random document selection appears to be to embed an integer as a part of the document itself, which is a workaround that I do not like much.

[Here are the options](http://bdadam.com/blog/finding-a-random-document-in-mongodb.html).

Since our working document store will not have more than 10-ish documents in it at once, it's fine to implement a relatively dumb solution that doesn't rely on this gimmick.

In [None]:
import random

In [None]:
r = random.randint(0, bike_weeks_document_collection.count({}) - 1)
randomElement = bike_weeks_document_collection.find({}).limit(1).skip(r)

In [None]:
for document in randomElement:
    print(document)

Well, that's all we need to be able to do, really. I'm super nervous about the service's claims that their single-node environment is not good for production usage, but there's nothing I can really do there.