We will work with the data provided by the [Open Data portal of Italy](http://www.datiopen.it) that aggregates datasets published by the Italian government, the Regions and Provinces of Italy and the City Houses. We will work with the dataset produced by the Region of Lazio regarding the bars & restaurants that operate in the region in the period 2007-2013. The data is originally provided by the [Regional Observatory of Lazio](http://www.osservatoriocommercio.lazio.it/). The dataset is available in various different formats, here we will work with the one in JSON format:
* [Regione Lazio Regione Lazio - Pubblici esercizi permanenti (Bar-Ristoranti)](http://www.datiopen.it/it/opendata/Regione_Lazio_Pubblici_esercizi_permanenti_Bar_Ristoranti_)

The Open Data portal is developed on the [CKAN platform](https://ckan.org/), a data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. Through the *Scarica* tab you can download the dataset in JSON format (under the section *Esporta in altri formati*). For convenience, rename the file to *dataset.json*.

The dataset provides the following information:
- Municipality (ccomune)
- Province (cprovincia)
- Year (canno)
- Bars
 * Total (cbar_totale)
 * New Openings (cbar_aperture)
 * Closed (cbar_chiusure)
 * Subingressi (cbar_subingressi)
- Restaurants
 * Total (cristoranti_totale)
 * New Openings (cristoranti_aperture)
 * Closed (cristoranti_chiusure)
 * Subingressi (cristoranti_subingressi)
- Bar-Restaurants
 * Total (cbar_ristoranti_totale)
 * New Openings (cbar_ristoranti_aperture)
 * Closed (cbar_ristoranti_chiusure)
 * Subingressi (cbar_ristoranti_subingressi)
  
### Load Datasets in JSON format

The files retrieved follows a JSON format. 

We will use the [JSON encoder and decoder](https://docs.python.org/3/library/json.html) a standard python package as defined in [W3C School on JSON](https://www.w3schools.com/js/js_json_intro.asp). 

We load all the contents of the file into a string variable *data*.

In [11]:
fo = open('dataset.json', 'r')
data = fo.read()
fo.close()

Essentially all the contents of the file are loaded into the available memory of our machine.

In [10]:
len(data)

1216003

We use the [loads](https://docs.python.org/3/library/json.html#encoders-and-decoders) standard decoder that converts the JSON string sequence into a standard list object where each row of the list contains a dictionary object.

In [6]:
import json
dataset = json.loads(data)

In [15]:
type(dataset)

list

The dataset contains a total of 2646 rows.

In [14]:
len(dataset)

2646

In this format we can access each row directly:

In [19]:
dataset[0]

{'canno': '2007',
 'caperture': '0',
 'cbar_aperture': '',
 'cbar_chiusure': '',
 'cbar_ristoranti_aperture': '',
 'cbar_ristoranti_chiusure': '',
 'cbar_ristoranti_subingressi': '',
 'cbar_ristoranti_totale': '',
 'cbar_subingressi': '',
 'cbar_totale': '',
 'cchiusure': '0',
 'ccomune': 'Acquapendente',
 'cprovincia': 'VITERBO',
 'cristoranti_aperture': '',
 'cristoranti_chiusure': '',
 'cristoranti_subingressi': '',
 'cristoranti_totale': '',
 'csubingressi': '1',
 'ctotale': '32'}

## Storing data in a Document Database

We now wish to store the contents of the dataset into a document database where each row becomes a separate document. We will use the [MongoDB](https://www.mongodb.com/what-is-mongodb) and the Database-as-a-service provider [mLab](https://mlab.com/).

Using the web interface provided by mLab:
1. Create a new account under mLab
2. Create a new Database utilizing the 500MB sandbox provided by MLab without any charges.

We will access the mongoDB database provided by mLab through the [API (Application Programmers Interface)](http://docs.mlab.com/data-api/). To do so we need to enable the API access and acquire a API Key. Once again, through the web interface provided by mLab:

1. Follow the link by clicking on your username, on the top-right of the main web page.
2. Copy the API key provided by mLab account - this is unique for your account, 
3. Make sure that Data API access is enabled by clicking on the button *Enable Data API access*.

Please note that mLab’s Data API uses [MongoDB Extended JSON in strict mode](https://docs.mongodb.com/v3.2/reference/mongodb-extended-json/) to encode queries and documents.


### Simple operations via MongoDB Data API

We start by using the API endpoint */Databases* to retrieve the databases linked to the authenticated account using [requests](http://docs.python-requests.org/en/master/#) python library for using the HTTP protocol in a simple and straight-forward way.

In [21]:
import requests

params = {'apiKey': 'sStgqsMbRn5EyHWssNpLqEvCRin1BWW3'}
url = 'https://api.mlab.com/api/1/databases'
response = requests.get(url, params)

The [get](http://docs.python-requests.org/en/master/user/quickstart/#make-a-request) method is making an HTTP GET request passing the single parameter required by the particular API endpoint, that is the *apiKey*. 

If the HTTP request is completed successfully (the apiKey is correct, Data API access is enabled) the server will respond with 200 (OK).

In [23]:
response

<Response [200]>

We can access the contents of the result through the *text* element of the [response object](http://docs.python-requests.org/en/master/user/quickstart/#response-content)

In [24]:
response.text

'[ "adm" , "adm2017" , "ds" , "seed" ]'