# Exercise sheet \#5
## Using MongoDB
### Exercise 1
For this exercise, you will work with the Paris Tourist Information dataset (see zip file on ARCHE).
This dataset contains pieces of information about seightseeing tours in Paris. These pieces are used to describes venues belonging to the following types:
- points of interests (POI)
- restaurants
- attractions
- accomodations

Here is an example of a document:
<pre>
{
   "_id" : 83292,
   "contact" : {
      "website" : "http://www.trocaderolatour.com",
      "GooglePlaces" : "https://plus.google.com/107754700607079935569/about?hl=en-US"
   },
   "name" : "Best Western Premier Trocadero La Tour",
   "location" : {
      "city" : "Paris",
      "coord" : {"coordinates" : [2.2795155644417,48.858311118724],"type" : "Point"},
      "address" : "Paris,   France    5 bis, rue Massenet, 16. Trocadéro - Passy, 75016 Paris"
   },
   "category" : "accommodation",
   "description" : " Situé à 15 minutes à pied de la tour Eiffel, le Best Western Premier Trocadero La Tour bénéficie d'un emplacement idéal pour découvrir Paris. Il abrite un bar lambrissé doté de fauteuils en cuir et un patio.",
   "services" : [
      "jardin",
      "terrasse",
      "journaux",
      "bar",
      "petit-déjeuner en chambre",
      "réception ouverte 24h 24",
      "enregistrement et règlement rapides",
      "bagagerie",
      "service d'étage",
      "salles de réunions banquets",
      "centre d'affaires",
      "garde d'enfants",
      "blanchisserie",
      "chambres non-fumeurs"
   ],
   "reviews" : [
      {
          "wordsCount" : 30,
          "rating" : 0,
          "language" : "en",
          "source" : "Foursquare",
          "text" : "Nice beds, rooms andstaff. Perfect central location. Breakfast is very expensive for a contenintal breakfast, however many bakeries and restaurants in the area. Will stay here again my next visit.",
          "time" : "2010-09-30"
      }
   ]
}
</pre>

#### Question 1.1 - Setting up the database
- Install a local MongoDB server on your machine, along with a [Robo3T](https://robomongo.org/) MongoDB client.
- Create a database named "tourPedia" containing a collection named "paris".
- Import the content of the `tour-Pedia_paris.json` file into that collection.

NB: For questions 1.2 to 1.5, please use the [Robo3T](https://robomongo.org/) graphical MongoDB client to design and check your queries.


#### Question 1.2 - Filtering and projecting data
- Filter out venues whose type is "accomodation" and service "blanchisserie" (laundry).
- Project addresses of venues whose type is accomodation.

#### Question 1.3 - Constrained filtering
- Filter out lists of reviews about venues for which there is at least one English review whose score is greater than 3.

#### Question 1.4 - Grouping data
- Group venues by type and count them.

#### Question 1.5 - Aggregating data
- For venues of type "accomodation", give the number of venues per "service".

### Exercise 2
For this exercise, we will reuse the data from Exercise 1.

In the following questions (which are similar to Exercise 1), you are required to use [pymongo](https://api.mongodb.com/python/current/api/pymongo/index.html).

#### Question 2.1 - Filtering and projecting data
- Filter out venues whose type is "accomodation" and service "blanchisserie" (laundry).
- Project addresses of venues whose type is accomodation.

Compare your results with those of question 1.2 above.

In [7]:
from pymongo import MongoClient
from pprint import pprint
client = MongoClient('mongodb://localhost:27017')


In [11]:
db = client.tourPedia
venues1 = db.paris.find({ "category" : "accommodation" , "services" : "blanchisserie" })
venues2 = db.paris.find({ "category" : "accommodation" },{"location.address" : 1 })
print(list(venues1)[0:3]) 
print(list(venues2)[0:3]) 

[{'_id': 83265, 'contact': {'website': 'http://www.ares-paris-hotel.com', 'GooglePlaces': 'https://plus.google.com/117469042429892205022/about?hl=en-US', 'Foursquare': 'https://foursquare.com/v/h%C3%B4tel-ar%C3%A8s-tour-eiffel/4adcd9fff964a5208f3021e3'}, 'name': 'Arès Tour Eiffel', 'location': {'city': 'Paris', 'coord': {'coordinates': [2.2981756925583, 48.850407339623], 'type': 'Point'}, 'address': 'Paris,   France    7 rue du Général de Larminat, 15. Eiffel Tower - Porte de Versailles, 75015 Paris'}, 'category': 'accommodation', 'description': " L'Hotel Arès Tour Eiffel est un hôtel de caractère 4 étoiles situé à 10 minutes à pied de la Tour Eiffel. Il propose des chambres climatisées avec connexion Wi-Fi gratuite. Vous pourrez accéder gratuitement au centre de remise en forme et de bien-être situé à 50 mètres de l'hôtel. Les chambres de l'hôtel Arès Tour Eiffel sont décorées avec un mélange de styles baroque et contemporain. Chaque chambre est équipée d'une télévision à écran plat a

 #### Question 2.2 - Constrained filtering
- Filter out lists of reviews about venues for which there is at least one English review whose score is greater than 3.

Compare your results with those of question 1.3 above.

In [2]:
db = client.tourPedia
reviews = db.paris.find({"reviews" : {'$elemMatch' : {"language":"en", "rating" : {'$gt' : 3}}}}),
nbreviews1 = db.paris.count_documents({"reviews" : {'$elemMatch' : {"language":"en", "rating" : {'$gt' : 3}}}})
nbreviews2 = len(list(reviews))
print(nbreviews1)
print(nbreviews2)

4017
1


#### Question 2.3 - Grouping data
- Group venues by type and count them.

Compare your results with those of question 1.4 above.

In [3]:
db  = client.tourPedia
agr = [ {'$group' : { '_id' : '$category', 'total' : {'$sum':1}}} ]
val = list(db.paris.aggregate(agr))
for v in val:
    pprint(v)

{'_id': 'accommodation', 'total': 3376}
{'_id': 'restaurant', 'total': 21823}
{'_id': 'attraction', 'total': 4316}
{'_id': 'poi', 'total': 26846}


#### Question 2.4 - Aggregating data
- For venues of type "accomodation", give the number of venues per "service".

Compare your results with those of question 1.5 above.

In [6]:
db  = client.tourPedia
agr = [{'$match'  : {'category' : 'accommodation'}},
       {'$unwind' : '$services'},
       {'$group'  : { '_id' : '$services', 'total' : {'$sum':1}}} ]
val = list(db.paris.aggregate(agr))
for v in val:
    pprint(v)

{'_id': 'estonien', 'total': 1}
{'_id': 'jardin', 'total': 170}
{'_id': 'bibliothèque', 'total': 76}
{'_id': ' anglais', 'total': 2}
{'_id': 'distributeur automatique de billets sur place', 'total': 8}
{'_id': ' suédois', 'total': 9}
{'_id': 'club pour enfants', 'total': 1}
{'_id': 'menus pour régimes spéciaux (sur demande)', 'total': 17}
{'_id': 'restaurant (à la carte)', 'total': 69}
{'_id': 'letton', 'total': 1}
{'_id': 'billard', 'total': 3}
{'_id': 'livraison de courses', 'total': 1}
{'_id': 'sauna', 'total': 37}
{'_id': 'billetterie', 'total': 299}
{'_id': 'équitation', 'total': 1}
{'_id': 'service de concierge', 'total': 380}
{'_id': 'salle de jeux', 'total': 7}
{'_id': 'ascenseur', 'total': 1280}
{'_id': 'local à ski', 'total': 1}
{'_id': "boutiques dans l'hôtel", 'total': 24}
{'_id': "bureau d'excursions", 'total': 255}
{'_id': 'service de cireur', 'total': 86}
{'_id': 'solarium', 'total': 4}
{'_id': 'randonnée à vélo', 'total': 4}
{'_id': 'discothèque', 'total': 2}
{'_id': 'r