# Exercise sheet \#5
## Using MongoDB
### Exercise 1
For this exercise, you will work with the Paris Tourist Information dataset (see zip file on ARCHE).
This dataset contains pieces of information about seightseeing tours in Paris. These pieces are used to describes venues belonging to the following types:
- points of interests (POI)
- restaurants
- attractions
- accomodations

Here is an example of a document:
<pre>
{
   "_id" : 83292,
   "contact" : {
      "website" : "http://www.trocaderolatour.com",
      "GooglePlaces" : "https://plus.google.com/107754700607079935569/about?hl=en-US"
   },
   "name" : "Best Western Premier Trocadero La Tour",
   "location" : {
      "city" : "Paris",
      "coord" : {"coordinates" : [2.2795155644417,48.858311118724],"type" : "Point"},
      "address" : "Paris,   France    5 bis, rue Massenet, 16. Trocadéro - Passy, 75016 Paris"
   },
   "category" : "accommodation",
   "description" : " Situé à 15 minutes à pied de la tour Eiffel, le Best Western Premier Trocadero La Tour bénéficie d'un emplacement idéal pour découvrir Paris. Il abrite un bar lambrissé doté de fauteuils en cuir et un patio.",
   "services" : [
      "jardin",
      "terrasse",
      "journaux",
      "bar",
      "petit-déjeuner en chambre",
      "réception ouverte 24h 24",
      "enregistrement et règlement rapides",
      "bagagerie",
      "service d'étage",
      "salles de réunions banquets",
      "centre d'affaires",
      "garde d'enfants",
      "blanchisserie",
      "chambres non-fumeurs"
   ],
   "reviews" : [
      {
          "wordsCount" : 30,
          "rating" : 0,
          "language" : "en",
          "source" : "Foursquare",
          "text" : "Nice beds, rooms andstaff. Perfect central location. Breakfast is very expensive for a contenintal breakfast, however many bakeries and restaurants in the area. Will stay here again my next visit.",
          "time" : "2010-09-30"
      }
   ]
}
</pre>

#### Question 1.1 - Setting up the database
- Install a local MongoDB server on your machine, along with a [Robo3T](https://robomongo.org/) MongoDB client.
- Create a database named "tourPedia" containing a collection named "paris".
- Import the content of the `tour-Pedia_paris.json` file into that collection.

NB: For questions 1.2 to 1.5, please use the [Robo3T](https://robomongo.org/) graphical MongoDB client to design and check your queries.


#### Question 1.2 - Filtering and projecting data
- Filter out venues whose type is "accomodation" and service "blanchisserie" (laundry).
- Project addresses of venues whose type is accomodation.

#### Question 1.3 - Constrained filtering
- Filter out lists of reviews about venues for which there is at least one English review whose score is greater than 3.

#### Question 1.4 - Grouping data
- Group venues by type and count them.

{
    "_id" : "accommodation",
    "sum" : 3376.0
}
{
    "_id" : "attraction",
    "sum" : 4316.0
}
{
    "_id" : "poi",
    "sum" : 26846.0
}
{
    "_id" : "restaurant",
    "sum" : 21823.0
}

#### Question 1.5 - Aggregating data
- For venues of type "accomodation", give the number of venues per "service".

### Exercise 2
For this exercise, we will reuse the data from Exercise 1.

In the following questions (which are similar to Exercise 1), you are required to use [pymongo](https://api.mongodb.com/python/current/api/pymongo/index.html).

#### Question 2.1 - Filtering and projecting data
- Filter out venues whose type is "accomodation" and service "blanchisserie" (laundry).
- Project addresses of venues whose type is accomodation.

Compare your results with those of question 1.2 above.

In [10]:
from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
with client:
	db = client.tourPedia 
	accommodations = db.paris.find({'category': 'accommodation'})
	#for a in accommodations: 
	#	if a['services'] is not None and "blanchisserie" in a['services']: 
	#		print(a['name'])

	type_accomodation = db.paris.find({'category': 'accommodation'}, {'_id': 1, 'name':1})
	for accom in type_accomodation:
		print(accom)


{'_id': 83269, 'name': 'Hôtel Minerve Paris'}
{'_id': 83263, 'name': 'Forest-Hill Villette'}
{'_id': 83280, 'name': 'Hôtel Napoléon'}
{'_id': 83296, 'name': "Hôtel de L'Europe"}
{'_id': 83332, 'name': 'Hôtel Verneuil'}
{'_id': 83369, 'name': 'Hôtel Victoria - 1 boulevard Ornano'}
{'_id': 83303, 'name': 'Hôtel de la Trémoille'}
{'_id': 83286, 'name': 'Murano Urban Resort'}
{'_id': 83905, 'name': 'Waldorf Arc de Triomphe Hôtel & Spa'}
{'_id': 83972, 'name': 'Hotel de France'}
{'_id': 83801, 'name': 'Champs-Elysees Plaza'}
{'_id': 84791, 'name': 'Quality Hotel Opera Saint-Lazare'}
{'_id': 84795, 'name': 'La Villa Maillot & Spa ****'}
{'_id': 85315, 'name': 'Entrée des artistes'}
{'_id': 84963, 'name': 'Holiday Inn Paris - Saint-Germain-des-Prés'}
{'_id': 85516, 'name': 'Best Western Premier Hotel Pergolese'}
{'_id': 85547, 'name': 'Adagio Bercy Aparthotel'}
{'_id': 85341, 'name': 'Hôtel Wilson Opera'}
{'_id': 85575, 'name': 'Hotel Cervantes 3*'}
{'_id': 85563, 'name': 'InterContinental'}


 #### Question 2.2 - Constrained filtering
- Filter out lists of reviews about venues for which there is at least one English review whose score is greater than 3.

Compare your results with those of question 1.3 above.

In [16]:
client = MongoClient('mongodb://localhost:27017/')
with client:
	db = client.tourPedia 
	reviews = db.paris.find({"reviews": {"$elemMatch": {"language": "en","rating": {"$gt": 3}}}})
	for r in reviews:
		print(r['name'])

Le Congrès Maillot
Le Murat
Publicis Drugstore Brasserie
Starbucks Opera
Café des Deux Moulins
Starbucks
Starbucks
Starbucks
Starbucks
Bar du Marché
Royal-Jussieu
Les P'tites indécises
Favela Chic
Eric Kayser
Loulou' Friendly diner
Antico Caffe della Pace
Café Pinson
Café Marly
Auberge Dab
Pizza Pino
L'Européen
Restaurant le Laumière
Le Cardinal
Le Petit Villiers
Le Basile
Angelina
Pause Café
Café Français
Aux Trois Obus
Hard Rock Cafe Paris
Au Passage
Au Boeuf Couronné
L'As Du Fallafel
Le Coutume café
La Terrasse Mirabeau
AntiCafé
Le Meurice Restaurant
Factory & Co
The Long Red Bar
Corcoran's
L'Atlantique
Corcoran's Irish Pub
Les Editeurs
Dédé la Frite
Mariage Frères Salon de Thé
Les Deux Magots
Berthillon
Frog & Princess
Emporio Armani Caffé
Café de Flore
Chez Prune
Le Barrio Latino
Café Beaubourg
Au bon coin
Le Fumoir
Le P'tit Troquet
Terminus Nord
La Closerie des Lilas
Haagen-Dazs
Senderens
Le Grand B
Le Pick-Clops
Bread and Roses
Café de la Paix
The Bombardier
Café de L'Alma
Le Re

#### Question 2.3 - Grouping data
- Group venues by type and count them.

Compare your results with those of question 1.4 above.

In [20]:
client = MongoClient('mongodb://localhost:27017/')
with client:
	db = client.tourPedia 
	agr = [ {'$group': {"_id" : "$category", "sum" : {'$sum' : 1}}} ]
	val = list(db.paris.aggregate(agr))
for v in val:
	print(v)

{'_id': 'accommodation', 'sum': 3376}
{'_id': 'restaurant', 'sum': 21823}
{'_id': 'poi', 'sum': 26846}
{'_id': 'attraction', 'sum': 4316}


#### Question 2.4 - Aggregating data
- For venues of type "accomodation", give the number of venues per "service".

Compare your results with those of question 1.5 above.

In [21]:
client = MongoClient('mongodb://localhost:27017/')
with client:
	db = client.tourPedia 
	agr = [{'$match' : {"category" : "accommodation"}}, {'$unwind' : "$services"}, {'$group' : {"_id" : "$services", "sum" : {'$sum' : 1}}}]
	val = list(db.paris.aggregate(agr))
for v in val:
	print(v)
	

{'_id': ' slovaque', 'sum': 2}
{'_id': 'roumain', 'sum': 10}
{'_id': 'bibliothèque', 'sum': 76}
{'_id': 'bain turc à vapeur', 'sum': 43}
{'_id': 'letton', 'sum': 1}
{'_id': "équipe d'animation", 'sum': 2}
{'_id': 'restaurant (à la carte)', 'sum': 69}
{'_id': 'club pour enfants', 'sum': 1}
{'_id': 'ascenseur', 'sum': 1280}
{'_id': ' italien', 'sum': 145}
{'_id': 'turc', 'sum': 2}
{'_id': 'hébreu', 'sum': 12}
{'_id': 'slovène', 'sum': 1}
{'_id': 'enregistrement départ privé', 'sum': 24}
{'_id': 'centre de remise en forme', 'sum': 85}
{'_id': 'randonnée pédestre', 'sum': 4}
{'_id': ' portugais', 'sum': 57}
{'_id': 'presse à pantalons', 'sum': 23}
{'_id': 'distributeur automatique de billets sur place', 'sum': 8}
{'_id': ' cadeaux', 'sum': 38}
{'_id': 'néerlandais', 'sum': 16}
{'_id': 'salon de coiffure institut de beauté', 'sum': 20}
{'_id': 'installations pour barbecue', 'sum': 2}
{'_id': 'livraison de courses', 'sum': 1}
{'_id': ' thaïlandais', 'sum': 1}
{'_id': 'piscine intérieure', 's