<a href="https://colab.research.google.com/github/organisciak/Scripting-Course/blob/master/labs/10-aggregations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 10 Lab: Aggregations

*Reminder - save your work. Go to* File > Save a Copy in Drive *to ensure that you have your work saved.*

This week's questions are focused on aggregations in MongoDB. No other review: but keep practicing your SQL or Pandas in the context of your projects!

For references, here are the most notable *stages* of the pipeline:

- **\$match**: Select a subset of data (as you can do with 'find')
- **\$sort**: Order data by the values of a certain key
- **\$group**: Group data based on a key - like 'groupby' in Pandas
- **\$limit**: Trim the number of documents in the dataset
- **\$unwind**: Deconstruct an array, so that there is a document for every value of the array

## Dataset 1: Foursquare venues

We'll be working with a dataset of 50 venues near the University of Denver. The file is [close_venues.json](https://raw.githubusercontent.com/organisciak/Scripting-Course/master/data/close_venues.json). Dr. O has already put it into MongoDB in a database called `week10` and a collection named `foursquare`.

In [4]:
#@title Connect to a MongoDB database
#@markdown This cell connects to a remote MongoDB instance.
#@markdown The database is set to a variable `db` - write the
#@markdown code to set a `collection` variable that points to the 
#@markdown collection named "week10"
!pip install dnspython
from urllib.parse import quote_plus
from pymongo import MongoClient
import os
from getpass import getpass

if os.path.exists('credentials.txt'):
    # Allow loading credentials for user, pw, url, one per line
    with open('credentials.txt', mode='r') as f:
        user, mongopw, cluster_url = [l.strip() for l in f.readlines()]
else:
    user = "scriptingStudent" #@param {type:"string"}
    cluster_url = "cluster0-ga5s0.mongodb.net" #@param {type:"string"}
    mongopw = getpass('Enter your MongoDB password for "{}":\n'.format(user))
    with open('credentials.txt', mode='w') as f:
        f.write("{}\n{}\n{}".format(user, mongopw, cluster_url))

client = MongoClient("mongodb+srv://{}:{}@{}/test?retryWrites=true&w=majority".format(quote_plus(user), quote_plus(mongopw), cluster_url))
db = client.week10



In [10]:
collection.count()

  """Entry point for launching an IPython kernel.


50

**Q1)** What's the venue with a cross-street of 'E Iliff Ave'? Remember that nested object keys can be referred to using `.`, like `key.subkey`. Paste the exact name string for auto-grading (e.g. if it says 'name': 'XYZ', enter *XYZ* as the answer)  (*6pts*)

In [0]:
q1_answer = "" #@param {type:'string'}

**Q2)** Which of these results has recieved the most 'tips'?  (*5pts*)

*   A) 'Daniels College of Business'
*   B) 'Nelson Hall Dining'
*   C) 'Anderson Academic Commons'



In [0]:
q2_answer = "A) 'Daniels College of Business'" #@param ["A) 'Daniels College of Business'", "B) 'Nelson Hall Dining'", "C) 'Anderson Academic Commons'"]
variable_name = "A) 'Daniels College of Business'"

**Q3)** Write an aggregation pipeline to answer the above question. (*8pts*)

In [0]:
# Answer-Q3 (Write your answer here)


**Q4)** How much do the following categories show up?  (*5pts*)

*   'College Academic Building'
*   'Bookstore'
*   'College Library'



In [0]:
College Academic Building = "" #@param {type:"string"}

In [0]:
Bookstore = "" #@param {type:"string"}

In [0]:
College Library = "" #@param {type:"string"}

**Q5)** Write an aggregation pipeline to answer the above question.  (*8pts*)

In [0]:
# Answer-Q5 (Write your answer here)


**Q6)** What's the average distance for each of these categories, rounding down to the nearest whole number:  (*8pts*)
  - Bagel Shop
  - College Administrative Building
  - College Academic Building

In [0]:
Bagel Shop = "" #@param {type:"string"}

In [0]:
College Administration Building = "" #@param {type:"string"}

In [0]:
College Academic Building = "" #@param {type:"string"}

## Dataset 2: Recommended Food in Denver

Our second dataset has 50 recommended food venues and is in a collection called `popular`. The original data is [popular_venues.json](https://raw.githubusercontent.com/organisciak/Scripting-Course/master/data/popular_venues.json).

In [0]:
with open('popular_venues.json') as f:
    data2 = json.load(f)

In [0]:
collection.count()

50

**Q7)** In this dataset of popular food places, which of the following categories is best represented?  (*7pts*)

*   A) 'Breakfast Spot'
*   B) 'Sushi Restaurant'
*   C) Burger Joint'

In [0]:
q7_answer = "A) 'Breakfast Spot'" #@param ["A) 'Breakfast Spot'", "B) 'Sushi Restaurant'", "C) 'Burger Joint'"]
variable_name = "A) 'Daniels College of Business'"

**Q8** Which restaurant has the highest rating? Paste the exact 'name' string.  (*5pts*)

In [0]:
q8_answer = "" #@param {type:"string"}

**Q9)** Which restaurant has the most categories? Paste the name string.  (*5pts*)

In [0]:
q9_answer = "" #@param {type:"string"}

**Q10)** Write the aggregation pipeline to get the above answer.  (*8pts*)

In [0]:
# Answer-Q10 (Write your answer here)


**Q11)** Which restaurant has the most tips in this dataset? (Folding tipCounts from restaurants with multiple locations).  (*7pts*)

In [0]:
q11_answer = "" #@param {type:"string"}

**Q12)** Which 'tip' has the most likes?  (*7pts*)

*   A) "Get the Cricket Burger..."
*   B) "Voted Best Overall Wine List..."
*   C) "The Cinco Burger combines the best of..."

In [0]:
q12_answer = "A) \"Get the Cricket Burger...\"" #@param ["A) \"Get the Cricket Burger...\"", "B)Voted Best Overall Wine List\"", "C) \"The Cinco Burger combines the best of...\""]
variable_name = "A) 'Daniels College of Business'"

**Q13)** What's the Male/Female gender split among users providing tips?  (*6pts*)

*  A) 50/50
*  B) 33/66
*  C) 66/33

In [0]:
q13_answer = "A) 50/50" #@param ["A) 50/50", "B) 33/66", "C) 66/33"]
variable_name = "A) 'Daniels College of Business'"

**Q14)** Write the code to determine which category of restaurant has the fewest average checkins, focusing only on categories with 3 or more restaurants in this dataset. Tip: the answer to the question is 'Café', with an average of 1258.7 checkins, followed by 'Pizza Place'. (*15pts*)

In [0]:
# Answer-Q14 (Write your answer here)


# Submission Instructions

In [0]:
#@markdown ### First, Enter your name for grading
my_name = "" #@param { type:'string' }

#@markdown _Have you saved your work for yourself? Don't forget to Save a Copy in Drive so that you have your progress._

In [0]:
#@markdown ### Second, check your work:

#@markdown - have you answered all the questions?
#@markdown     - Some answers can be checked automatically - just run this cell.
#@markdown - Does this notebook run from top to bottom?
#@markdown     - Go to "Runtime > Restart and run all..." to check. Do all the cells run, to the very bottom, or is there a cell in the middle with an error?
#@markdown - Have you completed all the answers where you entered code, keeping the `# Answer-Qx` line at the start of those cells?

#@markdown *A lab that the professor has to fix manually will lose 10pts - run the checks!*

lab1 = dict(
    q1 = dict(entrytype='cell', pts=5),
    q2 = dict(entrytype='var', pts=5),
    q3 = dict(entrytype='cell', pts=5),
    q4a = dict(entrytype='var', pts=5),
    q4b = dict(entrytype='var', pts=5),
    q4c = dict(entrytype='var', pts=10),
    q5 = dict(entrytype='var', pts=5),
    q6 = dict(entrytype='cell', pts=5),
    q7 = dict(entrytype='cell', pts=5),
    q8a = dict(entrytype='var', pts=4),
    q8b = dict(entrytype='var', pts=4),
    q8c = dict(entrytype='var', pts=4),
    q9a = dict(entrytype='var', pts=4),
    q9b = dict(entrytype='var', pts=4),
    q10 = dict(entrytype='var', pts=8),
    q11 = dict(entrytype='var', pts=7),
    q12 = dict(entrytype='var', pts=5),
    q13 = dict(entrytype='cell', pts=10),
)
var_qs = [k for k,v in lab1.items() if v['entrytype'] == 'var']
for q in var_qs:
  qvar = '{}_answer'.format(q) 
assert(qvar) in globals(), "I don't see '{}' - did you run the cell where you ran it?".format(qvar)
answer = globals()[qvar]
assert answer != "", "{} is blank - did you mean to do that?".format(qvar)
print("Good work. The ones that I can pre-check look correct.")

#@markdown ### Finally, submit it

#@markdown - Download the file with "File > Download .ipynb" and submit it to the Canvas assignment page.