<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Project:" data-toc-modified-id="Project:-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Project:</a></span></li><li><span><a href="#Computational-details:" data-toc-modified-id="Computational-details:-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Computational details:</a></span></li><li><span><a href="#Database:-Nobel-Prize-API" data-toc-modified-id="Database:-Nobel-Prize-API-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Database: Nobel Prize API</a></span></li><li><span><a href="#Results:" data-toc-modified-id="Results:-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Results:</a></span><ul class="toc-item"><li><span><a href="#Missing-nobel-prizes" data-toc-modified-id="Missing-nobel-prizes-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Missing nobel prizes</a></span></li><li><span><a href="#Winner-of-nobel-in-Physics:" data-toc-modified-id="Winner-of-nobel-in-Physics:-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Winner of nobel in Physics:</a></span></li></ul></li></ul></div>

# Project:
List all the years and nobel prizes that were missing in that year with the below format. Compare each year with original one in 1901. 

In the first method we use the groupby python package and in the seccond one we use the grouping of MongoDb.
In both cases we construct an aggregation pipeline to collect, in reverse chronological order (i.e., descending year), prize documents for all original categories (that is, $in categories awarded in 1901). Project only the prize year and category (including document _id is fine).


1934: physics

1933: chemistry

1932: peace

1931: physics

1928: peace

1925: medicine

1924: chemistry, peace


In the sccond project please find the list of winners in physics such as:

2013: Englert and Higgs

2012: Haroche and Wineland

2011: Perlmutter and Riess and Schmidt

2010: Geim and Novoselov

2009: Boyle and Kao and Smith

2008: Kobayashi and Maskawa and Nambu

2007: Fert and Grünberg



# Computational details:
The aggregation cursor will be fed to Python's itertools.groupby function to group prizes by year. For each year that at least one of the original prize categories was missing, a line with all missing categories for that year will be printed.


# Database: Nobel Prize API
Nobelprize.org offers open data to developers in two ways: An API and as Linked Data. The data is free to use and contains information about who has been awarded the Nobel Prize, when, in what prize category and the motivation, as well as basic information about the Nobel Laureates such as birth data and the affiliation at the time of the award. The data is regularly updated as the information on Nobelprize.org is updated, including at the time of announcements of new Laureates.

https://nobelprize.readme.io/docs/getting-started




# Results:
## Missing nobel prizes

In [59]:
import pymongo
from pymongo import MongoClient
client = MongoClient('localhost',27017)
client.nobeldatacamp.list_collection_names()

['prizes', 'laureates']

In [67]:
#  Check how many documents are in each collection
n_prizes = client.nobeldatacamp.prizes.count_documents({})
n_laureates = client.nobeldatacamp.laureates.count_documents({})

# Print the document
print("number of documents in prize:", n_prizes)
print("number of documents in laureates:", n_laureates)


db = client.nobeldatacamp

number of documents in prize: 590
number of documents in laureates: 934


In [74]:
import requests

# Get the list of fields present in each type of document
prize_fields = list(prize.keys())
laureate_fields = list(laureate.keys())

print(prize_fields)
print(laureate_fields)

['_id', 'year', 'category', 'overallMotivation', 'laureates']
['_id', 'id', 'firstname', 'surname', 'born', 'died', 'bornCountry', 'bornCountryCode', 'bornCity', 'diedCountry', 'diedCountryCode', 'gender', 'prizes']


In [73]:
# Retrieve sample prize and laureate documents
db.prizes.find_one()

{'_id': ObjectId('5edf82888b17802c1e585f46'),
 'year': '2018',
 'category': 'physics',
 'overallMotivation': '“for groundbreaking inventions in the field of laser physics”',
 'laureates': [{'id': '960',
   'firstname': 'Arthur',
   'surname': 'Ashkin',
   'motivation': '"for the optical tweezers and their application to biological systems"',
   'share': '2'},
  {'id': '961',
   'firstname': 'Gérard',
   'surname': 'Mourou',
   'motivation': '"for their method of generating high-intensity, ultra-short optical pulses"',
   'share': '4'},
  {'id': '962',
   'firstname': 'Donna',
   'surname': 'Strickland',
   'motivation': '"for their method of generating high-intensity, ultra-short optical pulses"',
   'share': '4'}]}

In [90]:
from collections import OrderedDict
from itertools import groupby
from operator import itemgetter

original_categories = set(db.prizes.distinct("category", {"year": "1901"}))
print("original 1901", original_categories)

# Save an pipeline to collect original-category prizes
pipeline = [
    {"$match": {"category": {"$in": list(original_categories)}}},
    {"$project": {"category": 1, "year": 1}},
    {"$sort": OrderedDict([("year", -1)])}
]
docs = db.prizes.aggregate(pipeline)

for key, group in groupby(docs, key=itemgetter("year")):
    missing = original_categories - {doc["category"] for doc in group}
    if missing:
        print("{year}: {missing}".format(year=key, missing=", ".join(sorted(missing))))

original 1901 {'literature', 'medicine', 'peace', 'chemistry', 'physics'}
2018: literature
1972: peace
1967: peace
1966: peace
1956: peace
1955: peace
1948: peace
1943: literature, peace
1939: peace
1935: literature
1934: physics
1933: chemistry
1932: peace
1931: physics
1928: peace
1925: medicine
1924: chemistry, peace
1923: peace
1921: medicine
1919: chemistry
1918: literature, medicine, peace
1917: chemistry, medicine
1916: chemistry, medicine, peace, physics
1915: medicine, peace
1914: literature, peace


In [102]:
import pymongo
from pymongo import MongoClient
from collections import OrderedDict

original_categories = sorted(set(db.prizes.distinct("category", {"year": "1901"})))
print("original 1901", original_categories)


pipeline = [
    {"$match": {"category": {"$in": original_categories}}},
    {"$project": {"category": 1, "year": 1}},
    
    # Collect the set of category values for each prize year.
    {"$group": {"_id": "$year", "categories": {"$addToSet": "$category"}}},
    
    # Project categories *not* awarded (i.e., that are missing this year).
    {"$project": {"missing": {"$setDifference": [original_categories, "$categories"]}}},
    
    # Only include years with at least one missing category
    {"$match": {"missing.0": {"$exists": True}}},
    
    # Sort in reverse chronological order. Note that "_id" is a distinct year at this stage.
    {"$sort": OrderedDict([("_id", -1)])},
]

docs = db.prizes.aggregate(pipeline)

for doc in docs:
    print("{year}: {missing}".format(year=doc["_id"],missing=", ".join(sorted(doc["missing"]))))

original 1901 ['chemistry', 'literature', 'medicine', 'peace', 'physics']
2018: literature
1972: peace
1967: peace
1966: peace
1956: peace
1955: peace
1948: peace
1943: literature, peace
1939: peace
1935: literature
1934: physics
1933: chemistry
1932: peace
1931: physics
1928: peace
1925: medicine
1924: chemistry, peace
1923: peace
1921: medicine
1919: chemistry
1918: literature, medicine, peace
1917: chemistry, medicine
1916: chemistry, medicine, peace, physics
1915: medicine, peace
1914: literature, peace


## Winner of nobel in Physics:
Sorting together: MongoDB + Python
In this exercise we'll explore the prizes in the physics category. We have two sort in this excercise,  winners should be sorted base on the year and if in one year we have many winner they should be sorted by alphabet. We will use Python to sort laureates for one prize by last name, and then MongoDB to sort prizes by year.  On document of pryze s also  showed.

In [76]:
from operator import itemgetter

def all_laureates(prize):  
  # sort the laureates by surname
  sorted_laureates = sorted(prize["laureates"], key=itemgetter("surname"))
  
  # extract surnames
  surnames = [laureate["surname"] for laureate in sorted_laureates]
  
  # concatenate surnames separated with " and " 
  all_names = " and ".join(surnames)
  
  return all_names

# test the first document for the function
print(all_laureates(db.prizes.find_one()))

Ashkin and Mourou and Strickland


In [79]:
# find physics prizes, project year and name, and sort by year
docs = db.prizes.find(
           filter= {"category": "physics"}, 
           projection= ["year", "laureates.surname"], 
           sort= [("year", -1)],
           limit=20,
           skip= 5
)


# print the year and laureate names (from all_laureates)
for doc in docs:
  print("{year}: {names}".format(year=doc["year"], names=all_laureates(doc)))

2013: Englert and Higgs
2012: Haroche and Wineland
2011: Perlmutter and Riess and Schmidt
2010: Geim and Novoselov
2009: Boyle and Kao and Smith
2008: Kobayashi and Maskawa and Nambu
2007: Fert and Grünberg
2006: Mather and Smoot
2005: Glauber and Hall and Hänsch
2004: Gross and Politzer and Wilczek
2003: Abrikosov and Ginzburg and Leggett
2002: Davis Jr. and Giacconi and Koshiba
2001: Cornell and Ketterle and Wieman
2000: Alferov and Kilby and Kroemer
1999: 't Hooft and Veltman
1998: Laughlin and Störmer and Tsui
1997: Chu and Cohen-Tannoudji and Phillips
1996: Lee and Osheroff and Richardson
1995: Perl and Reines
1994: Brockhouse and Shull
