# Module 5 Activities

Completing the activities here will allow you to demonstrate your ability to:

-   Leverage `map`, `filter`, and `reduce` functions in Python to simplify the data processing techniques.
-   Implement custom error handling for Python applications, including exception handling.
-   Leverage a variety of concepts to implement advanced text analysis processes in Python.

## Instructions

For each activity:

-   Use lambda, `map`, `reduce`, and `filter` functions as appropriate to solve these questions rather than traditional looping and conditional statements.

-   You may apply any appropriate transformation on the data before applying lambda, `map`, `reduce`, and `filter` functions.

-   You may use any Python data structures.

-   Include appropriate exception handling for exceptions related to conversion errors or I/O errors.

## Activity 1

For this activity, download and use the file [prize.json](https://the-software-guild.s3.amazonaws.com/techstart-1909/data-files/prize.json).

This dataset includes information about Nobel Prizes, including the laureates who received each prize, when the prize was awarded, and the category of the prize.

1.  Identify the most recent year in the dataset where someone received a Nobel prize.
1.  Identify the earliest year when someone received a Nobel prize.
1.  Identify the category with the highest number of prizes.
1.  Identify the laureate with the highest number of prizes.
1.  Identify the laureate that won the most recent prize in peace.
1.  Identify the laureate that won the most recent prize in medicine.
1.  Identify the year when the most laureates jointly won the same prize in the same year.
1.  How many prizes have been given in the economics category?
1.  How many prizes have been given in peace?
1.  How many prizes have been given in literature?

In [3]:
# your code here
import json
from pprint import pprint 

file_path = 'FileIO-DataFiles/prizes.json'

try:
    with open(file_path,'r') as jsonfile: 
        data = json.load(jsonfile)  
except (TypeError, ValueError) as e:
    print("The file could not be decoded")
    file_path = input("Enter a valid file path:\n")
except FileNotFoundError:
    print("The JSON file entered was not found! Please try again!")
    file_path = input("Enter a valid file path:\n")
else:
    print("That was a valid JSON file!")
    
_min = 100000
_max = 0

for k in data['prizes']:
    for key, val in k.items():
        if key == "year":
            value = int(val)
            if value < _min:
                _min = value
            elif value > _max:
                _max = value
        else:
            pass
        
prizes = data['prizes']
new_dict = dict()

for prize in prizes:
    for k, v in prize.items():
        if k == "category" and v not in new_dict:
            new_dict[v] = 1
        elif k == "category" and v in new_dict:
            val = new_dict[v]
            val += 1
            new_dict[v] = val
        else:
            pass
        
new_dict2 = dict()

for prize in prizes:
    for k, v in prize.items():
        if k == "laureates":
            for d in v:
                # pprint(d)
                for key, values in d.items():
                    if key == "id" and values not in new_dict2:
                        new_dict2[values] = 1
                        
                    elif key == "id" and values in new_dict2:
                        new_value = new_dict2[values]
                        new_value += 1
                        new_dict2[values] = new_value
                    else:
                        pass
                    
max_4 = 0
keys_id = None
                    
for k, v in new_dict2.items():
    value = int(v)
    if value > max_4:
        max_4 = value
        keys_id = k

key_name = ""
for prize in prizes:
    for k, v in prize.items():
        if k == "laureates":
            for d in v:
                # pprint(d)
                for key, values in d.items():
                    if key == "id" and values == keys_id:
                        key_name = d['firstname']
        
list_of_2017 = []
        
for k in data['prizes']:
    for key, val in k.items():
        if key == "year" and val == str(_max):
            list_of_2017.append(k)
           
list_of_peace = []

for k in list_of_2017:
    for key, val in k.items():
        if key == "category" and val == "peace":
            list_of_peace.append(k)


list_of_medicine = []

for k in list_of_2017:
    for key, val in k.items():
        if key == "category" and val == "medicine":
            list_of_medicine.append(k)

year_dic = dict()
new_dict3 = dict()

def find_number_laureates(prize):
    for key, value in prize.items():
        if key == "laureates":
            return len(value)

for prize in prizes:
    for k, v in prize.items():
        if k == "year":
            if v not in new_dict3:
                new_dict3[v] = find_number_laureates(prize)
            else:
                new_dict3[v] += find_number_laureates(prize)

max_7 = 0
_year = None

for k, v in new_dict3.items():
    if v > max_7:
        max_7 = v
        _year = k

for k, v in new_dict.items():
    if k == "economics":
        ecoK = k
        ecoV = v
    elif k == "peace":
        pK = k
        pV = v
    elif k == "literature":
        litK = k
        litV = v
    else:
        pass
        
# Identify the laureate that won the most recent prize in peace.
# Identify the laureate that won the most recent prize in medicine.
# Identify the year when the most laureates jointly won the same prize in the same year.        

print("The most recent year was", _max, "and the earliest year was", _min)
print("The most common category of Nobel prize is", max(new_dict), "with", new_dict[max(new_dict)], "occurences.")
print("The person who won the most is", key_name, "with", max_4, "occurrences.")
print("The most recent nobel peace prize winner was", list_of_peace)
print("The most recent nobel medicine prize winner was", list_of_medicine)
print("The most jointly won laureates within the years was", max_7, "in the year", _year)
print("The amount of prizes in the", ecoK, "category was", ecoV, ".")
print("The amount of prizes in the", pK, "category was", pV, ".")
print("The amount of prizes in the", litK, "category was", litV, ".")

That was a valid JSON file!
The most recent year was 2017 and the earliest year was 1901
The most common category of Nobel prize is physics with 111 occurences.
The person who won the most is Comité international de la Croix Rouge (International Committee of the Red Cross) with 3 occurrences.
The most recent nobel peace prize winner was [{'year': '2017', 'category': 'peace', 'laureates': [{'id': '948', 'firstname': 'International Campaign to Abolish Nuclear Weapons (ICAN)', 'motivation': '"for its work to draw attention to the catastrophic humanitarian consequences of any use of nuclear weapons and for its ground-breaking efforts to achieve a treaty-based prohibition of such weapons"', 'share': '1', 'surname': ''}]}]
The most recent nobel medicine prize winner was [{'year': '2017', 'category': 'medicine', 'laureates': [{'id': '938', 'firstname': 'Jeffrey C.', 'surname': 'Hall', 'motivation': '"for their discoveries of molecular mechanisms controlling the circadian rhythm"', 'share': '3

## Activity 2

For this exercise, use the *evolutionPopUSA\_MainData.csv* file, which you can find on [Figshare](https://figshare.com/articles/Main_Dataset_for_Evolution_of_Popular_Music_USA_1960_2010_/1309953).

Take some time to review the information about the dataset on that page before completing this activity.

For the following questions, you must use lambda, `map`, `reduce`, and `filter` functions as appropriate to solve these questions. You may apply any transformation on the data before applying the functions. You can use any Python data structure.

1.  Identify the artist with most the most songs in the Pop USA playlist.

2.  Identify the cluster with the most songs.

3.  Identify the genres within each cluster.

4.  Identify the era with the most songs.

In [None]:
# your code here

## Activity 3

Create a program to find the following information using the [restaurant.json](https://the-software-guild.s3.amazonaws.com/techstart-1909/data-files/restaurant.json) file.

You must use lambda, `map`, `reduce`, and `filter` functions as appropriate to solve the following questions. You may apply any transformation on the data before applying the functions. You can use any Python data structure.

1.  Compute the average score for each restaurant.

2.  Compute the minimum score for each restaurant.

3.  Compute the maximum score for each restaurant.

4.  Compute the average score for each type of cuisine in each borough.

5.  Compute the minimum score for each type of cuisine in each borough.

6.  Compute the maximum score for each type of cuisine in each borough.

### Challenge

Identify at least three other values that you can use to calculate average, minimum, and maximum values and find those values.

In [None]:
# your code here

## Activity 4

Download this file to use for this activity: [stocks.zip](https://the-software-guild.s3.amazonaws.com/techstart-1909/data-files/stocks.zip).

-   You will need to extract or uncompress the zip file before using the data.

-   The zipped file contains many text files, each of which represents
    the stocks for one particular company. You should treat this
    collection as a single dataset rather than as individual files when
    solving the problems listed below.

Write a program to retrieve data to find the following information:

1. Return a list of all company indexes that appear at least once in
   the dataset, sorted alphabetically by index value.

2. Return the number of publicly traded companies for each year in the
   dataset.

3. Return a list of indexes of companies that were publicly traded in each
   year in the dataset, grouped by year.

   - For example, the output should look something like this:

     <pre>[ "Year 1":[index1,index2,index3,….],
     "Year 2": [index1,index2,index3,….],
     …]</pre>

4. Choose a company index value and return all historical pricing data for
   the company.

5. Choose a company index value and year and return the pricing data for
   the company for that particular year.

6. Choose a company index value, a year, and a month, and return all
   pricing data for the company for that particular year and month.

7. Choose a year and return the list of companies that were publicly
   traded during that year. The results should include all companies
   that traded for at least one day during the given year.


In [None]:
# your code here