In [2]:
from cs103 import *

# Tutorial Solution - Analysis Programs

## Pre-Tutorial Work

None this week.

## Overview

You have a file containing information about newspapers' advertising revenue and circulation revenue.

Take a look at the included file called `newspaper_advertising_and_circulation.csv` to see how it is structured. Two files containing small subsets of this information have also been  provided for testing purposes (`newspaper_advertising_and_circulation_test1.csv` and `newspaper_advertising_and_circulation_test2.csv`). You can find the original information [here](http://www.journalism.org/fact-sheet/newspapers/) (Note that the information for years 2013-2020 is estimated, not measured).

Now that you have looked at the file, we'll complete the planning steps of the HtDAP recipe. 

#### Step 1a
The file contains information about newspapers' circulation and advertising revenue between 1956 and 2020. The information for 2013-2020 is estimated, not measured. For each year, the advertising revenue and circulation revenue are given in thousands of US dollars. The circulation revenue is missing for 1990.

#### Step 1b

Now, here are some ideas of what a program operating on this information might produce.

We might find the year that had the biggest ratio of advertising revenue to circulation revenue.

We might find the highest circulation revenue.

We might plot the circulation revenue and advertising revenue over time.

We might find the year that had the highest advertising revenue.

We might find the change in advertising revenue for each year in comparison to the previous year as a percentage.


#### We are going to find the year that had the highest advertising revenue.

#### Step 1c
Here's an  example that shows the kind of output we expect from this program:
```python
expect(main('newspaper_advertising_and_circulation.csv'), 2006)
```


## Problem 1

Now it is time to start building the program. Using the planning steps completed above, determine the information that you will need to represent in your program as data. 

You must clearly state which pieces of information you will choose to repesent.

*Note: We recommend that you only store information for years that have complete information. i.e. if the advertising revenue or circulation revenue is missing, ignore that row.*

Then complete the design of data definition(s) to represent that information. 

In [3]:
# I only need the year and advertising revenue for each row
from typing import  NamedTuple, List

NewspaperRevenue = NamedTuple('NewspaperRevenue', [('year', int),     # in range [0, ...)
                                                   ('ad_rev', int)])  # in range [0, ...)  
# interp. a the advertising revenue ('ad_rev') of a newspaper for the given year. 
NR1956 = NewspaperRevenue(1956, 3223000)
NR1957 = NewspaperRevenue(1957, 3268000)
NR1991 = NewspaperRevenue(1991, 30349000)

# template based on compound (3 fields)
def fn_for_newspaper_revenue(nr: NewspaperRevenue) -> ...:
    return ...(nr.year,
               nr.ad_rev)

# List[NewspaperRevenue]
# interp. a list of information about newspapers' revenues
L0 = []
L1 = [NR1956, NR1957]
L2 = [NR1991]

# Template based on arbitrary-sized and the reference rule
def fn_for_lonr(lonr: List[NewspaperRevenue]) -> ...:
    # description of the acc
    acc = ...    # type: ...
    for nr in lonr:
        acc = ...(fn_for_newspaper_revenue(nr), acc)
    return ...(acc)

## Problem 2a

Once you have your data definition(s) from Problem 1, design a function that reads
the information from the file and stores it as data in your program. 

*Remember: we recommend that you only store information for years that have complete information. i.e. if the advertising revenue or circulation revenue is missing, ignore that row.*

You should begin by copying the template from the HtDAP page, then complete the 
design of the `main` and `read` functions. When testing your functions, you may use the testing files called `newspaper_advertising_and_circulation_test1.csv` and `newspaper_advertising_and_circulation_test2.csv`.

In [4]:
import csv
from typing import Optional

def main(fn: str) -> int:
    """
    Reads the file from given filename and returns the year that has the highest advertising revenue
    """
    # template as a function composition
    return highest_ad_rev(read(fn))

def is_valid(rev: Optional[int]) -> bool:
    """
    return True if rev is an int and False otherwise
    """
    # return False # body of the stub
    # template based on optional
    if rev is None:
        return False
    else:
        return True
    
def read(fn: str) -> List[NewspaperRevenue]:
    """    
    Reads the file from given filename and returns a list of the
    newspaper revenues
    """
    #return []   #stub
    #template from HtDAP
    
    # lonr contains the result so far
    lonr = []   # type: List[NewspaperRevenue]
    with open(fn) as csvfile:
        reader = csv.reader(csvfile, delimiter=',')
        next(reader) # skip header line
        
        for row in reader:     
            year = parse_int(row[0])
            ad_revenue = parse_int(row[1])
            circ_revenue =  parse_int(row[2])
            # based on the suggestion, we are only storing rows that had both ad revenue and
            # circulation revenue in the .csv file. Since we are only using ad revenue in
            # our compound, you could have just checked that the ad revenue wasn't empty
            if is_valid(ad_revenue) and is_valid(circ_revenue): 
                nr = NewspaperRevenue(year,  
                                      ad_revenue)  
                lonr.append(nr)
    return lonr

def highest_ad_rev(lonr: List[NewspaperRevenue]) -> int:
    """
    return the year from lonr that had the highest advertising revenue
    
    ASSUME: lonr is not empty
    """
    #return 1  # body of the stub
    # template from List[NewspaperRevenue]
    # description of the acc
    curr_max = lonr[0]    # type: NewspaperRevenue
    for nr in lonr:
        curr_max = higher_ad(nr, curr_max)
    return curr_max.year

def higher_ad(nr1: NewspaperRevenue, nr2: NewspaperRevenue) -> NewspaperRevenue:
    """
    return the NewspaperRevenue with the highest ad revenue
    """
    # return nr1 # body of the stub
    # template based on NewspaperRevenue (all selectors from both inputs)
    if nr1.ad_rev > nr2.ad_rev:
        return nr1
    else:
        return nr2
        
start_testing()

# examples and tests for main
expect(main('newspaper_advertising_and_circulation.csv'), 2005)
expect(main('newspaper_advertising_and_circulation_test2.csv'), 1957)

# examples and tests for is_valid
expect(is_valid(899813), True)
expect(is_valid(None), False)

# examples and tests for read
expect(read('newspaper_advertising_and_circulation_test1.csv'), [NR1991])
expect(read('newspaper_advertising_and_circulation_test2.csv'), [NR1956, NR1957])
expect(len(read('newspaper_advertising_and_circulation.csv')), 64)

# examples and tests for highest_ad_rev
expect(highest_ad_rev([NR1991]), 1991)
expect(highest_ad_rev([NR1956, NR1957]), 1957)
expect(highest_ad_rev(read('newspaper_advertising_and_circulation.csv')), 2005)

# examples and tests for higher_ad
expect(higher_ad(NR1956, NR1957), NR1957)
expect(higher_ad(NR1957, NR1991), NR1991)

summary()

[92m12 of 12 tests passed[0m


## Problem 2b

To finish your program, complete the design of the analysis function(s). For this particular problem, we would like your program to find the year that had the highest advertising revenue.

Think about your data definitions and the helper rules to determine how many helper functions you will need to write when designing this function. 

In [14]:
# RETURN to the cell above to complete your design of the analysis functions.
# Do not design them here.