In [1]:
from cs103 import *
import csv

# Tutorial Starter - Analysis Programs

## Pre-Tutorial Work

None this week

## Overview

You have a file containing information about newspapers' advertising revenue and circulation revenue.

Take a look at the included file called `newspaper_advertising_and_circulation.csv` to see how it is structured. Two files containing small subsets of this information have also been  provided for testing purposes (`newspaper_advertising_and_circulation_test1.csv` and `newspaper_advertising_and_circulation_test2.csv`). You can find the original information [here](http://www.journalism.org/fact-sheet/newspapers/) (Note that the information for years 2013-2016 is estimated, not measured).

Now that you have looked at the file, we'll complete the planning steps of the HtDAP recipe. 

#### Step 1a
The file contains information about newspapers' circulation and advertising revenue between 1956 and 2016. The information for 2013-2016 is estimated, not measured. For each year, the advertising revenue and circulation revenue are given in thousands of US dollars. The circulation revenue is missing for 1990.

#### Step 1b

Now, here are some ideas of what a program operating on this information might produce.

We might find the year that had the biggest ratio of advertising revenue to circulation revenue.

We might find the highest circulation revenue.

We might plot the circulation revenue and advertising revenue over time.

We might find the year that had the highest advertising revenue.

We might find the change in advertising revenue for each year in comparison to the previous year as a percentage.

#### Step 1c
Here's an  example that shows the kind of output we expect from this program:
```python
expect(main('newspaper_advertising_and_circulation.csv'), 2006)
```


## Problem 1

Now it is time to start building the program. Using the planning steps completed above, determine the information that you will need to represent in your program as data. 

You must clearly state which pieces of information you will choose to repesent.

*Note: We recommend that you only store information for years that have complete information. i.e. if the advertising revenue or circulation revenue is missing, ignore that row.*

Then complete the design of data definition(s) to represent that information. 

In [2]:
# your solution goes here
from typing import NamedTuple, Optional, List

Newspaper = NamedTuple("Newspaper", [("year", int),
                                    ("adv", int),
                                    ("circ", int)])
#interp. the year, the advtertising (adv), and circulation (circ), of a newspaper

N1 = Newspaper(1956, 3223000, 1344492)
N2 = Newspaper(1961,3601000, 1684319)

@typecheck
def fn_for_newspaper(n: Newspaper) -> ...:
    return ...(n.year,
              n.adv,
              n.circ)

List[Newspaper]
# interp. a list of newspapers

LON1: [N1, N2]
LON2: [N1]
LON: []

@typecheck
def fn_for_lon(lon: List[Newspaper]) -> ...:
    #description of accumulator
    acc = ... #type: ...
    
    for n in lon:
        acc = ...(fn_for_newspaper(n), acc)
    return acc


Data = Optional[str]
# interp. id there is data or no data (none)
                                     
D1 = "1956"
D2 = ""

@typecheck
def fn_for_data(d: Data) -> ...:
    if d is none:
        return ...
    else: 
        return ...(d)

List[Data]
# interp. a list of data

LOD1: [D1, D2]
LOD2: [D1]
LOD3: []

@typecheck
def fn_for_lod(lod: List[Data]) -> ...:
    #description of accumulator
    acc = ... #type: ...
    
    for d in lod:
        acc = ...(d, acc)
    return acc

## Problem 2a

Once you have your data definition(s) from Problem 1, design a function that reads
the information from the file and stores it as data in your program. 

*Remember: we recommend that you only store information for years that have complete information. i.e. if the advertising revenue or circulation revenue is missing, ignore that row.*

You should begin by copying the template from the HtDAP page, then complete the 
design of the `main` and `read` functions. When testing your functions, you may use the testing files called `newspaper_advertising_and_circulation_test1.csv` and `newspaper_advertising_and_circulation_test2.csv`.

In [3]:
# your solution goes here

@typecheck
def main(filename: str) -> int:
    """
    Reads information from given filename and returns the year with the highest advertising revenue
    """
    # # Template from HtDAP, based on function composition
    return year_highest_adv_rev(read(filename))

@typecheck
def read(filename: str) -> List[Newspaper]:
    """
    reads information form the specified file and returns a list of newspapers
    """
    # return LON1 #stub
    #template from HTDAP
    acc = [] #type: List[Newspaper]
    
    with open(filename) as csvfile:
        reader = csv.reader(csvfile)
        next(reader) #skip header line
        
        for row in reader:
            if is_valid(row) == True:     # store information for years that have complete information
                c = Newspaper(parse_int(row[0]), parse_int(row[1]), parse_int(row[2]))
                
                acc.append(c)
                
        return acc
    
            
            
@typecheck
def is_valid(lod: List[Data]) -> bool:
    """
    return True if the data in the list is complete, and return False, if the data in the list is incomplete
    """
    #return False #stub
    # template from Data 
   
    
    for d in lod:
        if d == "":
            return False
    return True

                

        
@typecheck
def year_highest_adv_rev(lon: List[Newspaper]) -> int:
    """
    return the year with the highest advertising revenue
    """
    # return 0 # template
    # template from List[Newspaper] and reference rule
    
    # acc keeps track of the newspaper from the list of newspapers seen so far that has the highest
    # advertising revenue
    acc = lon[0]
    
    for n in lon:
        if n.adv > acc.adv:
            acc = n
    
    return year(acc)

@typecheck
def year(n: Newspaper) -> int:
    """
    return the year of a newspaper
    """
    # return 0 #stub
    # template from Newspaper
    return n.year


start_testing()

#test Main
expect(main("newspaper_advertising_and_circulation_test1.csv"), 1991)
expect(main("newspaper_advertising_and_circulation.csv"), 2005)
expect(main("newspaper_advertising_and_circulation_test2.csv"), 1957)
summary()

# test read
expect(read("newspaper_advertising_and_circulation_test1.csv"), [Newspaper(1991, 30349000, 8697679)])
expect(read("newspaper_advertising_and_circulation_test2.csv"), [Newspaper(1956,3223000,1344492),
                                                                Newspaper(1957,3268000,1373464)])
summary()

# test is_valid
expect(is_valid([D1, D2]), False)
expect(is_valid([D1]), True)
expect(is_valid([D2]), False)

summary()

# test year_highest_adv_rev
expect(year_highest_adv_rev([Newspaper(1991, 30349000, 8697679)]), 1991)
expect(year_highest_adv_rev([Newspaper(1956,3223000,1344492), Newspaper(1957,3268000,1373464)]), 1957)

summary()

#test year
expect(year(N1), 1956)
expect(year(N2), 1961)

summary()

[92m3 of 3 tests passed[0m
[92m2 of 2 tests passed[0m
[92m3 of 3 tests passed[0m
[92m2 of 2 tests passed[0m
[92m2 of 2 tests passed[0m


## Problem 2b

To finish your program, complete the design of the analysis function(s). For this particular problem, we would like your program to find the year that had the highest advertising revenue.

Think about your data definitions and the helper rules to determine how many helper functions you will need to write when designing this function. 

In [4]:
# RETURN to the cell above to complete your design of the analysis functions.
# Do not design them here.
