# Project Milestone Template

### Step 1a: Planning 
#### Identify the information in the file your program will read

Describe (all) the information that is available. Be sure to note any surprising or unusual features. (For example, some information sources have missing data, which may be blank or flagged using values like -99, NaN, or something else.)

<font color="blue">

available information includes
- the year the movie was released 
- the imdb ID of the movie (given by "tt" + followed by a combination of six numbers unique to each movie)
- the title of the movie 
- how the movie passes in terms of the Bechdel Test (test): 
    - "nowomen-disagree" indicates that there are no women in the movie
    - "notalk" indicates that while there are two women in the movie, they do not talk to each other 
    - "men" indicates that while there are two women who talk to each other in the movie, their conversation is about men/a man
    - "ok" indicates that the movie has two women that talk to each other about a subject that does not involve men; the movie passes the Bechdel Test
    - "notalk-disagree"
    -"men-disagree" 
    - "ok-disagree" indicates that, under a generous eye, the movie does not pass the Bechdel Test
    - "dubious-disagree" indicates that, given the benefit of the doubt, the movie does not pass the Bechdel Test (equivalent to "ok-disagree" in that the given movie is in the "generous category")
- the movie's rank on a clean test (clean_test), in which
    - "notalk" indicates
    - "ok" indicates
    - "men" indicates
    - "nowomen" indicates
    - "dubious" indicates
- whether the movie (FAIL) or (PASS) 
- the budget for production of the movie in dollars 
- the movie's domestic gross (how much revenue the movie made in theatres in their home country)
- the movie's international gross (how much revenue is made in theaters outside of the movie's home country)
- the movie's code, which consists of etiher 2012 or 2013 and is followed by a "FAIL" or a "PASS" 
    
    
    
</font>

### Step 1b: Planning 
#### Brainstorm ideas for what your program will produce
#### Select the idea you will build on for subsequent steps

You must brainstorm at least three ideas for graphs or charts that your program could produce and choose the one that you'd like to work on. You can choose between a line chart, histogram, bar chart, scatterplot, or pie chart.

If you would like to change your project idea from what was described in the proposal, you will need to get permission from your project TA. This is intended to help ensure that your new project idea will meet the requirements of the project. Please see the project proposal for things to be aware of when communicating with your project TA.

<font color="blue">
    
Based off of our initial research question, [Does the amount of characters in a movie title impact its overall income and success?], three possible ideas include:
1. A line chart
    - x axis: the number of characters in the title of a given movie, including spaces (i.e. "The Great Gatsby" has 16 characters)
    - y axis: the profit made by a given movie. More specifically, gross domestic profit. We did not choose the sum of the domestic and international profit because the international profit would, in the case that a relationship between title character number and profit exists, reflect the movie's translated title in a given international country.
    - possible relationships that could be identified from this line graph include (a) a upward slope indicating a positive correlation (as the number of characters in the title increases, the total profit made by the movie also increases), and (b) a downward slope indicating a negative correlation (as the number of characters in the title increases, the total profit made by the movie decreases). 
2. A line chart
    - Given the same parameters as the first idea, instead of taking the the domestic profit, we can focus on the sum of the domestic and international profits.
    - Moreover, instead of graphing the number of characters on the x axis, we can instead take in the number of words in the title, where numbers would also be considered a "word". 
3. A histogram
    - We can select a specific total gross profit range and only consider movies within that range
    - from there, each bar is the number of movies with titles within the range of of a certain number of characters (i.e. possible bars would be movies with 1-6 characters, 7-12 characters, 13-18 characters, etc.)
    
    
</font>

### Step 1c: Planning 
#### Write or draw examples of what your program will produce

You must include a **hand-drawn image** that shows what your chart or plot will look like. You can insert an image using Edit -> Insert Image.

![IMG_0162.jpg](attachment:IMG_0162.jpg)
<font color="blue"></font>

### Step 2a: Building
#### Document which information you will represent in your data definitions
#### Design data definitions

Before you design data definitions in the code cell below, you must explicitly document here which information in the file you chose to represent and why that information is crucial to the chart or graph that you'll produce when you complete step 2c.

In [1]:
from cs103 import *
from typing import NamedTuple, List
import csv

##################
# Data Definitions

MovieData = NamedTuple('MovieData', [('characters', int), # in range [0, ...)
                                    ('dom_gross', int)])  # in range [1000000, ...)
# interp. data about a movie, including the number of characters in the movie title (characters)
# and the movie's domestic gross profit (dom_gross). 

MD0 = MovieData(0, 1000000)
MD1 = MovieData(12, 240120460)
MD2 = MovieData(20, 12000380)

@typecheck
def fn_for_movie_data(md: MovieData) -> ...:
    # template based on Compound (2 fields), atomic non-distinct
    return ...(md.characters,
              md.dom_gross)

# List[MovieData]
# interp. a list of movie data

LOMD0 = []
LOMD1 = [MD1, MD2]

@typecheck
def fn_for_lomd(lomd: List[MovieData]) -> ...:
    # template based on arbitrary-sized data and reference rule
    # description of accumulator
    acc = ... #type: ...
    for md in lomd:
        acc = ...(fn_for_movie_data(md), acc)
        
    return ...(acc)


  return ...(md.characters,
  acc = ...(fn_for_movie_data(md), acc)
  return ...(acc)


### Step 2b: Building
#### Design a function to read the information and store it as data in your program

Complete this step in the code cell below. Your `read` function should remove any row with invalid or missing data but otherwise keep all the data. I.e., you should **not** design the `read` function such that it only returns the data you need for step 2c.

In [None]:
###########
# Functions

@typecheck
def read(filename: str) -> List[MovieData]:
    """    
    reads information from the specified file and returns a list of movie data
    (returns all rows except those in which the domestic gross profit is less than the 
    budget of the movie)
    """
    #return []  #stub
    # Template from HtDAP
    # loc contains the result so far
    loc = [] # type: List[MovieData]

    with open(filename) as csvfile:
        
        reader = csv.reader(csvfile)
        next(reader) # skip header line

        for row in reader:
            # you may not need to store all the rows, and you may need
            # to convert some of the strings to other types
            c = Consumed(row[0], ... ,row[n])
            loc.append(c)
    
    return loc


# Begin testing
start_testing()

# Examples and tests for read
expect(..., ...)

# show testing summary
summary()

In [None]:
# Be sure to select ALL THE FILES YOU NEED (including csv's) 
# when you submit. Also, UNLIKE USUAL, YOU CAN EDIT THIS CELL!
# That's in case you want to switch the ASSIGNMENT code for the final
# submission. Run this cell to start the submission process.
from cs103 import submit

COURSE = 83388
ASSIGNMENT = 1090674
#ASSIGNMENT = 1090673 # UNCOMMENT for final submission and COMMENT line above

submit(COURSE, ASSIGNMENT)

# If your submission fails, SUBMIT by downloading your files and uploading them to 
# Canvas. You can learn how on the page "How to submit your Jupyter notebook" on 
# our Canvas site.

# Please double check your submission on Canvas to ensure that the right files (Jupyter file + CSVs) have been submitted

<font color="red">**You should always check your submission on Canvas. It is your responsibility to ensure that the correct file has been submitted for grading.**</font> Regrade or accomodation requests using reasoning such as "I didn't realize I submitted the wrong file"/"I didn't realize the submission didn't work"/"I didn't realize I didn't save before submitting so some of my work is missing" will not be considered.