# Midterm Project (400 Points)

For the midterm project, you will be combining your knowledge and work on previous assignments and lessons into a simple, but complete data science project.  

## Project Description

The midterm project should follow the directory structure shown below.  You will be responsible for implementing each of the scripts, ensuring that they work correctly, and generating any necessary data and reports. 

```nohighlight
msds510/                              <- Top-level project directory
├── data
│   ├── processed
│   │   └── avengers_processed.csv    <- UTF-8 encoded CSV with Python friendly headers
│   └── raw
│       └── avengers.csv              <- Original `avengers.csv` data. 
├── reports
│   └── top_ten_appearances.md        <- Markdown report of the top 10 Avengers
└── src
    ├── make_report.py                <- Script to make a report of top 10 Avengers
    ├── msds510                       <- Package name for project specific code. 
    │   ├── __init__.py
    │   └── util.py                   <- Place in helper functions used to process records
    └── process_csv.py                <- Script to process the original CSV data
```

## Detailed Requirements

### Script and Python Code

1. Running `python process_csv.py  ../data/raw/avengers.csv ../data/processed/avengers_processed.csv` should generate a processed CSV file. 
2. Running `python make_report.py ../data/processed/avengers_processed.csv ../reports/top_ten_appearances.md` should generate a Markdown formatted report of the top ten Avengers by the number of their appearances. 
3. The scripts should implement a `main` function and implement `if __name__ == '__main__'` statement as shown in [Top-level script environment](https://docs.python.org/3/library/__main__.html) documentation. 
4. Implement any helper functions in the `msds510.util` module.  The scripts should only define one top-level function called `main`.  
5. While you cannot use any third-party packages to complete this project, you are encouraged to use the Python Standard Library.  It is recommended that consider using the following: 
    1. `DictReader` and `DictWriter` from the `csv` module to process CSV files. 
    2. `date` and `timedelta` from the `datetime` module to work with dates. 
    3. `argv` from the `sys` module to process command line arguments. 

### Processing CSV Data

Below are the requirements for the `avengers_processed.csv` file.  

1. The first row should contain a header with Python friendly names. 
2. The data should contain a `month_joined` field with the numeric month (e.g. 1 - January, 5 - May, 12 - December) the person joined the Avengers. This field should be empty if the information is not available. 
3. Trim any trailing whitespace (e.g. newlines and spaces) from the `notes` field. 
4. Convert `YES/NO` values to boolean `True/False` values. 

### Markdown Report

Markdown is a lightweight markup language for generating documentation without having to learn a complicated language like *HTML* or *LaTeX*.  It is relatively easy to generate Markdown reports from Python scripts and then convert them into *html*, *pdf*, *docx* and other formats. 

For this project, you will not need to understand the details of Markdown.  You will just need to generate text that conforms to a template.  

Your Markdown report will contain a list of the top ten Avengers sorted by the number of appearances.  Each section should use the following template. 

```
# {rank}. {name}

* Number of Appearances: {appearances}
* Year Joined: {year}
* Years Since Joining: {years_since_joining}
* URL : {url}

## Notes

{notes}
```

Here is an example with the values filled in.  This should be similar to what is output in your report.  *Note*: Do not use the data from this example (e.g., rank, years joined, etc...) in your report.  The rank and other values may not accurately reflect the underlying data. 


```nohighlight
# 8.  Henry Jonathan "Hank" Pym

* Number of Appearances: 1269
* Year Joined: 1963
* Years Since Joining: 55
* URL: http://marvel.wikia.com/Henry_Pym_(Earth-616)

## Notes

Merged with Ultron in Rage of Ultron Vol. 1. A funeral was held.
```

When completed your entire report it should contain a section like that for each of the top ten Avengers.  Your completed report should look something like this. 

```
# 1. Name of Number One Avenger

* Number of Appearances: 9000
* Year Joined: 1952
* Years Since Joining: 89
* URL: http://marvel.wikia.com/That_Guy

## Notes

These are not real notes.  This is just an example. 

# 2. Name of Number Two Avenger

...

# 3. Name of Number Three Avenger

. 
. 
.


# 10.  Name of Number Ten Avenger

...
```