# Assignment 4 - Statements and Syntax

## Instructions

In this assignment, we will continue working with the `avengers.csv`.  Copy over the `avengers_utf8.csv` you created in the last exercise to the working directory of this notebook. Open up `avengers_utf8.csv` and assign the lines to the `lines` variable. 

Follow the instructions for submitting a Jupyter Notebook assignment in the submitting assignments documentation. 

## 1. Convert lines to rows (5 points)

Using a `for` statement, loop through the `lines` sequence and process each line into a row and add each processed row to the `rows` list.  For the purposes of this assignment, a row is a list of values that results from splitting each line by its delimiter (in this case a comma).  As an example, if the line input is `'1,2,apple,pear'`, then row output is `['1', '2', 'apple', 'pear']`.  Do not worry about converting values or cleaning up the input data at this point. 


<div class="alert alert-info">
**Note**: In this data set, there is one record that contains extra commas.  This happens when commas are used to separate the fields and commas are used within the some of the data values.  In this case, the `url` field for this record is `"...Nicholas_Fury,_Jr._(Earth-616)#"` and the `name_alias` field is `"Nicholas Fury, Jr., Marcus Johnson"`. These values are both enclosed with quote characters, which means we should not treat the commas inside the quotes as field separators.  In later assignments, we will implement code to handle escaped commas, but for now we will ignore this field.   
</div>

A normal row should contain 21 values.  Only add a row to the `rows` list if it contains 21 values. 

In [4]:
with open('avengers_utf8.csv') as f:
    lines = f.readlines()
    
rows = []

for line in lines:
    row = line.split(',') 
    if len(row) == 21:    
        rows.append(row)
len(rows)
    

173

## 2. Create a header with Python friendly names (5 points)

In the second step, create a header with Python friendly names.  These names will be used to create a dictionary based record.  When creating the names, convert the original name to a new name using the following rules.  Assign the new header names to the `fieldnames` variable.  

1. The new name should only use lower case letters, numbers, and underscores.
2. Replace spaces and slashes with underscores. 
3. Strip trailing question marks and newlines.  

In [8]:
# If I use the print to retrieve the info it will return the data like a paragraph vs a list form
header = rows[0]

fieldnames = []

for name in header:
    new_name = name.lower()
    new_name = new_name.strip('\n').strip('?')
    new_name = new_name.replace('/', '_').replace(' ', '_')
      
    fieldnames.append(new_name)

(fieldnames)
    

['url',
 'name_alias',
 'appearances',
 'current',
 'gender',
 'probationary_introl',
 'full_reserve_avengers_intro',
 'year',
 'years_since_joining',
 'honorary',
 'death1',
 'return1',
 'death2',
 'return2',
 'death3',
 'return3',
 'death4',
 'return4',
 'death5',
 'return5',
 'notes']

## 3. Create dict-based records (10 points)

Using the `fieldnames` and `rows` variables from the last two parts, create `records` which contain Python dictionaries with the field names assigned to the appropriate values. 

In [12]:
records = []  

for row in rows[1:]:
    record = {} 
    for field, value in zip(fieldnames, row):
        record[field] = value

    records.append(record)
records[0]


{'url': 'http://marvel.wikia.com/Henry_Pym_(Earth-616)',
 'name_alias': '"Henry Jonathan ""Hank"" Pym"',
 'appearances': '1269',
 'current': 'YES',
 'gender': 'MALE',
 'probationary_introl': '',
 'full_reserve_avengers_intro': 'Sep-63',
 'year': '1963',
 'years_since_joining': '52',
 'honorary': 'Full',
 'death1': 'YES',
 'return1': 'NO',
 'death2': '',
 'return2': '',
 'death3': '',
 'return3': '',
 'death4': '',
 'return4': '',
 'death5': '',
 'return5': '',
 'notes': 'Merged with Ultron in Rage of Ultron Vol. 1. A funeral was held.\n'}

## 4. Convert record values to appropriate types (10 points)

Using a for loop, iterate through each record and convert each record value to the appropriate type.  Additionally, make sure to update the `years_since_joining` value to the current year. If the value does not need to be converted, strip any trailing spaces or newlines. 

In [13]:
# if it is a yes or no question, this should be a boolen type
# if it is a number, this should be an integer value

for record in records:
    for key, value in record.items():
        if key.startswith('death'):
            if value == 'YES':
                record[key] = True
            elif value == 'NO':
                record[key] = False
        elif key.startswith('return'):
            if value == 'YES':
                record[key] = True
            elif value == 'NO':
                record[key] = False
        elif key.startswith('current'):
            if value == 'YES':
                record[key] = True
            elif value == 'NO':
                record[key] = False
        elif key in ['year', 'appearances']:
            record[key] = int(value)
        else:
            record[key] = value.strip()
            
for record in records:
    record['years_since_joining'] = 2018 - record['year']
    
records[0]['years_since_joining']

55

# 5. Filtering Records (10 points)

Using a list comprehension, get the records for those Avengers with over 3000 appearances. Assign the output to the `over3000` variable. 

Use this to print the name and number of appearances for those Avengers.  When printing the names, some names may contain extra quotes.  Strip the `"` around the entire name and replace double quotes (`""`) with a single quote (`"`).

In [15]:
# GO THROUGH EACH RECORD
    # if that record has record[appearances] > 3000
    # send to an accumlator
over3000 = [record for record in records if record['appearances'] > 3000]

for record in over3000:
    print('{}: {}'.format(record['appearances'], record['name_alias'].strip('"').replace('""', '"')))

3068: Anthony Edward "Tony" Stark
3458: Steven Rogers
4333: Peter Benjamin Parker
3130: James "Logan" Howlett
