# Assignment 4 - Statements and Syntax

## Instructions

In this assignment, we will continue working with the `avengers.csv`.  Copy over the `avengers_utf8.csv` you created in the last exercise to the working directory of this notebook. Open up `avengers_utf8.csv` and assign the lines to the `lines` variable. 

Follow the instructions for submitting a Jupyter Notebook assignment in the submitting assignments documentation. 

## 1. Convert lines to rows (5 points)

Using a `for` statement, loop through the `lines` sequence and process each line into a row and add each processed row to the `rows` list.  For the purposes of this assignment, a row is a list of values that results from splitting each line by its delimiter (in this case a comma).  As an example, if the line input is `'1,2,apple,pear'`, then row output is `['1', '2', 'apple', 'pear']`.  Do not worry about converting values or cleaning up the input data at this point. 


<div class="alert alert-info">
**Note**: In this data set, there is one record that contains extra commas.  This happens when commas are used to separate the fields and commas are used within the some of the data values.  In this case, the `url` field for this record is `"...Nicholas_Fury,_Jr._(Earth-616)#"` and the `name_alias` field is `"Nicholas Fury, Jr., Marcus Johnson"`. These values are both enclosed with quote characters, which means we should not treat the commas inside the quotes as field separators.  In later assignments, we will implement code to handle escaped commas, but for now we will ignore this field.   
</div>

A normal row should contain 21 values.  Only add a row to the `rows` list if it contains 21 values. 

In [1]:
avengers = open(r'C:\Users\Dan Siegel\Desktop\Classes\DSC510\Week3\avengers.csv','r')
lines = avengers.readlines()

rows = []
for line in lines:
    row=line.split(',')
    if len(row)==21:
        rows.append(row)

## 2. Create a header with Python friendly names (5 points)

In the second step, create a header with Python friendly names.  These names will be used to create a dictionary based record.  When creating the names, convert the original name to a new name using the following rules.  Assign the new header names to the `fieldnames` variable.  

1. The new name should only use lower case letters, numbers, and underscores.
2. Replace spaces and slashes with underscores. 
3. Strip trailing question marks and newlines.  

In [2]:
fieldnames = []
for category in rows[0]:
    corr_category = category.lower()
    corr_category = corr_category.strip('\n').strip('?')
    corr_category = corr_category.replace('/','_')
    fieldnames.append(corr_category)
print(fieldnames)
    

['url', 'name_alias', 'appearances', 'current', 'gender', 'probationary introl', 'full_reserve avengers intro', 'year', 'years since joining', 'honorary', 'death1', 'return1', 'death2', 'return2', 'death3', 'return3', 'death4', 'return4', 'death5', 'return5', 'notes']


## 3. Create dict-based records (10 points)

Using the `fieldnames` and `rows` variables from the last two parts, create `records` which contain Python dictionaries with the field names assigned to the appropriate values. 

In [3]:
records = []
for row in rows[1:]:
    record={}
    for category in range(len(fieldnames)):
        record[fieldnames[category]]=row[category]
    records.append(record)

In [4]:
print (records[4]['name_alias'], records[4]['url'])

Thor Odinson http://marvel.wikia.com/Thor_Odinson_(Earth-616)


## 4. Convert record values to appropriate types (10 points)

Using a for loop, iterate through each record and convert each record value to the appropriate type.  Additionally, make sure to update the `years_since_joining` value to the current year. If the value does not need to be converted, strip any trailing spaces or newlines. 

In [5]:
for rows in range(len(records)):
    for keys in fieldnames:
        if keys == 'appearances' or keys == 'year' or keys == 'years since joining':
            records[rows][keys]=int(records[rows][keys])
        else:
            records[rows][keys]=records[rows][keys].rstrip().lstrip().strip('\n')
            if keys == 'current' or keys.startswith('return') or keys.startswith('death'):
                if records[rows][keys]=='YES':
                    records[rows][keys]=True
                elif records[rows][keys]=='NO':
                    records[rows][keys]=False
                else:
                    records[rows][keys]=None
        if keys == 'years since joining':
            records[rows][keys]=(2018-(records[rows]['year']))

In [6]:
records

[{'appearances': 1269,
  'current': True,
  'death1': True,
  'death2': None,
  'death3': None,
  'death4': None,
  'death5': None,
  'full_reserve avengers intro': 'Sep-63',
  'gender': 'MALE',
  'honorary': 'Full',
  'name_alias': '"Henry Jonathan ""Hank"" Pym"',
  'notes': 'Merged with Ultron in Rage of Ultron Vol. 1. A funeral was held.',
  'probationary introl': '',
  'return1': False,
  'return2': None,
  'return3': None,
  'return4': None,
  'return5': None,
  'url': 'http://marvel.wikia.com/Henry_Pym_(Earth-616)',
  'year': 1963,
  'years since joining': 55},
 {'appearances': 1165,
  'current': True,
  'death1': True,
  'death2': None,
  'death3': None,
  'death4': None,
  'death5': None,
  'full_reserve avengers intro': 'Sep-63',
  'gender': 'FEMALE',
  'honorary': 'Full',
  'name_alias': 'Janet van Dyne',
  'notes': 'Dies in Secret Invasion V1:I8. Actually was sent tto Microverse later recovered',
  'probationary introl': '',
  'return1': True,
  'return2': None,
  'return3':

# 5. Filtering Records (10 points)

Using a list comprehension, get the records for those Avengers with over 3000 appearances. Assign the output to the `over3000` variable. 

Use this to print the name and number of appearances for those Avengers.  When printing the names, some names may contain extra quotes.  Strip the `"` around the entire name and replace double quotes (`""`) with a single quote (`"`).

In [13]:
over3000=[]
for rows in range(len(records)):
    records[rows]['over3000'] = (records[rows]['appearances']>3000)
    if (records[rows]['appearances']>3000):
        over3000.append(rows)
print(over3000)

[2, 6, 73, 92]


In [8]:
for rows in range(len(records)):
    if records[rows]['over3000'] == True:
        appearances = records[rows]['appearances']
        name = records[rows]['name_alias']
        if name.startswith('"') and name.endswith('"'):
            name = name[1:-1]
        name = name.replace('""', '"')
        print("NAME:", name, "APPEARANCES:", appearances)

NAME: Anthony Edward "Tony" Stark APPEARANCES: 3068
NAME: Steven Rogers APPEARANCES: 3458
NAME: Peter Benjamin Parker APPEARANCES: 4333
NAME: James "Logan" Howlett APPEARANCES: 3130


In [20]:
for individuals in over3000:
    appearances = records[individuals]['appearances']
    name = records[individuals]['name_alias']
    if name.startswith('"') and name.endswith('"'):
        name = name[1:-1]
        name = name.replace('""', '"')
    print("NAME:", name, "APPEARANCES:", appearances)

NAME: Anthony Edward "Tony" Stark APPEARANCES: 3068
NAME: Steven Rogers APPEARANCES: 3458
NAME: Peter Benjamin Parker APPEARANCES: 4333
NAME: James "Logan" Howlett APPEARANCES: 3130
