# Week 12

Here are some questions that should exercise a bunch of the stuff we've studied in the course. 

## Question 1
Create your own text file with TextEdit or NotePad. Save it as a .txt file! The file should have multiple lines - at least 10 of them to make it interesting. Make sure your code works with both an odd and an even number of lines.

Write a function that takes a string as a parameter that will be the name of the file. Your function should do the following:
* open the file whose name is passed in for reading
* open a second file whose name is the same as the one passed in with a "_rev" before the .txt extension. So, if the parameter passed in is "my_text.txt", you should create a new file called "my_text_rev.txt".
* write the contents of the first file to the second file but swap the order of each pair of consecutive lines. 

For example, if this is your input file:


    Bicycle
    We are the champions
    Bohemian rhapsody 
    We will rock you
    Another one bites the dust

then this should be the output file:

    We are the champions
    Bicycle
    We will rock you
    Bohemian rhapsody 
    Another one bites the dust

1. Write the function using a list. (Easy)
2. Write the function without using any collections (no lists, tuples, sets, dictionaries). You cannot write your own collection. Why? What if your textfile is 800 Gb? In part a) you will try to read the whole thing into memory and your program will either crash or take a very long time. Here, you should write the code so that you only ever had two lines in memory at any one time. (Intermediate)


### Part 1 (Easy)

In [14]:
filename = 'my_text.txt'

def read_pair_swap_rev_list (filename):
    ''' (str,str) -> None
    Read the file in filename and output it as filename+'_rev', swapping each line pair.
    This is a computationally intensive function because it makes use of a large list.
    '''
    
    lines = []

    # reading
    with open(filename, 'r') as in_file:
        text = in_file.readline()
        while text != '':
            lines.append(text)
            text = in_file.readline()    # next line

    print("lines read: ", lines)

    # creating a new file
    out_file_name = filename[:-4] +'_rev.txt'

    # swapping the list element pairs:
    for i in range(0, len(lines) - 1, 2):
        lines[i], lines[i + 1] = lines[i + 1], lines[i]

    # writing
    with open(out_file_name, 'w') as out_file:
        for i in range(len(lines) ):
            out_file.write( lines[i] )

    print("lines wrote: ", lines)

# Execution
read_pair_swap_rev_list(filename)

lines read:  ['We are the champions\n', 'Bicycle\n', 'We will rock you\n', 'Bohemian rhapsody \n', 'Another one bites the dust']
lines wrote:  ['Bicycle\n', 'We are the champions\n', 'Bohemian rhapsody \n', 'We will rock you\n', 'Another one bites the dust']


### Part 2 (Intermediate)

In [15]:
def read_pair_swap_rev (filename):
    ''' (str,str) -> None
    Read the file in filename and output it as filename+'_rev', swapping each line pair.
    This is a computationally light function which avoids holding large memory while execution.
    '''

    s = "" # storing the file contents in a light string

    # reading
    with open(filename) as infile:
        text = infile.readlines()
        for line in text:
            # add line onto s after removing the \n and adding a space
            s = s + line#.rstrip('\n') + " "
            
    print("string read:\n", s)

    # creating a new file
    out_file_name = filename[:-4] +'_rev_noList.txt'

    # swapping and writing simultaneously
    start = 0
    with open(out_file_name, 'w') as out_file:
        while start < len(s):

            end1 = s.find('\n', start)
            line1 = s[start:end1 + 1]
            end2 = s.find('\n', end1 + 1)

            if end1 == -1: # No more newlines; last line
                out_file.write(s[start:])
                break

            if end2 == -1: # Only one line left after this
                line2 = s[end1 + 1:]
                out_file.write( line2 + '\n' + line1)
                break
            else:
                line2 = s[end1 + 1: end2]
                out_file.write( line2 + '\n' + line1)
                start = end2 +1
    
# Execution
read_pair_swap_rev(filename)

string read:
 We are the champions
Bicycle
We will rock you
Bohemian rhapsody 
Another one bites the dust


## Question 2
Write a program (you can choose to use functions or not) to create a "fake" marks file as follows.
* the course has the following assessments: Assignment1, Assignment2, Midterm1, Midterm2, Final
    * Assignment1 has 1 part A
    * Assignment2 has 2 parts A, B
    *  Midterm1 has 4 parts (A-D)
    * Midterm2 has 5 parts (A-E)
    * Final has 5 parts (A-E)
* each assessment is marked out of 100 and so the sum of the values of each part must be 100. Each part of a given assessment is worth the same. So if there are 4 parts, each is worth 25 marks.
* you have 30 students in the class and each one has a unique 3-digit ID number. For each student, generate a random 3-digit number ID.
* For each student, for each part of each assessment, generate the student's mark. The mark must be between 0 and the value of that part of that assessment.
* Write out the marks file as a CSV with the following format:


    id, assessment_name, mark_part1, mark_part2, ... 

(up to the number of parts of the assignment). For example, the first three lines of the file might be:

    333, Assignment1, 67
    333, Assignment2, 44, 20
    333, Midterm1, 20, 21, 18, 22

This file will be used in Q3.

One option is to create data structures using classes and/or dictionaries. You could do it this way. However, the main purpose of this file is to be used in Q3, so you can feel free to be "quick and dirty". My code takes about 20 lines.


In [16]:
import random
import csv

# hardcoded data
assessments = ('Assignment1', 'Assignment2', 'Midterm1', 'Midterm2', 'Final')
num_parts = (1,2,4,5,5) # number of parts of each assessment
num_students = 30

with open("marks.csv", "w",  newline='') as outfile:
    # Note on ", newline='' " for Windows users: when you open a file in text mode ("w") on Windows without specifying newline='', 
    # Python inserts an extra \r\n each time csv.writer writes a line — resulting in double line breaks.
    writer = csv.writer(outfile)
    
    for student in range(num_students):
        # Potential bug: may randomly produce same id more than once and so have 
        # multiple entries per student and assessment
        
        id = random.randint(100,1000)
        
        for i in range(len(assessments)):
            out_row = [id, assessments[i]]
            
            for p in range(num_parts[i]):
                out_row.append(str(random.randint(0,100/num_parts[i])))
                
            writer.writerow(out_row)
            

## Question 3:
The purpose of this problem is to give you practice with nested dictionaries.

### a)
Write a function that takes in a filename, reads in the CSV file you created in Q2 from a file of that name, and returns a dictionary of the following form:

    {id : {assessment : total_mark}}

This is a nested dictionary. The outer dictionary has keys corresponding to student IDs. For each student ID, the value is an inner dictionary whose keys are the 5 assessments and whose values are the total score on the corresponding assessment. For example, from the partial CSV file listed in Q2, you would create the following partial dictionary.

    {333 : {'Assignment1' : 67,
            'Assignment2' : 64,
            'Midterm1' : 81, ...},...}

Then write a function that takes the dictionary and returns a list of tuples - one for each student. Each tuple should be of the form (`id`, `avg_mark`) where `id` is an ID number and `avg_mark` is the average mark from all assessments of that student.

### b) 
Write a function that takes in a filename, reads in the CSV file you created in Q2 from a file of that name, and returns a dictionary of the following form:

    {id : {assessment : {assessment_part : mark}}}

This is a triply nested dictionary. The outer dictionary has keys corresponding to student IDs. For each student ID, the value is an inner dictionary whose keys are the 5 assessments and whose values are another dictionary. This inner-most dictionary should have keys corresponding to the parts of the assessment (i.e., A,B,C,D,E). For example, from the partial CSV file listed in Q2, you would create the following partial dictionary.

    {333 : {'Assignment1' : {'A' : 67},
            'Assignment2' : {'A' : 44, 'B' : 20},
            'Midterm1' : {'A' : 44, 'B' : 20, 'C' : 18, 'D' : 22},
    ...},
    ...}

Then write a function that takes the dictionary and returns a list of tuples - one for each student. Each tuple should be of the form (`id`, `avg_mark`) where `id` is an ID number and `avg_mark` is the average mark from all assessments of that student. Note that this function is different from the one in part a) because the dictionary has a different form.

In [17]:
def generate_student_marks(filename):
    '''
    (str) -> dictionary {id : {assessment : mark}}
    Opens the CSV file indicated by filename.
    Reads in the student mark records and stores them in a nested dictionary indexed
    first by ID and then by assessment name
    '''
    
    all_students = {}
    with open(filename, "r") as infile:
    
        reader = csv.reader(infile)
        for row in reader:
            id = row[0]
            assessment = row[1]
            total_mark = 0
            for mark in row[2:]:
                float_mark = float(mark)
                total_mark += float_mark
                    
            if id not in all_students:
                all_students[id] = {assessment : total_mark} 
            else:
                all_students[id][assessment] = total_mark
            
    return all_students

def calculate_average(all_students):
    '''
    (dict) -> list of tuples
    calculate the average mark for each students an return it in a list of tuples
    (id, average)
    '''
    mark_list = []
    for id in all_students:
        sum_mark = 0
        for assessment in all_students[id]:
            sum_mark += all_students[id][assessment]
        avg = sum_mark / len(all_students[id])
        mark_list.append((id, avg))
        
    return mark_list

students = generate_student_marks("marks.csv")
print(students, len(students))

student_list = calculate_average(students)
print(student_list)



{'870': {'Assignment1': 56.0, 'Assignment2': 35.0, 'Midterm1': 79.0, 'Midterm2': 50.0, 'Final': 44.0}, '412': {'Assignment1': 71.0, 'Assignment2': 44.0, 'Midterm1': 58.0, 'Midterm2': 52.0, 'Final': 67.0}, '120': {'Assignment1': 100.0, 'Assignment2': 40.0, 'Midterm1': 40.0, 'Midterm2': 66.0, 'Final': 47.0}, '622': {'Assignment1': 53.0, 'Assignment2': 37.0, 'Midterm1': 52.0, 'Midterm2': 40.0, 'Final': 59.0}, '284': {'Assignment1': 94.0, 'Assignment2': 47.0, 'Midterm1': 54.0, 'Midterm2': 45.0, 'Final': 14.0}, '509': {'Assignment1': 90.0, 'Assignment2': 14.0, 'Midterm1': 56.0, 'Midterm2': 64.0, 'Final': 59.0}, '951': {'Assignment1': 55.0, 'Assignment2': 37.0, 'Midterm1': 68.0, 'Midterm2': 35.0, 'Final': 61.0}, '900': {'Assignment1': 41.0, 'Assignment2': 36.0, 'Midterm1': 11.0, 'Midterm2': 49.0, 'Final': 75.0}, '961': {'Assignment1': 0.0, 'Assignment2': 66.0, 'Midterm1': 68.0, 'Midterm2': 56.0, 'Final': 46.0}, '244': {'Assignment1': 10.0, 'Assignment2': 54.0, 'Midterm1': 18.0, 'Midterm2': 4

## Question 4

Using the same data and set-up as Q3 do the following.

### a) 
Redo Q3a using a `Student` class. 

Create a `Student` class. The class should contain the student ID and a dictionary that is the same as the inner dictionary in Q3a. That is, each `Student` object should have a dictionary of the form: `{assessment : total_mark}`

For example:

    {'Assignment1' : 67,'Assignment2' : 64,'Midterm1' : 81, ...}

Create a dictionary of the form: `{id : Student-object}`

Then write a function that takes the dictionary and returns a list of tuples - one for each student. Each tuple should be of the form (`id`, `avg_mark`) where `id` is an ID number and avg_mark is the average mark from all assessments of that student.

### b) 
Redo Q3b using a `Student` class. 

Create a `Student` class. The class should contain the student ID and a dictionary that is the same as the inner dictionary in Q3b. That is, each `Student` object should have a dictionary of the form: `{assessment : {assessment_part : mark}}`

For example:

    {'Assignment1' : {'A' : 67},
    'Assignment2' : {'A' : 44, 'B' : 20},
    'Midterm1' : {'A' : 44, 'B' : 20, 'C' : 18, 'D' : 22},...}

Create a dictionary of the form: `{id : Student-object}`

Then write a function that takes the dictionary and returns a list of tuples - one for each student. Each tuple should be of the form (`id`, `avg_mark`) where `id` is an ID number and `avg_mark` is the average mark from all assessments of that student.

In [18]:
class Student:
    '''An object that represents student information'''
    
    def __init__(self, id):
        '''
        (self,str) -> NoneType
        Assigns the id, creates an empty marks dictionary
        '''
        self.id = id
        self.marks = {}
        
    def __str__(self):
        '''
        (self) -> str
        '''
        ret_str = self.id + " ["
        for assessment in self.marks:
            ret_str += "(" + assessment + "," + str(self.marks[assessment]) + ")"
        ret_str += "]"
        return ret_str

    def add_mark(self, assessment, mark):
        '''
        (self, str, num) -> NoneType
        Adds num as the mark for assessment
        '''
        self.marks[assessment] = mark  # overwrites existing mark if any. Check?

    def calc_avg(self):
        '''
        (self) -> float
        Returns the average mark over the assessments
        '''
        total_mark = 0
        for assessment in self.marks:
            total_mark += self.marks[assessment]
        return total_mark / len(self.marks)

def generate_student_marks(filename):
    '''
    (str) -> dictionary {str : Student}
    Parse the student data from CSV file filename
    '''
    
    all_students = {}
    with open(filename, "r") as infile:
    
        reader = csv.reader(infile)
        for row in reader:
            id = row[0]
            if id not in all_students:
                s = Student(id)
                all_students[id] = s
            else:
                s = all_students[id]
                
            assessment = row[1]
            total_mark = 0
            for mark in row[2:]:
                float_mark = float(mark)
                total_mark += float_mark
                
            s.add_mark(assessment, total_mark)
                        
    return all_students

def calculate_average(all_students):
    '''
    (dict of Students) -> list of tuples
    calculate the average mark for each students an return it in a list of tuples
    (id, average)
    '''
    mark_list = []
    for id in all_students:
        mark_list.append((id, all_students[id].calc_avg()))
        
    return mark_list

students = generate_student_marks("marks.csv")
for s in students:
    print(students[s])
    
student_list = calculate_average(students)
print(student_list)

870 [(Assignment1,56.0)(Assignment2,35.0)(Midterm1,79.0)(Midterm2,50.0)(Final,44.0)]
412 [(Assignment1,71.0)(Assignment2,44.0)(Midterm1,58.0)(Midterm2,52.0)(Final,67.0)]
120 [(Assignment1,100.0)(Assignment2,40.0)(Midterm1,40.0)(Midterm2,66.0)(Final,47.0)]
622 [(Assignment1,53.0)(Assignment2,37.0)(Midterm1,52.0)(Midterm2,40.0)(Final,59.0)]
284 [(Assignment1,94.0)(Assignment2,47.0)(Midterm1,54.0)(Midterm2,45.0)(Final,14.0)]
509 [(Assignment1,90.0)(Assignment2,14.0)(Midterm1,56.0)(Midterm2,64.0)(Final,59.0)]
951 [(Assignment1,55.0)(Assignment2,37.0)(Midterm1,68.0)(Midterm2,35.0)(Final,61.0)]
900 [(Assignment1,41.0)(Assignment2,36.0)(Midterm1,11.0)(Midterm2,49.0)(Final,75.0)]
961 [(Assignment1,0.0)(Assignment2,66.0)(Midterm1,68.0)(Midterm2,56.0)(Final,46.0)]
244 [(Assignment1,10.0)(Assignment2,54.0)(Midterm1,18.0)(Midterm2,42.0)(Final,84.0)]
156 [(Assignment1,65.0)(Assignment2,42.0)(Midterm1,62.0)(Midterm2,45.0)(Final,47.0)]
132 [(Assignment1,100.0)(Assignment2,43.0)(Midterm1,25.0)(Midterm