<a href="https://colab.research.google.com/github/MarkStephens060482/oop_examples/blob/main/example_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Object-Oriented Programming - Example 2

---

### Mark Stephens

*Consider the scenario of a senior secondary college  with **Year10** classes, where a **Year10Math**, **Year10English**, **Year10Humanities** and a **Year10Science** are all **Year10** classes. In a Year 10 cohort there are many students and they form 6 different classes each. Each class completes an end of semester examination. Student awards are given to the student of highest achievement and so a rank of best performing students is necessary. Also, the teaches of the classes wish to identify which class achieved the greatest examination performance in order to analyse teacher strategies for revision and exam preparation. This can be done by ranking mean exam score per class and ranking z-score for all students.*   

---

In [None]:
import random, string
import numpy as np
import pandas as pd
# User defined module
import Prac2Module as p2m

## Useful functions

### Truncated skew normal random number

Examination results are a proportion of correctness as a percentage, with limiting values of 0 ansd 100. Any given class is comprise of students of mixed abilities with no equal proportion of students of similar strengths, and so a skewed normal distribtion, truncated at 0 and 100, is reasonable to model examination results. This is implemented as approximate truncated skew normal random number generator in the attached module *Prac2Module*. An example is given below:  

In [None]:
m, s, l, a, b, h = 50,10,5,0,100,0.25
p2m.trucskewnormrand(m, s, l, a, b, h)

68.75

In [None]:
help(p2m)

Help on module Prac2Module:

NAME
    Prac2Module

DESCRIPTION
    Truncated Skew Normal Random Number Generator
    This uses a simpson method approximation for the integral of the pdf of the skew normal distribution
    to form the cumulative distribution function.
    Attributes:
        m - mean of distribution.
        s - standard deviation of distribution
        l - skew of distribution
        a - minimum value of truncated distribution
        b - maximum value of truncated distriubtion
        h - step size of numerical integration

FUNCTIONS
    approx_inverse_sncdf(m, s, l, a, b, h)
    
    simpson_sncdf(x, m, s, l, h)
    
    skewnormpdf(x, m, s, l)
    
    trucskewnormrand(m, s, l, a, b, h)
    
    znormcdf(x)
    
    znormpdf(x)

FILE
    c:\users\08632717\documents\masters of data science study\foundations of computer science - python b\assignment 1\prac2module.py




In [None]:
def generate_cohort(size: int) -> list[str]:
    """
    Returns a list of randomly generated strings of the form 'ABCD1234'.
    Arguments:
    size - size of the population.
    """
    list_cohort = []
    for i in range(size):
        str1 = "".join(random.sample(string.ascii_uppercase,4))
        str2 = (str(random.randrange(1000)).zfill(4))
        ID = str1 + str2
        list_cohort.append(ID)
    return list_cohort


def generate_class_result(m: int, s: int, l: int, student_list: list) -> list[list[str,int]]:
    """
    generates a list of tuples, each with student ID and exam result as percentage using
    truncated skew normal random number.
    Arguments:
    m - mean of distribution.
    s - standard deviation of distribution.
    l - skew of distribution.
    student_list - a list of student IDs.
    """
    class_data = []
    for student in student_list:
        result = round(p2m.trucskewnormrand(m, s, l, 0, 100, 0.2))
        class_data.append([student,result])
    return class_data

def stats(class_data: list[list[str,int]]) -> any:
    """
    Determines mean, standard deviation and sample size of items of a list.
    Arguments:
    class_data - a list of a list of elements, e, where e[0] is a string and e[1] is integer
    """
    results = [e[1] for e in class_data]
    class_mean = round(np.mean(results),1)
    class_stdev = round(np.std(results),1)
    class_size = len(results)
    return results, class_mean, class_stdev, class_size

def get_initials(fullname):
    """
    Take a full name and return capitalised Initials.
    Arguments"
    fullname - the full name as a string.
    reference:
    https://stackoverflow.com/questions/41005700/function-that-returns-capitalized-initials-of-name
    """
    xs = (fullname)
    name_list = xs.split()

    initials = ""

    for name in name_list:  # go through each name
        initials += name[0].upper()  # append the initial

    return initials

In [None]:
class Year10:
    """
    A year 10 class that completed an end of semester examination.
    Typically comprises of 18 to 25 students, with corresponding exam results.
    Class summary statistics are calculated and z-score for each student is determined.

    Attributes:
    class_size - number of students in the class.
    class_mean - mean of exam results for class.
    rank_1 - The student with the highest result and their z_score as a tuple.
    class_data - an array of student IDs and corresponding results.

    Methods:
    summary_stats() - five number summary statistics of exam results for class are calculated, and the attributes
    class_size and class_mean are updated.
    top_zscore() - student results are converted to z_scores and printed in ascending order. Student with first
    rank is assigned to attribute rank_1.
    """
    def __init__(self,class_data: list[list[str,int]]):
        self.class_data = class_data
        self.class_size = len(class_data)
        self.class_mean = 0
        self.rank_1 = 0

    def __lt__(self , other):
        return self.class_mean < other.class_mean

    def __gt__(self, other):
        return ((self.class_mean) > (other.class_mean))

    def __eq__(self, other):
        return self.class_mean == other.class_mean

    def __repr__(self):
        if self.class_mean == 0:
            return f"Teacher is yet to correct the exam"
        else:
            return f"A Year 10 class with mean exam result of {self.class_mean}"

    def analyse_exam(self):
        # assigns results and statistics
        if bool(self.class_data):
            results, self.class_mean, class_stdev, class_size = stats(self.class_data)
            quantile_val = np.quantile(results, [0,0.25,0.5,0.75,1])
        else:
            return print("No class data available.")
        new_line = "\n"
        return print(f"mean result is: {self.class_mean},{new_line}\
        minimum is: {quantile_val[0]},{new_line}\
        lower quartile is: {quantile_val[1]},{new_line}\
        median is: {quantile_val[2]},{new_line}\
        upper quartile is: {quantile_val[3]},{new_line}\
        maximum is: {quantile_val[4]}{new_line}")

    def top_zscore(self):
        if bool(self.class_data):
            # determine statistics
            results, self.class_mean, class_stdev, class_size = stats(self.class_data)
        else:
            return print("No class data available.")
        #convert all results to zscores
        zscore_data =self.class_data
        for x in zscore_data:
            x[1] = round((x[1] - self.class_mean)/class_stdev,2)
        #find the index for element with maximum score and assign student and zscore to attribute rank_1
        index = np.array(results).argmax()
        self.rank_1 = tuple(zscore_data[index])
        zscore_data.sort(key = lambda a: a[1],reverse=True)
        new_line = "\n"
        return print(f"The sorted class list with z-scores:{new_line}\
        {np.array(zscore_data)}")

class Year10Math(Year10):
    """
    A year 10 Math class that completed an end of semester examination.
    Typically comprises of 18 to 25 students,with corresponding exam results.
    Class summary statistics are calculated and z-score for each student is determined.

    Attributes:
    teacher_name - the name of the teacher of the Math class.
    class_size - number of students in the class.
    teacher_name - The name of the teacher of the class.
    class_mean - mean of exam results for class.
    rank_1 - The student with the highest result and their z_score as a tuple.
    class_data - an array of student IDs and corresponding results.

    Methods:
    summary_stats() - five number summary statistics of exam results for class are calculated, and the attributes
    class_size and class_mean are updated.
    top_zscore() - student results are converted to z_scores and printed in ascending order. Student with first
    rank is assigned to attribute rank_1.
    """
    def __init__(self, class_data: list[list[str,int]], teacher_name: str):
        Year10.__init__(self,class_data)
        self.teacher_name = teacher_name

    def __lt__(self , other):
        return self.class_mean < other.class_mean

    def __gt__(self, other):
        return ((self.class_mean) > (other.class_mean))

    def __eq__(self, other):
        return self.class_mean == other.class_mean

    def __repr__(self):
        if self.class_mean == 0:
            return f"{self.teacher_name} is yet to correct the Year 10 Mathematics exam"
        else:
            return f"{self.teacher_name}'s Year 10 Mathematics class with mean exam score of {self.class_mean}."

    def analyse_exam(self):
        # assigns results and statistics
        if bool(self.class_data):
            results, self.class_mean, class_stdev, class_size = stats(self.class_data)
            quantile_val = np.quantile(results, [0,0.25,0.5,0.75,1])
        else:
            return print("No class data available.")
        new_line = "\n"
        return print(f"{self.teacher_name}'s Mathematics class: {new_line}\
mean result is: {self.class_mean},{new_line}\
minimum is: {quantile_val[0]},{new_line}\
lower quartile is: {quantile_val[1]},{new_line}\
median is: {quantile_val[2]},{new_line}\
upper quartile is: {quantile_val[3]},{new_line}\
maximum is: {quantile_val[4]}{new_line}")

    def top_zscore(self):
        if bool(self.class_data):
            # determine statistics
            results, self.class_mean, class_stdev, class_size = stats(self.class_data)
        else:
            return print("No class data available.")
        # convert all results to zscores
        zscore_data =self.class_data
        for x in zscore_data:
            x[1] = round((x[1] - self.class_mean)/class_stdev,2)
        # find the index for element with maximum score and assign student and zscore to attribute rank_1
        index = np.array(results).argmax()
        self.rank_1 = tuple(zscore_data[index])
        # sort the data
        zscore_data.sort(key = lambda a: a[1],reverse=True)
        # join teacher name and class name to class data.
        a = np.full((len(zscore_data),2),(get_initials(self.teacher_name),"10Math"))
        b = np.array(zscore_data)
        return np.concatenate((a, b), axis=1)

class Year10English(Year10):
    """
    A year 10 English class that completed an end of semester examination.
    Typically comprises of 18 to 25 students,with corresponding exam results.
    Class summary statistics are calculated and z-score for each student is determined.

    Attributes:
    teacher_name - the name of the teacher of the English class.
    class_size - number of students in the class.
    teacher_name - The name of the teacher of the class.
    class_mean - mean of exam results for class.
    rank_1 - The student with the highest result and their z_score as a tuple.
    class_data - an array of student IDs and corresponding results.

    Methods:
    summary_stats() - five number summary statistics of exam results for class are calculated, and the attributes
    class_size and class_mean are updated.
    top_zscore() - student results are converted to z_scores and printed in ascending order. Student with first
    rank is assigned to attribute rank_1.
    """
    def __init__(self, class_data: list[list[str,int]], teacher_name: str):
        Year10.__init__(self,class_data)
        self.teacher_name = teacher_name

    def __lt__(self , other):
        return self.class_mean < other.class_mean

    def __gt__(self, other):
        return ((self.class_mean) > (other.class_mean))

    def __eq__(self, other):
        return self.class_mean == other.class_mean

    def __repr__(self):
        if self.class_mean == 0:
            return f"{self.teacher_name} is yet to correct the Year 10 English exam"
        else:
            return f"{self.teacher_name}'s Year 10 English class with mean exam score of {self.class_mean}."

    def analyse_exam(self):
        # assigns results and statistics
        if bool(self.class_data):
            results, self.class_mean, class_stdev, class_size = stats(self.class_data)
            quantile_val = np.quantile(results, [0,0.25,0.5,0.75,1])
        else:
            return print("No class data available.")
        new_line = "\n"
        return print(f"{self.teacher_name}'s English class: {new_line}\
mean result is: {self.class_mean},{new_line}\
minimum is: {quantile_val[0]},{new_line}\
lower quartile is: {quantile_val[1]},{new_line}\
median is: {quantile_val[2]},{new_line}\
upper quartile is: {quantile_val[3]},{new_line}\
maximum is: {quantile_val[4]}{new_line}")

    def top_zscore(self):
        if bool(self.class_data):
            # determine statistics
            results, self.class_mean, class_stdev, class_size = stats(self.class_data)
        else:
            return print("No class data available.")
        # convert all results to zscores
        zscore_data =self.class_data
        for x in zscore_data:
            x[1] = round((x[1] - self.class_mean)/class_stdev,2)
        # find the index for element with maximum score and assign student and zscore to attribute rank_1
        index = np.array(results).argmax()
        self.rank_1 = tuple(zscore_data[index])
        # sort the data
        zscore_data.sort(key = lambda a: a[1],reverse=True)
        # join teacher name and class name to class data.
        a = np.full((len(zscore_data),2),(get_initials(self.teacher_name),"10English"))
        b = np.array(zscore_data)
        return np.concatenate((a, b), axis=1)

class Year10Science(Year10):
    """
    A year 10 Science class that completed an end of semester examination.
    Typically comprises of 18 to 25 students, with corresponding exam results.
    Class summary statistics are calculated and z-score for each student is determined.

    Attributes:
    teacher_name - the name of the teacher of the Science class.
    class_size - number of students in the class.
    teacher_name - The name of the teacher of the class.
    class_mean - mean of exam results for class.
    rank_1 - The student with the highest result and their z_score as a tuple.
    class_data - an array of student IDs and corresponding results.

    Methods:
    summary_stats() - five number summary statistics of exam results for class are calculated, and the attributes
    class_size and class_mean are updated.
    top_zscore() - student results are converted to z_scores and printed in ascending order. Student with first
    rank is assigned to attribute rank_1.
    """
    def __init__(self, class_data: list[list[str,int]], teacher_name: str):
        Year10.__init__(self,class_data)
        self.teacher_name = teacher_name

    def __lt__(self , other):
        return self.class_mean < other.class_mean

    def __gt__(self, other):
        return ((self.class_mean) > (other.class_mean))

    def __eq__(self, other):
        return self.class_mean == other.class_mean

    def __repr__(self):
        if self.class_mean == 0:
            return f"{self.teacher_name} is yet to correct the Year 10 Science exam"
        else:
            return f"{self.teacher_name}'s Year 10 Science class with mean exam score of {self.class_mean}."

    def analyse_exam(self):
        # assigns results and statistics
        if bool(self.class_data):
            results, self.class_mean, class_stdev, class_size = stats(self.class_data)
            quantile_val = np.quantile(results, [0,0.25,0.5,0.75,1])
        else:
            return print("No class data available.")
        new_line = "\n"
        return print(f"{self.teacher_name}'s Science class: {new_line}\
mean result is: {self.class_mean},{new_line}\
minimum is: {quantile_val[0]},{new_line}\
lower quartile is: {quantile_val[1]},{new_line}\
median is: {quantile_val[2]},{new_line}\
upper quartile is: {quantile_val[3]},{new_line}\
maximum is: {quantile_val[4]}{new_line}")

    def top_zscore(self):
        if bool(self.class_data):
            # determine statistics
            results, self.class_mean, class_stdev, class_size = stats(self.class_data)
        else:
            return print("No class data available.")
        # convert all results to zscores
        zscore_data =self.class_data
        for x in zscore_data:
            x[1] = round((x[1] - self.class_mean)/class_stdev,2)
        # find the index for element with maximum score and assign student and zscore to attribute rank_1
        index = np.array(results).argmax()
        self.rank_1 = tuple(zscore_data[index])
        # sort the data
        zscore_data.sort(key = lambda a: a[1],reverse=True)
        # join teacher name and class name to class data.
        a = np.full((len(zscore_data),2),(get_initials(self.teacher_name),"10Science"))
        b = np.array(zscore_data)
        return np.concatenate((a, b), axis=1)

class Year10Humanities(Year10):
    """
    A year 10 Humanities class that completed an end of semester examination.
    Typically comprises of 18 to 25 students, with corresponding exam results.
    Class summary statistics are calculated and z-score for each student is determined.

    Attributes:
    teacher_name - the name of the teacher of the Humanities class.
    class_size - number of students in the class.
    teacher_name - The name of the teacher of the class.
    class_mean - mean of exam results for class.
    rank_1 - The student with the highest result and their z_score as a tuple.
    class_data - an array of student IDs and corresponding results.

    Methods:
    summary_stats() - five number summary statistics of exam results for class are calculated, and the attributes
    class_size and class_mean are updated.
    top_zscore() - student results are converted to z_scores and printed in ascending order. Student with first
    rank is assigned to attribute rank_1.
    """
    def __init__(self, class_data: list[list[str,int]], teacher_name: str):
        Year10.__init__(self,class_data)
        self.teacher_name = teacher_name

    def __lt__(self , other):
        return self.class_mean < other.class_mean

    def __gt__(self, other):
        return ((self.class_mean) > (other.class_mean))

    def __eq__(self, other):
        return self.class_mean == other.class_mean

    def __repr__(self):
        if self.class_mean == 0:
            return f"{self.teacher_name} is yet to correct the Year 10 Humanities exam"
        else:
            return f"{self.teacher_name}'s Year 10 Humanities class with mean exam score of {self.class_mean}."

    def analyse_exam(self):
        # assigns results and statistics
        if bool(self.class_data):
            results, self.class_mean, class_stdev, class_size = stats(self.class_data)
            quantile_val = np.quantile(results, [0,0.25,0.5,0.75,1])
        else:
            return print("No class data available.")
        new_line = "\n"
        return print(f"{self.teacher_name}'s Humanities class: {new_line}\
mean result is: {self.class_mean},{new_line}\
minimum is: {quantile_val[0]},{new_line}\
lower quartile is: {quantile_val[1]},{new_line}\
median is: {quantile_val[2]},{new_line}\
upper quartile is: {quantile_val[3]},{new_line}\
maximum is: {quantile_val[4]}{new_line}")

    def top_zscore(self):
        if bool(self.class_data):
            # determine statistics
            results, self.class_mean, class_stdev, class_size = stats(self.class_data)
        else:
            return print("No class data available.")
        # convert all results to zscores
        zscore_data = self.class_data
        for x in zscore_data:
            x[1] = round((x[1] - self.class_mean)/class_stdev,2)
        # find the index for element with maximum score and assign student and zscore to attribute rank_1
        index = np.array(results).argmax()
        self.rank_1 = tuple(zscore_data[index])
        zscore_data.sort(key = lambda a: a[1],reverse=True)# sort the data
        # array of teacher initials and subject
        a = np.full((len(zscore_data),2),(get_initials(self.teacher_name),"10Humanities"))
        #array of exam data
        b = np.array(zscore_data)
        return np.concatenate((a, b), axis=1) # join teacherinitials, class name and class data.


___
1. Define a collection of entities that have inheritance and polymorphism relationships. Define comparator methods for the entities and then sort the collection both ascending and descending.

In [None]:
# Assign randomised data to class attributes and generate a list of object instances of classes.
student_no = 125 # number of students in the cohort
cohort = generate_cohort(student_no)
class_no = 6 # number of different classes of students per subject.
Class_types = [Year10Math, Year10English, Year10Science, Year10Humanities]
class_object_list = []
for C in Class_types:
    random.shuffle(cohort) # randomise students in cohort
    class_lists = np.array_split(cohort,class_no) # produces class lists of student IDs
    # list of teachers
    teachers = ["Hugh Jackman",
                "Ryan Reynolds",
                "Scarlett Johansson",
                "Charlize Theron",
                "Chris Hemsworth",
                "Jennifer Lawrence"]
    i = 0
    while (len(teachers) > 0):
        teacher_name = random.choice(teachers) #randomly assign a value to class object parameter teacher_name
        teachers.remove(teacher_name)
        # randomly assign mean, standard deviation and skew to random number distribution.
        m = random.randint(40,60) # class mean result ranges between 40 to 60
        s = random.uniform(10,25) # class sd of result ranges between 10 to 20
        l = random.uniform(-1.5, 1.5) #class skew of results ranges between -1.5 to 1.5
        student_list = class_lists[i] # select a class of students from cohort
        class_data = generate_class_result(m,s,l,student_list) # generate results
        obj = C(class_data,teacher_name) #define object of class type.
        class_object_list.append(obj)#add to list
        i += 1

In [None]:
class_object_list

[Ryan Reynolds is yet to correct the Year 10 Mathematics exam,
 Charlize Theron is yet to correct the Year 10 Mathematics exam,
 Chris Hemsworth is yet to correct the Year 10 Mathematics exam,
 Scarlett Johansson is yet to correct the Year 10 Mathematics exam,
 Jennifer Lawrence is yet to correct the Year 10 Mathematics exam,
 Hugh Jackman is yet to correct the Year 10 Mathematics exam,
 Hugh Jackman is yet to correct the Year 10 English exam,
 Charlize Theron is yet to correct the Year 10 English exam,
 Ryan Reynolds is yet to correct the Year 10 English exam,
 Jennifer Lawrence is yet to correct the Year 10 English exam,
 Scarlett Johansson is yet to correct the Year 10 English exam,
 Chris Hemsworth is yet to correct the Year 10 English exam,
 Chris Hemsworth is yet to correct the Year 10 Science exam,
 Jennifer Lawrence is yet to correct the Year 10 Science exam,
 Hugh Jackman is yet to correct the Year 10 Science exam,
 Scarlett Johansson is yet to correct the Year 10 Science exam

#### The teachers correct the exams!!

In [None]:
for obj in class_object_list:
    obj.analyse_exam()

Ryan Reynolds's Mathematics class: 
mean result is: 42.7,
minimum is: 2.0,
lower quartile is: 31.0,
median is: 47.0,
upper quartile is: 60.0,
maximum is: 68.0

Charlize Theron's Mathematics class: 
mean result is: 44.0,
minimum is: 15.0,
lower quartile is: 36.0,
median is: 42.0,
upper quartile is: 56.0,
maximum is: 70.0

Chris Hemsworth's Mathematics class: 
mean result is: 29.7,
minimum is: 3.0,
lower quartile is: 18.0,
median is: 32.0,
upper quartile is: 41.0,
maximum is: 53.0

Scarlett Johansson's Mathematics class: 
mean result is: 46.4,
minimum is: 11.0,
lower quartile is: 33.0,
median is: 49.0,
upper quartile is: 63.0,
maximum is: 73.0

Jennifer Lawrence's Mathematics class: 
mean result is: 68.2,
minimum is: 39.0,
lower quartile is: 65.0,
median is: 70.0,
upper quartile is: 75.0,
maximum is: 79.0

Hugh Jackman's Mathematics class: 
mean result is: 48.0,
minimum is: 15.0,
lower quartile is: 33.0,
median is: 48.0,
upper quartile is: 63.0,
maximum is: 87.0

Hugh Jackman's English c

#### Sorting the objects via class_mean attribute

In [None]:
class_object_list.sort() # sorts the objects in ascending order via class_mean attribute
class_object_list

[Chris Hemsworth's Year 10 Mathematics class with mean exam score of 29.7.,
 Ryan Reynolds's Year 10 Science class with mean exam score of 37.7.,
 Chris Hemsworth's Year 10 Humanities class with mean exam score of 37.9.,
 Scarlett Johansson's Year 10 Science class with mean exam score of 40.8.,
 Hugh Jackman's Year 10 English class with mean exam score of 41.2.,
 Ryan Reynolds's Year 10 Mathematics class with mean exam score of 42.7.,
 Charlize Theron's Year 10 Mathematics class with mean exam score of 44.0.,
 Scarlett Johansson's Year 10 Mathematics class with mean exam score of 46.4.,
 Hugh Jackman's Year 10 Mathematics class with mean exam score of 48.0.,
 Chris Hemsworth's Year 10 English class with mean exam score of 48.0.,
 Charlize Theron's Year 10 Science class with mean exam score of 48.2.,
 Charlize Theron's Year 10 Humanities class with mean exam score of 48.8.,
 Jennifer Lawrence's Year 10 Humanities class with mean exam score of 51.3.,
 Ryan Reynolds's Year 10 Humanities c

In [None]:
class_object_list.sort(reverse=True) # sorts the objects in descending order via class_mean attribute
class_object_list

[Hugh Jackman's Year 10 Humanities class with mean exam score of 69.2.,
 Jennifer Lawrence's Year 10 Mathematics class with mean exam score of 68.2.,
 Jennifer Lawrence's Year 10 Science class with mean exam score of 62.0.,
 Ryan Reynolds's Year 10 English class with mean exam score of 61.0.,
 Chris Hemsworth's Year 10 Science class with mean exam score of 59.1.,
 Hugh Jackman's Year 10 Science class with mean exam score of 58.5.,
 Jennifer Lawrence's Year 10 English class with mean exam score of 56.7.,
 Scarlett Johansson's Year 10 English class with mean exam score of 55.2.,
 Scarlett Johansson's Year 10 Humanities class with mean exam score of 53.7.,
 Charlize Theron's Year 10 English class with mean exam score of 53.0.,
 Ryan Reynolds's Year 10 Humanities class with mean exam score of 52.9.,
 Jennifer Lawrence's Year 10 Humanities class with mean exam score of 51.3.,
 Charlize Theron's Year 10 Humanities class with mean exam score of 48.8.,
 Charlize Theron's Year 10 Science class 

___
2. Write an example to show the union of two DataFrames. Feel free to define your own DataFrames or to use existing DataFrames. How long does it take to run the union operation?

In [None]:
# Union of two dataframes
df1 = pd.DataFrame(class_object_list[1].top_zscore())
df2 = pd.DataFrame(class_object_list[2].top_zscore())
%timeit pd.concat([df1,df2],join = 'outer') # union of two dataframes

730 µs ± 123 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


*The union operation on two dataframes take approximately 200 microseconds*

In [None]:
def union_dataframe():
    """
    Demonstrates the union of 2 dataframes. This function access arrays of class exam results,
    converts them to dataframes, iterates through each dataframe and performs the union on two at a time.
    """
    df = pd.DataFrame()
    for i,obj in enumerate(class_object_list):
        size = len(obj.top_zscore()) # rows of numpy array
        df1 = pd.DataFrame(obj.top_zscore(), # generatring numpy array of class data
                           columns = ['Teacher','Class','Student_IDs','z_scores'], #defining columns
                           index = range(i*size,(i+1)*size)) # defining row indices
        df = pd.concat([df,df1],join = 'outer') # union of two dataframes
        df['z_scores'] = df['z_scores'].astype(float)
        df.sort_values(by = "z_scores", ascending = False, inplace = True)
        df.reset_index(drop=True, inplace = True)
    return df
union_dataframe()

Unnamed: 0,Teacher,Class,Student_IDs,z_scores
0,HJ,10Humanities,TDCL0555,2.31
1,HJ,10Math,GZPV0437,2.17
2,CT,10Humanities,GPDN0718,2.15
3,CH,10English,KQAT0967,2.05
4,CT,10Humanities,OQNA0738,2.00
...,...,...,...,...
495,SJ,10Science,ASEH0735,-2.35
496,JL,10Science,VIGB0856,-2.36
497,RR,10Humanities,FCUX0870,-2.43
498,CT,10English,KUYI0208,-2.78


In [None]:
%timeit union_dataframe()

146 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


___
3. Write an example to show the merge of two DataFrames. Feel free to define your own DataFrames or to use existing DataFrames. How long does it take to run the merge operation?

### A different scenario

In [None]:
# define two dataframes
u = [i for i in range(10)]
w = [2*i for i in range(10)]
x = [(3*i - 4) for i in range(10)]
y = [(i**2 - 4) for i in range(10)]
z = [(i**2 - 2*i +1) for i in range(10)]
df1 = pd.DataFrame({
        "w_values": w,
        "x_values": x,
        "y-values": y,
        "z-values": z
    })
df2 = pd.DataFrame({
        "u_values": u,
        "x_values": x,
        "y-values": y
        })

df = df1.merge(df2,how='inner')
df

Unnamed: 0,w_values,x_values,y-values,z-values,u_values
0,0,-4,-4,1,0
1,2,-1,-3,0,1
2,4,2,0,1,2
3,6,5,5,4,3
4,8,8,12,9,4
5,10,11,21,16,5
6,12,14,32,25,6
7,14,17,45,36,7
8,16,20,60,49,8
9,18,23,77,64,9


In [None]:
%timeit df1.merge(df2,how='inner')

4.26 ms ± 749 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
