<a href="https://colab.research.google.com/drive/1t6qdzu0Wr17kjaUsntTXcQ4zzOfL8YcP" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Topic 01: Basic Software Engineering Concepts


$_{\text{©L.A. Guy, R. Surio Jr., M.A. Sustento, T.J. Vallarta| 2022 | Fudamentals of Artificial Intelligence and Data Analytics}}$

Data Science or Machine Learning is more than math or business application. It is also important to remember that AI, ML, or DS are being integrated into industry and companies for product delivery. So it is important to keep code clean, manage your files properly, and also assure readability for your projects and their codes.
This module is dedicated to reviewing the concepts of software engineering and integrating their concepts in Machine Learning programming. For this notebook we will be covering:
* Modular Programming
* Object-oriented Programming Concepts
* Documentation
* Version Control	

### Task 1: Setting the Class Grades
1. Create variable declarations for:
        - Last Names
        - First Names
        - Grades for Prelims
        - Grades for Midterms
        - Grades for Finals 
2. Create a DataFrame (Pandas) for consolidating the data from Task 1.1

In [1]:
### CODE HERE ###
import pandas as pd
import numpy as np

## Creating DataFrame
cl_df = pd.read_excel('class_grades.xlsx')          ## creating class_grades DataFrame
cl_df

Unnamed: 0,name,id,prelim_grades,midterm_grades,final_grades
0,alpha,1,94.96,95.26,92.2192
1,aleph,2,94.96,95.26,92.7792
2,beta,3,78.173333,94.36,84.32
3,beth,4,78.173333,81.372,84.64
4,cuprum,5,94.96,95.26,92.2192
5,delta,6,84.826667,92.485,83.5792
6,eta,7,93.98,97.3,95.6528
7,epsilon,8,60.7882,68.0,83.8672
8,foxtrot,9,94.306667,82.0,84.1472
9,gamma,10,96.406667,82.0,84.1472


### Task 2: Getting Class Statistics
1. Create a function named `get_grades()` for computing the semestral grade of each student:

  `inputs`: `DataFrame` of a class grade sheet.
  
  `outpus`: `DataFrame` showing:
  * The prelim, midterms, and finals grades of each student
  * The semestral grade of each student

2. Create a function named `get_class_stats()`.
  
  `inputs`: `DataFrame` from `get_grades()`.
  
  `outpus`: `DataFrame` showing:
  * The lowest and highest prelim, midterm, finals, and semestral grades
  * the mean, median, mode, and standard deviation of the grades


In [2]:
## Getting the sum of Prelim, Midterm, Finals Grade then average
def get_grades(df):
    df['Semestral Grade'] = (df.iloc[:,2:5].sum(axis=1))/3              
    return df

## Getting the lowest and highest prelim, midterm, finals, and semestral grades
## Getting the mean, median, mode, and standard deviation of the grades
def get_class_stats(df):
    get_grades(df)
    df['Lowest Prelim Grade']= df.sort_values('prelim_grades').iloc[0,2]
    df['Lowest Midterm Grade']= df.sort_values('midterm_grades').iloc[0,3]
    df['Lowest Finals Grade']= df.sort_values('final_grades').iloc[0,4]
    df['Lowest Semestral Grade']= df.sort_values('Semestral Grade').iloc[0,5]

    df['Highest Prelim Grade']= df.sort_values('prelim_grades', ascending=False).iloc[0,2]
    df['Highest Midterm Grade']= df.sort_values('midterm_grades', ascending=False).iloc[0,3]
    df['Highest Finals Grade']= df.sort_values('final_grades', ascending=False).iloc[0,4]
    df['Highest Semestral Grade']= df.sort_values('Semestral Grade', ascending=False).iloc[0,5]

    df['Mean of Grades'] = df['Semestral Grade'].mean()                         ## Mean of Semestral Grades
    df['Median of Grades'] = df['Semestral Grade'].median()                     ## Median of Semestral Grades
    df['Mode of Grades'] = df['Semestral Grade'].mode()                         ## Mode of semestral of grades
    df['Standard Deviation of Grades'] = df['Semestral Grade'].std()            ## Standard deviation of semestral grade

    return df

get_grades(cl_df)

Unnamed: 0,name,id,prelim_grades,midterm_grades,final_grades,Semestral Grade
0,alpha,1,94.96,95.26,92.2192,94.1464
1,aleph,2,94.96,95.26,92.7792,94.333067
2,beta,3,78.173333,94.36,84.32,85.617778
3,beth,4,78.173333,81.372,84.64,81.395111
4,cuprum,5,94.96,95.26,92.2192,94.1464
5,delta,6,84.826667,92.485,83.5792,86.963622
6,eta,7,93.98,97.3,95.6528,95.644267
7,epsilon,8,60.7882,68.0,83.8672,70.885133
8,foxtrot,9,94.306667,82.0,84.1472,86.817956
9,gamma,10,96.406667,82.0,84.1472,87.517956


### Task 3: Advanced Class Functions
1. Optimize `get_grades()` by eliminating for loops in your function.
2. Create a function named `get_similar_students()`
  
  `inputs`: two (2) class `DataFrames`
  
  `outputs`: list of common student numbers.

  *Note*: The function should not contain iterative code blocks (i.e. `for` or `while` loops, or list comprehension)

In [3]:
### CODE HERE ###
clist_df1 = pd.read_excel('classlists.xlsx', 'class1')          ## creating class list DataFrame 
clist_df2 = pd.read_excel('classlists.xlsx', 'class2')

## Getting the sum of Prelim, Midterm, Finals Grade then average
def get_grades(df):
    df['Semestral Grade'] = (df.iloc[:,2:5].sum(axis=1))/3
    return round(df,2)

## Getting similar students number in two class
def get_similar_students(cl_1,cl_2):
    df_similar = pd.merge(cl_1,cl_2)           ## Merging two dataframe and getting the Inner Join
    return df_similar[df_similar.columns[0]].values.tolist()                         ## Changing DataFrame to a list type

get_similar_students(clist_df1, clist_df2)


[201005821,
 201008697,
 201005415,
 201009096,
 201105430,
 201109930,
 201005367,
 201007925,
 201109299,
 201002173,
 201108238,
 201006042,
 201009820,
 201009820,
 201105857,
 201009997,
 201101893,
 201101893,
 201005822,
 201102320,
 201105259,
 201106780,
 201106199,
 201102471,
 201105592,
 201102510,
 201003675,
 201102367,
 201108241,
 201108071,
 201109056,
 201002279,
 201009157,
 201003688,
 201005275,
 201102397,
 201004454,
 201105896,
 201106739,
 201005150,
 201008775,
 201009967,
 201105318,
 201109680,
 201101996,
 201103263,
 201102318,
 201109823,
 201009671,
 201103024,
 201003730,
 201104063,
 201104594,
 201108816,
 201004865,
 201007074,
 201002245,
 201005482,
 201002849,
 201007043,
 201105295,
 201009959,
 201105014,
 201008815]

### Task 4: Class of Classes
1. Modify the `class` `Section` and integrate the functions from Tasks 1 to 3 as its methods. Make sure that the codes are optimized.
2. Create a `method` named `get_failed()` wherein it will create a list of all the failed students in the class.
3. Create a `method` named `fail_count()` wherein it will return the count of the failed students.

** **Note:** due to data privacy, the data privacy office has mandated your code to refrain from printing the student number and names of students. Their student numbers and names  must be masked with asterisks.

In [4]:
### CODE HERE ###
class Section:
    def __init__(self, max_pop, cl_grade=0, class_list1=0, class_list2=0):
        self.max_pop = max_pop
        self.cl_grade = cl_grade
        self.class_list1 = class_list1
        self.class_list2 = class_list2

    ## Cloning a copy of DataFrame changing the data into asterisk
    def masking(self, copy):
        df_copy = copy.copy()
        df_copy['name'].mask(df_copy['name'] == df_copy['name'], '******', inplace=True)
        df_copy['id'].mask(df_copy['id'] == df_copy['id'], '******', inplace=True)
        return df_copy
        
    ## Count the number of students in class
    def count_class(self):
        return len(self.cl_grade)

    ## Checking if students are more than the max population of class
    def is_overloaded(self):
        is_overload = True if self.count_class() > self.max_pop else False
        return is_overload
    
    ## Getting the sum of Prelim, Midterm, Finals Grade then average
    def get_grades(self):
        self.cl_grade['Semestral Grade'] = (self.cl_grade.iloc[:,2:5].sum(axis=1))/3
        return self.masking(self.cl_grade)
    
    ## Getting the lowest and highest prelim, midterm, finals, and semestral grades
    ## Getting the mean, median, mode, and standard deviation of the grades
    def get_class_stats(self):
        get_grades()
        self.cl_grade['Lowest Prelim Grade']= self.cl_grade.sort_values('prelim_grades').iloc[0,2]
        self.cl_grade['Lowest Midterm Grade']= self.cl_grade.sort_values('midterm_grades').iloc[0,3]
        self.cl_grade['Lowest Finals Grade']= self.cl_grade.sort_values('final_grades').iloc[0,4]
        self.cl_grade['Lowest Semestral Grade']= self.cl_grade.sort_values('Semestral Grade').iloc[0,5]

        self.cl_grade['Highest Prelim Grade']= self.cl_grade.sort_values('prelim_grades', ascending=False).iloc[0,2]
        self.cl_grade['Highest Midterm Grade']= self.cl_grade.sort_values('midterm_grades', ascending=False).iloc[0,3]
        self.cl_grade['Highest Finals Grade']= self.cl_grade.sort_values('final_grades', ascending=False).iloc[0,4]
        self.cl_grade['Highest Semestral Grade']= self.cl_grade.sort_values('Semestral Grade', ascending=False).iloc[0,5]

        self.cl_grade['Mean of Grades'] = self.cl_grade['Semestral Grade'].mean()                           ## Mean of Semestral Grades
        self.cl_grade['Median of Grades'] = self.cl_grade['Semestral Grade'].median()                       ## Median of Semestral Grades
        self.cl_grade['Mode of Grades'] = self.cl_grade['Semestral Grade'].mode()                           ## Mode of semestral of grades
        self.cl_grade['Standard Deviation of Grades'] = self.cl_grade['Semestral Grade'].std()              ## Standard deviation of semestral grade

        return self.cl_grade
    
    ## Getting similar students number in two class
    def get_similar_students(self):
        df_similar = pd.merge(self.class_list1, self.class_list2)                                   ## Merging two dataframe and getting the Inner Join
        return df_similar[df_similar.columns[0]].values.tolist()                                         ## Changing DataFrame to a list type
    
    ## Getting the failed students (Private class due to data privacy)
    def __get_failed(self):
        self.get_grades()
        filter_failed = (self.cl_grade['Semestral Grade'] <= 69.9)
        return self.cl_grade.loc[filter_failed]

    ## Getting the number of failed students
    def fail_count(self):
        return print("The number of faied student/s: ",len(self.__get_failed()))



section = Section(40, cl_df)
section.fail_count()

The number of faied student/s:  1



---
**END OF LABORATORY**

---