<a href="https://colab.research.google.com/github/dyjdlopez/fund-of-aiml/blob/main/activities/01%20-%20Software%20Engineering%20Concepts%20and%20Review/fund_aiml_01v1_2021.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Topic 01: Software Engineering Concepts Review
$_{\text{©D.J. Lopez | 2021 | Fudamentals of Machine Learning}}$

Data Science or Machine Learning is more than math or business application. It is also important to remember that AI, ML, or DS are being integrated into industry and companies for product delivery. So it is important to keep code clean, manage your files properly, and also assure readability for your projects and their codes.
This module is dedicated to reviewing the concepts of software engineering and integrating their concepts in Machine Learning programming. For this notebook we will be covering:
* Modular Programming
* Object-oriented Programming Concepts
* Documentation
* Version Control	


# 1. Modular Programming
Creating readable and understandable code is needed in the industry or any project that you might do. It does not just benefit yourself in understanding your code or project but also other teams, users, and even future researchers. In this module, we are going to discuss how to format your projects and codes for readability and understandability. One of the techniques to achieve this is through modular programming.

The idea of modular programming is to make your code granular for reusability. It does not only makes your code simpler but also optimal. The goal of making your code more efficient does not only look into visual representation but also its optimization.


## 1.1 Refactoring

Refactoring your code means restructuring your code without changing the expected output. We consider code refactoring while optimizing our code in terms of naming or even in optimizing methods. The first activity of refactoring that we will do or consider is renaming and the other is "functionizing" your code. 

### *Renaming*

You might think discussing this topic must be pointless or too simplistic but it is actually one of the most frequent mistakes that data scientists, professors, or researchers usually do. Naming variables and functions properly will bring clarity to understanding the code.

In [None]:
'''
    CASE 1: Setting the class grades.

    The case for this section would be computing for the grades of a 
    certain class. For the next cells will be looking in how can we create a 
    class analytics implementation.
'''
### Example of bad naming in module import nicknames.
### Nicknames should be short and still relevant to the module.

# import pandas as food_panda 
## bad, since not representative and long
#import pandas as nd 
##bad, although short it is not representative
import pandas as pd 
##good, aside form community acceptance it is short and representative

ln = ["Ingles", "Español", "Cruz", "Jones"]
# bad, since "ln" is ambiguous.

names = ["Ingles", "Español", "Cruz", "Jones"] 
# good, since descriptive

lnames = ["Ingles", "Español", "Cruz", "Jones"]
# better, more specific that itis about last names

last_names_of_the_class = ["Ingles", "Español", "Cruz", "Jones"] 
# bad, too long


### Task 1: Setting the Class Grades
1. Create variable declarations for:
        - Last Names
        - First Names
        - Grades for Prelims
        - Grades for Midterms
        - Grades for Finals 
2. Create a DataFrame (Pandas) for consolidating the data from Task 1.1

In [None]:
### CODE HERE ###

### *Code Reusability*

Just like in any writing we want to reduce redundancies in our content. Just like following the rules of brevity we also consider that in creating cleaner code. One way of applying brevity to code is code reusability wherein you refrain from creating code that you repeat in encoding. This is where the use of functions will be most used. Consider the following code below.

In [None]:
'''
    CASE 2: Computing grades.

    For this cell we are going to look into creating functions for the class.
    This includes computing for their grades and some classroom management
    routines.
'''
### Consider this procedural programming script
gs_1 = [89.4, 78.2, 88.0, 28.5, 67.3]
# Get average grades of the class
class_g = 0
for g in gs_1:
  class_g += g
cmean = class_g/len(gs_1)
print(cmean)

In [None]:
### Now let's say you want to reuse the code above for a different class
### you might just copy and paste in like the code below and change some 
### variable naming

gs_2 = [99.4, 64.5, 87.2, 68.5, 57.]
# Get average grades of the class
class_g = 0
for g in gs_2:
  class_g += g
cmean = class_g/len(gs_2)
print(cmean)

75.34


In [None]:
### That works but if you keep doing that for production, it would just look
### like spaghetti code. Instead, if you are repeating code, consider creating
### a function.

def average(grades):
  grades_class = 0
  for g in grades:
    grades_class += g
  avg = grades_class/len(grades)
  return avg


In [None]:
### The code above would be reusable but it is not easy to read. We need to 
### enforce the code refactoring for renaming not just the variables but also
### the functions we are creating. So if we would refactor the 'average' 
### Function earlier it could be:

def get_class_ave(arr):
  class_grades = 0
  for grade in arr:
    class_grades += grade
  class_avg = class_grades/len(arr)
  return class_avg

### Task 2: Getting Class Statistics
1. Create a function named `get_grades()` for computing the semestral grade of each student:

  `inputs`: `DataFrame` of a class grade sheet.
  
  `outpus`: `DataFrame` showing:
  * The prelim, midterms, and finals grades of each student
  * The semestral grade of each student

2. Create a function named `get_class_stats()`.
  
  `inputs`: `DataFrame` from `get_grades()`.
  
  `outpus`: `DataFrame` showing:
  * The lowest and highest prelim, midterm, finals, and semestral grades
  * the mean, median, mode, and standard deviation of the grades


In [None]:
### CODE HERE ###

## 1.2 Code Optimization

When we consider your code to be part of production or for customer delivery we must consider that they are not laggy nor too space-consuming for devices. This is a common practice in data science and machine learning wherein training algorithms take up resources from computers or considering the space a model consumes at your workspace or your customer’s device.

### *Space Complexity*

This would specifically refer to your choice of data structures. This is a job for reviewing data structures and algorithms for space complexity. Although in Python for AI we are looking into more data structures like matrices and data frames.

In [None]:
'''
    CASE 3: Class management.

    For this cell we are going to look into optimizing functions for the class.
    This includes finding unique data and similar data amnong class materials.
'''
## The following code will optimize how to efficiently store data into memory
enrollees_2016_2020 = ['15022', '18302', '8845', '9203', '10035']
enrollees_2016_2020 = [15022, 18302, 8845, 9203, 10035]

In [None]:
img1 = [[125.4543, 254.0001], [64.2132, 84.5336]]
img1

[[125.4543, 254.0001], [64.2132, 84.5336]]

In [None]:
import numpy as np
img1_vector = np.array(img1)
img1_vector

array([[125.4543, 254.0001],
       [ 64.2132,  84.5336]])

In [None]:
img1_quantized = np.array(img1_vector, dtype='int32')
img1_quantized

array([[125, 254],
       [ 64,  84]], dtype=int32)

### *Time Complexity*

If you would recall back in your data structures and algorithms this pertains to the Big-O notation. However, in this notebook, we’ll look into specific and intuitive samples of time complexity. We are going to look at specific AI use such as the concept of vectorization.

In [None]:
## The following code wil try to optimize getting unique elements from the given
## list of class codes

curr_2018_courses = ['220', '270', '318', '450', '101', '768', '223L', '223',
                     '727', '418', '673', '672', '450', '124', '771', '654',
                     '231', '768', '768', '224', '654', '673']
### print an array that shows only the unique elements
### Code from Geeks for Geeks
def unique(arr):
     # initialize a null list
    unique_list = []
     
    # traverse for all elements
    for item in arr:
        # check if exists in unique_list or not
        if item not in unique_list:
            unique_list.append(item)
    return sorted(unique_list)

unique(curr_2018_courses)  

In [None]:
### Optimized way
sorted(list(set(curr_2018_courses)))

In [None]:
### Optimizing Functions
import math
def get_floor_binlog(num):
  return math.floor(math.log2(num))
get_floor_binlog(31)

4

In [None]:
### Lambda Functions
binlog = lambda num: math.floor(math.log2(num))
binlog(31)

4

In [None]:
### Optimizing Booleans
val = 1
if val%2 == 0:
  val = 1
else:
  val = 2

In [None]:
### Singe-line Boolean Statemets
val = 1
val = 1 if val%2 == 0 else 2
val

2

In [None]:
### Optimizing Iterations
bits = [1,0,1,1,1]
not_bits = []
for bit in bits:
  not_bits.append(int(not(bit)))
print(not_bits)

[0, 1, 0, 0, 0]


In [None]:
### List Comprehensions
bits = [1,0,1,0,1]
not_bits = [int(not(bit)) for bit in bits]
not_bits

[0, 1, 0, 1, 0]

In [None]:
### A Networks Example
def get_floor_binlog(num):
  return math.floor(math.log2(num))
def get_subnet_bits(user_count):
  
  floor_bin = get_floor_binlog(user_count) # gets the floor bits from 
                                           # the user count
  if floor_bin == 0: # assign 1 if bit count is 0.
    floor_bin = 1
  bits = [] # instantiate the bit container
  for i in range(floor_bin): # add a bit to the list of bits according to the
                             # bit count.
    bits.append(1) 
  return bits

get_subnet_bits(32)

[1, 1, 1, 1, 1]

In [None]:
#### An Optimized Networks Example
floor_bin = lambda num: math.floor(math.log2(num))

def get_subnet_bits(user_count):
  
  nbits = floor_bin(user_count) # gets the floor bits from the user count
  nbits = 1 if nbits==0 else nbits # assign 1 if bit count is 0.
  bits = [1 for i in range(nbits)] # add a bit to the list of bits according 
                                   # to the bit count.
  return bits

get_subnet_bits(32)

[1, 1, 1, 1, 1]

In [None]:
## The following code wil try to optimize solving for the inner product of two
## lists
vectA = [1,2,3,1,-1,2]
vectB = [-1,4,5,3,2,0]

In [None]:
def inner_product(vect1, vect2):
  result = 0 
  for va,vb in zip(vectA, vectB):
    result += va*vb
  return result
inner_product(vectA, vectB)

23

In [None]:
## Optimized method
np.array(vectA) @ np.array(vectB) 

23

### Task 3: Advanced Class Functions
1. Optimize `get_grades()` by eliminating for loops in your function.
2. Create a function named `get_similar_students()`
  
  `inputs`: two (2) class `DataFrames`
  
  `outputs`: list of common student numbers.

  *Note*: The function should not contain iterative code blocks (i.e. `for` or `while` loops, or list comprehension)

In [None]:
### CODE HERE ###

# 2. Some Object-Oriented Programming Concepts

Object-oriented programming (OOP) is undoubtedly a foundational programming course for Software Engineering. The idea of OOP is to transform procedural programming into a more modular programming style. In this section, we are going to review the fundamentals of OOP such as objects, classes, and encapsulation.

## 2.1 Objects and Classes
Learning about objects and classes is one of the foundations of OOP. Think of objects as any object that you could see. Like cars, shirts, humans, professions, even your favorite anime characters. Classes on the other hand are the categories or concepts of those things. In this section, we’ll review how to instantiate classes and objects.

### *Attributes and Methods*
Just like any object are just things we could describe them or they could have what we call descriptions or characteristics. In OOP we call these attributes. Attributes can be any value representing an aspect of the object which is inclusive to its concept of class. For example, a tree (object)  under the category of plants (class) could have attributes such as its name, age, height, or even the amount of oxygen it produces.

Methods, on the other hand, are interactions that could be done by the object or could be done onto the object. For example for our tree, it could grow, photosynthesize, absorb water, or we could harvest its fruits. In this example, we can try to review the attributes and methods of a class or object.

In [None]:
'''
    CASE 4: Class of Classes

    For this section we are going to look at implementing the fundamentals of
    OOP for our class routines.
'''
class Section:
  def __init__(self, max_pop, class_list):
    self.max_pop = max_pop
    self.class_list = class_list
  
  def count_class(self):
    return len(self.class_list)

  def is_overloaded(self):
    is_overload = True if self.count_class() > self.max_pop else False
    return is_overload

### *Class Instantiation*
Now that we have created our class let’s try to put them to use in code. To use classes and objects in your code we are going to instantiate them. There are several ways to instantiate an object and we will be trying them out in this section.

In [None]:
CpE_58053 = Section(40, ['John'])
print(CpE_58053.is_overloaded())
print(CpE_58053.count_class())

False

## 2.2 Encapsulation
The idea of incapsulation is to restrict users or developers from using or tampering with the internal attributes or methods of a class. This would be similar to using the private or public property in Java, C++, or C#. Python also enforces this in programming but in a different way. 

Not every procedure or transaction in a class should be shown to the users. We used the concept of encapsulation to solve this. In Python, all attributes, variables, or methods are public by default. In this section, we will see how to make methods and attributes private.

In [None]:
class Section:
  def __init__(self, max_pop, class_list):
    self.max_pop = max_pop
    self.class_list = class_list
  
  def __count_class(self):
    return len(self.class_list)

  def is_overloaded(self):
    is_overload = True if self.__count_class() > self.max_pop else False
    return is_overload

In [None]:
CpE_58054 = Section(2, ['Jeanne', 'Pietro', 'Dude'])
print(CpE_58054.is_overloaded())
print(CpE_58054.__count_class())

### Task 4: Class of Classes
1. Modify the `class` `Section` and integrate the functions from Tasks 1 to 3 as its methods. Make sure that the codes are optimized.
2. Create a `method` named `get_failed()` wherein it will create a list of all the failed students in the class.
3. Create a `method` named `fail_count()` wherein it will return the count of the failed students.

** **Note:** due to data privacy, the data privacy office has mandated your code to refrain from printing the student number and names of students. Their student numbers and names  must be masked with asterisks.

In [None]:
### CODE HERE ###



---
**END OF LABORATORY**

---
