## Software (Python) Engineering Notebook 1

The Software (Python) Engineering Notebook 1 serves as an onboarding notebook for incoming new developers.

This notebook will focus more on some of the expected practices and recommended guideline surrounding git, style guides and project elements.

Future notebooks will discuss more topics surrounding code quality.

### 1. Common Git Commands and Practices

As intern and new developers, we will be working in teams and sharing a common repository. Therefore, communication and tracking of changes to the code base are very important.

The following section will explain how to work as a member in a team in the context of Git.

<img src="assets/images/git_flow.png" width="400" height="400">



##### 1.1 Sync your Local Feature Repository with the Remote Dev Repository


```git pull``` runs ```git fetch``` and a second command by default ```git merge```. It is a good idea to do a ```git pull``` every morning, after lunch or when you are done with your codes to ensure your local feature branch is up to date with the remote dev branch.

If there is a conflict, it can also be caught early and resolved early.

```git pull``` also requires your work to be committed because of the ```git merge``` command.

If you are in the middle of your work and is not ready to commit your work, you can use ```git stash``` to shelve your work and do a ```git pull``` to update local feature branch. Finally, ```git stash pop``` to get your work back.

Note: by default, ```git stash``` will stash only the following
* changes that have been added to your index(staged changes)
* changes made to files that are currently tracked by Git (unstaged changes)

But it will not stash:
* new files that have not yet been staged
* files that have been ignored.

##### 1.2 Working on Multiple Branches in the same Remote Repository (Keep Single Branch Small)

```git checkout```

The ```git checkout <branch name>``` allows you to switch branches in your local machine. What happens during ```git checkout``` is that it switches the information stored in <b>index</b>(staging) to your local directory. This means that you can access folders and files of the different branches in the repo locally on your code editor.

This is important because every branch should be contained to the issue that it is solving. Therefore, if you are working on different issues for the repository at the same time. You should switch branches accordingly.

<span style="color:red">Also, it is good practice to keep a single branch small and focused so it is easy to review and merge 

##### 1.3 Commit Your Codes Often

One common bad habit that has been observe is that new developers do not commit their codes often enough. It is important to commit codes often because it will communicate the changes to other developers on the same project and it also allows for the project to revert to a certain point in time based on the commit.

Therefore, it is important for new developer to understand how commit works and commits often. (Refer to the diagram above for below section)

```git add -> git commit```

The ```git add <filename>``` command adds the changes made to the <b>index</b>(staging). It is bad practice to add all changes in one single ```git add -A``` command. This makes reverting back to previous code version difficult as you want to be able to reverse back to very specific points in "time". This mean that it is better for you to add single file or at least files for a particular change.

```git commit -m "text"``` then make the changes from the <b>index</b>(staging) to the local repository with a text message. This text message is very important and will be covered later.
 
How often is often? It could be after you added a class or made some changes to a function or modification to the documentation. 

You'll need to make judgement accordingly when is a good time to add and commit. However, usually you will be ```git add``` and ```git commit``` multiple times a day.


##### 1.4 Do Not Ever Force Push Your Changes

```git push```

The ```git push``` command is straight forward in the sense it pushes your local work to the remote repository.

<span style="color:red">However, DO NOT ever force your git push if there is an issue</span>

You may unwittingly "overwrite" or "change" other team mates work that were pushed earlier. You should always de-conflict with the team if there is an issue.

##### 1.5 How to write good commit messages?

Proper and good commit messages are beneficial because
* It's a way of communication between members who work with each other's code.
* It can help save time in finding recent changes related to a bug.
* It can help us find out where and how to add the code.
* It saves you time for your own tasks as people come look for you to explain less.

<span style="color:red">The below guidelines are opinionated and differs often depending on projects and teams. Therefore, check with your Lead if he/she has a certain template for your commit messages.<span>

<u>Content of the message</u>
* It should be easily understandable
* It should be unambiguous
* It should be just enough and not overtly detailed
* A single commit message ideally should contain only 1 kind of change. For example, a bug fix and a minor refactoring should not be committed together.
* It should describe the change to the behavior of the code, not the actual code changes which can already be seen.
* More intuitively, the messages are answering,"What are the changes for?" over "What are the changes?".
* Specify the type of commit in the title
* Specify the issue number or ticket number in the description

<u>Formatting of the message</u>

* Length of title - below 50 characters <br>
* Length of description - up to 72 characters

```
    git commit -m <title> -m <description>
```


In [1]:
# Example, scroll to see the whole message

"""

git commit -m "add an image dataloader function" -m "The pipeline class can now load images into pytorch tensor from a csv file that maps images filepath"

"""

'\n\ngit commit -m "add a dataloader function" -m "The pipeline class can now load images into pytorch tensor from a csv file that maps images filepath"\n\n'

##### 1.6 How to Name your Branch in the Repository?

Within the gitlab repository, the ```main``` branch will always be protected. No one is supposed to be working in this branch.

From the ```main branch```, the Lead will create a ```dev``` branch which all merge requests will be to merge into this ```dev``` branch.

One common way to name your issue branch:

* issue-number_description_your-name
* 28-build_nosql_parser_chrisho

The issue-number is the number from the task in the issue board.

![image](assets/images/issue_number.jpg)

<span style="color:red">However, check with your Lead on the way he/she wants you to name your branches. </span>

In [2]:
# Example for above task
"""

28-build_nosql_parser_chrisho

"""

'\n\n28-build_nosql_parser_chrisho\n\n'

##### Further Resources

1. [githowto-tutorial](https://githowto.com/)
2. [reflectoring-meaningful-commit-messages](https://reflectoring.io/meaningful-commit-messages/)
3. [freecodecamp-how-to-write-better-git-commit-messages](https://www.freecodecamp.org/news/how-to-write-better-git-commit-messages/)

### 2. Basic Coding Styles Guides

Main Mantra of this section is quoted from PEP8 in red below. It is the core idea for you to take away.

<span style="color:red">"The guidelines provided here are intended to improve the <b>readability</b> of code and make it <b>consistent</b> across the wide spectrum of Python code.</span>

<span style="color:red">A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is <b>MORE</b> important. Consistency within one module or function is the <b>MOST</b> important.
</span>

Therefore

<span style="color:red">Always check with your Lead for any deviation from this guide</span>


##### 2.1.0 Managing Imports

For imports, it is not sufficient to just list them alphabetically. You will need to consider dividing them up into three sections on top of alphabetical order.

First section is the Generic/Built-in libraries. These are libraries that comes with Python such as "os", "time", "pathlib" etc

Second section is the third-party libraries. These are libraries you've installed into your virtual environment such as pytorch, sklearn etc.

Third section is your own custom made modules or packages which you wrote and need to import in.

<span style="color:red">Finally, remember to remove any dangling imports which your code is not using.

In [1]:
### For Notebooks put all your imports at the top ###

import yaml

### For py files: ###

#!/usr/bin/env python

####################
# Required Modules #
####################

# Generic/Built-in

# Libs

# Custom

##################
# Configurations #
##################

##################################
# All your Classes and Functions #
##################################

# if __name__ == '__main__':
#     pass

##### 2.2.0 Naming Convention, Docstrings and Type-Hinting

In [2]:
class StoreClass:
    """
    A class to represent a store with functionalities to track and update inventory.

    Attributes:
    -----------
    store_id: str
        Unique identifier for the store.
    inventory: Dict[str, Tuple[int,float]]
        Dictionary of products available in the store. The Keys are product ids and
        Values are a tuple containing quantity in store and unit price.
    revenue: float
        Total revenue for the store.

    Methods:
    --------
    get_total_price(quantity:int, price:float)->float:
        This function calculates the total price based on unit quantity and unit price

    update_after_sale(self, product_id:str, sold_quantity:int)->None:
        This function updates the store inventory and revenue after a sale.
    
    """
    def __init__(
            self, 
            store_id:str,
            inventory:dict[str,tuple[int,float]],)->None:
        """
        Construct all the necessary attributes for the store object.

        Parameters:
        -----------
        store_id: str
            Unique identifier for the store.
        inventory: Dict[str, Tuple[int,float]]
            Dictionary of products available in the store. The Keys are product ids and
            Values are a tuple containing quantity in store and unit price.
        revenue: float
            Total revenue for the store.
        """
        self.store_id = store_id
        self.inventory = inventory
        self.revenue = 0.0

    @staticmethod
    def get_total_price(quantity:int, price:float)->float:
        """This function calculates the total price based on unit quantity and unit price

        Args:
            quantity (int): total number of products
            price (float): unit price of the product 

        Returns:
            float: total price of the total number of products
        """
        return quantity * price
    
    def update_after_sale(self, product_id:str, sold_quantity:int)->None:
        """This function updates the store inventory and revenue after a sale.

        Args:
            product_id (str): The unique identifier of the product sold
            sold_quantity (int): The quantity of the product sold
        
        Returns:
            None
        """
        current_quantity, price = self.inventory[product_id]
       
        # Update new quantity after sale
        new_quantity = current_quantity - sold_quantity
        self.inventory[product_id] = (new_quantity, price)
        
        # Update revenue after sale
        self.revenue += self.get_total_price(sold_quantity, price)

We will be using the above synthetic simple class and function as an example to demonstrate each component of naming convention, docstrings and typing.

<span style="color:red">If you are not familiar with writing classes,future notebooks will touch on how to write a class, subclassing and name mangling etc. 

##### 2.2.1 Style Guide for Constant, Variables and Functions

It is good practice to declare constants through a config.yaml file and written in all capital letters. See below for example.

In [3]:
with open("config.yaml", "r") as file:
    CONFIG = yaml.safe_load(file)

# Constant are declared in all capital letters.
NUM_EPOCHS = CONFIG["number_epochs"] 

print(NUM_EPOCHS)

5


Function name should start with a relevant verb that make sense according to the operation and output.

Variable names should be meaningful like "quantity" and "price" in the example below.

Both function and variable names should be lowercase, with words separated by underscores as necessary. These practices will help reader understand your codes easier.

On Type-Hinting, it is important to add what are the variable types such as "int" and "float" in the example to help reader and team to understand what type of data goes into the function. 

It is also important to document the function's return type which in this case is "float".

For Docstrings, it is important to write them if the function is of not trivial size or the logic is not intuitive. 

See below example on how to write """Docstrings""" with details on the arguments and what it returns.

<img src="assets/images/function_variable_names.jpg">

##### 2.2.2 Style Guides for Classes and Methods

The class name should be a CamelCase which is to capitalized the first letter of each word without space. It should also have a docstring below to describe the class, attributes and methods. See below for naming example.

The class name is usually noun-based.

For class methods, it is mostly similar to functions in terms of styles. See full example.

<img src="assets/images/class_naming.jpg">

##### 2.2.3 Additional Guides for Type-Hinting

Eventually, you will notice that Type-Hinting is a task that requires diligence. As it may not be enough to say that the variable is a list. 

You will need to document it is a list of what?

This is very important for team mates to understand, integrate or modify codes that you wrote.

In the below example, inventory is a dictionary with key that is string and value that is a tuple of an integer and a float.

<img src="assets/images/type-hinting.jpg">

In [4]:
# Create Store Inventory Data
inventory_data = {"ID01":(50,1.2),"ID02":(25,3.5)}

# Init Store Class
my_store= StoreClass(store_id="AISG", inventory=inventory_data)
print(my_store.inventory)
print(my_store.revenue)

# Assume the store sold 10 of ID01
my_store.update_after_sale(product_id="ID01", sold_quantity=10)
print(my_store.inventory)
print(my_store.revenue)

{'ID01': (50, 1.2), 'ID02': (25, 3.5)}
0.0
{'ID01': (40, 1.2), 'ID02': (25, 3.5)}
12.0


##### 2.3.0 Line Break Formatting 

Typically for readability, PEP8 suggest to limit all lines to a maximum of 79 characters. PEP8 also has suggestions on how to break lines if it gets too long.

However, you will notice from different libraries that different developers break their lines differently.

<span style="color:red">Therefore, check with your Lead on how he/she wants you to break your long lines. There may even be some instances where longer lines are prefered.<span>

In [5]:
# What line breaking looks like?
def function(arg_1: str, arg_2: float, arg_3: list[int], arg_4: tuple[int], arg_5: float, arg_6: str) -> None:
    pass

# Versus below

def function(
        arg_1: str,
        arg_2: float,
        arg_3: list[int],
        arg_4: tuple[int],
        arg_5: float,
        arg_6: str) -> None:
    pass

# Example 2

first_number=1
second_number=2
third_number=3

value = (first_number + second_number - third_number)

# Break before binary operator. Some may believe the above is still very readable
# Hence, check with your Project Lead on the way line should break.

value = (first_number
         + second_number
         - third_number)

##### 2.3.1 Automating Code Formatting (Black)

Black is a popular code formatting library that can help you auto-format your code with consistency, especially if the entire team adopts it.

You can install the extension if you are using VS code.

<img src="assets/images/black_vscode.png">

<span style="color:red">Do check with your Lead if he/she wants to adopt this.

##### 2.4.0 Commenting Your Code Inline

A program has 2 very different audiences. 

* Human reader who needs to read your comments at times to understand your code.
* Machine who ignores your comments and execute just your code.

This means there is a delicate balance for you to practice between what to comment inline and how to comment.

You have to comment to help readers understand.

But you also don't want to comment unnecessarily and make the file too long with many text that do not add value and becomes hard to maintain down the road when you start making changes.

See some best practices below.

In [6]:
# Make comments meaningful. Comments should not duplicate code.

# Good comments explain why code is written the way it is.

# Use clear and concise language.

# Comment on your classes, methods and functions (context)

# Include relevant links from documentation, stackoverflow, research papers

# If you find it hard to comment, there is probably something wrong with the code.

# Don't over-comment

### 3. Managing Project Elements

##### 3.1.0 Managing Folder Structures

It is important to get the project structure correct at the start, so you won't have to deal with the conflicts down the line when your team mates have a different project structure in mind.

This is a typical AI/ML project structure.

<span style="color:red">However, project structure is also very dependent on the type of project and scope. Therefore, check with your Lead on the project structure early on to reduce conflicts later on.<span>

```
data/
    input/
        {name_version}/
            train/
            test/
            validation/
    output/
        {experiment}/
            prediction/
src/
    __init__.py
    main.py
    module.py
    package/
        __init__.py
        package_module.py
config.yaml
.env
.gitignore
requirement.txt
README.md

```

##### 3.2.0 Managing Environment Variables and Configurations

<span style="color:red"> Not just don't push, NEVER commit your secrets, keys, password and sensitive file paths to Git. Always use a .env file to store them.<span>

You can see the example below on how to use .env to load your environment variables.

At the same time, you also don't want to hard-code any configuration or hyper-parameter into the code itself. For such variables, use a config.yaml file instead. Load the configuration after your imports before your code starts.

In [3]:
import os
import yaml

from dotenv import load_dotenv

load_dotenv()

# change .env.example to .env in order to load the data
SECRET_KEY = os.getenv("SECRET_KEY")
print(SECRET_KEY)

with open("config.yaml", "r") as file:
    CONFIG = yaml.safe_load(file)

# load configuration after import before your code start
NUM_EPOCHS = CONFIG["number_epochs"] 

print(NUM_EPOCHS)

None
5
