<a href="https://colab.research.google.com/github/d-tomas/text-mining/blob/main/notebooks/lecture_1-extra.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PEP 8

PEP 8 is the official style guide for Python code, providing conventions for writing clean and readable code. It includes guidelines on naming conventions, code layout, indentation, line length, and spacing, among others. By following PEP 8, developers can maintain consistent code formatting, making it easier to read, understand, and collaborate with others.

## Names

If there's one thing to take from *PEP 8*, it's this. Simple and effective:

* *Variables*: use lowercase letters and underscores to separate words
* *Functions*: use lowercase letters and underscores to separate words
* *Classes*: capitalize the first letter of each word (*CapWords*)
* *Constants*: use uppercase letters and underscores to separate words
* If you need to use a reserved word, add an underscore at the end
* Avoid using names that are too short like `fn`; instead, write `first_name`
* Never use lowercase `l`, uppercase `O`, or uppercase `I` as variable names; they are easily confused with `1` or `0`


In [None]:
students = 10  # Variable

def calculate_area(length_x, length_y):  # Function
    return length_x * length_y

class FlyingPig:  # Class
    weight = 25

PI = 3.1415927  # Constant

class_ = 'Mammal'  # I'm stubbornly using 'class' as a variable name

## Indentation

Since Python uses indentation instead of braces to denote the scope of a block of code, it's crucial to have a clean and consistent indentation style:

* Guidelines are not too strict, with multiple options; the key is to choose a convention and stick to it.
* *PEP 8* settles the spaces vs. tabs debate, recommending four spaces instead of a tab character.
* Always indent code that spans multiple lines to improve readability
* For long lists of values, either indent based on the opening delimiter or use a hanging indent
* When using *hanging indents*, there should be no value on the first line, and an extra level of indentation should be added if there is a block afterward
* In the case of multiline statements, the closing symbol can be aligned with the indentation used or with the first character of the statement


In [None]:
# This block won't compile! It's just for examples

# Long list of values with 'opening delimiter'
result = my_function(first_argument, second_argument,
                    third_argument, fourth_argument)

# Long list of values with 'hanging indent'
result = my_function(
    first_argument, second_argument,
    third_argument, fourth_argument)

# Long list of values with 'hanging indent' and block afterward
def my_function(
        first_argument, second_argument,
        third_argument, fourth_argument):
    print(first_argument)

# Multiline statement alignment with whitespace
my_list = [
    1, 2, 3,
    4, 5, 6,
    ]

# Multiline statement alignment with the first character
my_list = [
    1, 2, 3,
    4, 5, 6,
]

## Whitespace and Line Breaks

PEP 8 has quite strict guidelines on when to break long lines and add whitespace.

Use the following guidelines as a starting point; the most important thing is to be consistent.

Criteria for line breaks:

* It's recommended not to include multiple statements on the same line
* Keep line lengths below 79 characters
* Use two blank lines before and after functions and class definitions
* Use one blank line before and after a class method definition
* Use a blank line to separate logical steps in long sequences

Criteria for whitespace:

* Avoid unnecessary whitespace
* Avoid whitespace immediately inside parentheses, braces, or brackets
* Use whitespace around assignment and logical operators
* Don't use whitespace in value assignment for default parameters
* Use spaces around mathematical operators to clarify the order of operations
* Use whitespace after commas and colons unless they are next to the end of a bracket, brace, or parenthesis
* When using `:` to slice a list, do not use whitespace
* Don't use whitespace to align variable values



In [None]:
# This block won't compile! It's just for examples

# Whitespace inside parentheses, braces, and brackets
spam(ham[1], {eggs: 2})  # Correct
spam ( ham[ 1 ], { eggs: 2 } )  # Incorrect

# Whitespace in assignment and logical operators
egg = 12  # Correct
egg=12  # Incorrect

# Whitespace in default parameter assignment
def complex(real, imag=0.0):  # Correct
    return magic(r=real, i=imag)  # Correct

def complex(real, imag = 0.0):  # Incorrect
    return magic(r = real, i = imag)  # Incorrect

# Whitespace in mathematical operators
hypot = x*x + y*y  # Correct
c = (a+b) * (a-b) # Correct

hypot = x * x + y * y  # Incorrect
c = (a + b) * (a - b) # Incorrect

# Whitespace after commas and periods
x, y = y, x  # Correct
x , y = y , x  # Incorrect

# Whitespace with ':'
my_list[2:5] = 10  # Correct
my_list[2 : 5] = 10  # Incorrect

# Whitespace to align variable values
user_name    = 'Pepito'  # Incorrect
user_country = 'Spain'

## Single Quotes or Double Quotes?

Both single (`'`) and double (`"`) quotes are used to define string values.

Additionally, triple quotes (`'''` or `"""`) are used for multi-line strings.

Python doesn't advocate for single or double quotes but does provide usage guidelines:

* Use double quotes for strings that contain single quotes
* Use single quotes for strings with double quotes

In [None]:
# All these examples are correct

message = 'In a place of "La Mancha"'

message = "In a place of 'La Mancha'"

message = '''In a place of "La Mancha", whose name I do not wish to remember,
             there was not long ago a gentleman of those with a lance in the shipyard,
             an old shield, a skinny horse, and a fast greyhound.'''

message = """In a place of 'La Mancha', whose name I do not wish to remember,
             there was not long ago a gentleman of those with a lance in the shipyard,
             an old shield, a skinny horse, and a fast greyhound."""

## Comments

Keep comments updated; incorrect comments are worse than no comments:

* Write full sentences
* A comment starts with `#` followed by a whitespace
* Block comments:
  * One or more lines of comments starting with `#`
  * They must be indented to the same level as the code they are commenting on
  * Separate paragraphs in the comment with a blank comment line
* Inline comments:
  * Follow the code they comment on the same line
  * They should be used sparingly
  * There must be two spaces between the comment and the code
* All functions, classes, and methods should be documented (`docstrings`)
* Triple quotes (`"""`) should be used for `docstrings`

In [None]:
# This is a block comment
# It can have one or more lines
#
# I've left a blank line to separate paragraphs within the comment
# Easy, right?

var = 34  # This is an inline comment

def my_function():
    """ This is a single-line docstring """

def my_other_function(parameter=False):
    """
    This is a multiline docstring.

    Docstring also has its own formats, such as Epytext, reST, and Google.
    Yeah, you can look into that at home...

    """

## Expressions and Modules

* Use inline negation (`if a is not b`) instead of negating a positive expression (`if not a is b`)
* Don't check if a list is empty using `len(list) == 0`; instead use `if not list`
* Always place `import` at the beginning of the file
* Import functions and classes using `from my_module import MyClass` instead of importing the entire module (`import my_module`)
* You should import libraries in this order:
  * Standard library modules
  * External modules
  * Project modules
* Each of these import sections should be in alphabetical order

## Exercise

In [None]:
# Install 'Flake8' to analyze code style

!pip install flake8

In [None]:
# 'Flake8' doesn't check naming format
# A 'Flake8' plugin called 'pep8-naming' needs to be installed
# Error codes of type 'Nxxx' are for names

!pip install pep8-naming

In [None]:
# Example of messy code with many style issues...
# Let's create a 'test.py' file with this code to analyze it

%%writefile test.py
#define our data
my_dict ={
    'a'  : 10,
'b': 3,
    'c'  :   4,
          'd': 7}
#import the module we need
import numpy as np
#helper function
def DictToArray(d):
  """Convert dictionary values to a numpy array"""
  #extract values and convert
               x=np.array(d.values())
               return x
# This is an unnecessarily long comment line that should be split into several
print(DictToArray(my_dict))

class my_class:
    x = 12

In [None]:
# If we want to see the line numbers

!cat -n test.py

In [None]:
# Use 'Flake8' from the command line to analyze the file

!flake8 test.py

### Do you dare to fix it?

In [None]:
# Fix the style errors in this cell

%%writefile test.py
#define our data
my_dict ={
    'a'  : 10,
'b': 3,
    'c'  :   4,
          'd': 7}
#import the module we need
import numpy as np
#helper function
def DictToArray(d):
  """Convert dictionary values to a numpy array"""
  #extract values and convert
               x=np.array(d.values())
               return x
# This is an unnecessarily long comment line that should be split into several
print(DictToArray(my_dict))

class my_class:
    x = 12

In [None]:
# Re-analyze the style until everything is correct

!flake8 test.py

# References

* [A Five-Minute Introduction to Python's Style Guide: PEP 8](https://medium.com/code-85/a-five-minute-introduction-to-pythons-style-guide-pep-8-57202886265f)
* [A Summary of PEP 8: Style Guide for Python Code](https://tandysony.com/2018/02/14/pep-8.html)
* [An Overview of The PEP 8 Style Guide](https://towardsdatascience.com/an-overview-of-the-pep-8-style-guide-5672459c7682)
* [What is PEP 8 and why should I implement it?](https://dev.to/viktorvillalobos/que-es-el-pep-8-y-porque-deberia-implementarlo-54bh)