# Software Engineering Best Practices (Python)

## Topics
* Modularity
    * The idea is to break the code into smaller pieces.  We use OOP (Object Oriented Programming) to write modular code.
    * It improves:
        * Readability: "Code is read much more than it is written" (PEP8)
        * Maintanability
    * What modularity leverages?
        * Packages, Classes, and Methods
    * Class Inheritance & DRY principle 
        * Start with Parent class and pass on its functionality to a Child class, which inherits methods and attributes of its Parent.
        * This used the DRY concept, where you avoid copying & pasting code from an original class to extend it to a new extended one.
* Conventions & PEP8
    * Conventions are like social norms (or conventions). 
    * Pythonistas have their own "conventions" based on PEP8, which is the defacto Style Guide for Python Code.
    * pycodestyle package: it flags violations to PEP8 style.

* Documentation & Packages
    * Comments, Docstrings, and Self-Documenting Code
    * Pip & PyPi
    * help()
        * can be applied to many objects (to number, to a package, to a method of a package)
* Automated Testing
    * Use tools like pytest package to automatically run and re-run your tests to ensure code is working
* Version Control & Git

### Conventions & PEP8 

In [None]:
# Use pycodestyle's StyleGuide class to check multiple files for PEP 8 compliance.
    # pycodestyle can be run from the command line to check a file for PEP 8 compliance. 
    # Sometimes it's useful to run this kind of check from a Python script.

# Import needed package
import pycodestyle

# Create a StyleGuide instance
style_checker = pycodestyle.StyleGuide()

# Run PEP 8 check on multiple files
result = style_checker.check_files(['/workspace/sources/datacamp/general_python_scripts/nay_pep8.py', '/workspace/sources/datacamp/general_python_scripts/yay_pep8.py'])

# Print result of PEP 8 style check
print(result.messages)


### Documents & Packages

* Packages:
    * Recall: A package is a collection of Python modules.
    * Adding Functionality to your package
        * We use functions or classes  (as defined in a utils.py file) to add functionality to a package.
    * Using  Classes to strenghten the functionality
        * We use OOP (Object Oriented Programming) to write modular code
    * Writing your first package:
        * Minimial package structure: 
            * a directory called my_package (folder with the package name) and a `__init__`.py file to indicate it's a package
        * When we import and work with our resulting package, we will be in the my_script.py file, which is in the same level as the my_package folder
        * In the my_package folder, the utils.py file is where you define the function xyz(argument)
        * When you are writing your code in the my_script.py, you do:
            * import my_package.utils as mp
            * mp.utils.xyz(argument)
        * To make life easier for the user, we can import this class in `__init__`.py using relative import syntax:
            * from .utils import xyz(argument)
        * Reference for package naming: https://peps.python.org/pep-0008/#package-and-module-name
        * Sample file structure without inheritance:  
        <img src="sources/datacamp/general_python_scripts/package_structure.png" alt="package structure" width="400px">
        * Sample file structure with inheritance:  
        <img src="sources/datacamp/general_python_scripts/package_structure_with_inherit.png" alt="inheritance" width="400px">

In [5]:
# Using help() on an integer
# It provides help on python's integer class
help(42)


Help on int object:

class int(object)
 |  int([x]) -> integer
 |  int(x, base=10) -> integer
 |  
 |  Convert a number or string to an integer, or return 0 if no arguments
 |  are given.  If x is a number, return x.__int__().  For floating point
 |  numbers, this truncates towards zero.
 |  
 |  If x is not a number or if base is given, then x must be a string,
 |  bytes, or bytearray instance representing an integer literal in the
 |  given base.  The literal can be preceded by '+' or '-' and be surrounded
 |  by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
 |  Base 0 means to interpret the base from the string as an integer literal.
 |  >>> int('0b100', base=0)
 |  4
 |  
 |  Built-in subclasses:
 |      bool
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __bool__(self, /)
 |      True if self else False
 |

In [4]:
# Using help() on a method of a function
from collections import Counter

# It provides help on python's most_common() method from the Counter function in the collections package
help(Counter.most_common)

Help on function most_common in module collections:

most_common(self, n=None)
    List the n most common elements and their counts from the most
    common to the least.  If n is None, then list all element counts.
    
    >>> Counter('abracadabra').most_common(3)
    [('a', 5), ('b', 2), ('r', 2)]



In [None]:
# Writing your first package: my_package
    # we create a my_package folder inside our working directory
    # each package folder must have a __init__.py file
# In order to make your package installable by pip you need to create a setup.py
    # work_dir/
    # ├── my_package
    # │    ├── __init__.py
    # │    └── utils.py
    # └── setup.py
# Keep in mind your package is located in a directory named work_dir = text_analyzer
    # text_analyzer/
    # ├── my_package
    # │    ├── __init__.py
    # │    └── utils.py
    # └── setup.py

# Import needed function from setuptools
from setuptools import setup

# Create proper setup to be used by pip
setup(name='text_analyzer',
      version='0.0.1',
      description='Perform and visualize a text anaylsis.',
      author='Caio',
      packages=['text_analyzer']) #package directory


In [1]:
# Installing your requirements.txt

# To be able to share your newly created package, you need to create two things: setup.py and requirements.txt, both under work_dir level
    # setup.py: tells pip how to install the package (and it will be used by PyPI if you decide to publish)
        # the packages tag: lists the location of the __init__.py files in the package.
            # recall that we can use "relative import" by doing: from .utils import xyz(argument)
        # in our case, we have just one __init.py__ in the my_package folder
        # the utils.py is where you write the functions (or the "functionalities") that are part of the package

    # work_dir/
    # ├── my_package
    # │    ├── __init__.py
    # │    └── utils.py
    # ├── requirements.txt
    # └── setup.py

# Given that you are running a shell session in the work_dir structure, this is the command to run the requirements.txt file
    # pip install -r requirements.txt

In [10]:
# Writing a Class for the package
# PEP8 Conventions:
    # Camel case for Class names
    # for the variable that refers to the future instance of the class: write 'self' instead of other names

# Leaning points:
    # Self and __init__: step by step idea 
        # You create a Class Person, initially with two methods: '__init__' and 'print_test'
        # Within _init_ you define two three arguments: self, name, age.
        # self is called: the variable that refers to a future instance of the class
        # (name, age) are called "instance variables" because they store attributes that will be assigned to an instance of the class.
        # You create an instance of this class when you assign it to a variable in your future code: person1 = Person("Caio", 36).
        # 'self' is responsible for attaching the instance variables (name, age) to the instance of the class (person1), as well as its values ("Caio", 36).
        # 'self' does this by the _init_ method, defined within the Class definition.
        # you can access and work with the instance variables (name, age) by using them within the instance methods (print_test)
    # Accessing the Class:
        # You add functionalities to the utils.py, in this case, we will call the file parent_class_utils.py because later we will create a child_class_utils.py 
        # You do:
            # change (cd) to the __init__.py file's directory
            # `from .parent_class_utils import Person`, which would correctly import this class in __init__.py using relative import syntax
            # This will let it be easily accessible by your users.
            # in your my_script.py you would just start with: import my_package
    # Other methods within a Class:
        # If a Class has only __init__ (self, etc) than this class is only a "container" for the attributes that will be assigned to instances.
        # You can add pre-built methods to your Class, but you need to do the `from module_xyz import object_xyz `
        # You can add other functionality to classes using non-public methods. By defining methods as non-public you're signifying to the user that the method is only to be used inside the package

# 1) Define Class
# 2) Once the class is written you will modify your package's __init__.py file to make it easily accessible by your users

# working_dir
# ├── my_package
# │    ├── __init__.py
# │    ├── utils.py 
# │    ├── parent_class_utils.py
# ├── requirements.txt
# └── setup.py
# └── my_script.py


In [9]:
class Person:
    """A class to represent a person."""

    def __init__(self, name, age):
        """Initialize a Person object with a name and age.

        Args:
            name (str): The name of the person.
            age (int): The age of the person.
        """
        self.name = name  # Assign the name attribute of the instance
        self.age = age    # Assign the age attribute of the instance

    def introduce(self):
        """Introduce the person.

        Returns:
            str: A string containing the person's name and age.
        """
        return f"My name is {self.name} and I am {self.age} years old."

# Instantiate the Person Class
person1 = Person("Caio", 36)
print("This is the person1 object: ", person1)
print("This is the person1 name and age: ", person1.name, ",", person1.age)



This is the person1 object:  <__main__.Person object at 0x7fa9fc5c4bd0>
This is the person1 name and age:  Caio , 36


In [2]:
# Package Structure with Class Inheritance (Parent class -> Child class)

# Instead of copy-pasting the already written functionality, you will use the principles of 'DRY' and inheritance to quickly create your new Child class.

# Learning Points
    # Create another other_utils2.py to add a Child class code.
    # `from .parent_class_utils import ParentClass`, which would correctly import this parent class in __init__.py using relative import syntax
    # Person is the "ParentClass" and it will be an argument for the ChildClass defined within child_class_utils.py
    # self now has all the methods and attributes that an instance of a ParentClass would
    # Next, you use self as you would normally and build additional functionality unique to ChildClass
    # Then, you instantiate the child class and can access both parent and child class functionalities

# working_dir
# ├── my_package
# │    ├── __init__.py
# │    ├── utils.py 
# │    ├── parent_class_utils.py
# │    ├── child_class_utils.py
# ├── requirements.txt
# └── setup.py
# └── my_script.py


In [None]:
#Import ParentClass object
from .parent_class import ParentClass

# Create a child class with inheritanceclass
ChildClass(ParentClass): # The child class inherits from the parent class
    def __init__(self):
    # Call parent's __init__ method        
    ParentClass.__init__(self) # Here we build an instance of ParentClass and store it into self, i.e., it initializes a ParentClass
    # Add attribute unique to child class
    self.child_attribute = "I'm a child class attribute!"

# Create a ChildClass instance
child_class = ChildClass()
print(child_class.child_attribute)
print(child_class.parent_attribute)

In [None]:
# Example 1/2 (Document - Parent Class)

# working_dir
# ├── text_analyzer
# │    ├── __init__.py
# │    ├── counter_utils.py
# │    ├── document.py
# └── my_script.py

# Procedures:
    # We create a text_analyzer package that provides functions and classes to analyze text
    # We create a document.py that hold Document class (parent) which initially works as a container for a text attribute
    # Suppose we want to tokenize our document, i.e., you break the document into individual words (tokens) that forms a list.
    # We put the tokenization procedure within the __init__, so that the Document instance is created and already tokenized.
    # The tokenize method comes as _tokenize ("_" because this method is only used internally in the Docuemnt class).
    # Since it is already written in the "token_utils.py", then we only define it and call it with the self.text attribute as argument.
    # This will store the text as token in the _init_ method
    # The same happens in the _count_words method, you define it and call it with the self.tokens, with the essence of Counter().

# Document class (Parent Class) created within document.py
# The Document class will perform text analysis in my_package
class Document:
    """A class for text analysis
    
    :param text: string of text to be analyzed
    :ivar text: string of text to be analyzed; set by `text` parameter
    """
    # Initialize a new Document instance
    def __init__(self, text):
        # Store text parameter to the text attribute
        self.text = text
        # Pre tokenize the document with non-public tokenize method
        self.tokens = self._tokenize()
        # Pre tokenize the document with non-public count_words
        self.word_counts = self._count_words()

    def _tokenize(self):
        return tokenize(self.text)

    # Non-public method to tally document's word counts
    def _count_words(self):
        # Use collections.Counter to count the document's tokens
        return Counter(self.tokens)

# Import your custom text_analyzer package
import text_analyzer

# Create an instance of Document with datacamp_tweet. Now, your document variable has a class "document" with am attribute "text" with a value given in datacamp_tweet
datacamp_doc = Document(datacamp_tweets)

# print the first 5 tokens from datacamp_doc
print(datacamp_doc.tokens[:5])

# print the top 5 most used words in datacamp_doc
print(datacamp_doc.word_counts.most_common(5))



In [None]:
# Example 2/2 (SocialMedia class - Child Class) 

# working_dir
# ├── text_analyzer
# │    ├── __init__.py
# │    ├── counter_utils.py
# │    ├── document.py
# │    ├── tweet.py
# └── my_script.py

# Procedures:

# SocialMedia class (Child Class) created within tweet.py
# Define a SocialMedia class that is a child of the `Document class`. 
# Its extends Document tokenization and word counting to also conting hashtag and mentions.
class SocialMedia(Document):
    def __init__(self, text):
        Document.__init__(self)
        self.hashtag_counts = self._count_hashtags()
        self.mention_counts = self._count_mentions()
        
    def _count_hashtags(self):
        # Filter attribute so only words starting with '#' remain
        return filter_word_counts(self.word_counts, first_char='#')      
    
    def _count_mentions(self):
        # Filter attribute so only words starting with '@' remain
        return filter_word_counts(self.word_counts, first_char='@')

# Import custom text_analyzer package
import text_analyzer

# Create a SocialMedia instance with datacamp_tweets
dc_tweets = text_analyzer.SocialMedia(text=datacamp_tweets)

# Print the top five most most mentioned users
print(dc_tweets.mention_counts.most_common(5))

# Plot the most used hashtags
text_analyzer.plot_counter(dc_tweets.mention_counts)