# Software Engineering Best Practices (Python)

## Topics
* Conventions & PEP8
    * Conventions are like social norms (or conventions). 
    * Pythonistas have their own "conventions" based on PEP8, which is the defacto Style Guide for Python Code.
    * pycodestyle package: it flags violations to PEP8 style.
    * Readability follows the Zen of Python

* Modularity
    * The idea is to break the code into smaller pieces.  We use OOP (Object Oriented Programming) to write modular code.
    * It improves:
        * Readability: "Code is read much more than it is written" (PEP8)
        * Maintanability
    * What modularity leverages?
        * Packages, Classes, and Methods
    * Class Inheritance & DRY principle 
        * Start with Parent class.
        * Think on an specific objective and expand it (pass its functionality) to another class, the child class, which inherits methods and attributes of its Parent.
        * Multilevel (Parent -> Child -> Grandchild) vs. Multiple (Many Parents -> Child) Inheritance
        * This used the DRY concept, where you avoid copying & pasting code from an original class to extend it to a new extended one.

* Documentation
    * Comments, Docstrings, and Self-Documenting Code
    * Readability follows the Zen of Python
    * Pip & PyPi
    * help()
        * can be applied to many objects (to number, to a package, to a method of a package)
        
* Automated Testing
    * Use tools like doctest and pytest package to automatically run and re-run your tests to ensure code is working.
    * CI Tools: 
        * Travis CI (can help test your code when new code is added)
        * Code Climate (can help point out if your code isn't modular)
    
* Version Control & Git

## Conventions & PEP8 

In [None]:
# Use pycodestyle's StyleGuide class to check multiple files for PEP 8 compliance.
    # pycodestyle can be run from the command line to check a file for PEP 8 compliance. 
    # Sometimes it's useful to run this kind of check from a Python script.

# Import needed package
import pycodestyle

# Create a StyleGuide instance
style_checker = pycodestyle.StyleGuide()

# Run PEP 8 check on multiple files
result = style_checker.check_files(['/workspace/sources/datacamp/general_python_scripts/nay_pep8.py', '/workspace/sources/datacamp/general_python_scripts/yay_pep8.py'])

# Print result of PEP 8 style check
print(result.messages)


## Packages, Classes

* Packages:
    * Recall: A package is a collection of Python modules.
    * Adding Functionality to your package
        * We use functions or classes  (as defined in a utils.py file) to add functionality to a package.
    * Using  Classes to strenghten the functionality
        * We use OOP (Object Oriented Programming) to write modular code
    * Writing your first package:
        * Minimial package structure: 
            * a directory called my_package (folder with the package name) and a `__init__`.py file to indicate it's a package
        * When we import and work with our resulting package, we will be in the my_script.py file, which is in the same level as the my_package folder
        * In the my_package folder, the utils.py file is where you define the function xyz(argument)
        * When you are writing your code in the my_script.py, you do:
            * import my_package.utils as mp
            * mp.utils.xyz(argument)
        * To make life easier for the user, we can import this class in `__init__`.py using relative import syntax:
            * from .utils import xyz(argument)
        * Reference for package naming: https://peps.python.org/pep-0008/#package-and-module-name
        * Sample file structure without inheritance:  
        <img src="sources/datacamp/general_python_scripts/package_structure.png" alt="package structure" width="400px">
        * Sample file structure with inheritance:  
        <img src="sources/datacamp/general_python_scripts/package_structure_with_inherit.png" alt="inheritance" width="400px">

In [5]:
# Using help() on an integer
# It provides help on python's integer class
help(42)


Help on int object:

class int(object)
 |  int([x]) -> integer
 |  int(x, base=10) -> integer
 |  
 |  Convert a number or string to an integer, or return 0 if no arguments
 |  are given.  If x is a number, return x.__int__().  For floating point
 |  numbers, this truncates towards zero.
 |  
 |  If x is not a number or if base is given, then x must be a string,
 |  bytes, or bytearray instance representing an integer literal in the
 |  given base.  The literal can be preceded by '+' or '-' and be surrounded
 |  by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
 |  Base 0 means to interpret the base from the string as an integer literal.
 |  >>> int('0b100', base=0)
 |  4
 |  
 |  Built-in subclasses:
 |      bool
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __bool__(self, /)
 |      True if self else False
 |

In [4]:
# Using help() on a method of a function
from collections import Counter

# It provides help on python's most_common() method from the Counter function in the collections package
help(Counter.most_common)

Help on function most_common in module collections:

most_common(self, n=None)
    List the n most common elements and their counts from the most
    common to the least.  If n is None, then list all element counts.
    
    >>> Counter('abracadabra').most_common(3)
    [('a', 5), ('b', 2), ('r', 2)]



In [None]:
# Writing your first package: my_package
    # we create a my_package folder inside our working directory
    # each package folder must have a __init__.py file
# In order to make your package installable by pip you need to create a setup.py
    # work_dir/
    # ├── my_package
    # │    ├── __init__.py
    # │    └── utils.py
    # └── setup.py
# Keep in mind your package is located in a directory named work_dir = text_analyzer
    # text_analyzer/
    # ├── my_package
    # │    ├── __init__.py
    # │    └── utils.py
    # └── setup.py

# Import needed function from setuptools
from setuptools import setup

# Create proper setup to be used by pip
setup(name='text_analyzer',
      version='0.0.1',
      description='Perform and visualize a text anaylsis.',
      author='Caio',
      packages=['text_analyzer']) #package directory


In [1]:
# Installing your requirements.txt

# To be able to share your newly created package, you need to create two things: setup.py and requirements.txt, both under work_dir level
    # setup.py: tells pip how to install the package (and it will be used by PyPI if you decide to publish)
        # the packages tag: lists the location of the __init__.py files in the package.
            # recall that we can use "relative import" by doing: from .utils import xyz(argument)
        # in our case, we have just one __init.py__ in the my_package folder
        # the utils.py is where you write the functions (or the "functionalities") that are part of the package

    # work_dir/
    # ├── my_package
    # │    ├── __init__.py
    # │    └── utils.py
    # ├── requirements.txt
    # └── setup.py

# Given that you are running a shell session in the work_dir structure, this is the command to run the requirements.txt file
    # pip install -r requirements.txt

In [10]:
# Writing a Class for the package
# PEP8 Conventions:
    # Camel case for Class names
    # for the variable that refers to the future instance of the class: write 'self' instead of other names

# Leaning points:
    # Self and __init__: step by step idea 
        # You create a Class Person, initially with two methods: '__init__' and 'print_test'
        # Within _init_ you define two three arguments: self, name, age.
        # self is called: the variable that refers to a future instance of the class
        # (name, age) are called "instance variables" because they store attributes that will be assigned to an instance of the class.
        # You create an instance of this class when you assign it to a variable in your future code: person1 = Person("Caio", 36).
        # 'self' is responsible for attaching the instance variables (name, age) to the instance of the class (person1), as well as its values ("Caio", 36).
        # 'self' does this by the _init_ method, defined within the Class definition.
        # you can access and work with the instance variables (name, age) by using them within the instance methods (print_test)
    # Accessing the Class:
        # You add functionalities to the utils.py, in this case, we will call the file parent_class_utils.py because later we will create a child_class_utils.py 
        # You do:
            # change (cd) to the __init__.py file's directory
            # `from .parent_class_utils import Person`, which would correctly import this class in __init__.py using relative import syntax
            # This will let it be easily accessible by your users.
            # in your my_script.py you would just start with: import my_package
    # Other methods within a Class:
        # If a Class has only __init__ (self, etc) than this class is only a "container" for the attributes that will be assigned to instances.
        # You can add pre-built methods to your Class, but you need to do the `from module_xyz import object_xyz `
        # You can add other functionality to classes using non-public methods. By defining methods as non-public you're signifying to the user that the method is only to be used inside the package

# 1) Define Class
# 2) Once the class is written you will modify your package's __init__.py file to make it easily accessible by your users

# working_dir
# ├── my_package
# │    ├── __init__.py
# │    ├── utils.py 
# │    ├── parent_class_utils.py
# ├── requirements.txt
# └── setup.py
# └── my_script.py


In [9]:
class Person:
    """A class to represent a person."""

    def __init__(self, name, age):
        """Initialize a Person object with a name and age.

        Args:
            name (str): The name of the person.
            age (int): The age of the person.
        """
        self.name = name  # Assign the name attribute of the instance
        self.age = age    # Assign the age attribute of the instance

    def introduce(self):
        """Introduce the person.

        Returns:
            str: A string containing the person's name and age.
        """
        return f"My name is {self.name} and I am {self.age} years old."

# Instantiate the Person Class
person1 = Person("Caio", 36)
print("This is the person1 object: ", person1)
print("This is the person1 name and age: ", person1.name, ",", person1.age)



This is the person1 object:  <__main__.Person object at 0x7fa9fc5c4bd0>
This is the person1 name and age:  Caio , 36


In [2]:
# Package Structure with Class Inheritance (Parent class -> Child class)

# Instead of copy-pasting the already written functionality, you will use the principles of 'DRY' and inheritance to quickly create your new Child class.

# Learning Points
    # Create another other_utils2.py to add a Child class code.
    # `from .parent_class_utils import ParentClass`, which would correctly import this parent class in __init__.py using relative import syntax
    # Person is the "ParentClass" and it will be an argument for the ChildClass defined within child_class_utils.py
    # self now has all the methods and attributes that an instance of a ParentClass would
    # Next, you use self as you would normally and build additional functionality unique to ChildClass
    # Then, you instantiate the child class and can access both parent and child class functionalities

# working_dir
# ├── my_package
# │    ├── __init__.py
# │    ├── utils.py 
# │    ├── parent_class_utils.py
# │    ├── child_class_utils.py
# ├── requirements.txt
# └── setup.py
# └── my_script.py


In [None]:
#Import ParentClass object
from .parent_class import ParentClass

# Create a child class with inheritanceclass
ChildClass(ParentClass): # The child class inherits from the parent class
    def __init__(self):
    # Call parent's __init__ method        
    ParentClass.__init__(self) # Here we build an instance of ParentClass and store it into self, i.e., it initializes a ParentClass
    # Add attribute unique to child class
    self.child_attribute = "I'm a child class attribute!"

# Create a ChildClass instance
child_class = ChildClass()
print(child_class.child_attribute)
print(child_class.parent_attribute)

In [None]:
# Example 1/3
    # Class Document - Parent Class
    # A class for text analysis (tokenization and word count)

# working_dir
# ├── text_analyzer
# │    ├── __init__.py
# │    ├── counter_utils.py
# │    ├── document.py
# └── my_script.py

# Procedures:
    # We create a text_analyzer package that provides functions and classes to analyze text
    # We create a document.py that hold Document class (parent) which initially works as a container for a text attribute
    # Suppose we want to tokenize our document, i.e., you break the document into individual words (tokens) that forms a list.
    # We put the tokenization procedure within the __init__, so that the Document instance is created and already tokenized.
    # The tokenize method comes as _tokenize ("_" because this method is only used internally in the Docuemnt class).
    # Since it is already written in the "token_utils.py", then we only define it and call it with the self.text attribute as argument.
    # This will store the text as token in the _init_ method
    # The same happens in the _count_words method, you define it and call it with the self.tokens, with the essence of Counter().

# Document class (Parent Class) created within document.py
# The Document class will perform text analysis in my_package
class Document:
    """A class for text analysis
    
    :param text: string of text to be analyzed
    :ivar text: string of text to be analyzed; set by `text` parameter
    """
    # Initialize a new Document instance
    def __init__(self, text):
        # Store text parameter to the text attribute
        self.text = text
        # Pre tokenize the document with non-public tokenize method
        self.tokens = self._tokenize()
        # Pre tokenize the document with non-public count_words
        self.word_counts = self._count_words()

    def _tokenize(self):
        return tokenize(self.text)

    # Non-public method to tally document's word counts
    def _count_words(self):
        # Use collections.Counter to count the document's tokens
        return Counter(self.tokens)

# Import your custom text_analyzer package
import text_analyzer

# Create an instance of Document with datacamp_tweet. Now, your document variable has a class "document" with am attribute "text" with a value given in datacamp_tweet
datacamp_doc = text_analyzer.Document(datacamp_tweets)

# Use dir() to show all of datacamp_doc's methods and attributes
dir(datacamp_doc)

# Run help on my_doc
help(my_doc)

# Run help on my_doc's plot method
help(my_doc.plot_counts)

# print the first 5 tokens from datacamp_doc
print(datacamp_doc.tokens[:5])

# print the top 5 most used words in datacamp_doc
print(datacamp_doc.word_counts.most_common(5))



In [None]:
# Example 2/3
    # Class SocialMedia - Child Class
    # A child class expanding text analysis (from tokenization and word count to conting hashtags and mentions)


# working_dir
# ├── text_analyzer
# │    ├── __init__.py
# │    ├── counter_utils.py
# │    ├── document.py
# │    ├── tweet.py
# └── my_script.py

# Procedures:

# SocialMedia class (Child Class) created within tweet.py
# Define a SocialMedia class that is a child of the `Document class`. 
# Its extends Document tokenization and word counting to also conting hashtag and mentions.
class SocialMedia(Document):
    def __init__(self, text):
        Document.__init__(self)
        self.hashtag_counts = self._count_hashtags()
        self.mention_counts = self._count_mentions()
        
    def _count_hashtags(self):
        # Filter attribute so only words starting with '#' remain
        return filter_word_counts(self.word_counts, first_char='#')      
    
    def _count_mentions(self):
        # Filter attribute so only words starting with '@' remain
        return filter_word_counts(self.word_counts, first_char='@')

# Import custom text_analyzer package
import text_analyzer

# Create a SocialMedia instance with datacamp_tweets
dc_tweets = text_analyzer.SocialMedia(text=datacamp_tweets)

# Print the top five most most mentioned users
print(dc_tweets.mention_counts.most_common(5))

# Plot the most used hashtags
text_analyzer.plot_counter(dc_tweets.mention_counts)

In [1]:
# Example 3/3
    # Class SocialMedia - Grandchild Class
    # A child class expanding text analysis (from tokenization and word count + conting hashtags and mentions, to retweets)
    # Here we will use Multilevel Inheritance and the super() function

# Import custom text_analyzer package
import text_analyzer

# Define a Tweet class that inherits from SocialMedia
class Tweets(SocialMedia):
    def __init__(self, text):
        # Call parent's __init__ with super()
        super().__init__(text)
        # Define retweets attribute with non-public method
        self.retweets = self._process_retweets()

    def _process_retweets(self):
        # Filter tweet text to only include retweets
        retweet_text = filter_lines(self.text, first_chars='RT')
        # Return retweet_text as a SocialMedia object
        return SocialMedia(retweet_text)

# Create instance of Tweets
my_tweets = text_analyzer.Tweets(datacamp_tweets)

# Plot the most used hashtags in the retweets
my_tweets.retweets.plot_counts('hashtag_counts')



### Summary of Folder structure, Packages, Functions, Classes, Inheritance

* Folder structure and Packages

In [None]:
# 1) We build a folder structure for our project.
        # The top level is the project working directory: working_dir
        # Within the working_dir, we have 2 files:
            # my_script.py: where you build your project code main code
            # package_name: where you build the functionalities to be used when importing the package -> call it text_analyzer
            # requirements.txt: where you specify the environment needed to properly use your package (python packages with versions)
                # to install everything within this file, you do: pip install -r requirements.txt
                # note that in this case we are only creating the environment to properly use our package (we are not installing our package)
            # setup.py: tells pip how to install our package
                # in our case, it contains a single call to an specific function: from setuptools import setup;
                # Example of the setup():
                    # setup(name='my_package', version='0.0.1', description='An example package', author='Caio', 
                    #       packages=['my_package'], 
                    #       install_requires=['matplotlib','numpy==1.15.4','pycodestyle>=2.4.0'])
                # Now, we cd to "working_dir" and we can install our package with: pip install .
        # working_dir
        # ├── setup.py
        # ├── requirements.txt
        # ├── text_analyzer
        # │    ├── __init__.py
        # │    ├── counter_utils.py
        # │    ├── document.py
        # │    ├── social_media.py
        # │    ├── tweet.py
        # └── my_script.py
    # 2) We create a package with minimum content
        # PEP8 naming for the package: "package_name".
        # We add a __init__.py file (let's python know that the directory is a package).
        # Now, this package can be imported just like any other package: import text_analyzer
        # You can also use the help to check it: help(text_analyzer)
    # 3) We expand the packages functionalities
        # Within text_analyzer, we added a utils.py file, i.e., a submodule (not a subpackage!) -> call it counter_utils.py
            # counter_utils.py: we add a function "some_function()" to perform some analysis
            # We can import it in my_scrip.py: import package_name.utils
            # We can use the function: package_name.utils.some_function()
        # We can use relative import to be easier for the user:
            # Within the __init__.py file, we add: 
                # from .utils import some_function
            # We can import it in my_script.py only by doing this: import package_name
            # We can use the function without 'utils': package_name.some_function()
        # We can add many "utils.py" files, just depend on how we one to organize different functionalities for this package


* Creating Functions

In [None]:
    # 4) We create functions within utils.py using a call from another package to make it easier
    # Example: 3 functions

        # First, we create a counter_utils.py function that will serve as a place for counting procedures with the Counter Class

        # Recall
            # working_dir
            # ├── setup.py
            # ├── requirements.txt
            # ├── text_analyzer
            # │    ├── __init__.py
            # │    ├── counter_utils.py (THIS ONE)
            # │    ├── document.py
            # │    ├── social_media.py
            # │    ├── tweet.py
            # └── my_script.py

        # Import needed functionality
        from collections import Counter
            # Counter is a Class. More specifically, a subclass of a dict.
            # It instantiates a Counter object and ingerits dictionaries functionalitis.
            # Thus, elements are stored as dictionary keys and their counts values. You access it as with a dict.

        def plot_counter(counter, n_most_common=5):
            # receives a counter object
            # Subset the n_most_common items from the input counter
            top_items = counter.most_common(n_most_common)
                # most_common(n): a method from the Counter calss. 
                # It returns a list of the n most common elements and their counts as a tuple. It has to be applied to a counter instance.
            # Plot `top_items`
            plot_counter_most_common(top_items)

        def sum_counters(counters):
            # Sum the inputted counters
            return sum(counters, Counter()) 
            # Python doesn't know how to add a list of Counters directly, so by providing an empty Counter() as the starting point,
            # we're essentially telling Python, "Start with an empty Counter and add each Counter in the list to it."
            # While you might typically see sum() used with numbers, it can be used with other types of objects as well, as long as they support addition with each other. 
            # In this case, Counter() objects support addition with other Counter() objects, which is why it works in summing up Counter instances
    
    

* Start building the my_script.py

In [None]:
# 5) We start building my_script.py
        # word_count is a list
            # it's elements are a Counter:
                # Counter({'DataCamp': 1,
                #          'Introduction': 1,
                #          'to': 1,
                #          'H': 2,
                #          'O': 2,
                #          ...
                #          'its': 1,
                #          'auto': 1})

        # Import local package
        import text_analyzer

        # Sum word_counts using sum_counters from text_analyzer ()
        word_count_totals = text_analyzer.sum_counters(word_counts)
            # it returns a Counter() object from all words in all Counter() objects within the word_counts list: word_count_totals
            # it has one element, a Counter() object:
                # Counter({'DataCamp': 24,
                #          'Introduction': 27,
                #          'to': 263,
                #          'H': 2,
                #           ...})
        # Plot word_count_totals using plot_counter from text_analyzer
        text_analyzer.plot_counter(word_count_totals)

* Creating Parent Classes

In [None]:
# First, let's create a function for separating words into a list and saving it under tokenize_utils.py
   
# Recall
    # working_dir
    # ├── setup.py
    # ├── requirements.txt
    # ├── text_analyzer
    # │    ├── __init__.py
    # │    ├── counter_utils.py
    # │    ├── tokenize_utils.py (THIS ONE)
    # │    ├── document.py
    # │    ├── social_media.py
    # │    ├── tweet.py
    # └── my_script.py

# Complete the function's docstring
def tokenize(text, regex=r'[a-zA-z]+'):
  """Split text into tokens using a regular expression

  :param text: text to be tokenized
  :param regex: regular expression used to match tokens using re.findall 
  :return: a list of resulting tokens

  >>> tokenize('the rain in spain')
  ['the', 'rain', 'in', 'spain']
  """
  return re.findall(regex, text, flags=re.IGNORECASE)

# Print the docstring
help(tokenize)

In [None]:
# 6) Creating Parent Classes with non-public functionalities

# Recall
        # working_dir
        # ├── setup.py
        # ├── requirements.txt
        # ├── text_analyzer
        # │    ├── __init__.py
        # │    ├── counter_utils.py
        # │    ├── document.py (THIS IS WHERE THE DOCUMENT CLASS IS DEFINED)
        # │    ├── social_media.py
        # │    ├── tweet.py
        # └── my_script.py

# Suppose we want to use a function (calling it within a Class) from another python package within a script called token_utils.py
# We use relative import to be easier for the user, and within the __init__.py file, we add: from .token_utils import tokenize 
# With this, you can call this function directly without the need to do "my_package.function()" 

# Import function to perform tokenization 
from .tokenize_utils import tokenize
from collections import Counter

class Document:
    """A class for text analysis
    
    :param text: string of text to be analyzed
    :ivar text: string of text to be analyzed; set by `text` parameter
    """
    # Initialize a new Document instance
    def __init__(self, text):
        # Store text parameter to the text attribute
        self.text = text
        # Pre tokenize the document with non-public tokenize method
        self.tokens = self._tokenize()
        # Pre tokenize the document with non-public count_words
        self.word_counts = self._count_words()

    def _tokenize(self): # Use PEP8 naming with "_function
        return tokenize(self.text) 

    # Non-public method to tally document's word counts
    def _count_words(self): # Use PEP8 naming with "_function
        # Use collections.Counter to count the document's tokens. Returns a Counter object
        return Counter(self.tokens)
    
# Here you could already go to myscript.py and do this example:
    # datacamp_tweets is a string
        # Example: datacamp_tweets = '[DataCamp] Introduction to H2O AutoML --> In this tutorial, you will learn about...'
    # Import custom text_analyzer package
    import text_analyzer
    # create a new document instance from datacamp_tweets
    datacamp_doc = Document(datacamp_tweets)
        # this will create an datacamp_doc instance with 3 attributes: 
            # text via the normal __init__ self procedure for instantiation
            # tokens (via the _tokenize public method)
            # word_counts (via the _count_words public method)

    # print the first 5 tokens from datacamp_doc
    print(datacamp_doc.tokens[:5])

    # print the top 5 most used words in datacamp_doc. 
        # It uses the most_common method which is applied to word_counts, a Counter instance of the Counter class
    print(datacamp_doc.word_counts.most_common(5))



* Creating Child Classes (Inheritance)
    * Instead of copy-pasting the already written Parent Class functionality and improving its code to do more specific functionalities, we will instead use the principles of 'DRY' and inheritance to create a Child Class with only these specific functionalitis.

In [None]:
# 7) The SocialMedia class inherits from Document and expands its functionalities

# Recall
        # working_dir
        # ├── setup.py
        # ├── requirements.txt
        # ├── text_analyzer
        # │    ├── __init__.py
        # │    ├── counter_utils.py
        # │    ├── document.py 
        # │    ├── social_media.py (THIS IS WHERE THE SOCIAL MEDIA CLASS IS DEFINED)
        # │    ├── tweet.py
        # └── my_script.py

# We had a Document Class that is used to analyze text:
    # It serves as a container for a text (self.text)
    # It tokanizes the text, transforming into a list (self.tokens)
    # It counts the words of the list using the Counter class methods (self.word_counts)

# Now, we want to expand particularize this class to work with Social Media text. 
# We build a Child Class SocialMedia that inherits from its Parent Class (Document)
# We add functionalities to perfome social media text procedures

# Import Parent Class (document.py) object for use in defining the Child Class
from .document import Document

# Create a Child Class inheriting from its Parent Class
class SocialMedia(Document):
    
    # Initialize a new SocialMedia instance, inheriting Document Class functionalities
    def __init__(self, text): 
        # Here, we call the Parent Class __init__ method, which builds an instance of Parent and stores back into the SocialMedia self.
        # This means that self now also have all functionalities from Document 
        Document.__init__(self) 
        # Now, we use self normally to build the other attributes particular to the SocialMedia Class
            # Note that both attributes below will be an instance of a Counter Class because it uses the word_counts method, 
            # which uses collections.Counter to count the document's tokens and returns a Counter object
        self.hashtag_counts = self._count_hashtags()
        self.mention_counts = self._count_mentions()
        
    def _count_hashtags(self):
        # Filter attribute so only words starting with '#' remain
            # It uses the word_counts attribute from the Document class, which is a Counter instance of the Counter class, i.e., words and its counts in a dict.
            # It also uses filter_word_counts(), a particular function created to filter words by their first characters.
        return filter_word_counts(self.word_counts, first_char='#')      
    
    def _count_mentions(self):
        # Filter attribute so only words starting with '@' remain
            # It uses the word_counts attribute from the Document class, which is a Counter instance of the Counter class, i.e., words and its counts in a dict.
            # It also uses filter_word_counts(), a particular function created to filter words by their first characters.
        return filter_word_counts(self.word_counts, first_char='@')
    
# Here you could already go to myscript.py and do this example:
    # datacamp_tweets is a string
        # Example: datacamp_tweets = '[DataCamp] Introduction to H2O AutoML --> In this tutorial, you will learn about...'
    # Import custom text_analyzer package
    import text_analyzer

    # Create a SocialMedia instance with datacamp_tweets
    dc_tweets = text_analyzer.SocialMedia(text=datacamp_tweets)

    # Print the top five most most mentioned users
    print(dc_tweets.mention_counts.most_common(5))

    # Plot the most used hashtags
    text_analyzer.plot_counter(dc_tweets.mention_counts)

* Creating GrandChild Classes (Multilevel Inheritance)


In [None]:
# 8) The Tweets class inherits from SocialMedia and expands its functionalities

# Recall
        # working_dir
        # ├── setup.py
        # ├── requirements.txt
        # ├── text_analyzer
        # │    ├── __init__.py
        # │    ├── counter_utils.py
        # │    ├── document.py 
        # │    ├── social_media.py 
        # │    ├── tweet.py (THIS IS WHERE THE TWEET CLASS IS DEFINED)
        # └── my_script.py

# Here we change the essence of the inheritance procedure
# We will use the super() function with the __init__ to instantiate a Parent or Child class within the respective Child or GrandChild Class

class Tweets(SocialMedia):
    def __init__(self, text):
        # Here, we call parent's __init__ with super(), which builds an instance of Document and stores back into the SocialMedia self.
        # However, since we only provide 'text', it only inherits the text functionality from Document
        super().__init__(text)
        # Define retweets attribute with non-public method
        self.retweets = self._process_retweets()

    def _process_retweets(self):
        # Filter tweet text to only include retweets
        retweet_text = filter_lines(self.text, first_chars='RT')
        # Return retweet_text as a SocialMedia object
        return SocialMedia(retweet_text)

# Here you could already go to myscript.py and do this example:
    # datacamp_tweets is a string
        # Example: datacamp_tweets = '[DataCamp] Introduction to H2O AutoML --> In this tutorial, you will learn about...'
    # Import custom text_analyzer package
    import text_analyzer

    # Create instance of Tweets
    my_tweets = text_analyzer.Tweets(datacamp_tweets)

    # Plot the most used hashtags in the tweets
    my_tweets.plot_counts('hashtag_counts')


## Documentation

* Comments
    * Are used inline
    * Cannot be seen by end users unless they enter the code
    * Focus on "Why the code is doing what it is doing" (and not "What the code is doing")

* Docstrings
    * Documentation for end users
    * It is output when a user calls help() on functions and classes

* Redability
    * Self-documenting code ("def is_even" is better than "def check_even")
    * If the code doesn't fit the screen, maybe it needs to be refactored (broken down into modules)

* For tests
    * Sphinx (generates HTML for the documentation)
    * Code Climate (can help point out if your code isn't modular)
    * Travis CI (can help test your code when new code is added)

In [13]:
# Suppose we build a square function:

def square(x):
    """Square the number x    

    :param x: number to square    
    :return: x squared    

    >>> square(2)
    4    
    """
    # `x * x` is faster than `x ** 2`
    # reference: https://stackoverflow.com/a/29055266/5731525return x * x

# Calling the help() on this function
help(square)

Help on function square in module __main__:

square(x)
    Square the number x    
    
    :param x: number to square    
    :return: x squared    
    
    >>> square(2)
    4



In [None]:
# Example of docstring documentation for the SocialMedia Class
from text_analyzer import Document

class SocialMedia(Document):
    """Analyze text data from social media
    
    :param text: social media text to analyze

    :ivar hashtag_counts: Counter object containing counts of hashtags used in text
    :ivar mention_counts: Counter object containing counts of @mentions used in text
    """
    def __init__(self, text):
        Document.__init__(self, text)
        self.hashtag_counts = self._count_hashtags()
        self.mention_counts = self._count_mentions()

* Testing
    * A test folder will be included in the folder structure
    * Easy python options for testing
        * doctest
        * pytest
    * Documentation for tests
        * Sphinx (generates HTML for the documentation)
        * CI Testing tools:
            * Travis CI (Continuous Integration)
                * When you add new test to your code (because, normally, we all do it continuously), than Travis will automatically test for you.
            * Schedule tools
                * By scheduling a build, your tests can be run periodically even without adding new code (maybe a dependency was changed and you didn't know, so a error would be raised even when the main code was not updated)

In [15]:
# Doctest
import doctest

# Run the testmod function from doctest to test your function's example code
doctest.testmod()

**********************************************************************
File "__main__", line 9, in __main__.square
Failed example:
    square(2)
Expected:
    4    
Got nothing
**********************************************************************
File "__main__", line 24, in __main__.tokenize
Failed example:
    tokenize('the rain in spain')
Exception raised:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/doctest.py", line 1355, in __run
        exec(compile(example.source, filename, "single",
      File "<doctest __main__.tokenize[0]>", line 1, in <module>
        tokenize('the rain in spain')
      File "/tmp/ipykernel_441/2143170907.py", line 27, in tokenize
        return re.findall(regex, text, flags=re.IGNORECASE)
               ^^
    NameError: name 're' is not defined
**********************************************************************
2 items had failures:
   1 of   1 in __main__.square
   1 of   1 in __main__.tokenize
***Test Failed*** 2 fail

TestResults(failed=2, attempted=2)

In [None]:
# Pytest
    # Pytest looks for files that start or end with "test"

# Recall
        # working_dir
        # ├── setup.py
        # ├── requirements.txt
        # ├── text_analyzer
        # │    ├── __init__.py
        # │    ├── counter_utils.py
        # │    ├── document.py 
        # │    ├── social_media.py 
        # │    └── tweet.py (THIS IS WHERE THE TWEET CLASS IS DEFINED)
        # ├── tests
        # │    └── test_document.py
        # └── my_script.py

# This will be written in the test_document.py
from text_analyzer import Document

# Example 1: Testing the Document class token attribute
def test_document_tokens():    
    doc = Document('a e i o u')
    
    assert doc.tokens == ['a', 'e', 'i', 'o', 'u']

# Example 2: Testing edge case of a blank document
def test_document_empty():    
    doc = Document('')
    
    assert doc.tokens == []
    assert doc.word_counts == Counter()

# Example 3: Comparing instances of the same Class
# Create 2 identical Document objects
doc_a = Document('a e i o u')
doc_b = Document('a e i o u')
# Check if objects are ==
print(doc_a == doc_b) # -> FALSE

# Check if attributes are ==
print(doc_a.tokens == doc_b.tokens)  # -> TRUE
print(doc_a.word_counts == doc_b.word_counts)  # -> TRUE

In [None]:
# Example of Pytest

from collections import Counter
from text_analyzer import SocialMedia

# Create an instance of SocialMedia for testing
test_post = 'learning #python & #rstats is awesome! thanks @datacamp!'
sm_post = SocialMedia(test_post)

# Test hashtag counts are created properly
def test_social_media_hashtags():
    expected_hashtag_counts = Counter({'#python': 1, '#rstats': 1})
    assert sm_post.hashtag_counts == expected_hashtag_counts