## Using Libraries


### Lesson Overview

Outline
In this lesson, we will:

Explore community-driven libraries using PIP
Implement the strategy object design pattern to ensure scalable code
Learn common libraries you’ll use in data science roles


PIP – The Package Installer for Python

![class3.png](attachment:class3.png)

As we mentioned earlier, pip is the main tool used to install Python packages.

Although many developers refer to pip as "the package installer for Python", an entertaining bit of trivia is that Ian Bicking, the developer who introduced pip, has stated the name actually stands for pip installs packages—thus the name is an example of a recursive acronym.

Searching PIP for Packages

There are many 3rd party APIs that exist. They solve a variety of use cases. You can discover new packages on the Python Package Index where different projects are maintained and distributed under open source licenses.

Here are a few popular ones:

Importing Files (e.g. Pandas)https://pandas.pydata.org/
Using Frameworks (e.g. Flask) https://flask.palletsprojects.com/en/1.1.x/
Advanced Math (e.g. Numpy)https://numpy.org/
Advanced Machine Learning (e.g. Scikit-Learn)https://scikit-learn.org/stable/install.html
Security and Cryptography (e.g. PyJWT)https://pyjwt.readthedocs.io/en/latest/
3rd party APIs (e.g. AWS Boto3)https://aws.amazon.com/sdk-for-python/


Using PIP to Install via Command Line¶

Try it!
Before trying this exercise, please reset the workspace:

Reset Workspace

First, let's experiment with the runtime environment of the Python Interpreter by making commands from directly within the terminal window. We can activate the interactive mode of the interpreter by running the python3 command by itself (without specifying a file):


python3
This will launch the Python runtime. Now we can attempt to import numpy by entering:


import numpy
Since numpy is not installed, this command will fail. We can exit the Python Interpreter runtime by entering:


exit()
Now, let's use pip to install numpy. From the terminal window (but not within the Python Interpreter), run the following command:


pip install numpy


## Importing Installed Dependencies

Try it!
Once again, we can launch the Python interpreter runtime with:

python3
Now we can again attempt the import of numpy. This time, we won't encounter an exception:

import numpy
Now, numpy is accessible within this script and we can access its member functions for numerical Python calculations.

Note that you'll often see numpy imported using aliasing, like this:


import numpy as np
Practice
Try using pip to install addtional libraries:
scipy – useful for mathematics, science, and engineering.
pandas – a fast and easy to use datastructure for tabular data.
Try using the pip uninstall command to remove libraries!


Virtual Environments
As we gain more experience developing many Python applications that rely on 3rd party dependencies, we'll quickly realize that not every project requires the same dependencies. Instead, we'll have specific needs for each project. We'll need a way of managing which dependencies are required for a given project.

Ideally, we'd even like to ensure we know what version of a dependency is required in case the method signatures in that library change as that library is maintained. And it would be better still if we could somehow install different versions of dependencies for each project so we can have different states for different projects.

We can accomplish all of these goals with Python Virtual Environments!

Understanding Imports and Managing Dependencies
As we work on more projects, we'll quickly realize each project may have different packages they require with different versions. So far, we've been installing packages to the system runtime using pip install. We can use something called a virtual environment to organize project dependencies and manage the project state.

## Additional Resources

https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/

https://realpython.com/python-virtual-environments-a-primer/

https://www.geeksforgeeks.org/creating-python-virtual-environment-windows-linux/



### Which Python and Creating Virtual Environments¶

Try it!
In the terminal to the right, check what version of the Python runtime we're using:


which python
Notice on this machine, we can repeat this step for python, python2, python3, and python3.5.

Next, let's create a virtual environment running specifically on Python version 3.5. First, we'll need to install a dependency for this linux machine:


apt-get install python3-venv -y
Then we can use the venv module to create the virtual environment. To avoid confusion we can use the -m flag to specify the verison of the Python runtime we'd like to use:


python3.5 -m venv venv
Running the ls command will show a new directory with our virtual environment runtime. Next, we'll learn how we can use this virtual environment.

Note: This exercise is for Linux. The venv differs slightly depending on your operating system. For detailed instructions to activate a virtual environment on Windows, Mac, and Linux, checkout the venv docs.

python3.5 -m venv venv

### Activating the Virtual Environment

Try it!
In a previous step, we created a new virtual environment directory using the venv module. We can now use the BASH source command to tell the terminal to use the virtual environment runtime. To change your python alias to point to the virtual environment runtime, run:


source venv/bin/activate
Now if we run the which python command, we can observe that it points to the virtual environment directory! You can now install dependencies using pip as usual, but they will instead be installed to your virtual environment directory:


pip install numpy
To exit the virtual environment, and instead use your system's default python runtimes, run:


deactivate
Note: As with the previous exercises, this exercise is for Linux. The venv differs slightly depending on your operating system. For detailed instructions to activate a virtual environment on Windows, Mac, and Linux, checkout the venv docs.


source venv/bin/activate
which python
pip install numpy












Markdown

Python 3
Page
Select a guide notebook page number(Optional)
3
1
2
3
of 3
Importing from the Virtual Environment¶

00:00 / 03:28
1x
CC

Let's run through all of the main steps Gabe covered in the video.

First, make sure your virtual environment is active by entering:


which python3
You should get a path that includes the venv directory:


/home/workspace/venv/bin/python3
If you like, you can launch Python, import numpy, and see where the numpy module is being imported from:


python3
>>> import numpy
>>> numpy
<module 'numpy' from '/home/workspace/venv/lib/python3.5/site-packages/numpy/__init__.py'>
You should see that this numpy installation is now pointing to the virtual environment's site-packages directory!

If you exit() Python and then run pip freeze, this will show you what libraries are installed within your virtual environment. Your results should include only a few items, like this:


numpy==1.17.4
pkg-resources==0.0.0
pyspark==2.4.3
You can then save these to a file by entering:


pip freeze > requirements.txt
This won’t generate any output in the terminal, but if you look in the workspace directory (or enter ls in the terminal) you’ll now see that a requirements.txt file has been created. (By the way, we could name this file something else if we had a good reason to, but requirements.txt is the standard name developers use.) We can view the contents by entering cat requirements.txt.

Now, deactivate the virtual environment and delete the venv directory:


deactivate
rm -rf venv
Then create a new virtual environment called venv2 and activate it:


python3.5 -m venv venv2
source venv2/bin/activate
And now we can install all the dependencies we saved by entering:


pip install -r requirements.txt
The -r simply tells pip to install from a requirements file (in contrast to, say, pip install numpy, which tells pip to go straight to installing numpy).

Hopefully you can see how powerful this workflow is—you can easily create an environment whenever you need to and reinstall all of the required dependencies, locked at whatever specific versions you need for that particular environment.

Practice:
You probably won't remember this workflow (and all the commands) from doing it only once—so we encourage you to spend a few more minutes practicing:

Try adding a few different dependencies to your virtual environment:
scipy – useful for mathematics, science, and engineering.
pandas – a fast and easy to use datastructure for tabular data.
Then run pip freeze. You'll notice there may be subdependencies for these more complex libraries.
Save the state of pip to a requirements.txt file.
Delete the virtual environment.
Create a new virtual environment following the earlier steps.
Reinstall using the requirements.txt file you generated in step 3 and use pip freeze to view the state. And of course, be sure to deactivate your virtual environment when you're done!

source venv/bin/activate
pip install scipy
pip install pandas
pip freeze > requirements.txt
deactivate
rm -rf venv/
python3.5 -m venv venv2
source venv2/bin/activate
pip install -r requirements.txt
pip freeze
deactivate

Solving Problems with Libraries
Manipulating Images
Let's now build out a more complex script that uses an open source library. Our goal will be to transform an image—to crop, resize, and add text. This process is similar to what you might encounter if you're building a multi-media application.

Here's an example image transformation for the pipeline you'll build:

![dog.png](attachment:dog.png)


Step 1: Study and Install the Library
Now that we have our problem clearly defined. We'll need to find a library to help us. Our goal is to find a single library that is capable of:

Resizing and cropping an image
Adding text onto the image
Saving the resulting image as a jpg file
Take a moment to review the following libraries:

Numpy
Pillow
Spacy
Which library will best solve our problem?

The Solution is Pillow:

Numpy offers powerful features for working with matricies. Images can be represented as a matrix of pixel values, so we can accomplish some simple tasks like resizing and cropping photos—but adding text will be a challenge.
Pillow offers features to work with graphics and has interfaces for filtering, resizing, and drawing onto the images (including drawing text!)
Spacy is a powerful tool for working with text and written words. We're interested in working with images, so this won't be very helpful.
What is the command to install the selected library?


pip install Pillow
Try it!
Before continuing, run the install command in the terminal window so the library is available as we continue to build!


Step 2: Using the Library to Resize Images¶
After identifying a suitable library, let's break our development into a series of steps. Let's start with opening and performing some relatively simple transformation operations to crop and resize the input image.

Image
Using the Pillow documentation, see if you can find out how to use the library to resize and crop an image. Here are some links to point you in the right direction:

Image class
Image.open instance method
Image.size instance variable
Image.resize instance method
Image.crop instance method
Image.save instance method
Try it!
Your tasks are to complete the skeleton code (top right) to achieve the following requirements:

Load an image into a Pillow Image object.
Crop the image at ./imgs/img.jpg to center on the dog's face.
Resize the cropped image to be two times the size, while preserving the aspect ratio (the proportion of the image height to its width).
Save the image to a new jpg file.




In [None]:
from PIL import Image

def generate_postcard(in_path, out_path, crop=None, width=None):
    """Create a Postcard With a Text Greeting

    Arguments:
        in_path {str} -- the file location for the input image.
        out_path {str} -- the desired location for the output image.
    Returns:
        str -- the file path to the output image.
    """
    img = Image.open(in_path)

    if crop is not None:
        img = img.crop(crop)

    if width is not None:
        ratio = width/float(img.size[0])
        height = int(ratio*float(img.size[1]))
        img = img.resize((width, height), Image.NEAREST)

    img.save(out_path)
    return out_path

if __name__=='__main__':
    print(generate_postcard('./imgs/img.jpg', 
                            './imgs/out.jpg',
                            (450, 900, 900, 1300),
                            200))

Step 3: Adding Text¶
Let's complete our simple application and add additional complexity using more advanced features of Pillow to add a messsage onto our image. We can use Pillow to draw text and geometric shapes onto images.

Image
Once again, we should start by reviewing the documentation to understand how we can use the library for this goal. Here are some links to point you in the right direction:

ImageDraw module
ImageDraw.text instance method
ImageFont module
ImageFont.truetype instance method
Try it!
Your tasks are to continue to extend your code in img.py to:

Add a message onto the cropped image.
Add style to the message typography adjusting fill and size. We've included LilitaOne-Regular.ttf in the ./fonts directory. You can add additional fonts from Google Fonts.

In [None]:
from PIL import Image, ImageDraw, ImageFont

def generate_postcard(in_path, out_path, message=None, crop=None, width=None):
    """Create a Postcard With a Text Greeting

    Arguments:
        in_path {str} -- the file location for the input image.
        out_path {str} -- the desired location for the output image.
        crop {tuple} -- The crop rectangle, as a (left, upper, right, lower)-tuple. Default=None.
        width {int} -- The pixel width value. Default=None.
    Returns:
        str -- the file path to the output image.
    """
    img = Image.open(in_path)

    if crop is not None:
        img = img.crop(crop)

    if width is not None:
        ratio = width/float(img.size[0])
        height = int(ratio*float(img.size[1]))
        img = img.resize((width, height), Image.NEAREST)

    if message is not None:
        draw = ImageDraw.Draw(img)
        font = ImageFont.truetype('./fonts/LilitaOne-Regular.ttf', size=20)
        draw.text((10, 30), message, font=font, fill='white')

    img.save(out_path)
    return out_path

if __name__=='__main__':
    print(generate_postcard('./imgs/img.jpg', 
                            './imgs/out.jpg',
                            'woof!',
                            (450, 900, 900, 1300),
                            200))

### Strategy Object Design Pattern
Combining Our Skills: Encapsulating Third-Party Libraries
As our system grows and implements more complex code, we'll want a way to organize complex code into a simpler interface. One way of achieving this goal is to encapsulate a more complex class or library into a simple class with limited functionality. We'll call this class a strategy object because it represents one possible strategy for achieving an action.

The strategy object design pattern provides many benefits, including:

Decreasing the complexity for developers consuming the library who do not need the entire functionality of the library. For example, an encapsulated class may be designed to only provide one simplified method for working with the relevant data.
Easily replacing a library without needing to make changes throughout a large codebase. You can easily write a second strategy object and simply replace the old object with the new object. Since the object's methods are the same, no additional refactoring will be required.
Creating additional classes to solve similar problems, and dynamically selecting the most appropriate class without needing to write additional code. Similar to replacing an object, you can add additional objects to perform the same action in multiple ways. For example, you may need a different strategy object for different file types.
Static Methods
There may be times when you'd like to define a method signature that will be consistently implemented in multiple ways.

For example, suppose we're implementing a conversion between two units (e.g. inches to centimeters, or Fahrenheit to Celsius). In these cases, we will not need to know any additional information outside of these inputs (i.e. instance variables), but would like a consistent signature so we can easily add a different strategy later (e.g., replace Fahrenheit to Celsius with Fahrenheit to Kelvin). In this scenario, it would be ideal to have an additional structure to ensure the interface is implemented consistently for each method. More concretely, we can define our method signature as convert(input) -> output, which is general enough that it can be the same across our conversion equations.

We could use an abstract class with an instance method, but this would require us to first create an object in memory. Instead, we can use a feature of Python called a static method. With a static method, we can define the method and make it accessible before we instantiate the class object.

A static method is similar to a class method in many ways, except it has no reference to the class or any of its default instance variables. Although static methods are not the most commonly used features in Python, they can be a helpful thing to have in your tool belt when the need arises.


Implementing the conversion example from before the video, we can create a three-class relationship with a well-defined abstract, static method. Using a different strategy object is as simple as re-assigning the strategy variable.

from abc import ABC, abstractmethod

class ConversionStrategy(ABC):
    @staticmethod
    @abstractmethod
    def convert(x):
        pass

class FahrenheitToCelsiusConverter(ConversionStrategy):
    @staticmethod
    def convert(x):
        return (x-32) * 5 / 9

class CelsiusToFahrenheitConverter(ConversionStrategy):
    @staticmethod
    def convert(x):
        return ( x * 9 / 5 ) + 32

result = FahrenheitToCelsiusConverter.convert(32)
print(result)


Use the following graphic to answer the questions:

The DocxImporter class is responsible for loading data from Docx (Microsoft Word Document) files and the CSVImporter class is responsible for loading data from CSV (Comma Separated Value) files.

Simple Strategy Object Class Diagram
![object%20design.png](attachment:object%20design.png)


Assigning Strategy Object References to Variables
We can assign our strategies to variables that can be used later in the code. For example, we can use the line strategy = FahrenheitToCelsiusConverter, which saves a reference to the strategy, but does not instantiate the FahrenheitToCelsiusConverter object. Class instantiation only invokes the __init__ method if the object is called using parentheses () as in FahrenheitToCelsiusConverter()).

We can use this concept to neatly select a strategy at runtime:

def pick_strategy(unit):
    if unit == 'fahrenheit':
        return FahrenheitToCelsiusConverter
    else:
        return CelsiusToFahrenheitConverter

def smart_convert(temp, unit):
    strategy = pick_strategy(unit)
    result = strategy.convert(32)
    print(result)

Additional Resources:
Strategy Pattern https://en.wikipedia.org/wiki/Strategy_pattern
Class Methods vs Static Methods https://www.geeksforgeeks.org/class-method-vs-static-method-python/


### Setting Up Our Interface Class¶


Try it!
Let's take a minute to lay the groundwork for our strategy object importer:

Review the file structure and organization of the skeleton code we've provided.
Create a new ImporterInterface abstract class:
This class should implement a can_ingest class method which decides if a file is compatible with the importer.
A parse abstract class method signature which we will realize and fully complete in the children classes that implement the ImporterInterface.


In [None]:
from abc import ABC, abstractmethod

from typing import List
from .Cat import Cat

class ImportInterface(ABC):

    allowed_extensions = []

    @classmethod
    def can_ingest(cls, path):
        ext = path.split('.')[-1]
        return ext in cls.allowed_extensions

    @classmethod
    @abstractmethod
    def parse(cls, path: str) -> List[Cat]:
        pass

### Importing Word Documents¶

Try it!
Before implementing our code, we need to install the python-docx library to work with word documents in Python. This library requires a new version of a Python helper module called setuptools. To install the updated helper and the docx library, run:


pip install -U setuptools
pip install python-docx 
Then, we're ready to implement our first strategy object:

Create a new DocxImporter class that inherits ImporterInterface.
Implement the parse method that uses the python-docx library to read import data from a docx file.
Import and use your importer in the run.py file.

In [None]:
#docximporter.py

from typing import List
import docx

from .ImportInterface import ImportInterface
from .Cat import Cat

class DocxImporter(ImportInterface):
    allowed_extensions = ['docx']

    @classmethod
    def parse(cls, path: str) -> List[Cat]:
        if not cls.can_ingest(path):
            raise Exception('cannot ingest exception')

        cats = []
        doc = docx.Document(path)

        for para in doc.paragraphs:
            if para.text != "":
                parse = para.text.split(',')
                new_cat = Cat(parse[0], int(parse[1]), bool(parse[2]))
                cats.append(new_cat)

        return cats

In [None]:
#run.py
from ImportEngine import DocxImporter

print(DocxImporter.parse('./data/cats.docx'))



Importing CSV Files

Try it!
Before implementing our code, we need to install the pandas library to work with csv files in python by running:


pip install pandas
Then, we're ready to implement our first strategy object:

Create a new CSVImporter class that inherits ImporterInterface.
Implement the parse method that uses the pandas library to read import data from a csv file.
Import and use your importer in the run.py file.

In [None]:
from typing import List
import pandas

from .ImportInterface import ImportInterface
from .Cat import Cat

class CSVImporter(ImportInterface):
    allowed_extensions = ['csv']

    @classmethod
    def parse(cls, path: str) -> List[Cat]:
        if not cls.can_ingest(path):
            raise Exception('cannot ingest exception')

        cats = []
        df = pandas.read_csv(path, header=0)

        for index, row in df.iterrows():
            new_cat = Cat(row['Name'], row['Age'], row['isIndoor'])
            cats.append(new_cat)

        return cats

### Encapsulating Our Strategy Objects¶

Try it!
Encapsulation can make our software easy to work with. Refactor your code to:

Include a new Importer class that will encapsulate the CSVImporter and DocxImporter classes. It should realize the ImporterInterface.
Write a parse method that makes a decision for which importer to use based on filetype.
Refactor run.py to consume the Importer class!

In [None]:
#importer.py
from typing import List

from .ImportInterface import ImportInterface
from .Cat import Cat
from .DocxImporter import DocxImporter
from .CSVImporter import CSVImporter


class Importer(ImportInterface):
    importers = [DocxImporter, CSVImporter]

    @classmethod
    def parse(cls, path: str) -> List[Cat]:
        for importer in cls.importers:
            if importer.can_ingest(path):
                return importer.parse(path)
            