# Advanced Python Packaging and Deploying Applications

## What will we learn in this class?


## Packages

Packages are a way of structuring Python’s module namespace by using “dotted module names”. For example, the module name A.B designates a submodule named B in a package named A. Just like the use of modules saves the authors of different modules from having to worry about each other’s global variable names, the use of dotted module names saves the authors of multi-module packages like NumPy or Pillow from having to worry about each other’s module names.

Packages are a way of structuring Python’s module namespace by using “dotted module names”. For example, the module name A.B designates a submodule named B in a package named A. Just like the use of modules saves the authors of different modules from having to worry about each other’s global variable names, the use of dotted module names saves the authors of multi-module packages like NumPy or Pillow from having to worry about each other’s module names.

Suppose you want to design a collection of modules (a “package”) for the uniform handling of sound files and sound data. There are many different sound file formats (usually recognized by their extension, for example: .wav, .aiff, .au), so you may need to create and maintain a growing collection of modules for the conversion between the various file formats. There are also many different operations you might want to perform on sound data (such as mixing, adding echo, applying an equalizer function, creating an artificial stereo effect), so in addition you will be writing a never-ending stream of modules to perform these operations. Here’s a possible structure for your package (expressed in terms of a hierarchical filesystem):

```
sound/                          Top-level package
      __init__.py               Initialize the sound package
      formats/                  Subpackage for file format conversions
              __init__.py
              wavread.py
              wavwrite.py
              aiffread.py
              aiffwrite.py
              auread.py
              auwrite.py
              ...
      effects/                  Subpackage for sound effects
              __init__.py
              echo.py
              surround.py
              reverse.py
              ...
      filters/                  Subpackage for filters
              __init__.py
              equalizer.py
              vocoder.py
              karaoke.py
              ...
```

When importing the package, Python searches through the directories on sys.path looking for the package subdirectory.

The __init__.py files are required to make Python treat directories containing the file as packages. This prevents directories with a common name, such as string, unintentionally hiding valid modules that occur later on the module search path. In the simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__ variable

Users of the package can import individual modules from the package, for example:
```
import sound.effects.echo
```
This loads the submodule sound.effects.echo. It must be referenced with its full name.
```
sound.effects.echo.echofilter(input, output, delay=0.7, atten=4)
```
An alternative way of importing the submodule is:

```
from sound.effects import echo
```
Note that when using `from package import item`, the item can be either a submodule (or subpackage) of the package, or some other name defined in the package, like a function, class or variable. The import statement first tests whether the item is defined in the package; if not, it assumes it is a module and attempts to load it. If it fails to find it, an ImportError exception is raised.

Contrarily, when using syntax like import item.subitem.subsubitem, each item except for the last must be a package; the last item can be a module or a package but can’t be a class or function or variable defined in the previous item.


## Importing * From a Package

Now what happens when the user writes from `sound.effects import *`? Ideally, one would hope that this somehow goes out to the filesystem, finds which submodules are present in the package, and imports them all. This could take a long time and importing sub-modules might have unwanted side-effects that should only happen when the sub-module is explicitly imported.

The only solution is for the package author to provide an explicit index of the package. The import statement uses the following convention: if a package’s `__init__.py` code defines a list named `__all__`, it is taken to be the list of module names that should be imported when `from package import *` is encountered. It is up to the package author to keep this list up-to-date when a new version of the package is released. Package authors may also decide not to support it, if they don’t see a use for importing * from their package. For example, the file `sound/effects/__init__.py` could contain the following code:
```
__all__ = ["echo", "surround", "reverse"]
```

This would mean that `from sound.effects import *` would import the three named submodules of the sound package.

If `__all__` is not defined, the statement `from sound.effects import *` does not import all submodules from the package sound.effects into the current namespace; it only ensures that the package sound.effects has been imported (possibly running any initialization code in `__init__.py`) and then imports whatever names are defined in the package. This includes any names defined (and submodules explicitly loaded) by `__init__.py`. It also includes any submodules of the package that were explicitly loaded by previous import statements. 

## Packaging Python Projects

We will modify a bit the tutorial from Python reference to give more context to what are we doing.  In this tutorial we will build a simple project called class_python_utilities.

To create this project locally create the following file structure
```
class_python_utilities
└── class_python_utilities
    └── __init__.py
```

Once you create this structure, you’ll want to run all of the commands in this tutorial within the top-level folder - so be sure to `cd class_python_utilities.`


### Creating the packages files
You will now create a handful of files to package up this project and prepare it for distribution. Create the new files listed below and place them in the project’s root directory - you will add content to them in the following steps.

```
class_python_utilities
├── LICENSE
├── README.md
├── class_python_utilities
│   └── __init__.py
├── setup.py
└── tests

```
`tests/` is a placeholder for unit test files. Leave it empty for now.

### Creating Setup.py

`setup.py` is the build script for setuptools. It tells setuptools about your package (such as the name and version) as well as which code files to include.

Open setup.py and enter the following content. Update the package name , this ensures that you have a unique package name and that your package doesn’t conflict with packages uploaded by other people following this tutorial.

```


import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()
    
REQUIRED_PACKAGES=[pandas]

setuptools.setup(
    name="class_python_utilities-[YOUR_ID]", # Replace with your own desired name
    version="0.0.1",
    author="Example Author",
    author_email="author@example.com",
    description="A small example package",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/pypa/sampleproject",
    #packages=setuptools.find_packages(),
    install_requires=REQUIRED_PACKAGES,
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires='>=3.6',
)



```


setup() takes several arguments. This example package uses a relatively minimal set:

 * name is the distribution name of your package. This can be any name as long as only contains letters, numbers, _ , and -. It also must not already be taken on pypi.org. Be sure to update this with your username, as this ensures you won’t try to upload a package with the same name as one which already exists when you upload the package.

* version is the package version see PEP 440 for more details on versions.

* author and author_email are used to identify the author of the package.

* description is a short, one-sentence summary of the package.

* long_description is a detailed description of the package. This is shown on the package detail package on the Python Package Index. In this case, the long description is loaded from README.md which is a common pattern.

   
* long_description_content_type tells the index what type of markup is used for the long description. In this case, it’s Markdown.

* url is the URL for the homepage of the project. For many projects, this will just be a link to GitHub, GitLab, Bitbucket, or similar code hosting service.

* packages is a list of all Python import packages that should be included in the distribution package. Instead of listing each package manually, we can use find_packages() to automatically discover all packages and subpackages. In this case, the list of packages will be example_pkg as that’s the only package present.
* **intall_requires**

* classifiers gives the index and pip some additional metadata about your package. In this case, the package is only compatible with Python 3, is licensed under the MIT license, and is OS-independent. You should always include at least which version(s) of Python your package works on, which license your package is available under, and which operating systems your package will work on. For a complete list of classifiers, see https://pypi.org/classifiers/.

There are many more than the ones mentioned here. See Packaging and distributing projects for more details.

### Create README.md

Open README.md and enter the following content

```
#Class Utilities
Utilities module for class
```

### Creating a LICENSE

It’s important for every package uploaded to the Python Package Index to include a license. This tells users who install your package the terms under which they can use your package. For help picking a license, see https://choosealicense.com/. Once you have chosen a license, open LICENSE and enter the license text. For example, if you had chosen the MIT license:

```
Copyright (c) 2018 The Python Packaging Authority

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```

### Generating distribution archives

... Lets stop a bit here in the tutorial. If you want to create a package that is distributed in pip. please continue with the tutorial here: https://packaging.python.org/tutorials/packaging-projects/

In our case, when most of the time we work in a private setup we will follow another route. 


### Versionising

This is a toy example but version number sare meaningul . Given a version number {major}.{minor}.{pathc}, increment

major: when you make backward-incompatibale changes
minro : when you make backward-compatible features
patch: when you fix bugs

## Getting your package into github.

before continuing we need to add something to our package. During the course you have been building a few classes and functions. Collect all those functions and add them to `utils.py`

```
class_python_utilities
├── LICENSE
├── README.md
├── class_python_utilities
│   └── __init__.py
    └── utils.py
├── setup.py
└── tests

```

You shoul have at least 
```
class PortfolioModels
def Perceptron(train_data, test_data)
def Perceptron(train_data, test_data):
class NonLinearRegression:

```

Add these functions to utils, with their respective required libraries. also update setup.py to reflect this.





* Create a new empty repository in github called `class_python_utilities`
* Follow the instructions and initialize a repository in the root  directoy of `class_python_utilities`
```
git init
git add - A
git commit -m "first commit"
git remote add origin https://github.com/[GIT_USER]/class_python_utilities.git
git push -u origin master
                
```


Now you can install your package in your environment as

```
pip insttall git+https://github.com/JalatorrS1/class_python_utilities.git

```

After this lets import the package utils functions and  test our code


In [13]:
from class_python_utils.utils import *
pm=PortfolioModels(data_repo="data/")

importing data: 100%|██████████| 12/12 [00:12<00:00,  1.06s/it]


## Create an executable package

Lets add another file to our project
```
class_python_utilities
├── LICENSE
├── README.md
├── class_python_utilities
│   └── __init__.py
    └── utils.py
    └── __main__.py
├── setup.py
└── tests
```

and add the following code.

```
from .utils import *


def get_parser():
    """Get parser object for script xy.py."""
    from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter

    parser = ArgumentParser(
        description="Financial Data Harvest Luigi Tasks", formatter_class=ArgumentDefaultsHelpFormatter
    )
    parser.add_argument(
        "--portfolio_path",
         help="Read portfolio data",
        required=False
    )

    return parser

def main(args):


   pm = PortfolioModels(data_repo= args.portfolio_path)
   print(pm.all_closes.head())


if __name__ == "__main__":

   parser=get_parser()
   args = parser.parse_args()
   main(args)
```

what and why are we doing this?


Test your module in your enviroment running

```
python -m class_python_utils --portfolio_path /home/jose/code/intro_python_finance/data/

```


## Containers (Docker)

<img src="static/img/docker.png">

### What is Docker

 Docker is a tool that allows developers, sys-admins etc. to easily deploy their applications in a sandbox (called containers) to run on the host operating system i.e. Linux. The key benefit of Docker is that it allows users to package an application with all of its dependencies into a standardized unit for software development. Unlike virtual machines, containers do not have high overhead and hence enable more efficient usage of the underlying system and resources.
 
 ### What are containers?

The industry standard today is to use Virtual Machines (VMs) to run software applications. VMs run applications inside a guest Operating System, which runs on virtual hardware powered by the server’s host OS.

VMs are great at providing full process isolation for applications: there are very few ways a problem in the host operating system can affect the software running in the guest operating system, and vice-versa. But this isolation comes at great cost — the computational overhead spent virtualizing hardware for a guest OS to use is substantial.

Containers take a different approach: by leveraging the low-level mechanics of the host operating system, containers provide most of the isolation of virtual machines at a fraction of the computing power.

### Why use containers?

Containers offer a logical packaging mechanism in which applications can be abstracted from the environment in which they actually run. This decoupling allows container-based applications to be deployed easily and consistently, regardless of whether the target environment is a private data center, the public cloud, or even a developer’s personal laptop. This gives developers the ability to create predictable environments that are isolated from the rest of the applications and can be run anywhere.

From an operations standpoint, apart from portability containers also give more granular control over resources giving your infrastructure improved efficiency which can result in better utilization of your compute resources.

Now we are ready to build a fully reproducible  verison of our package

### building the Data Science Container
#### The Base Alpine Linux Image
Alpine Linux is a tiny Linux distribution designed for power users who appreciate security, simplicity and resource efficiency.
As claimed by Alpine:
Small. Simple. Secure. Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox.
The Alpine image is surprisingly tiny with a size of no more than 8MB for containers. With minimal packages installed to reduce the attack surface on the underlying container. This makes Alpine an image of choice for our data science container.
Downloading and Running an Alpine Linux container is as simple as:

```
docker container run --rm alpine:latest cat /etc/os-release
```



#### The DockerFile
Docker can build images automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. Using docker build users can create an automated build that executes several command-line instructions in succession.

#### Usage
The docker build command builds an image from a Dockerfile and a context. The build’s context is the set of files at a specified location PATH or URL. The PATH is a directory on your local filesystem. The URL is a Git repository location.
A context is processed recursively. So, a PATH includes any subdirectories and the URL includes the repository and its submodules. This example shows a build command that uses the current directory as context:


The build is run by the Docker daemon, not by the CLI. The first thing a build process does is send the entire context (recursively) to the daemon. In most cases, it’s best to start with an empty directory as context and keep your Dockerfile in that directory. Add only the files needed for building the Dockerfile.
Warning
Do not use your root directory, /, as the PATH as it causes the build to transfer the entire contents of your hard drive to the Docker daemon.
...


...

The Docker daemon runs the instructions in the Dockerfile one-by-one, committing the result of each instruction to a new image if necessary, before finally outputting the ID of your new image. The Docker daemon will automatically clean up the context you sent.
Note that each instruction is run independently, and causes a new image to be created - so RUN cd /tmp will not have any effect on the next instructions.
Whenever possible, Docker will re-use the intermediate images (cache), to accelerate the docker build process significantly. This is indicated by the Using cache message in the console output. (For more information, see the Dockerfile best practices guide:

source
https://docs.docker.com/engine/reference/builder/

```
FROM alpine:latest


WORKDIR /var/www/



# SOFTWARE PACKAGES

#   * musl: standard C library

#   * lib6-compat: compatibility libraries for glibc

#   * linux-headers: commonly needed, and an unusual package name from Alpine.

#   * build-base: used so we include the basic development packages (gcc)

#   * bash: so we can access /bin/bash

#   * git: to ease up clones of repos

#   * ca-certificates: for SSL verification during Pip and easy_install

#   * freetype: library used to render text onto bitmaps, and provides support font-related operations

#   * libgfortran: contains a Fortran shared library, needed to run Fortran

#   * libgcc: contains shared code that would be inefficient to duplicate every time as well as auxiliary helper routines and runtime support

#   * libstdc++: The GNU Standard C++ Library. This package contains an additional runtime library for C++ programs built with the GNU compiler

#   * openblas: open source implementation of the BLAS(Basic Linear Algebra Subprograms) API with many hand-crafted optimizations for specific processor types

#   * tcl: scripting language

#   * tk: GUI toolkit for the Tcl scripting language

#   * libssl1.0: SSL shared libraries

ENV PACKAGES="\

    dumb-init \

    musl \

    libc6-compat \

    linux-headers \

    build-base \

    bash \

    git \

    ca-certificates \

    freetype \

    libgfortran \

    libgcc \

    libstdc++ \

    openblas \

    tcl \

    tk \

    libssl1.0 \

    "



# PYTHON DATA SCIENCE PACKAGES

#   * numpy: support for large, multi-dimensional arrays and matrices

#   * matplotlib: plotting library for Python and its numerical mathematics extension NumPy.

#   * scipy: library used for scientific computing and technical computing

#   * scikit-learn: machine learning library integrates with NumPy and SciPy

#   * pandas: library providing high-performance, easy-to-use data structures and data analysis tools

#   * class_utils: Our great package

ENV PYTHON_PACKAGES="\

    numpy \

    matplotlib \

    scipy \

    scikit-learn \

    pandas \

    git+https://github.com/JalatorrS1/class_python_utilities.git \

    " 



RUN apk add --no-cache --virtual build-dependencies python3 \

    && apk add --virtual build-runtime \

    build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \

    && ln -s /usr/include/locale.h /usr/include/xlocale.h \

    && python3 -m ensurepip \

    && rm -r /usr/lib/python*/ensurepip \

    && pip3 install --upgrade pip setuptools \

    && ln -sf /usr/bin/python3 /usr/bin/python \

    && ln -sf pip3 /usr/bin/pip \

    && rm -r /root/.cache \

    && pip install --no-cache-dir $PYTHON_PACKAGES \

    && apk del build-runtime \

    && apk add --no-cache --virtual build-dependencies $PACKAGES \

    && rm -rf /var/cache/apk/*



CMD ["python3"]
```

The FROM directive is used to set alpine:latest as the base image. Using the WORKDIR directive we set the /var/www as the working directory for our container. The ENV PACKAGES lists the software packages required for our container like git, blas and libgfortran. The python packages for our data science container are defined in the ENV PACKAGES.
We have combined all the commands under a single Dockerfile RUN directive to reduce the number of layers which in turn helps in reducing the resultant image size.

We have combined all the commands under a single Dockerfile RUN directive to reduce the number of layers which in turn helps in reducing the resultant image size


Now run 

```
docker build -t class-image .
```

What happend? And how can you fix this error?

this happend becase we did not had git installed. remember that each image is a fresh recreation of what the dockerfile is commanding. in this case we need to add

```
   && apk add git \
```

A cool features of docker is the avility to tag image. For example instead fo waiting hours everytime that scikitlearn is building we can  create that image. tag it. upload it to dockerhub and then someone can just pull it without the need to rebuild that stage. 



Another error, Well it turns out that installing scikit-learn in alpine is not an easy task. Can be done but takes time. to avoid wasting time lets use a prebuild image from docker-hub. build with anaconda pre-installed



you can check the images built in your system with the command `docker images`

now lets bash inside of the image
```
docker run -i -t class-image /bin/bash
```

## Homework

Create an account in docker hub , tag and upload your image. 