# Distributing Code in Python
Date: 2020-28-01  
Author: Jason Beach  
Categories: Deployment, Distribution  
Tags: python, setuptools  
<!--eofm-->

Distributing and deploying products is a necessary step in the solution development process.  Each solution must be thoughtfully analyzed for strengths and weaknesses, especially from the perspective of security.  The decisons you make are largely based on the language employed to create the solution.  This post will describe steps distributing Python.

## Introduction

Simple python scripts are a terrific approach to getting work done, quickly.  However, in more professional environments, the ease of use can make distribution tricky.  Python is a Byte code interpreted language so that it is disseminated as scripts.  Compilation is unique to the machine that runs the code.  Providing users with the bytecode (.pyc) is possible, but must be run, by the python interpreter, on the same version of virutal machine.

Compiled Python bytecode files are architecture-independent, but VM-dependent. A .pyc file will only work on a specific set of Python versions determined by the magic number stored in the file, using the following:

In [1]:
! python -V

Python 3.7.3


In [None]:
# python
import imp
imp.get_magic().encode('hex')   #output: 'd1f20d0a'

In [None]:
# python
f = open('test25.pyc')
magic = f.read(4)
magic.encode('hex')     #output:'b3f20d0a' > not compatible

Bytecode is platform-independent, so bytecodes compiled by a compiler running in windows will still run in linux/unix/mac.  Machine code is platform-specific, if it is compiled in windows x86, it will run ONLY in windows x86.

This is different from other popular languages, such as Java, which is compiled to a Java Archive (.jar) file, then run using the Java Virtual Machine (jvm).  That was the original popularity of java: it was a _portable_ language.  All you needed was the correct JVM.

Bytecodes are the machine language of the Java virtual machine. When a JVM loads a class file, it gets one stream of bytecodes for each method in the class. The bytecodes streams are stored in the method area of the JVM. The bytecodes for a method are executed when that method is invoked during the course of running the program. They can be executed by intepretation, just-in-time compiling, or any other technique that was chosen by the designer of a particular JVM.

A method's bytecode stream is a sequence of instructions for the Java virtual machine. Each instruction consists of a one-byte opcode followed by zero or more operands. The opcode indicates the action to take. If more information is required before the JVM can take the action, that information is encoded into one or more operands that immediately follow the opcode.

While providing the source code is preferred when working in a collaborative environment, it is not ideal when distributing to end-users who are uninterested in the internals.  In addition, providing source code might present security concerns, especially if the python developer is trying to create a closed-source product.  Source code obfuscators exist, but this approach was an after-thought in python's development.

binary packages (wheels)

## Distribution Methods

* source code repo
* PyPi
  - dependency
  - commandline utility
* bundle your program and the python runtime into a single file
* Docker
* obfuscator

The typical method of disseminating Python solutions is to maintain sourece code in a repository, such as Github, then package and upload the solution to PyPi, the [Python Package Index](https://pypi.org/).  PyPi allows projects and dependencies to be imported with a simple `pip install <package>`.  What's more, the packaging process is quite a bit easier than that of more demanding communities, such as R libraries' [ Comprehensive R Archive Network(CRAN)](https://cran.r-project.org/).  This comes at a cost in that PyPi is known to have had imposters (viruses) to popular packages which may have one character different than the actual, desired package.  So, some care should be taken.

Some terminology:

* file - 
* module - usually a sincle python script (name.py)
* package - a module with other modules, and packages, included
* binary package (bdist) - 
* source package (sdist) - 

## Packaging

## References

* [nice discussion on packaging](https://python-packaging-tutorial.readthedocs.io/en/latest/setup_py.html)
* [distributing a package](https://packaging.python.org/guides/distributing-packages-using-setuptools/)