# 19) Be a Pythonista <a class="tocSkip">

### Install packages

When you need to develop some code, the fastest solution is to use the ideas from other code sources. The first place to look is the standard Python library and the Python Package Index (PyPI). PyPI is constantly updated with Python packages with currently more than 113,000. When you use pip, it searches PyPI. Another popular repository is GitHub and Popular Python recipes has more than four thousand short Python programs.

There are many ways to install Python packages:

- Use pip if you can. It is the most common method and you can install most of the Python packages you are likely to encounter using pip.

- Use pipenv, which combines pip and virtualenv.

- Use conda if you do a lot of scientific work and want to use the Anaconda distribution of Python.

- Install from source.

The simplest use of pip is to install the latest version of a single package by using the following command:

    pip install flask
    
You can also ask pip to install a specific version or a minimum version:

    pip install flask==0.9.0
    pip install 'flask>=0.9.0'
    
If you want to install more than one Python package, you can use a requirements file. Although it has many options, the simplest use is a list of packages, one per line, optionally with a specific or relative version. Your sample requirements.txt file might contain:

    flask==0.9.0
    django
    psycopg2
    
You can then use:

    pip -r requirements.txt
    
We can also install the latest version of a package and delete a package using:
    
    pip install --upgrade package
    pip uninstall package

#### Using virtualenv

The standard way of installing third-party Python packages is to use pip and virtualenv. A virtual environment is just a directory that contains the Python interpreter, some other programs like pip, and some packages. You activate it by running the shell script activate that it is in the bin directory of that virtual environment. This sets the environment variable PATH that your shell uses to find programs. By activating a virtual environment, you put its bin directory ahead of the usual directories in your PATH. The result is that when you type a command like pip or python, your shell first finds the one in your virtual environment, instead of system directories like /bin, /usr/bin, or /usr/local/bin.

#### Installing from source

Occasionally, a Python package is new, or the author has not managed to make it available with pip. To build the package, you generally do the following:

- Download the code
- Extract the files by using zip, tar or another appropriate tool if they are archived or compressed.
- Run "python setup.py install" in the directory containing a setup.py file.

### Test

Python lacks the type-checking of static languages, which makes some things easier but also lets undesirable results through the door, as a result, testing is essential. Before creating actual test programs, we should run a Python code checker. The most popular are pylint and pyflakes. These check for actual code errors and style faux pas. Below is an example of a poorly written Python script:

In [4]:
%%writefile bad_script.py

# Poor Python code

a = 1
b = 2
print(a)
print(b)
print(c)

Writing bad_script.py


In [7]:
# Pylint report of the above script

!pylint bad_script.py

************* Module bad_script
bad_script.py:1:0: C0114: Missing module docstring (missing-module-docstring)
bad_script.py:4:0: C0103: Constant name "a" doesn't conform to UPPER_CASE naming style (invalid-name)
bad_script.py:5:0: C0103: Constant name "b" doesn't conform to UPPER_CASE naming style (invalid-name)
bad_script.py:8:6: E0602: Undefined variable 'c' (undefined-variable)

--------------------------------------------------------------------

Your code has been rated at -6.00/10 (previous run: -6.00/10, +0.00)





If we fix the error and rewrite then we will improve our pylint score:

In [28]:
%%writefile good_script.py

# Better Python code

def func():
    """ Function that prints three numbers. """
    first = 1
    second = 2
    third = 3
    print(first)
    print(second)
    print(third)

Overwriting good_script.py


In [29]:
# Pylint report of the above script

!pylint good_script.py

************* Module good_script
good_script.py:1:0: C0114: Missing module docstring (missing-module-docstring)

------------------------------------------------------------------

Your code has been rated at 8.57/10 (previous run: 8.57/10, +0.00)





It is a good practice to write independent test programs first, to ensure that they all pass before you commit your code to any source control system. Writing test help you find problems faster- especially regressions (breaking something that used to work). The standard library contains two test packages: unittest and doctest. We shall write a module that capitalizes words and use unittest:

#### Unittest

We start by writing our capitalizing function:

In [44]:
%%writefile cap.py

def just_do_it(text):
    return text.capitalize()

Overwriting cap.py


The basis of testing is to decide what outcome you want from a certain input (here, you want the capitalized version of whatever text you input), submit the input to the function you are testing and then check whether it returned the expected results. The expected result is called an assertion, so in unittest you check your results by using methods with names that begin with assert. We shall now write our testing:

In [45]:
%%writefile test_cap.py

import unittest
import cap

class TestCap(unittest.TestCase):
    
    def setUp(self):
        pass
    
    def tearDown(self):
        pass
    
    def test_one_word(self):
        text = 'duck'
        result = cap.just_do_it(text)
        self.assertEqual(result, 'Duck')
        
    def test_multiple_words(self):
        text = 'a flock of ducks'
        result = cap.just_do_it(text)
        self.assertEqual(result, 'A Flock Of Ducks')

if __name__ == '__main__':
    unittest.main()

Overwriting test_cap.py


In [46]:
!python test_cap.py

F.
FAIL: test_multiple_words (__main__.TestCap)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_cap.py", line 21, in test_multiple_words
    self.assertEqual(result, 'A Flock Of Ducks')
AssertionError: 'A flock of ducks' != 'A Flock Of Ducks'
- A flock of ducks
?   ^     ^  ^
+ A Flock Of Ducks
?   ^     ^  ^


----------------------------------------------------------------------
Ran 2 tests in 0.000s

FAILED (failures=1)


It liked the first test (test_one_word), but not the second (test_multiple_words). The up arrows (^) show where the strings actually differed. Reading the documentation for the string capitalize function we see that it capitalizes only the first letter of the first word. If we added a title() function instead, then it would have worked. unittest provides a small but powerful set of assertions, letting you check values, confirm whether you have the class you want, determine whether an error was raised and so on.

#### Doctest

The second test package in the standard library is doctest. With this package, you can write tests within the docstring itself, also serving as documentation. It looks like the interactive interpreter: the characters >>>, followed by the call, and then the results on the following line. Below is the same example as above using doctest:

In [6]:
%%writefile cap2.py

def just_do_it(text):
    """
    >>> just_do_it('a flock of ducks')
    'A Flock Of Ducks'
    """
    return text.capitalize()

if __name__ == '__main__':
    import doctest
    doctest.testmod()

Overwriting cap2.py


In [7]:
!python cap2.py

**********************************************************************
File "cap2.py", line 4, in __main__.just_do_it
Failed example:
    just_do_it('a flock of ducks')
Expected:
    'A Flock Of Ducks'
Got:
    'A flock of ducks'
**********************************************************************
1 items had failures:
   1 of   1 in __main__.just_do_it
***Test Failed*** 1 failures.


We see the same result as above, but without having to write a separate class for the test. If there were no issues then there would be no output.

#### Continuous integration

When a group produces a lot of code daily, it helps to automate test as soon as changes arrive. You can automate source control systems to run test on all code as it is checked in. Below are large systems capable of performing this automation:

- buildbot - Written in Python, this source control system automates building, testing and releasing.
- jenkins - This is written in Java and seems to be the preferred CI tool of the moment.
- travis-ci - This automates projects hosted at GitHub and is free for open source projects.
- circleci - This one is commercial but free for open source and private projects.

### Debug Python code

Test first. The better your test are, the less you will have to fix later. When code breaks, it is usually because of something you just did. So you typically debug "from the bottom up", starting with your most recent changes. But sometimes the cause is elsewhere, in something that you trusted and thought worked. You would think that if there were problems in something that many people use, someone would have noticed by now. That is not always what happens. So after fixing recent changes, question your assumptions. This is a "top-down" approach and it takes longer.

The simplest way to debug in Python is to print out strings. Some useful things to print include vars(), which extracts the values of your local variables, including function arguments. We could also use decorators. A decorator can call code before or after a function without modifying the code within the function itself. This means that you can use a decorator to do something before or after any Python function, not just ones that you wrote.

### Log error messages

At some point it may be necessary to move from print statements to logging messages. A log is usually a system file that accumulates messages, often inserting useful information such as a timestamp or the name of the user who is running the program. When something goes wrong with your program, you can look at the appropriate log file to see what happened. The contents of exceptions are especially useful in logs because they show you the actual line at which your program broke, and why.

The standard Python library module is logging. The logging module includes these concepts:

- The message that you want to save to the log.
- ranked priority levels and matching functions: debug(), info(), warn(), error() and critical().
- One or more logger objects as the main connection with the module.
- Handlers that direct the message to your terminal, a file, a database or somewhere else.
- Formatters that create the output.
- Filters that makes decisions based on the input.

Here are some of its functions:

In [1]:
import logging

In [11]:
# Examples of logging functions

logging.debug('Debuging')
logging.info('Info')
logging.warn('Warning')
logging.error('Error')
logging.critical('Critical')

ERROR:root:Error
CRITICAL:root:Critical


Notice that debug() and info() did not do anything, and the other two printed LEVEL:root: before each message. You can scan for a particular value of LEVEL in a log file to find particular messages, compare timestamps and so on. The default priority level is WARNING, and that got locked in as soon as we called the first function debug(). Debug is the lowest level, so this enables it and all the higher levels to flow through. We can set the default level by using basicConfig(). We use handlers to direct the messages to different places. The most common is a log file:

In [8]:
# Creating a log file

logging.basicConfig(level = 'DEBUG', filename = 'logfile.log')
logger = logging.getLogger('logger_name')
logger.debug('There is a bug.')
logger.warn('This is a warning.')



The logging module includes at least 15 handlers to send messages to places such as email and web servers as well as the screen and files. You can control the format of your logged messages. If you provide a format string to basicConfig(), you can change to the format of your preference:

In [14]:
# Formatting the logger

fmt = "%(asctime)s %(levelname)s %(lineno)s %(message)s"
logging.basicConfig(level = 'DEBUG', format = fmt)
logger = logging.getLogger('logger_name')
logger.error('This is an error.')

ERROR:logger_name:This is an error.


### Optimize

In many cases, you can gain speed by using a better algorithm or data structure, the trick is knowing where to do this. This leads us to timers. A quick way of timing something is to get the current time, do something, get the new time, and then subtract the original time from the new time. There is a handier way to measure code snippets: using the standard module timeit. It has a function called timeit(), which will do count runs of your test code and print some results. The syntax is: timeit.timeit(code, number=count). Using timeit() means wrapping the code you are trying to measure in a string. If you have multiple lines of code, you could pass it to a triple-quoted multiline string, but that might be hard to read. Instead we could use a decorator to wrap around a function:

In [15]:
import time

In [25]:
# Decorator function to time a function

def time_decorator(func):
    def inner(*args, **kwargs):
        t1 = time.time()
        result = func(*args, **kwargs)
        t2 = time.time()
        print("Time for function to run: {:.4f}".format(t2-t1))
        return result
    return inner

@time_decorator
def snooze():
    time.sleep(1)

In [26]:
# Run our sleeper function
snooze()

Time for function to run: 1.0121


If you are pushing Python as hard as you can and still cannot get the performance you want, there are options. 
    
    Cython

Cython is a hybrid of Python and C, designed to translate Python with some performance annotations to compiled C code. These annotations are fairly small, like declaring the types of some variables. For scientific-style loops of numeric calculations, adding these hints will make them much faster. Many parts of Python and its standard library are written in C for speed and wrapped in Python for convenience. If you know C and Python and really want to make code fast, writing a C extension is had, but the improvements can be worth the trouble. The standard Python implementation is written in C, and often called CPython (not to be confused with Cython). 

    PyPy

Like PHP, Perl and Java, Python is not compiled to machine language, but translated to an intermediate language which is then interpreted in a virtual machine. PyPy is a new Python interpreter that applies some of the tricks that sped up Java. Its benchmarks show that PyPy is faster that CPython in every test - more than six times faster on average. You can download it and use it instead of CPython. PyPy is constantly being improved, and it might even replace CPython some day.

    Numba
    
You can use Numba to compile your Python code on the fly to machine code and speed it up. By using the @jit decorator, you can speed up calls after the first time a function is called. Numba is especially useful with NumPy and other mathematically demanding packages.

### Source control

When you are working on a small group of programs, you can usually keep track of your changes. If you work with a group of developers, source control becomes a necessity. There are many commercial and open source packages in this area. The most popular in the open source world where Python lives are MErcurial and Git. Both are examples of distributed version control systems, which produce multiple copies of code repositories.

#### Git

Git was originally written for Linux kernel development, but now dominates open source in general. GitHub is the largest git host, with more than a million repositories, but there are many other hosts. Here we shall go through some of the main commands:

Assuming we have created some file called test.py, which is within newdir:

We would likely find a file saying that there are "changes to be committed". This means that test.py is part of the local repository but its changes have not yet been committed:

To add all changed files in the current directory we can use git add .

To see all the changes that have been made to a file we can use $ git log test.py