In [4]:
%%HTML
<style>
.rendered_html table, .rendered_html th, .rendered_html tr, .rendered_html td {
     font-size: 100%;
}
</style>

# Dev environments, workflow and style

This talk is aimed at python developers who have already written a few script or libraries and want to start looking at best practices. 

## Topics covered

* Editors (IDEs)
* Python 3 or python 2
* (spoiler) Which python 3?
* virtualenv
* pip
* Linux / Mac / Windows install
* PEP 8 - Style Guide for Python Code
* Antipatterns

This lesson is going to be very opinionated. By that I mean, it will tell you a way to do it. There may be other ways to do it, even other better ways. However, the methods and software covered here is considered best practice in most circles and is a good starting point for learning more.

## Editors (IDEs)

### VSCode

If you don't already have an IDE (Integrated development environment) installed, or one that you're particularly fond of, try VSCode. 

https://code.visualstudio.com/download

I'm aware that Microsoft in the past have been the bad guys, but that's all changed and VSCode is excellent. You need to install the python plugin for it to get all the fancy code completion and other goodness. 

There's a lot of active development VSCode and Microsoft hired the guy who originally developed the python plugin, so that's being actively worked on too.

### Other editors

There are loads of alternatives if you don't want to use VSCode, the most popular python ones are:
* https://www.jetbrains.com/pycharm/download/
* https://www.sublimetext.com/3
* https://jupyter.org/install

### Vim

If you want to use Vim, there are a lot of good plugins, however before you start down that route, I'd say check out VSCode. It has Vim bindings, which is the main reason I switched to it. 

If you still want to use vim, check out these plugins / recommedations:
* My .vimrc is here if you want to reference it
    * https://github.com/laxdog/dotfiles
* There are a lot of settings you can use too, you'll see some in my vimrc, this is a good guide:
    * https://realpython.com/vim-and-python-a-match-made-in-heaven/   
* Vundle
    * This is a plugin manager for vim, it's excellent, use it
    * Once installed you can just add the plugins you want to your `.vimrc`
    * `git clone https://github.com/gmarik/Vundle.vim.git ~/.vim/bundle/Vundle.vim`
* syntastic (syntax checker)
* vim-flake8 (pep-8 checker)
* nerdtree (file explorer window)
* YouCompleteMe (ctags / autocomplete)

There are loads more, but that's a good start. 

### Python2 vs python3

The simple answer is python 3. Python 2 will no longer be supported after Jan 1st, 2020

In [10]:
from IPython.core.display import display, HTML, Javascript
display(HTML("""<script>const second = 1000, minute = second * 60, hour = minute * 60, day = hour * 24; let countDown = new Date('Jan 01, 2020 00:00:00').getTime(), x = setInterval(function() { let now = new Date().getTime(), distance = countDown - now; document.getElementById('days').innerText = Math.floor(distance / (day)), document.getElementById('hours').innerText = Math.floor((distance % (day)) / (hour)), document.getElementById('minutes').innerText = Math.floor((distance % (hour)) / (minute)), document.getElementById('seconds').innerText = Math.floor((distance % (minute)) / second); }, second)</script><div class="container"> <h1 id="head">Python 2.7 will retire in :</h1> <table> <th>Days</th><th>Hours</th><th>Minutes</th><th>Seconds</th> <tr> <td><span id="days"></span></td> <td><span id="hours"></span></td> <td><span id="minutes"></span></td> <td><span id="seconds"></span></td> </tr> </table> </div>"""))

0,1,2,3
,,,


### But why?

* It's faster. There are a few edge cases where python 2.7 wins, but in generaly 3.7 is faster.
* It uses less memory.
* The language features developed make Python both performent and stable
* The changes that broke 2 to 3 migrations, were down to bad architectural choices. This shouldn't happen again.
* Python 2.7 will be out of support at the start of 2020.
* A lot of major packages no longer support or are dropping support for 2:
    * Django
    * Numpy
    * Pandas
    * matplotlib
* There are some great features in 3:
    * f-strings
    * asyncio
    * dataclasses
    * type hints

### What if I'm stuck with 2?

Basically you need to start moving things over. The sooner you start the better. 

There are a lot of good resources and libraries to help you switch. The main python docs are a good start:

https://docs.python.org/3/howto/pyporting.html

You can chat to me afterwards if you need help in convincing team members or managers.

### Which python 3?

If you can, use python 3.7. That's the latest and greatest. However it's not always as easy as that. Sometime you are limited to what's available on the machine you're logging into and you can't install anything else. 

If you plan on running it on machine internally, check which version it's going to be deployed on before getting too far into things. I think most machines have at least 3.5 installed by default.

Which is acutally fine, you can generally get  most of the good features from 3.5 onwards. Here's quick list of what you do and don't get in the different versions incase you have to choose. 

|Feature           |3.4|3.5|3.6|3.7|
|------------------|---|---|---|---|
|dataclass         | ✘ | ✘ | ✘ | ✓ |
|Ordered dicts     | ✘ | ✘ | ✓ | ✓ |
|f-strings         | ✘ | ✘ | ✓ | ✓ |
|Advanced unpacking| ✘ | ✓ | ✓ | ✓ |
|async / await     | ✘ | ✓ | ✓ | ✓ |


### Linux install

This is the easiest of the three installs, in a terminal 

On Ubuntu and other Debian derivatives, use apt.

```bash
sudo apt-get install python3 python3-pip
```

On Centos / Fedora / Red Hat and derivatives, use yum.

```bash
# pip is not available in the base repos
sudo yum install epel-release
sudo yum install python3 python-pip
```

For SUSE / SLES and derivatives, use zypper.
```bash
sudo zypper install python3 python-pip
```

### Mac install

The best way to install is via homebrew, you'll need to have Xcode installed first though. 

Open up a terminal and use the following commands.

```bash
xcode-select --install
```

Then install homebrew.

```bash
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```

The script should tell you as much, but you need to update your ~/.profile with the following line

```bash
export PATH=/usr/local/bin:/usr/local/sbin:$PATH
```

Finally we can install python

```bash
brew install python
```

This will install pip for you as well, so at this point you're finished. 


### Windows install

For windows you first need to browse to the downloads section on the python website

https://www.python.org/downloads/release/python-372/

Grab the `Windows x86-64 executable installer`

Once it's downloaded, just double click and install it.

**important**:
Make sure you click the 'Add Python 3.7 to Path' checkbox.

### pip

#### What is it?
Pip is package management tool for python. So think of it like apt/yum/homebrew for python. It's a recursive acronym that stands for **Pip Installs Packages**.

#### Why do I need it?
You *could* go and download the source or the packages for all the libraries you need and install them manually, however that would be an absolute nightmare. 

Pip not only can fetch and install these packages for you, it also manages the versions for you. So if the package has been developed correctly, it will be able to make sure all the versions of the packages installed work nicely together. 

#### How do I use it?
It's pretty simple, if you already know the name of the package you're interested in:

```bash
pip3 install requests
```

Or you can search, but normally you'll already know what you want

```bash
pip3 search requests
```

#### pip freeze

When you're ready to ship your code and run it somewhere else, you need to know what the dependencies are. You can use pip to work this out

```bash
$ pip3 freeze

certifi==2019.3.9
chardet==3.0.4
idna==2.8
requests==2.21.0
urllib3==1.24.1
```

You can use this in a file called requirements.txt which can be used later to install those requirements easily

```bash
$ pip3 freeze > requirements.txt
```

Then on another machine (or another virtualenv)

```bash
pip3 install -r requirements.txt
```

You'll frequently see these files in repositories. If you checkout a new project from github or elsewhere and see this file, this is how you install the requirements.

#### Proofpoint pip

Internally we have a pip repository, the main reasons for using this are

* To host packages that have been developed internally
* To give access to pip package for machine that do not or should not have access to the internet

The repository is set up using artifactory, so it's the same place we host ubuntu and centos packages. I'll do a lesson on creating and hosting packages in the future. But for now if you want to use the repo you can add this to your pip config `~/.config/pip/pip.conf`

```
[global]
index-url = https://repocache.nonprod.ppops.net/artifactory/api/pypi/pypi/simple
index = https://repocache.nonprod.ppops.net/artifactory/api/pypi/pypi/simple
```

### Virtualenv (venv)

#### What is it?

Virtual env is a way to separate code you are developing from your main / system python install. It's not a hard requirement for writing applications, but it's a really good habit to get into. 

It basically creates an isolated, standalone python environment.

A fairly standard workflow would be :

* Checkout your source repo / cd to your source directory
* create a virtual env
* install the requirements
* start writing code

#### Why do I need it?

There are a number of reasons you could / should use it
* You don't need admin rights to install packages, they're install into a user owned directory
* You need to use multiple version of a library so don't want to install it to the system
* You are working on a library / code so you don't want to break the system version

#### How do I use it?

In python 2, we needed to install a separate package to enable virtual environments, this is no longer required in python 3 as it's a builtin module. 

There are two steps, creating the environment, then activating it. 

```bash
python3 -m venv venv
source venv/bin/activate
```

In the first command, the first *venv* is telling python which module to use. The second *venv* is telling it what to call the directory your isolated python will be stored in. 

The second command is calling a script that gets installed into the environment. This basically sets up all your paths.

If you use these a lot you can consider putting them in your `~/.bashrc` like so:

```bash
alias v3='python3 -m venv venv && source venv/bin/activate'
```

This command creates and sources the environment. But every time you want to use it, you'll need to run the second part of that, so it's probably a good idea to add another line to your `~/.bashrc` like this:

```bash
alias sv='source venv/bin/activate'
```

### pip and virtualenv

A lot of the time you're working in a virtual env, you'll be installing and using the code that you're working on. It can be a pain to have to install build the code every time and install it into your virtual env. 

There's a nice feature in pip called editable mode, which is enabled with the `-e` flag. Basically this installs the code using a symlink, so when you change the code in your local directory, it's automatically available in the virtual env.

To install a local package (there should be a setup.py) in editable mode, you do the following:

```bash
pip install -e .
```

#### New project workflow summary

1. Create a new project dir `new_dir`
2. git init, git add *, git commit
3. Set up a new virtual environment, activate it
4. pip install `some libs you need`
5. pip freeze > requirements.txt
6. git add, commit requirements.txt
7. pip install -e . # Only required if your code is a package
8. Start writing code

#### Existing project 
1. Checkout existing code and cd to dir
2. Set up a new virtual environment, activate it
3. pip install -r requirements.txt
4. pip install -e . # Only required if your code is a package
5. Start writing code

## PEP 8 -- Style Guide for Python Code

### What is it?

As the title says it's a style guide for writing python code. It's a guide to certain conventions that you should follow when writing code. This can be anything from how many blank lines or spaces you use between things, to how you should name variables.

### Why do I care?

Because it helps everyone, including you. 

Imagine reading a magazine article that put just two words on every line. Or had the text written right to left instead of left to right. It would be annoying and it would take you a lot longer to understand what was going on. You'd have to concentrate a lot more on what it was trying to tell you. 

This is the same thing. If everyone uses a consistent style when writing code, it makes it easier for everyone to understand. 

Some day you may want to contribute to open source software, which can be developed and maintained by thousands of developers from around the world. Having everyone on the same page lowers the barrier to entry.

### Do I have to use it?

No, it's just a guide. Your code will work just as well without following PEP-8, it's not a requirement for your code to work, just good practice you should follow.

You are however, much more likely to get your code used by others, understood by others, accepted at code review and understood by yourself when you come back to it. 

It just makes sense.

### What should I do?

The full PEP-8 guide can be found here https://www.python.org/dev/peps/pep-0008/, it can be a bit overwhelming though. So we'll just cover some of the main points or most common things you'll come across.

The first thing you should do is check your IDE can check if your code is PEP-8 compliant or not. Most either support it out of the box, or have a plugin to do so. 

You can easily check by breaking any of the following guidelines:

### Tabs or spaces (indentation)

* Spaces. 4 of them. NOTHING ELSE!

You should always uses spaces, don't listen to what anyone else says about it not looking right on their screen or some other crap. Use 4 spaces and never tabs.

### Line length

* Less than 120 characters

Strictly speaking the guide says lines should be less than 80 characters, though most people accept this is too small. I personally use the google style recomdendation on this of 120 characters as do a lot of people.

### imports

* All imports should be at the top of the file
* Imports should be on separate lines.
* If you're using 'from' then separate by comma

Yes:

```python
import os
import sys
```

No:

```python
import sys, os
```

It's okay to say this though:

```python
from subprocess import Popen, PIPE
```

### function and variable names
* Lowercase, with words separated by underscores as necessary to improve readability.

```python
def my_lovely_function():
    print("Do something")
    

your_variable = 10
```

### class names
* Class names should normally use the CapWords convention. 

```python
class ThisIsAnExampleClass():
    pass

instance_of_class = ThisIsAnExampleClass()
```

## Other good practices

Although not explicitly mentioned in the PEP-8 guide here are a number of other things to consider:

### Be descriptive

* Don't use single letter variables
* Use desciptive names for functions and variables

There's no real reason not to be descriptive. Maybe if it's a short lived small loop. But why bother? 

Why not get into the habit of not using them and use something descriptive so that you can quickly read and identify what's happening. 

Forming good habits means you don't have to think about things as much. You'll automatically be writing more descriptive, more readable code. 

When we read:

```python
for x in my_list:
    process(x)
```

We've no idea what's going on. But if we take: 

```python
for line in log_file:
    search_for_log_error(line)
```

it's clear to even the casual observer what is happening. The less time we spend on working out what something is doing the better. 

## Anti-patterns

_"An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive."_  Coined in 1995 by Andrew Koenig

There are of course a lot of really bad anti-patterns. We'll cover some of the more common ones that you should avoid.

There's a good list of them here:

https://docs.quantifiedcode.com/python-anti-patterns/

### No exception type(s) specified

This is probably the single worst antipattern in python. NEVER do this. Ever.

```python
try:
    do_something()
except:
    pass
```

Not just the `pass`, though that's pretty bad. The worst part is there is no exception type. This means that any exception that happens in `do_something()` will not be caught. 

I have spent full days trying to debug issues due to this. It's completely silent and incredibly difficult to find.

Catching the error and doing something else is normal, in fact it's pretty common in python. Exceptions are something can occur in normal patterns, but you need to deal with them in the correct manner. If you know what the exception is you can do this

```python
try:
    do_something()
except ValueError:
    pass
```

There's a full list of exceptions here https://docs.python.org/3/library/exceptions.html

If you must catch all exceptions then you need to log it or do something else. However, catching the explicit error is preferred.

```python
try:
    do_something()
except Exception as ex:
    logging.exception('Caught an error')
```


### Assigning to built-in function

In [20]:
# Using the dict var overwrites the python built-in
dict = {'one':1, 'two':2, 'three':3}

cars = dict()


TypeError: 'dict' object is not callable

There are special cases where you may want to overwrite built-ins. This isn't one of them. 

In general you should steer clear of this. You can find a list of Python built-ins here:

https://docs.python.org/3/library/functions.html

### Using getters and setters

This is a common pattern in Java and other languages, though it's not required in Python. In python you should make the member public and access it directly. 

If you need to process something before getting/setting you can use the `@property` decorator.

In [21]:
# Don't do this!
class Circle(object):
    def __init__(self, radius):
        self._radius = radius

    def get_radius(self):
        return self._radius
    
    def set_radius(self, radius):
        self._radius = radius

my_circle = Circle(5)
my_circle.get_radius()
my_circle.set_radius(6)

In [23]:
# Instead do this
class Circle(object):
    def __init__(self, radius):
        self.radius = radius


my_circle = Circle(5)
my_circle.radius
my_circle.radius = 6

In [24]:
# If you want to use @property
from math import pi

class Circle(object):
    def __init__(self, radius):
        self.radius = radius

    @property
    def circumference(self):
        return 2 * pi * self.radius
    
my_circle = Circle(3)
print(my_circle.circumference)


18.84955592153876


### Not using explicit unpacking


This code is error-prone and too verbose

In [25]:
# Don't do this!
list_of_ages = [41, 17, 28]

alice_age = list_of_ages[0]
bob_age = list_of_ages[1]
claire_age = list_of_ages[2]

Instead use unpacking. This does the same thing and it's easier to read and maintain. 

In [26]:
# Do this instead.
list_of_ages = [41, 17, 28]

alice_age, bob_age, claire_age = list_of_ages

### Not using .items() for dictionary iteration

This kind of follows on from the last one.

This works, but is less readable that the preferred version.

In [32]:
# Don't do this!
book = {"title": "The Way of Kings", "author":"Brandon Sanderson"}

for key in book:
    print("{0} : {1}".format(key, book[key]))

title : The Way of Kings
author : Brandon Sanderson


Instead use items() to iterate over the dictionary. 

When you define two variables in a for loop in conjunction with a call to items() on a dictionary, Python automatically assigns the first variable as the name of a key in that dictionary, and the second variable as the corresponding value for that key.

In [33]:
# Do this instead.
book = {"title": "The Way of Kings", "author":"Brandon Sanderson"}

for key, value in book.items():
    print("{0} : {1}".format(key, value))

title : The Way of Kings
author : Brandon Sanderson


This becomes ever more useful when you're using the key as part of the data you're interested in

In [41]:
books = {"The Way of Kings": "Brandon Sanderson",
         "The Name of the Wind": "Patrick Rothfuss",
         "The Lies of Locke Lamora": "Scott Lynch"}

for title, author in books.items():
    print("'{0}' by {1}".format(title, author))

'The Way of Kings' by Brandon Sanderson
'The Name of the Wind' by Patrick Rothfuss
'The Lies of Locke Lamora' by Scott Lynch


### Using * while importing

Please don't do this... This imports everything from module, it's lazy and makes it much harder for people to work out what you're doing. Explicitly import the functions you want, or import the base module and call the functions directly from there.

Imaging you have several libraries that you do an import * from. How is anyone meant to know where the functions come from that you're using. If there's a random function call in the middle of your code that's broken, you can't expect people to search through all the imported libraries just to find it.

In [None]:
# Don't do this!
from math import *

In [49]:
# Do this
from math import ceil

# or this
import math
x = math.ceil(1)

# or this
import pandas as pd
df = pd.DataFrame([8, 1])