# Workshop 14. Code Style and Code Reading. 


# Code style. Zen.

High readability is one of the great things about Python. For example, it forces the programmer to keep correct indentation, making the code automatically readable and not including unnecessary "end" statements.

However, high readability comes not only from the syntax of the language itself, but also from its community and guidelines.

PEP8 guideline with better readability than the python.org one: https://pep8.org/

## Explicit vs Implicit

Which one of the two snippets of code is explicit, which one is implicit?

What is better about the explicit one? What is better about the implicit one? What should your reasoning be when you choose whether to use implicit code or explicit code?


Example #1

In [None]:
def make_complex(*args):
    x, y = args
    return dict(**locals())

In [None]:
def make_complex(x, y):
    return {'x': x, 'y': y}

Example #2

In [None]:
import requests
r = requests.get("https://miguelgfierro.com")

In [None]:
from requests import *
r = get("https://miguelgfierro.com")

Example #3

In [None]:
def read(filename):
    if filename[-4:] == '.csv':
        # code for reading a csv
        pass
    elif filename[-5:] == '.json':
        # code for reading a json
        pass    

In [None]:
def read_csv(filename):
    pass
    # code for reading a csv

def read_json(filename):
    pass
    # code for reading a json

Explicit examples make it clear right away what is happening even if in some cases this leads to redundant information.

Implicit examples assume additional knowledge. This assumption may be correct. It may be incorrect.

It is often easier to write implicit code. *When you write code, you have the additional knowledge. When you read code, this may not be the case*. This leads to cases when while writing code you compare Implicit and Explicit examples from the perspective of someone with additional knowledge. From this perspective explicit code is redundant.

You reasoning shouldn't be "what is easier to write". It should be "what is easier to read assuming minimal pre-existing knowledge about how the program works". In this case explicit code is much more often the better option.

## Simple vs Complex. One statement per line

"Sparse is better than dense"

It may be fun to put everything into a single dense statement. However, it hurts readability.

Splitting code into small simple parts lets the reader easily understand what each part does separately.

In [None]:
print('one'); print('two')

if x == 1: print('one')

if <complex comparison> and <other complex comparison>:
    # do something

In [None]:
print('one')
print('two')

if x == 1:
    print('one')

cond1 = <complex comparison>
cond2 = <other complex comparison>
if cond1 and cond2:
    # do something

In [None]:
# A single statement that is very hard to understand.
print('\n'.join("%i bytes = %i bits which has %i possible values." % (j, j*8, 256**j-1) for j in (1 << i for i in range(8))))

1 bytes = 8 bits which has 255 possible values.
2 bytes = 16 bits which has 65535 possible values.
4 bytes = 32 bits which has 4294967295 possible values.
8 bytes = 64 bits which has 18446744073709551615 possible values.
16 bytes = 128 bits which has 340282366920938463463374607431768211455 possible values.
32 bytes = 256 bits which has 115792089237316195423570985008687907853269984665640564039457584007913129639935 possible values.
64 bytes = 512 bits which has 13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084095 possible values.
128 bytes = 1024 bits which has 179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137215 possible values.


Example from Homework 1. Problem #18

In [None]:
A = int(input())
B = int(input())
print(((((A // B) * A) + ((B // A) * B)) // ((A // B) + (B // A))))


5
8
8


In [None]:
a = int(input())
b = int(input())

remainder_a = a // b
remainder_b = b // a

max_number = (remainder_a * a + remainder_b * b) // (remainder_a + remainder_b)

print(max_number)

8
4
8


## Indentation and whitespace

Use four spaces per indentation level.

You can start a next line inside parentheses, brackets and braces. In this case you are technically free to use any indentation you want. So you should make readability your priority when choosing the indentation.

* Using vertical alignment makes it easy to see all the arguments of a function at once.
* Using additional indentation makes it easy to see where the arguments end and the next line of code starts.

In [None]:
# Aligned with opening delimiter.
foo = long_function_name(var_one, var_two,
                         var_three, var_four)

# More indentation included to distinguish this from the rest.
def long_function_name(
        var_one, var_two, var_three,
        var_four):
    print(var_one)

# Hanging indents should add a level.
foo = long_function_name(
    var_one, var_two,
    var_three, var_four)

In [None]:
# Arguments on first line forbidden when not using vertical alignment.
foo = long_function_name(var_one, var_two,
    var_three, var_four)

# Further indentation required as indentation is not distinguishable.
def long_function_name(
    var_one, var_two, var_three,
    var_four):
    print(var_one)

Use whitespace
* To surround these binary operators with a single space on either side: assignment (=), augmented assignment (+=, -= etc.), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), Booleans (and, or, not).
* To surround operators starting with those with lower priority. `hypot2 = x*x + y*y`
* After commas except before a close parenthesis

Avoid whitespace
* Immediately inside parentheses. `spam(ham[1], {eggs: 2})`
* Before commas, semicolons, colons
* Between function/list name and the following parentheses/brackets. (`spam(1)` and `spam_list[1]`)
* Around the = sign in keyword arguments or default values. (`print(a, b, sep='')`)

## Line length

Recommended maximum line length is 79 symbols (and the newline symbol, making the total 80).

Why?

* To open code in narrow editor windows. Useful in code reviews, comparisons.
* To avoid automatic wrapping as it is harder to read than manual one.

The preferred way of wrapping long lines is by using Python’s implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses.

Backslashes may be appropriate in some cases where the preferred way specifically fails. If you have a free choice of method, use parentheses.

## Naming variables and functions

The preferred way of naming variables and functions in Python is `lower_case_with_underscores`.

Use descriptive names. Make code self-documenting, so that anyone would be able to understand what happens right away.

In [None]:
# Init a & b to 0
a = 0
b = 0
 
# Read a & b
a, b = readvalues()
 
# Add a and b, and divide by 2
c = a + b
 
#Divide c by 2
d = c / 2
 
# print c
print(c)

NameError: ignored

In [None]:
number1 = 0
number2 = 0
 
number1, number2  = readvalues()
 
sum = number1  + number2
 
average = sum / 2
 
print(average)

Use comments to help understand reasoning behind the code.

In [None]:
# Our code doesn't work correctly when using a usb, so we have to add this extra checks
if (special_case_usb):
    Do_something_special

# Code Reading

One of the most common tasks in programming is changing code you don't completely understand. This can both include your old code and other people's code.

There are two ways of improving your solution of this task. The first one is to write good code that is easy to understand. If you follow the Zen of Python, learn about good programming practices and name your variables well, you will be able to do this. You will most often read your own code, so this is an important part of solving this task.  
The second way is to get used to reading unfamiliar code, so that you can quickly recognize some patterns and understand how it works.



Good practice in both reading code and learning about what Python can do is studying Python scripts for performing various things.

For example, you can find them in these repositories.

* https://github.com/geekcomputers/Python
* https://github.com/bamos/python-scripts
* https://github.com/hastagAB/Awesome-Python-Scripts
* https://github.com/realpython/python-scripts

Open a script that interests you, try running it and changing something.

For example, here is a script that prints file information.

https://github.com/geekcomputers/Python/blob/master/fileinfo.py

In [None]:
from __future__ import print_function

import os
import stat
import sys
import time

if sys.version_info >= (3, 0):
    raw_input = input

file_name = raw_input("Enter a file name: ")  # pick a file you have
count = 0
t_char = 0
try:
    with open(file_name) as f:
        line = f.readline()
        t_char += len(line)
        while line:
            count += 1
            line = f.readline()
            t_char += len(line)
except FileNotFoundError as e:
    print(e)
    sys.exit()

file_stats = os.stat(file_name)
# create a dictionary to hold file info
file_info = {
    'fname': file_name,
    'fsize': file_stats[stat.ST_SIZE],
    'f_lm': time.strftime("%d/%m/%Y %I:%M:%S %p",
                          time.localtime(file_stats[stat.ST_MTIME])),
    'f_la': time.strftime("%d/%m/%Y %I:%M:%S %p",
                          time.localtime(file_stats[stat.ST_ATIME])),
    'f_ct': time.strftime("%d/%m/%Y %I:%M:%S %p",
                          time.localtime(file_stats[stat.ST_CTIME])),
    'no_of_lines': count,
    't_char': t_char
}

print("\nfile name =", file_info['fname'])
print("file size =", file_info['fsize'], "bytes")
print("last modified =", file_info['f_lm'])
print("last accessed =", file_info['f_la'])
print("creation time =", file_info['f_ct'])
print("Total number of lines are =", file_info['no_of_lines'])
print("Total number of characters are =", file_info['t_char'])

if stat.S_ISDIR(file_stats[stat.ST_MODE]):
    print("This a directory")
else:
    print("This is not a directory\n")
    print("A closer look at the os.stat(%s) tuple:" % file_name)
    print(file_stats)
    print("\nThe above tuple has the following sequence:   ")
    print("""st_mode (protection bits), st_ino (inode number), 
    st_dev (device),    st_nlink (number of hard links),    
    st_uid (user ID of owner),   st_gid (group ID of owner),    
    st_size (file size, bytes),  st_atime (last access time, seconds since epoch),  
    st_mtime (last modification time),   st_ctime (time of creation, Windows)"""
          )


A good way to start is to find where output happens and go backwards. For example, the code above outputs file size. You may want to find what exactly provides this informations, but to do that you start with the point this information is printed.

## Tracing the path of file size

In [None]:
print("file size =", file_info['fsize'], "bytes")

In this case, you can easily find the place that outputs file size because the line of code contains `file size =`. There are few ways of writing information to output, so you can search for them. For example, in larger programs you can run search for the term "print" and browse through all calls of the print function.

The file size is stored in a dictionary called `file_info` at the key `'fsize'`.

Now you can find where `file_info` dictionary got the information about the file size. To do that you need to either find where the dictionary was created or where the key `'fsize'` was added to it.

```
    'fsize': file_stats[stat.ST_SIZE],
```
This is a part of the dictionary creation. Now you know that the dictionary got file size information from another source, the `file_stats` object. At this point `file_stats` can be a list or another dictionary. It is indexed with the `stat.ST_SIZE` and it could be an integer or something else.

In [None]:
file_stats = os.stat(file_name)

Finally, `file_stats` is an output of a library function `os.stat` that takes file name as an input. This means that if you ever want to find the size of a file, you can call the function `os.stat` with the file name as an argument and the output will be some object that contains file size at the index `stat.ST_SIZE`.

Notice how naming the variable `file_name` tells you right away what it is. You can be reasonably sure that it is a string containing the name of the file you get the information for. To be absolutely certain that it is the case, you can check where this variable gets its value.

In [None]:
file_name = raw_input("Enter a file name: ")  # pick a file you have

It seems like this is the function that takes information from standard input. However, the function that takes that information in Python 3 is `input()`, not `raw_input()`.

The code above explains this.

```
if sys.version_info >= (3, 0):
    raw_input = input
```
This function works the same way as `input()` if your Python version is no less than 3.0. So it is a compatibility issue.

## Tracing the path of character count and line count

If you browse through the `file_info` dictionary initialization code, you may notice that some parts of the file information don't come from the `file_stats` object. `file_name` is an obvious one, but there are also `count` and `t_char`.

You can trace their path in a similar way, going backwards and asking the question "where did this information come from?"

For `count` and `t_char` there is a piece of code that changes them in a loop. So to figure out why these variables hold the values for the line count and character count you need to understand how the loop works and what causes these variables to change. 

In [None]:
count = 0
t_char = 0
try:
    with open(file_name) as f:
        line = f.readline()
        t_char += len(line)
        while line:
            count += 1
            line = f.readline()
            t_char += len(line)
except FileNotFoundError as e:
    print(e)
    sys.exit()

There is exception handling, a `with` statement and a loop here.

You can read them the same way, going from what you know (`count` and `t_char`) to the parts of code these variables interact with. `t_char` is increased by the `len` of the `line`. Now `line` is the new unknown object.

`line = f.readline()` explains what `line` is, it is a string, containing a line from the file `f`. Now `f` is the next part that is unaccounted for.

`f` is a file object (descriptor) from the `with` statement. The file it represents is opened with `open(file_name)`, so you can be sure that it's the same file as in the previous case, because the same variable `file_name` contains its name.

## Another example

By following similar steps you can analyze and understand many other pieces of code.

Here is another example of a simple script dealing with files.

https://github.com/geekcomputers/Python/blob/master/folder_size.py

In [None]:
import os
import   # Load the library module and the sys module for the argument vector'''

directory = input("Enter directory name: ")

dir_size = 0  # Set the size to 0
fsizedicr = {'Bytes': 1,
             'Kilobytes': float(1) / 1024,
             'Megabytes': float(1) / (1024 * 1024),
             'Gigabytes': float(1) / (1024 * 1024 * 1024)}
for (path, dirs, files) in os.walk(
        directory):  # Walk through all the directories. For each iteration, os.walk returns the folders, subfolders and files in the dir.
    for file in files:  # Get all the files
        filename = os.path.join(path, file)
        dir_size += os.path.getsize(filename)  # Add the size of each file in the root dir to get the total size.

fsizeList = [str(round(fsizedicr[key] * dir_size, 2)) + " " + key for key in fsizedicr]  # List of units

if dir_size == 0:
    print("File Empty")  # Sanity check to eliminate corner-case of empty file.
else:
    for units in sorted(fsizeList)[::-1]:  # Reverse sort list of units so smallest magnitude units print first.
        print("Folder Size: " + units)


Just like in the previous example, you can follow the value of a variable and see where and how it changes to understand how the whole piece of code works.

## Task 1

Modify the following script to include coordinates and humidity data in the output.

Print the JSON data to see the structure of the data given to you and add two functions: one to retrieve coordinates and another one to retrieve humidity from the JSON data.


In [None]:
import requests

ork
def get_temperature(json_data):
    temp_in_celcius = json_data['main']['temp']
    return temp_in_celcius

def get_weather_type(json_data):
    weather_type = json_data['weather'][0]['description']
    return weather_type

def get_wind_speed(json_data):
    wind_speed = json_data['wind']['speed']
    return wind_speed



def get_weather_data(json_data, city):
    description_of_weather = json_data['weather'][0]['description']
    weather_type = get_weather_type(json_data)
    temperature = get_temperature(json_data)
    wind_speed = get_wind_speed(json_data)
    weather_details = ''
    return weather_details + ("The weather in {} is currently {} with a temperature of {} degrees and wind speeds reaching {} km/ph".format(city, weather_type, temperature, wind_speed))


def main():
    api_address = 'https://api.openweathermap.org/data/2.5/weather?appid=a10fd8a212e47edf8d946f26fb4cdef8&q='
    city = input("City Name : ")
    units_format = "&units=metric"
    final_url = api_address + city + units_format
    json_data = requests.get(final_url).json()
    weather_details = get_weather_data(json_data, city)
    # print formatted data
    print(weather_details)



main()


## Task 2

Modify the following script.

Right now it creates a markup table with information about several repositories. Add information about the repository's topics.

You can find the relevant documentation about the PyGithub package here: https://pygithub.readthedocs.io/en/latest/examples/Repository.html

In [None]:
!pip install PyGithub

In [None]:
from github import Github
import argparse
import os
import sys

repos = ['hastagAB/Awesome-Python-Scripts', 'google/googletest']
github = Github(os.getenv("GITHUB_TOKEN"))


def sanitize_for_md(s):
    s = s.replace("*", "\*")
    return s

# print("Generated on {}.\n".format(time.strftime("%Y-%m-%d")))
print("Name | Stargazers | Description")
print("|".join(["----"] * 3))
for r_name in sorted(repos, key=lambda v: v.upper()):
    try:
        r = github.get_repo(r_name)
    except:
        sys.stderr.write("Error: Repository '{}' not found.\n".format(r_name))
        sys.exit(-1)
    content = " | ".join([
        "[{}]({})".format(r.full_name, r.html_url),
        str(r.stargazers_count),
        sanitize_for_md(r.description)
    ])
    print(content)