# Using Python Language Features and the Standard Library

- Before moving on to the data structures and algorithms, we should go through some of python basics
- We are going to focus on pythonic approaches

## 1.1 Built-in functions

In [None]:
import random
from timeit import timeit

N = 100000  # Number of elements in the list

# Ensure every list is the same
random.seed(12)
my_data = [random.random() for i in range(N)]

- Let's look at summing values in a list in a C and in a Python way
- In C, you would write a loop to sum numbers like this:

In [None]:
def manualSumC():
    n = 0
    for i in range(len(my_data)):
        n += my_data[i]
    return n

- In Python you can loop directly over the list of elements instead

In [None]:
def manualSumPy(): 
    n = 0
    for evt_count in my_data:
        n += evt_count
    return n


- There is also an in built sum function in python

In [None]:
def builtinSum(): 
    return sum(my_data)

- If we compare all of these to each other, we can see that the fastest version in actually in built python version, them the python version and then the C version
- You can see that by leveraging python we can write a much faster and cleaner code

In [None]:
repeats = 1000
print(f"manualSumC: {timeit(manualSumC, globals=globals(), number=repeats):.3f}ms")
print(f"manualSumPy: {timeit(manualSumPy, globals=globals(), number=repeats):.3f}ms")
print(f"builtinSum: {timeit(builtinSum, globals=globals(), number=repeats):.3f}ms")

- This is because the built-in function are typically implemented in CPython backend and it bypasses python int4erpreter 
- What is CPython?

CPython is the main implementation of Python.
It’s written in the C programming language.
When you install “Python” from python.org, you’re almost always using CPython.
So when you write Python code, CPython is the program that interprets it and makes it run.

- In particular, those which are passed an iterable (e.g. lists) are likely to provide the greatest benefits to performance. The Python documentation provides equivalent Python code for many of these cases.

all(): boolean and of all items
any(): boolean or of all items
max(): Return the maximum item
min(): Return the minimum item
sum(): Return the sum of all items

- It’s usually better to tell Python what you want done (at a high level), rather than writing out all the steps. Built-ins and libraries will often do the work in optimised C code for you, and then just hand back a Python object.

## 1.2 Searching an element in a list

- Another example 
- Similarly to before we are going to compare manual python search and a proper pythonic method of searching elements
- Let's first generate inputs

In [None]:
import random

N = 2500  # Number of elements in list
M = 2  # N*M == Range over which the elements span

def generateInputs():
    random.seed(12)  # Ensure every list is the same
    return [random.randint(0, int(N*M)) for i in range(N)]

- Manual search is linear search which iterates though the list 

In [None]:
def manualSearch():
    ls = generateInputs()
    ct = 0
    for i in range(0, int(N*M), M):
        for j in range(0, len(ls)):
            if ls[j] == i:
                ct += 1
                break

- operatorSearch() uses the in operator to perform each search, which allows CPython to implement the inner loop in its C back-end

In [None]:
def operatorSearch():
    ls = generateInputs()
    ct = 0
    for i in range(0, int(N*M), M):
        if i in ls:
            ct += 1

- Manual search is 5x slower than the pythonic implementation

In [None]:
repeats = 1000
gen_time = timeit(generateInputs, number=repeats)
print(f"manualSearch: {timeit(manualSearch, number=repeats)-gen_time:.2f}ms")
print(f"operatorSearch: {timeit(operatorSearch, number=repeats)-gen_time:.2f}ms")

## 1.3 Parsing data from a text file

- Let's take on a little challenge 
- Let’s say we have read in some data from a text file, each line containing a time bin and a mean energy:

In [None]:
f = [
    ' 0000   0.9819 ',
    ' 0001   0.3435 ',
    # ...
    ' 0099   0.2275 ',
    ' 0100   0.7067 ',
    # ...
]

- If you’ve a C programming background, you may write the following code to parse the data into a dictionary:

In [None]:
def manualSplit():
    data = {}
    for line in f:
        first_char = line.find("0")
        end_time = line.find(" ", first_char, -1)

        energy_found = line.find(".", end_time, -1)
        begin_energy = line.rfind(" ", end_time, energy_found)
        end_energy = line.find(" ", energy_found)
        if end_energy == -1:
            end_energy = len(line)
        
        time = line[first_char:end_time]
        energy = line[begin_energy + 1:end_energy]

        data[time] = energy
    return data

Solution: python code!
- Much shorter 
- Much easier to read 
- More flexible (doesn’t care how many spaces there are).

In [None]:
def builtinSplit():
    data = {}
    for line in f:
        time, energy = line.split()
        data[time] = energy
    return data

- Let's compare those two methods

In [None]:
N = 10_000  # Number of elements in the list

# Ensure every list is the same
random.seed(12)
f = [f" {i:0>6d} {random.random():8.4f} " for i in range(N)]

repeats = 1000
print(f"manualSplit: {timeit(manualSplit, globals=globals(), number=repeats):.3f}ms")
print(f"builtinSplit: {timeit(builtinSplit, globals=globals(), number=repeats):.3f}ms")

Why is it faster?
- Even though split() is doing something similar to the manual version, it’s implemented in C inside CPython.