# Introduction to Python

**Feb 6, 2019**

_Material adapted from:_  
Copyright [Steve Phelps](http://sphelps.net) 2014
_by Rebecca Lowdon_

* For LaunchCode CoderGirl Data Science track
* Updated for Python3


# Modules

* Python has many packages with suites of functions beyond the standard Python functions. These packages are called _modules_; their functions are called _methods_.
* Modules can be specific (e.g. `Biopython`) or general use (e.g. `stats`).
* `numpy` is a popular module for scientific computing; we'll use it for linear algebra and random number capabilities.
* Numpy includes another data structure called `arrays`. `arrays` are similar to lists but they require all elements to be of the same type. Unlike lists, `arrays` can be N-dimensional.
* Once installed on your machine, modules are loaded into python via an `import` statement. At the same time, we can give the module an _alias_ with using `as`. This will make it convenient to call on `numpy` methods later.

In [10]:
import numpy as np

# Arrays

* Fixed-length, N-dimensional arrays that hold a single type of value.
- We can now use the functions defined in this package by prefixing them with `np`.  
- The function `array()` creates an array given a list.


In [3]:
x = np.array([0, 1, 2, 3, 4])
print(x)
print(type(x))

[0 1 2 3 4]
<class 'numpy.ndarray'>


# Functions over arrays

- When we use arithmetic operators on arrays, we create a new array with the result of applying the operator to each element.

In [4]:
y = x * 2
print(y)

[0 2 4 6 8]


- The same goes for functions:

In [5]:
x = np.array([-1, 2, 3, -4])
y = abs(x)
print(y)

[1 2 3 4]


# Populating Arrays

- To populate an array with a range of values we use the `np.arange()` function:


In [7]:
x = np.arange(0, 10)
print(x)

[0 1 2 3 4 5 6 7 8 9]


- We can also use floating point increments.


In [8]:
x = np.arange(0, 1, 0.1)
print(x)

[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]


## <font color='green'>Sidebar: finding help</font>

* `?` followed by the function name will pop-up a help guide in Jupyter Notebook.

In [9]:
?np.arange

[0;31mDocstring:[0m
arange([start,] stop[, step,], dtype=None)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval ``[start, stop)``
(in other words, the interval including `start` but excluding `stop`).
For integer arguments the function is equivalent to the Python built-in
`range` function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not
be consistent.  It is better to use `numpy.linspace` for these cases.

Parameters
----------
start : number, optional
    Start of interval.  The interval includes this value.  The default
    start value is 0.
stop : number
    End of interval.  The interval does not include this value, except
    in some cases where `step` is not an integer and floating point
    round-off affects the length of `out`.
step : number, optional
    Spacing between values.  For any output `out`, this is the distance
    between two adjacent values,

* Read the **docstring** to learn what each argument to the function `np.arrange()` means.
* How many arguments are required?

# Basic Plotting

- We will use a module called `matplotlib` to plot some simple graphs.

- This module provides functions which are very similar to MATLAB plotting commands.


In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

y = x*2 + 5
plt.plot(x, y)


# Plotting a sine curve

In [None]:
from numpy import pi, sin

x = np.arange(0, 2*pi, 0.01)
y = sin(x)
plt.plot(x, y)

# Plotting a histogram

- We can use the `hist()` function in `matplotlib` to plot a histogram

In [None]:
# Generate some random data
data = np.random.randn(1000)

ax = plt.hist(data)

## <font color='green'>Dive deeper</font>
* What does `np.random.randn()` do?
* Why did we use it above?

In [None]:
?np.random.randn

# Computing histograms as matrices

- The function `histogram()` in the `numpy` module will count frequencies into bins and return the result as a 2-dimensional array.

In [None]:
np.histogram(data)

In [None]:
plt.np.histogram(data)

# Defining new functions

* We can create custom functions with the keyword `def`.

In [None]:
def squared(x):
    return x ** 2

print(squared(5))

# Local Variables

- Variables created inside functions are _local_ to that function.

- They are not accessable to code outside of that function.

In [None]:
def squared(x):
    result = x ** 2
    return result

print(squared(5))

In [None]:
print(result)

* A `NameError` might mean a variable you are trying to call is out of scope.

# Functional Programming

- Functions are first-class citizens in Python.

- They can be passed around just like any other value.





In [None]:
print(squared)

In [None]:
y = squared
print(y)

In [None]:
print(y(5))

# Mapping the elements of a collection

- We can apply a function to each element of a collection using the built-in function `map()`.

- This will work with any collection: list, set, tuple or string.

- This will take as an argument _another function_, and the list we want to apply it to.

- It will return the results of applying the function. The results are saved as a map object, which can be coerced into a list.

In [None]:
map(squared, [1, 2, 3, 4])

In [None]:
list(map(squared, [1, 2, 3, 4]))

# List Comprehensions

- Because this is such a common operation, Python has a special syntax to do the same thing, called a _list comprehension_.


In [None]:
[squared(i) for i in [1, 2, 3, 4]]

- If we want a set instead of a list we can use a set comprehension

In [None]:
{squared(i) for i in [1, 2, 3, 4]}

# Iterating over multiple collections at once
## Cartesian product using list comprehensions

![Cartesian product](../../../images/Cartesian_Product.png)

The [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product) of two collections $X = A \times B$ can be expressed by using multiple `for` statements in a comprehension.


In [None]:
A = {'x', 'y', 'z'}
B = {1, 2, 3}
{(a,b) for a in A for b in B}

## Cartesian products with other collections

- The syntax for Cartesian products can be used with any collection type.


In [None]:
first_names = ('Steve', 'John', 'Peter')
surnames = ('Smith', 'Doe')

[(first_name, surname) for first_name in first_names for surname in surnames]

# Anonymous Function Literals

- We can also write _anonymous_ functions.
- These are function literals usually do not necessarily have a name.
- They are called _lambda expressions_ (after the $\lambda-$calculus).
- The syntax for a _lambda_ function is `lambda var: <function manipulating var>`.
- Combined with `map()`, the lambda function is followed by a list of values to apply to the function.

In [None]:
list(map(lambda x: x ** 2, [1, 2, 3, 4]))

# Filtering data

- We can filter a list by applying a _predicate_ to each element of the list.

- A predicate is a function which takes a single argument, and returns a boolean value.

- `filter(p, X)` is equivalent to $\{ x : p(x) \; \forall x \in X \}$ in set-builder notation.
- Below, we filter the list `[-5, 2, 3, -10, 0, 1]` based on the function `x > 0`.


In [None]:
list(filter(lambda x: x > 0, [-5, 2, 3, -10, 0, 1]))

* We can use both `filter()` and `map()` on other collections such as strings or sets.

In [None]:
list(filter(lambda x: x != ' ', 'hello world'))

In [None]:
list(map(ord, 'hello world'))

In [None]:
list(filter(lambda x: x > 0, {-5, 2, 3, -10, 0, 1}))

# Filtering using a list comprehension

- Again, because this is such a common operation, we can use simpler syntax to say the same thing.

- We can express a filter using a list-comprehension by using the keyword `if`:

In [None]:
data = [-5, 2, 3, -10, 0, 1]
[x for x in data if x > 0]

- We can also filter and then map in the same expression:

In [None]:
from numpy import sqrt
[sqrt(x) for x in data if x > 0]

# The reduce function

- The `reduce()` function recursively applies another function to pairs of values over the entire list, resulting in a _single_ return value.

In [None]:
from functools import reduce
reduce(lambda x, y: x + y, [0, 1, 2, 3, 4, 5])

# Big Data

- The `map()` and `reduce()` functions form the basis of the map-reduce programming model.

- [Map-reduce](https://en.wikipedia.org/wiki/MapReduce) is the basis of modern highly-distributed large-scale computing frameworks.

- It is used in BigTable, Hadoop and Apache Spark. 

- See [these examples in Python](https://spark.apache.org/examples.html) for Apache Spark.

# Reading Text Files

- To read an entire text file as a list of lines use the `readlines()` method of a file object.


In [None]:
f = open('/etc/group')
result = f.readlines()
f.close()


In [None]:
# Print the first line
print(result[0])

To concatenate into a single string:


In [None]:
single_string = ''.join(result)

In [None]:
single_string