# Module 1 - Introducing Modules and NumPy

### Introduction

#### *Our goals today are to be able to*:  

- Identify and import Python modules
- Using the Python Standard Library
- Install new modules if we need them
- Identify differences between NumPy and base Python in usage and operation

#### *Big questions for this lesson*:  
- What is a package, what do packages do, and why might we want to use them?
- When do we want to use NumPy?

### 1. Importing Python Libraries


Previously, we wrote a function to calculate the mean of an list. That was tedious.

Thankfully, other people have wrote and optimized functions and wrapped them into **modules and packages** we can then call and use in our analysis.

To import a package type `import` followed by the name of the library as shown below, or use `from` and `import` to import specific objects

In [1]:
import math
from collections import Counter

In [42]:
np.__version__
!pip install --upgrade numpy

Requirement already up-to-date: numpy in c:\users\seawr\anaconda3\envs\learn-env\lib\site-packages (1.18.5)


pexpect 4.6.0 requires ptyprocess>=0.5, which is not installed.
You are using pip version 10.0.1, however version 20.2b1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.


'1.18.1'

In [21]:
z = np.array([1,2,3,4.2])
z
z.dtype
z._truediv(2)

AttributeError: 'numpy.ndarray' object has no attribute '_truediv'

In [4]:
import numpy
import numpy as np
x = np.array([1,2,3])
x

array([1, 2, 3])

In [None]:
# os & sys
import os
import sys

In [None]:
# math
import math


In [None]:
# datetime & time
import datetime

x = datetime.datetime.now()
print(x)

In [None]:
print(x.year)
print(x.strftime("%A"))

In [None]:
x = datetime.datetime(2020, 5, 17)

print(x)

[Datetime formats](https://www.w3schools.com/python/python_datetime.asp)

In [None]:
# collections
from collections import Counter, defaultdict, namedtuple
c = Counter()                           # a new, empty counter
c = Counter('gallahad')                 # a new counter from an iterable
c = Counter({'red': 4, 'blue': 2})      # a new counter from a mapping
c = Counter(cats=4, dogs=8) 

In [None]:
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
    d[k].append(v)

d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

In [None]:
Point = namedtuple('Point', ['x', 'y'])
p = Point(11, y=22)     # instantiate with positional or keyword arguments
p[0] + p[1]             # indexable like the plain tuple (11, 22)


In [None]:
x, y = p                # unpack like a regular tuple
x, y
p.x + p.y               # fields also accessible by name
p                       # readable __repr__ with a name=value style


In [None]:
# pprint
from pprint
tup = ('spam', ('eggs', ('lumberjack', ('knights', ('ni', ('dead', ('parrot', ('fresh fruit',))))))))
pp = pprint.PrettyPrinter(depth=6)
pp.pprint(tup)

In [None]:
# random
import random

print(random.random())

print(random.randrange(1, 10))

In [None]:
# zipfile, gzip, zlib, bz2

In [None]:
# pdb

### 2. NumPy

![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


In [None]:
import numpy as np
import numpy

x = numpy.array([1, 2, 3])
print(x)

# Many packages have a canonical way to import them

y = np.array([4, 5, 6])
print(y)

Because of numpy we can now get the **mean** and other quick math of lists and arrays.

In [None]:
example = [4, 3, 25, 40, 62, 20]
print(np.mean(example))

Now let's import some other packages. We will cover in more detail some fun options for numpy later.

In [None]:
import scipy
import pandas as pd
import matplotlib as mpl

In [None]:
# sometimes we will want to import a specific module from a library
import matplotlib.pyplot as plt
from matplotlib.pyplot import plot

# What happens when we uncomment the next line?
# %matplotlib inline

plt.plot(x, y)

In [None]:
# OR we can also import it this way
from matplotlib import pyplot as plt
plt.plot(x, y)

Try importing the `seaborn` library as `sns` which is the convention.

In [None]:
# your code here


#### Helpful links: library documenation

Libraries have associated documentation to explain how to use the different tools included in a library.

- [NumPy](https://docs.scipy.org/doc/numpy/)
- [SciPy](https://docs.scipy.org/doc/scipy/reference/)
- [Pandas](http://pandas.pydata.org/pandas-docs/stable/)
- [Matplotlib](https://matplotlib.org/contents.html)

### 2. NumPy versus base Python

Now that we know libraries exist, why do we want to use them? Let us examine a comparison between base Python and Numpy.

Python has lists and normal python can do basic math. NumPy, however, has the helpful objects called arrays.

Numpy has a few advantages over base Python which we will look at.

In [None]:
import numpy as np

In [22]:
names_list = ['Bob', 'John', 'Sally']
names_array = np.array(['Bob', 'John', 'Sally'])
print(names_list)
print(names_array)
names_array
names_array__truediv(2)

['Bob', 'John', 'Sally']
['Bob' 'John' 'Sally']


NameError: name 'names_array__truediv' is not defined

In [None]:
# Make a list and an array of three numbers
# your code here

In [None]:
# divide your array by 2


In [None]:
# divide your list by 2


Numpy arrays support the `/` operator (which calls the `__div__()` method) while python lists do not. There are other things that make it useful to utilize numpy over base python for evaluating data.

In [None]:
# shape tells us the size of the array

numbers_array.shape

In [None]:
numbers_array

In [None]:
# Selection and assignment work as you might expect
numbers_array[1]

Take 5 minutes and explore each of the following functions.  What does each one do?  What is the syntax of each?
- `np.zeros()`
- `np.ones()`
- `np.full()`
- `np.eye()`
- `np.random.random()`

In [37]:
np.zeros([4,5,6])

array([[[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]]])

In [6]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [10]:
np.full([3,5])

TypeError: full() missing 1 required positional argument: 'fill_value'

In [31]:
np.eye(4,5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.]])

In [33]:
np.random.random(2)

array([0.77295899, 0.96553628])

### Slicing in NumPy

In [None]:
# We remember slicing from lists
numbers_list = list(range(10))
numbers_list[3:7]

In [None]:
# Slicing in NumPy Arrays is very similar!
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
a

In [None]:
# first 2 rows, columns 1 & 2 (remember 0-index!)
b = a[:2, 1:3]
b

### Datatypes in NumPy

In [None]:
a.dtype

In [None]:
names_array.dtype

In [None]:
a.astype(np.float64).dtype

### More Array Math

In [None]:
x = np.array([[1, 2], [3, 4]], dtype=np.float64)
y = np.array([[5, 6], [7, 8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

In [None]:
# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

In [None]:
# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

In [None]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

In [None]:
# Elementwise square root; both produce the same array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(x ** .5)
print(np.sqrt(x))

Below, you will find a piece of code we will use to compare the speed of operations on a list and operations on an array. In this speed test, we will use the library [time](https://docs.python.org/3/library/time.html).

In [None]:
import time
import numpy as np

size_of_vec = 1000


def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = [X[i] + Y[i] for i in range(len(X))]
    return time.time() - t1


def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1


t1 = pure_python_version()
t2 = numpy_version()
print("python: " + str(t1), "numpy: " + str(t2))
print("Numpy is in this example " + str(t1/t2) + " times faster!")

In pairs, run the speed test with a different number, and share your results with the class.