# Agenda

1. NumPy
    - NumPy arrays
    - Setting + retrieving
    - Broadcasting
    - Boolean / mask arrays for retrieving
    - dtypes and `nan`
2. Pandas
    - series vs. data frames
    - creating a series, from scratch or via NumPy

# What is NumPy? Why do we care?

Python is *not*:

- slim (in terms of memory)
- fast (in terms of execution)



In [1]:
import sys
x = 0

sys.getsizeof(x) # how many bytes does this integer use in memory?

28

In [4]:
x = 100_000_000
sys.getsizeof(x)

28

# What does NumPy do?

The problem is that everything in Python is an object, and thus it's really big.

In C, integers are tiny -- at the most, they're 64 bits (8 bytes) in size.  

NumPy allows us to use C data structures with a Python API.  We (mostly) feel like we're working in Python, but we're gaining the speed and memory usage of C.

The big deal in NumPy is actually one data structure, the NumPy array, aka `ndarray` -- it's an n-dimensional array.  If you're a mathematician or a physicist, then you'll want all of those dimensions.  We'll be using just 1-dimensional arrays and 2D arrays.

A NumPy array actually has two pieces:

- The Python part, which we work with
- The C part, where it allocates memory and works with it at that level



In [5]:
# let's load NumPy!

import numpy as np     

In [6]:
# create a NumPy array
# we *don't* directly use np.ndarray, even though it exists!

# rather, we'll use np.array, and pass it a regular Python list, which it'll turn
# into a NumPy array with the appropriate back-end values

a = np.array([10, 20, 30, 40, 50, 60, 70])
type(a)

numpy.ndarray

In [7]:
a

array([10, 20, 30, 40, 50, 60, 70])

In [8]:
# things that are similar between lists and arrays
# basic retrieval

a[3]   #get the element at index 3

40

In [9]:
# arrays are mutable
a[3] = 41
a

array([10, 20, 30, 41, 50, 60, 70])

In [10]:
# they are iterable, as well -- so we can put them in a for loop

# but DON'T DO THAT!

In [11]:
# a few other ways to create NumPy arrays

# (1) get a range, using np.arange (similar to Python's "range" builtin)
a = np.arange(10, 200, 3)   # start at 10, end before 200, step size 3

In [12]:
a

array([ 10,  13,  16,  19,  22,  25,  28,  31,  34,  37,  40,  43,  46,
        49,  52,  55,  58,  61,  64,  67,  70,  73,  76,  79,  82,  85,
        88,  91,  94,  97, 100, 103, 106, 109, 112, 115, 118, 121, 124,
       127, 130, 133, 136, 139, 142, 145, 148, 151, 154, 157, 160, 163,
       166, 169, 172, 175, 178, 181, 184, 187, 190, 193, 196, 199])

In [13]:
# (2) get a bunch of 0s
np.zeros(10)   # it's spelled "zeros" not "zeroes"

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [14]:
# (3) get a bunch of 1s
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [17]:
# (4) get a bunch of random integers

np.random.seed(0)    # start the random-number generator at a known value
np.random.randint(0, 100, 20)   # 20 random ints from 0-100 (not including 100)

array([44, 47, 64, 67, 67,  9, 83, 21, 36, 87, 70, 88, 88, 12, 58, 65, 39,
       87, 46, 88])