The idea of this notebook is to develop a basic understanding of how `numpy` works, and how to use it properly. First of all, let's import numpy.

In [1]:
import numpy as np

Numpy has a bunch of convenient mathematical functions built-in. Let's compute $\sin(\pi/2)$, $\cos(\pi/2)$, $\tan(\pi/2)$. You'll see how $\cos(\pi/2)$ is only approximately zero and $\tan(\pi/2)$ is a huge number, so be careful about this!

In [2]:
print(np.sin(np.pi/2),np.cos(np.pi/2), np.tan(np.pi/2))

1.0 6.123233995736766e-17 1.633123935319537e+16


The basic numpy data structure is the array, which is similar to the built-in python list, but much more efficient. The reasoning is that, under the hood, numpy is written in a compiled language!

In [3]:
print(np.arange(1,100,1))

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
 97 98 99]


You can index a numpy array in many different ways, for example:
- Using single indices and slices, like you would a Python list
- Using an array or list of indices
- Using booleans (`True`, `False`) and conditions

In [4]:
a = np.arange(1,100,1)
print("Slicing and indexing:")
print(a[1], a[1:10])
# Using a list to index the array:
print("Indexing using a list:")
print(a[[0,1,2,3]])
#Let's now take only places where a < 10. There are two ways of doing this:
print("Condition a < 10. Where is it true?")
print(a < 10, np.where(a < 10))
print("Indexing using the condition:")
print(a[a<10], a[np.where(a < 10)])

Slicing and indexing:
2 [ 2  3  4  5  6  7  8  9 10]
Indexing using a list:
[1 2 3 4]
Condition a < 10. Where is it true?
[ True  True  True  True  True  True  True  True  True False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False] (array([0, 1, 2, 3, 4, 5, 6, 7, 8]),)
Indexing using the condition:
[1 2 3 4 5 6 7 8 9] [1 2 3 4 5 6 7 8 9]


In [5]:
a[a < 10]

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

We can also do some fancy things that reshape the array. Let's first create a 2 by 2 array, then use it to index our original array!

In [5]:
b = np.array([[98,87],[45,3]])
print("b:")
print(b)
print("a:")
print(a[b])

b:
[[98 87]
 [45  3]]
a:
[[99 88]
 [46  4]]


In [6]:
c = b + a[b]
print(c)

[[197 175]
 [ 91   7]]


In [8]:
a.shape, c.shape

((99,), (2, 2))

We can also change the shape of our array using reshape:

In [9]:
b = a.reshape((3,33))
print(b)

[[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
  25 26 27 28 29 30 31 32 33]
 [34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
  58 59 60 61 62 63 64 65 66]
 [67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
  91 92 93 94 95 96 97 98 99]]


One very important thing in numpy is to try to as often as possible use the broadcasting/vectorization capabilities of numpy. Under the hood, numpy uses C. So every time you call a function in numpy, it calls a C function that operates on your input. Numpy functions can take arrays as inputs, and this is almost always better than using, say, a for loop to compute it. For example, let's make an array from $-\pi$ to $\pi$ with 1000000 steps and compute the cosine of it.

In [10]:
a = np.linspace(-np.pi,np.pi,1000000)
print(a)

[-3.14159265 -3.14158637 -3.14158009 ...  3.14158009  3.14158637
  3.14159265]


In [11]:
%%timeit
b = []
for i in a:
    b.append(np.cos(i))
b = np.array(b)

1.4 s ± 16.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [12]:
%%timeit
b = np.cos(a)

14.2 ms ± 63.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Doing the vectorized operation in numpy is almost 100x faster! This can be done with all array shapes. Let's take, for example, a plane

In [13]:
a = np.linspace(-np.pi, np.pi, 1000)
b = np.linspace(-np.pi, np.pi, 1000)

plane = np.meshgrid(a,b)[0]
print("Meshgrid makes this a matrix of size 1000x1000")
print(plane)


Meshgrid makes this a matrix of size 1000x1000
[[-3.14159265 -3.13530318 -3.1290137  ...  3.1290137   3.13530318
   3.14159265]
 [-3.14159265 -3.13530318 -3.1290137  ...  3.1290137   3.13530318
   3.14159265]
 [-3.14159265 -3.13530318 -3.1290137  ...  3.1290137   3.13530318
   3.14159265]
 ...
 [-3.14159265 -3.13530318 -3.1290137  ...  3.1290137   3.13530318
   3.14159265]
 [-3.14159265 -3.13530318 -3.1290137  ...  3.1290137   3.13530318
   3.14159265]
 [-3.14159265 -3.13530318 -3.1290137  ...  3.1290137   3.13530318
   3.14159265]]


In [14]:
plane.shape

(1000, 1000)

In [15]:
%%time
c = np.zeros_like(plane)
for i in range(1000):
    for j in range(1000):
        c[i,j] = np.cos(plane[i,j])

CPU times: user 1.59 s, sys: 27.8 ms, total: 1.62 s
Wall time: 1.63 s


In [16]:
%%time
c = np.cos(plane)

CPU times: user 16.4 ms, sys: 7 µs, total: 16.4 ms
Wall time: 20.1 ms


In [17]:
c + 1

array([[0.00000000e+00, 1.97786813e-05, 7.91139429e-05, ...,
        7.91139429e-05, 1.97786813e-05, 0.00000000e+00],
       [0.00000000e+00, 1.97786813e-05, 7.91139429e-05, ...,
        7.91139429e-05, 1.97786813e-05, 0.00000000e+00],
       [0.00000000e+00, 1.97786813e-05, 7.91139429e-05, ...,
        7.91139429e-05, 1.97786813e-05, 0.00000000e+00],
       ...,
       [0.00000000e+00, 1.97786813e-05, 7.91139429e-05, ...,
        7.91139429e-05, 1.97786813e-05, 0.00000000e+00],
       [0.00000000e+00, 1.97786813e-05, 7.91139429e-05, ...,
        7.91139429e-05, 1.97786813e-05, 0.00000000e+00],
       [0.00000000e+00, 1.97786813e-05, 7.91139429e-05, ...,
        7.91139429e-05, 1.97786813e-05, 0.00000000e+00]])

In [18]:
def f(x):
    return np.exp(-x**2/2)
def g(x): 
    return  1 / (1 + np.exp(-x))
def h(x):
    return np.sin(np.pi * x) * np.exp(-(x+1))

In [19]:
x = np.linspace(-1,1, 100)

gauss = f(x)
sigmoid = g(x)

damped = h(x)

Now, let's save this data!

In [23]:
 np.array([x,gauss]).T

array([[-1.        ,  0.60653066],
       [-0.97979798,  0.61878213],
       [-0.95959596,  0.63102348],
       [-0.93939394,  0.64324443],
       [-0.91919192,  0.65543451],
       [-0.8989899 ,  0.66758309],
       [-0.87878788,  0.6796794 ],
       [-0.85858586,  0.69171252],
       [-0.83838384,  0.70367144],
       [-0.81818182,  0.71554503],
       [-0.7979798 ,  0.72732208],
       [-0.77777778,  0.7389913 ],
       [-0.75757576,  0.75054136],
       [-0.73737374,  0.76196092],
       [-0.71717172,  0.77323858],
       [-0.6969697 ,  0.78436298],
       [-0.67676768,  0.79532277],
       [-0.65656566,  0.80610665],
       [-0.63636364,  0.81670336],
       [-0.61616162,  0.82710174],
       [-0.5959596 ,  0.83729073],
       [-0.57575758,  0.84725938],
       [-0.55555556,  0.85699689],
       [-0.53535354,  0.86649261],
       [-0.51515152,  0.87573605],
       [-0.49494949,  0.88471696],
       [-0.47474747,  0.89342527],
       [-0.45454545,  0.90185116],
       [-0.43434343,

In [24]:
np.savetxt('data/gauss.txt', np.array([x,gauss]).T) #transpose makes the file formatting look right

Two ways of loading it back in with `numpy`:

In [28]:
array = np.loadtxt('data/gauss.txt')
print(array)

[[-1.          0.60653066]
 [-0.97979798  0.61878213]
 [-0.95959596  0.63102348]
 [-0.93939394  0.64324443]
 [-0.91919192  0.65543451]
 [-0.8989899   0.66758309]
 [-0.87878788  0.6796794 ]
 [-0.85858586  0.69171252]
 [-0.83838384  0.70367144]
 [-0.81818182  0.71554503]
 [-0.7979798   0.72732208]
 [-0.77777778  0.7389913 ]
 [-0.75757576  0.75054136]
 [-0.73737374  0.76196092]
 [-0.71717172  0.77323858]
 [-0.6969697   0.78436298]
 [-0.67676768  0.79532277]
 [-0.65656566  0.80610665]
 [-0.63636364  0.81670336]
 [-0.61616162  0.82710174]
 [-0.5959596   0.83729073]
 [-0.57575758  0.84725938]
 [-0.55555556  0.85699689]
 [-0.53535354  0.86649261]
 [-0.51515152  0.87573605]
 [-0.49494949  0.88471696]
 [-0.47474747  0.89342527]
 [-0.45454545  0.90185116]
 [-0.43434343  0.90998505]
 [-0.41414141  0.91781764]
 [-0.39393939  0.92533992]
 [-0.37373737  0.93254319]
 [-0.35353535  0.93941905]
 [-0.33333333  0.94595947]
 [-0.31313131  0.95215675]
 [-0.29292929  0.95800356]
 [-0.27272727  0.96349297]
 

In [25]:
x, gauss = np.loadtxt('data/gauss.txt', unpack = True)
print(x, gauss)

[-1.         -0.97979798 -0.95959596 -0.93939394 -0.91919192 -0.8989899
 -0.87878788 -0.85858586 -0.83838384 -0.81818182 -0.7979798  -0.77777778
 -0.75757576 -0.73737374 -0.71717172 -0.6969697  -0.67676768 -0.65656566
 -0.63636364 -0.61616162 -0.5959596  -0.57575758 -0.55555556 -0.53535354
 -0.51515152 -0.49494949 -0.47474747 -0.45454545 -0.43434343 -0.41414141
 -0.39393939 -0.37373737 -0.35353535 -0.33333333 -0.31313131 -0.29292929
 -0.27272727 -0.25252525 -0.23232323 -0.21212121 -0.19191919 -0.17171717
 -0.15151515 -0.13131313 -0.11111111 -0.09090909 -0.07070707 -0.05050505
 -0.03030303 -0.01010101  0.01010101  0.03030303  0.05050505  0.07070707
  0.09090909  0.11111111  0.13131313  0.15151515  0.17171717  0.19191919
  0.21212121  0.23232323  0.25252525  0.27272727  0.29292929  0.31313131
  0.33333333  0.35353535  0.37373737  0.39393939  0.41414141  0.43434343
  0.45454545  0.47474747  0.49494949  0.51515152  0.53535354  0.55555556
  0.57575758  0.5959596   0.61616162  0.63636364  0.

### Pandas dataframes

Let's save the data for the three different functions inside a Pandas dataframe! This is essentially a combination of a numpy array and a dictionary.

In [27]:
import pandas as pd

Let's create our dataframe:

In [28]:
df = pd.DataFrame()

df['x'] = x

## we can access it in two ways:
print(df['x'], df.x)


0    -1.000000
1    -0.979798
2    -0.959596
3    -0.939394
4    -0.919192
5    -0.898990
6    -0.878788
7    -0.858586
8    -0.838384
9    -0.818182
10   -0.797980
11   -0.777778
12   -0.757576
13   -0.737374
14   -0.717172
15   -0.696970
16   -0.676768
17   -0.656566
18   -0.636364
19   -0.616162
20   -0.595960
21   -0.575758
22   -0.555556
23   -0.535354
24   -0.515152
25   -0.494949
26   -0.474747
27   -0.454545
28   -0.434343
29   -0.414141
        ...   
70    0.414141
71    0.434343
72    0.454545
73    0.474747
74    0.494949
75    0.515152
76    0.535354
77    0.555556
78    0.575758
79    0.595960
80    0.616162
81    0.636364
82    0.656566
83    0.676768
84    0.696970
85    0.717172
86    0.737374
87    0.757576
88    0.777778
89    0.797980
90    0.818182
91    0.838384
92    0.858586
93    0.878788
94    0.898990
95    0.919192
96    0.939394
97    0.959596
98    0.979798
99    1.000000
Name: x, Length: 100, dtype: float64 0    -1.000000
1    -0.979798
2    -0.959596
3  

In [29]:
print(df)

           x
0  -1.000000
1  -0.979798
2  -0.959596
3  -0.939394
4  -0.919192
5  -0.898990
6  -0.878788
7  -0.858586
8  -0.838384
9  -0.818182
10 -0.797980
11 -0.777778
12 -0.757576
13 -0.737374
14 -0.717172
15 -0.696970
16 -0.676768
17 -0.656566
18 -0.636364
19 -0.616162
20 -0.595960
21 -0.575758
22 -0.555556
23 -0.535354
24 -0.515152
25 -0.494949
26 -0.474747
27 -0.454545
28 -0.434343
29 -0.414141
..       ...
70  0.414141
71  0.434343
72  0.454545
73  0.474747
74  0.494949
75  0.515152
76  0.535354
77  0.555556
78  0.575758
79  0.595960
80  0.616162
81  0.636364
82  0.656566
83  0.676768
84  0.696970
85  0.717172
86  0.737374
87  0.757576
88  0.777778
89  0.797980
90  0.818182
91  0.838384
92  0.858586
93  0.878788
94  0.898990
95  0.919192
96  0.939394
97  0.959596
98  0.979798
99  1.000000

[100 rows x 1 columns]


An interesting thing about dataframes is that you can add as many columns as you'd like, as well as do operations on entire columns at once!

In [30]:
## assigning new values, either by using previously made variables:
df['gauss'] = gauss

## or making new ones using data from the dataframe:

df['sigmoid'] = g(df.x)

df['damped'] = h(df['x'])

In [31]:
print(df)

           x     gauss   sigmoid        damped
0  -1.000000  0.606531  0.268941 -1.224647e-16
1  -0.979798  0.618782  0.272932 -6.215548e-02
2  -0.959596  0.631023  0.276959 -1.215796e-01
3  -0.939394  0.643244  0.281023 -1.781221e-01
4  -0.919192  0.655435  0.285123 -2.316515e-01
5  -0.898990  0.667583  0.289258 -2.820545e-01
6  -0.878788  0.679679  0.293429 -3.292357e-01
7  -0.858586  0.691713  0.297635 -3.731177e-01
8  -0.838384  0.703671  0.301875 -4.136405e-01
9  -0.818182  0.715545  0.306150 -4.507609e-01
10 -0.797980  0.727322  0.310458 -4.844523e-01
11 -0.777778  0.738991  0.314799 -5.147041e-01
12 -0.757576  0.750541  0.319173 -5.415210e-01
13 -0.737374  0.761961  0.323579 -5.649225e-01
14 -0.717172  0.773239  0.328016 -5.849422e-01
15 -0.696970  0.784363  0.332484 -6.016268e-01
16 -0.676768  0.795323  0.336983 -6.150361e-01
17 -0.656566  0.806107  0.341512 -6.252414e-01
18 -0.636364  0.816703  0.346069 -6.323252e-01
19 -0.616162  0.827102  0.350655 -6.363803e-01
20 -0.595960 

We can save the dataframe in, for example, a csv file, as well as load it back:

In [32]:
df.to_csv('data/functions.csv')

new_df = pd.read_csv('data/functions.csv')

In [33]:
print(new_df)

    Unnamed: 0         x     gauss   sigmoid        damped
0            0 -1.000000  0.606531  0.268941 -1.224647e-16
1            1 -0.979798  0.618782  0.272932 -6.215548e-02
2            2 -0.959596  0.631023  0.276959 -1.215796e-01
3            3 -0.939394  0.643244  0.281023 -1.781221e-01
4            4 -0.919192  0.655435  0.285123 -2.316515e-01
5            5 -0.898990  0.667583  0.289258 -2.820545e-01
6            6 -0.878788  0.679679  0.293429 -3.292357e-01
7            7 -0.858586  0.691713  0.297635 -3.731177e-01
8            8 -0.838384  0.703671  0.301875 -4.136405e-01
9            9 -0.818182  0.715545  0.306150 -4.507609e-01
10          10 -0.797980  0.727322  0.310458 -4.844523e-01
11          11 -0.777778  0.738991  0.314799 -5.147041e-01
12          12 -0.757576  0.750541  0.319173 -5.415210e-01
13          13 -0.737374  0.761961  0.323579 -5.649225e-01
14          14 -0.717172  0.773239  0.328016 -5.849422e-01
15          15 -0.696970  0.784363  0.332484 -6.016268e-

You can select on columns inside a dataframe. For example, suppose we only wanted to use the negative side of the x axis:

In [34]:
positive = df[df['x'] < 0]

positive

Unnamed: 0,x,gauss,sigmoid,damped
0,-1.0,0.606531,0.268941,-1.224647e-16
1,-0.979798,0.618782,0.272932,-0.06215548
2,-0.959596,0.631023,0.276959,-0.1215796
3,-0.939394,0.643244,0.281023,-0.1781221
4,-0.919192,0.655435,0.285123,-0.2316515
5,-0.89899,0.667583,0.289258,-0.2820545
6,-0.878788,0.679679,0.293429,-0.3292357
7,-0.858586,0.691713,0.297635,-0.3731177
8,-0.838384,0.703671,0.301875,-0.4136405
9,-0.818182,0.715545,0.30615,-0.4507609
