# # Introduction to NumPy
We discuss techniques for effectively loading, storing, and manipulating in-memory data in Python.
Despite apparent heterogeneity (different range of formats) of data, it will help us to think of all data fundamentally as arrays of numbers.

NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python's built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size. 

In [1]:
# You should have this already available through Anaconda
import numpy 
numpy.__version__

'1.14.0'

In [7]:
#import NumPy using np as an alias and output documentation
import numpy as np
np?

Python offers several different options for storing data in efficient, fixed-type data buffers. The built-in array module (available since Python 3.3) can be used to create dense arrays of a uniform type:

In [11]:
import array
L = list(range(10))
A = array.array('i', L)
#print(A)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

While Python's array object provides efficient storage of array-based data, NumPy adds to this efficient operations on that data.

In [13]:
import numpy as np
# integer array:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

In [14]:
import numpy as np
#upcaste to floating point numbers
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

In [15]:
import numpy as np
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

In [16]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.28117216, 0.76799987, 0.36732274],
       [0.38403858, 0.54513874, 0.35955431],
       [0.41318414, 0.31666455, 0.9456578 ]])

In [17]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

array([[ 0.32548092,  0.25536068, -0.24204799],
       [ 0.12739611,  0.17707865,  0.17733738],
       [-1.07840374,  0.79341387, -1.82077558]])

In [50]:
'''
Discuss some useful array attributes. We'll start by defining three random arrays, 
a one-dimensional, two-dimensional, and three-dimensional array. 
Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), 
size (the total size of the array), and data type.
Then we demonstrate some manipulations, see comments.
'''
import numpy as np
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
print("dtype:", x3.dtype)
print('\n---x1----\n',x1)
print('\n---x2----\n',x2)
print('\n---x3----\n',x3)

print('\n first 3 elements of x1:',x1[:3])  # first three elements
print('\n first column of x2: ',x2[:, 0])  # first column of x2
print('\n all rows, every other column of x2:\n' ,x2[:3, ::2])  # all rows, every other column
x2_sub = x2[:2, :2] # extract a 2×2  subarray from x2
print('\n 2×2  subarray from x2 \n',x2_sub)

#can copy arrays using copy() method
x2_sub_copy = x2[:2, :2].copy()
print('\n 2×2  subarray from x2 using copy method\n', x2_sub_copy)

x2_sub_copy[0, 0] = 42
print('\n relace an element of a copy\n',x2_sub_copy)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60
dtype: int64

---x1----
 [5 0 3 3 7 9]

---x2----
 [[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]

---x3----
 [[[8 1 5 9 8]
  [9 4 3 0 3]
  [5 0 2 3 8]
  [1 3 3 3 7]]

 [[0 1 9 9 0]
  [4 7 3 2 7]
  [2 0 0 4 5]
  [5 6 8 4 1]]

 [[4 9 8 1 1]
  [7 9 9 3 6]
  [7 2 0 3 5]
  [9 4 4 6 4]]]

 first 3 elements of x1: [5 0 3]

 first column of x2:  [3 7 1]

 all rows, every other column of x2:
 [[3 2]
 [7 8]
 [1 7]]

 2×2  subarray from x2 
 [[3 5]
 [7 6]]

 2×2  subarray from x2 using copy method
 [[3 5]
 [7 6]]

 relace an element of a copy
 [[42  5]
 [ 7  6]]


Imagine that we have several categories of data on a number of people (say, name, age, and weight), and we'd like to store these values for use in a Python program. It would be possible to store these in three separate arrays


In [53]:
import numpy as np

In [54]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]
#Clumsy!!!

We can create a structured array using a compound data type specification

In [55]:
# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


Here 'U10' translates to "Unicode string of maximum length 10," 'i4' translates to "4-byte (i.e., 32 bit) integer," and 'f8' translates to "8-byte (i.e., 64 bit) float.

In [56]:
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


This stuff is still clumsy!!! We need something more concise and user friendly. Hence Pandas!!