## Chapter 4 NumPy Basics: Arrays and Vectorized Computation
For most data analysis applications, the main areas of funcionality:
* Fast Vectorized array operations for data munging and cleaning, subsetting and any other kinds of computations.
* Common array algorithms like sorting, unique, and set operations. 
* Efficient descriptive statistics and aggregating/summarizing data.
* Data alignment and relation data manipulations for merging and joining together heterogeneous datasets.
* Expressing conditional logic as array expressions instead of loops with if-elif-else branches
* Group-wise data manipulations (aggregation, transformation, function application)
<br>
***
### 4.1 The NumPy ndarray: A Multidimensional Array Object
One of the key features of NumPy is its N-dimensional array object (__ndarray__) which is a fast, flexible container for large datasets in Python.<br>
Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar<br> 
elements. <br>
<br>
NumPy enables batch computations with similar syntax to scalar values on built-in Python objects, I first import NumPy and generate a small array<br>
of random data:

In [2]:
#Example:
import numpy as np

#Generate some random data

data = np.random.randn(2,3)
data

array([[-0.22546893, -0.5899201 ,  0.47216614],
       [ 1.09566851, -0.76248735,  0.01563008]])

Then write mathematical operations with data:

In [3]:
data * 10

array([[-2.25468933, -5.89920099,  4.72166139],
       [10.95668507, -7.62487347,  0.15630083]])

In [4]:
data + data

array([[-0.45093787, -1.1798402 ,  0.94433228],
       [ 2.19133701, -1.52497469,  0.03126017]])

An ndarray is a generic multidimensional container for homogeneous data; that is all of the elements must be the same type. Every array has a __shape__, a tuple indicating the size of each dimension,<br>
and a dtype, an object describing the _data type_ of the array. 

In [7]:
display(data.shape)
display(data.dtype)

(2, 3)

dtype('float64')

#### Creating ndarrays
The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data. For example,<br>
a list is a good candidate for conversion:

In [10]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In [12]:
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Since data2 was a list of lists, the Numpy array __arr2__ has two dimensions with shape inferred from the data. We can confirm this by inspecting the __ndim__ and __shape__ attributes:

In [14]:
display(arr2.ndim)
display(arr2.shape)

2

(2, 4)

arrange is an array-valued version of the built-in Python range function:

In [16]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

You can explicity convert or _cast_ an array from one dtype to another using ndarray's __astype__ method:

In [18]:
arr= np.array([1,2,3,4,5])
display(arr.dtype)

float_arr = arr.astype(np.float64)
display(float_arr.dtype)

dtype('int32')

dtype('float64')

In [19]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
display(arr)
display(arr.astype(np.int32))

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

array([ 3, -1, -2,  0, 12, 10])

If you have an array of strings representing numbers, you can use __astype__ to convert them to numeric form:

In [21]:
numeric_strings = np.array(['1.25', '-9.6', '4.2'], dtype = np.string_)
numeric_strings.astype(np.float64)

array([ 1.25, -9.6 ,  4.2 ])

#### Arithmetic with NumPy Arrays
Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays<br>
applies the operation element-wise:

In [25]:
arr = np.array([[1.,2.,3.], [4., 5.,6.]])
display(arr) 
display(arr*arr)
display(arr-arr)

array([[1., 2., 3.],
       [4., 5., 6.]])

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

array([[0., 0., 0.],
       [0., 0., 0.]])

Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [26]:
display(1/arr)
display(arr**0.5)

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

Comparisons betweena arrays of the same size yield boolean arrays:

In [27]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
display(arr2)
arr2 > arr

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

array([[False,  True, False],
       [ True, False,  True]])

Evaluating operation between differently sized arrays is called _broadcasting_ and will be discussed more detail in Appendix A.<br>
#### Basic Indexing and Sling
One-dimensional arrays are simple on the surface they act similarly to Python list:

In [32]:
arr = np.arange(10)
display(arr)
display(arr[5])
display(arr[5:8])
arr[5:8] = 12
display(arr)


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

5

array([5, 6, 7])

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

Where arr[5:8] = 12, the value is _broadcasted_ to the entire selection.