# Introduction to NumPy
 >- **datasets** can come from a wide range of sources and a wide range of formats, including collections of documents, 
    collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else. 
 >- Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.
       -For example, images—particularly digital images—can be thought of as simply two dimensional arrays of numbers representing pixel brightness across the area. Sound clips can be thought of as one-dimensional arrays of intensity versus time.Text can be converted in various ways into numerical representations.
    No matter what the data are, the first step in making them analyzable will be to transform them into arrays of numbers. \
    For this reason, efficient storage and manipulation of numerical arrays is absolutely
    fundamental to the process of doing data science.
    
## Numpy
    NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data.
    
    In some ways, NumPy arrays are like Python’s built-in list type,but NumPy arrays provide much
    more efficient storage and data operations as the arrays grow larger in size.
    
    NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python.
    
    
    
    By convention, you’ll find that most people in the PyData world will import NumPy using np as an alias
    
    
<b>    
More detailed documentation, along with tutorials and other resources, can be found at 
<a href=" http://www.numpy.org" target="blank"> Numpy </a>
</b>

In [9]:
import numpy as np

# 1 - Understanding Data Types in Python


```c
/*C code*/
int result = 0;
for(int i=0; i<100; i++){
 result += i;
}
```
```python
#python code
result = 0
for i in range(100):
 result += i
```

```c
/*C code*/
int x = 4;
x = "four"; # FAILS
```
```python
#python code
x = 4
x = "four"
```

    The standard Python implementation is written in C. This means that every Python
    object is simply a cleverly disguised C structure
    
```c
struct _longobject {
 long ob_refcnt;
 PyTypeObject *ob_type;
 size_t ob_size;
 long ob_digit[1];
};
```


![title](imgs/CvsPython.png)



## a) A Python List Is More Than Just a List

In [10]:
L = [True, "2", 3.0, 4]
[type(i) for i in L]

[bool, str, float, int]

![title](imgs/CvsPythonList.png)

## b) Creating Arrays from Python Lists


In [11]:
np.array([1, 4, 2, 5, 3])


array([1, 4, 2, 5, 3])

In [12]:
np.array([3.14, 4, 2, 3])


array([3.14, 4.  , 2.  , 3.  ])

In [13]:
np.array([1, 2, 3, 4], dtype='float32')


array([1., 2., 3., 4.], dtype=float32)

In [14]:
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

## c) Creating Arrays from Scratch

In [15]:
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [16]:
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [20]:
#shape, item
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [19]:
#start, end at, step
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [21]:
#start, end, number of items
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [22]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

array([[-2.43017624, -0.14547795,  1.03245602],
       [-0.13185584, -0.08883609,  1.86035345],
       [ 0.03216037, -0.2314063 ,  0.20616837]])

In [24]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[5, 4, 3],
       [6, 9, 5],
       [6, 3, 0]])

In [26]:
# Create a 3x3 identity matrix
np.eye(3,dtype=int)

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

In [27]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that
# memory location
np.empty(3)

array([1., 1., 1.])

## d) NumPy Standard Data Types
![title](imgs/numpyDataTypes.png)


In [29]:
np.zeros(10, dtype=np.int16)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

In [30]:
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

# 2 - The Basics of NumPy Arrays

#### this section will cover:
- **Attributes of arrays:**
    Determining the size, shape, memory consumption, and data types of arrays
- **Indexing of arrays:**
    Getting and setting the value of individual array elements
- **Slicing of arrays:**
    Getting and setting smaller subarrays within a larger array
- **Reshaping of arrays:**
    Changing the shape of a given array
- **Joining and splitting of arrays:**
    Combining multiple arrays into one, and splitting one array into many
    
## a) NumPy Array Attributes

In [33]:
np.random.seed(0) # seed for reproducibility


x1 = np.random.randint(0,10, 6) # One-dimensional array
x2 = np.random.randint(0,10, (3, 4)) # Two-dimensional array
x3 = np.random.randint(0,10, (3, 4, 5)) # Three-dimensional array

In [35]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60
dtype: int32


In [37]:
print("dtype:", x3.dtype)
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

dtype: int32
itemsize: 4 bytes
nbytes: 240 bytes


## b) Array Indexing: Accessing Single Elements


In [38]:
x1

array([5, 0, 3, 3, 7, 9])

In [39]:
print("x1[0]: ", x1[0])
print("x1[-1]: ", x1[-1])

x1[0]:  5
x1[0]:  9


In [40]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [41]:
print("x2[0, 0]: ", x2[0, 0])
print("x2[0, -1]: ", x2[0, -1])
print("x2[-1, -1]: ", x2[-1, -1])

x2[0, 0]:  3
x2[0, -1]:  4
x2[-1, -1]:  7


## c) Array Slicing: Accessing Subarrays

***x[start:stop:step]***


In [43]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [44]:
x[:5]

array([0, 1, 2, 3, 4])

In [45]:
x[5:]

array([5, 6, 7, 8, 9])

In [46]:
x[4:7]

array([4, 5, 6])

In [47]:
x[::2]

array([0, 2, 4, 6, 8])

In [48]:
x[1::2]

array([1, 3, 5, 7, 9])

**negative step value**

In [50]:
x[::-1] # all elements, reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [51]:
x[5::-2] # reversed every other from index 5

array([5, 3, 1])

### Multidimensional subarrays

In [52]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [53]:
x2[:2, :3]

array([[3, 5, 2],
       [7, 6, 8]])

In [54]:
x2[:3, ::2] 

array([[3, 2],
       [7, 8],
       [1, 7]])

In [55]:
x2[::-1, ::-1]

array([[7, 7, 6, 1],
       [8, 8, 6, 7],
       [4, 2, 5, 3]])

In [56]:
# access column
x2[:, 0]

array([3, 7, 1])

In [57]:
#access row
x2[0, :]

array([3, 5, 2, 4])

### Subarrays as no-copy views

> array slices return views rather than copies of the array data.\
  This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies.

In [58]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [59]:
x2_sub = x2[:2, :2]
x2_sub

array([[3, 5],
       [7, 6]])

In [60]:
x2_sub[0,0] = 100

In [61]:
x2_sub

array([[100,   5],
       [  7,   6]])

In [62]:
x2

array([[100,   5,   2,   4],
       [  7,   6,   8,   8],
       [  1,   6,   7,   7]])

> This default behavior is actually quite useful: it means that when we work with large
datasets, we can access and process pieces of these datasets without the need to copy
the underlying data buffer.

In [63]:
x2_sub_copy = x2[:2, :2].copy()
x2_sub_copy

array([[100,   5],
       [  7,   6]])

In [64]:
x2_sub_copy[0,0] = 99
print(x2_sub_copy)
print(x2)

[[99  5]
 [ 7  6]]
[[100   5   2   4]
 [  7   6   8   8]
 [  1   6   7   7]]


## d) Reshaping of Arrays

In [66]:
x = np.arange(1, 10)
grid = x.reshape((3, 3))
grid

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

- **Note that for this to work, the size of the initial array must match the size of the reshaped array**
- > Where possible, the reshape method will use a no-copy view of the initial array, but with noncontiguous memory buffers this is not always the case

In [67]:
grid[0,0] = 0
print(grid)
print(x)

[[0 2 3]
 [4 5 6]
 [7 8 9]]
[0 2 3 4 5 6 7 8 9]


### the conversion of a one-dimensional array into a two-dimensional row or column matrix 

In [76]:
x = np.array([1,2,3])
x

array([1, 2, 3])

In [77]:
# row vector via reshape
x.reshape(1, 3)

array([[1, 2, 3]])

In [78]:
# row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [79]:
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [81]:
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

## e) Array Concatenation and Splitting
### Concatenation of arrays


In [82]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])


array([1, 2, 3, 3, 2, 1])

In [83]:
z = np.array([99,99,99])
np.concatenate([x,y,z])

array([ 1,  2,  3,  3,  2,  1, 99, 99, 99])

In [91]:
grid = np.arange(1,7).reshape(2,3)
print(np.concatenate([grid,grid]))
print(np.concatenate([grid,grid], axis = 1) )

[[1 2 3]
 [4 5 6]
 [1 2 3]
 [4 5 6]]
[[1 2 3 1 2 3]
 [4 5 6 4 5 6]]


#### For Mixed Dimensions

In [100]:
x = np.array([1, 2, 3])

y = np.array([[1], 
              [2]])

grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

In [101]:
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [99]:
np.hstack([y, grid])

array([[1, 9, 8, 7],
       [2, 6, 5, 4]])

### Splitting of arrays

The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. \
For each of these, we can pass a list of indices giving the split points

In [102]:
x = np.array([1, 2, 3, 99, 99, 3, 2, 1])
x1, x2, x3 = np.split(x, [3,5])
print(x1)
print(x2)
print(x3)

[1 2 3]
[99 99]
[3 2 1]


In [103]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [106]:
upper, middle, lower = np.vsplit(grid, [1, 3])
print(upper)
print(middle)
print(lower)

[[0 1 2 3]]
[[ 4  5  6  7]
 [ 8  9 10 11]]
[[12 13 14 15]]


In [107]:
left, middle, right = np.hsplit(grid, [1, 3])
print(left)
print(middle)
print(right)

[[ 0]
 [ 4]
 [ 8]
 [12]]
[[ 1  2]
 [ 5  6]
 [ 9 10]
 [13 14]]
[[ 3]
 [ 7]
 [11]
 [15]]
