 # Introduction to Numpy

**NumPy, short for Numerical Python, is one of the most important foundational packages for Numerical computing, Data science, ML/AI  in Python. Most computational packages providing scientific functionality uses NumPy under the hood for data exchange. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more**

**At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:**
- NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.
- The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.
- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.


**Here are some of the things you’ll find in NumPy:**
- ndarray, an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities.
- Mathematical functions for fast operations on entire arrays of data without having to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped files.
- Linear algebra, random number generation, and Fourier transform capabilities.
- A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.


## Why is numpy faster
**NumPy's speed advantage over standard Python lists stems from several key factors:**
1. **Homogeneous Data Types and Contiguous Memory:**
    NumPy arrays store elements of the same data type contiguously in memory. This contrasts with Python lists, which store elements as pointers to objects scattered in memory. Contiguous storage allows for efficient access and manipulation of data, leveraging CPU cache optimization and reducing memory access overhead.

2. **Vectorized Operations:**
    NumPy operations are implemented in C and Fortran, enabling vectorized computations. Instead of looping through individual elements, NumPy can perform operations on entire arrays at once. This significantly reduces the overhead associated with Python loops, pointer indirection, and dynamic type checking.

3. **C Implementation:**
    Many core NumPy functions are implemented in C, a compiled language. This provides a substantial performance boost compared to Python's interpreted nature, especially for computationally intensive tasks.

4. **Memory Efficiency:**
    NumPy arrays are more memory-efficient than Python lists due to their homogeneous data types and contiguous storage. This is particularly beneficial when dealing with large datasets, as it minimizes memory consumption and improves performance.

5. **Locality of Reference:**
    The contiguous memory layout of NumPy arrays promotes locality of reference, meaning that data accessed close together in memory are likely to be accessed again soon. This allows the CPU to efficiently prefetch data, further enhancing performance.

In [7]:
# Lets see an example
import numpy as np
import time

# Using NumPy
start_time = time.time()
a = np.arange(1000000)
b = a * 2
end_time = time.time()
numpy_time = end_time - start_time

# Using Python list
start_time = time.time()
a = list(range(1000000))
b = [x * 2 for x in a]
end_time = time.time()
list_time = end_time - start_time

print(f"NumPy time: {numpy_time:.4f} seconds")
print(f"List time: {list_time:.4f} seconds")

print(f'Difference: {(list_time-numpy_time):.4f}')

NumPy time: 0.0245 seconds
List time: 0.1304 seconds
Difference: 0.1059


## Who Else Uses NumPy?
NumPy fully supports an object-oriented approach, starting, once again, with ndarray. For example, ndarray is a
class, possessing numerous methods and attributes. Many of its methods are mirrored by functions in the outermost NumPy namespace, allowing the programmer to code in whichever paradigm they prefer. This flexibility has
allowed the NumPy array dialect and NumPy ndarray class to become the de-facto language of multi-dimensional
data interchange used in Python.

**Having an understanding of NumPy arrays and array-oriented computing will help you use tools with array-oriented semantics, like pandas, much more effectively**

## Installation
```python
pip install numpy

import numpy as np

np.__version__
```

## Prerequisites
Before following with this class, you should know a bit of Python. If you would like to refresh your memory, take a look at
our Python tutorial.

**NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of
the same type, indexed by a tuple of non-negative integers. **


**NumPy’s array class is called ndarray.** It is also known by the alias array. Note that numpy.array is not the
same as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers
less functionality. The more important attributes of an ndarray object are:
- **ndarray.ndim** the number of axes (dimensions) of the array.
- **ndarray.shape** the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.
- **ndarray.size** the total number of elements of the array. This is equal to the product of the elements of shape.
- **ndarray.dtype** an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.
- **ndarray.itemsize** the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.
- **ndarray.data** the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

In [11]:
# An example
import numpy as np
    
a = np.arange(15).reshape(3, 5)
a


array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [12]:
a.shape


(3, 5)

In [13]:
a.ndim

2

In [14]:
a.dtype.name

'int32'

In [15]:
a.itemsize

4

In [16]:
a.size

15

In [17]:
type(a)

numpy.ndarray

In [18]:
b = np.array([6, 7, 8])
b

array([6, 7, 8])

In [19]:
type(b)

numpy.ndarray

## Array Creation
There are several ways to create arrays. For example, you can create an array from a regular Python list or tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences.

In [25]:
a = np.array([2, 3, 4])
a

array([2, 3, 4])

In [26]:
a.dtype

dtype('int32')

In [27]:
b = np.array([1.2, 3.5, 5.1])
b

array([1.2, 3.5, 5.1])

In [28]:
b.dtype

dtype('float64')

In [None]:
# what happens
b = np.array([1, 3, '5.1'])

In [30]:
# Freuent error

a = np.array(1, 2, 3, 4) # WRONG
a = np.array([1, 2, 3, 4]) # RIGHT

### Note
**array transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into
three-dimensional arrays, and so on.**

In [31]:
b = np.array([(1.5,2,3), (4,5,6)])
b

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

In [34]:
c = np.array([
    [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    [[10, 11, 12,], [13, 14, 15], [16, 17, 18]],
    [[19, 20, 21], [22, 23, 24], [25, 26, 27]]
])

c

array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]],

       [[19, 20, 21],
        [22, 23, 24],
        [25, 26, 27]]])

In [35]:
c.ndim

3

In [36]:
c.shape

# 3 layers, 3 rows in each layer, 3 columns in each row

(3, 3, 3)

**The type of the array can also be explicitly specified at creation time**

In [38]:
a = np.array([[1,2], 
              [3,4]], 
             dtype=complex)

a

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

## Indexing, Slicing and Iterating

In [41]:
a = np.arange(10)**3
a

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729], dtype=int32)

In [42]:
a[2]

8

In [43]:
a[2:5]

array([ 8, 27, 64], dtype=int32)

In [44]:
a[:6:2]

array([ 0,  8, 64], dtype=int32)

In [45]:
a[ : :-1]

array([729, 512, 343, 216, 125,  64,  27,   8,   1,   0], dtype=int32)

In [46]:
for i in a:
    print(i)

0
1
8
27
64
125
216
343
512
729


In [48]:
for i in a:
    print(i**(2))

0
1
64
729
4096
15625
46656
117649
262144
531441


In [50]:
b = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12],
              [13, 14, 15, 16]
             ])
b

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [51]:
b[2]

array([ 9, 10, 11, 12])

In [52]:
b[2,3]

# tack: give me '15'

12

In [62]:
# second column of b
b[0:4, 1] 

array([ 2,  6, 10, 14])

In [63]:
# same thing

b[ : ,1]

array([ 2,  6, 10, 14])

In [66]:
# second and third row of b
b[1:3, : ]

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [67]:
b[-1]

array([13, 14, 15, 16])

In [69]:
# task
# return the second and third row of b with each element mutiplied by 2

In [None]:
# to continue page 20