# Crash Course on Numpy

[Numpy Cheat sheet](https://www.datacamp.com/cheat-sheet/numpy-cheat-sheet-data-analysis-in-python)

## Numpy Basics

NumPy stands for `Numerical Python`. It is a fundamental package for scientific computing with Python. It provides support for arrays, matrices, and a large collection of mathematical functions to operate on these arrays. NumPy is widely used in data science, machine learning, and scientific research due to its efficiency and ease of use for numerical operations.

In 2005, Travis Oliphant created NumPy. Travis Oliphant is also the creator of Anaconda. 

### Why do we need Numpy?

Numpy arrays offer many advantages over Python lists when it comes to numerical and scientific computing. Python list is not designed for computation purpose. For example, you can write Python code to create a new list that double the value of every element in the following list `x`.

``` python
list_1 = [1, 2, 3]
```

In [1]:
list_1 = [1, 2, 3]

# write the code below.




Here are some reasons why NumPy is preferred over lists for these purposes:

- **Performance**:

    - Speed: NumPy operations are implemented in C and are much faster than their Python equivalents. For example, element-wise operations on arrays are significantly faster in NumPy.
    - Memory Efficiency: NumPy arrays consume less memory than Python lists. This is because they store elements of the same type in contiguous memory locations, which reduces overhead.

- **Functionality**:

    - Broad Range of Functions: NumPy provides a comprehensive suite of mathematical functions, including linear algebra, random number generation, Fourier transforms, and more.
    - Vectorized Operations: NumPy allows for vectorized operations, which means you can perform operations on entire arrays at once without needing explicit loops. This leads to cleaner and more readable code.

- **Convenience**:

    - Multidimensional Arrays: NumPy supports multidimensional arrays, which are essential for working with matrices and higher-dimensional data.
    - Broadcasting: NumPy’s broadcasting feature allows for operations on arrays of different shapes and sizes without the need for explicit replication of data.
    - Interoperability: NumPy arrays integrate well with many other libraries in the scientific Python ecosystem, such as pandas, SciPy, and scikit-learn.

- **Specialized Data Structures**:

    - Matrix Operations: NumPy provides specialized data structures and functions for matrix operations, which are not as straightforward to implement using lists.
    - Advanced Indexing: NumPy offers advanced indexing and slicing capabilities, making it easier to manipulate and access data.


### Importing NumPy

To start using NumPy, you need to import it. The convention is to import it as `np`.

In [2]:
import numpy as np

We can now access the tools provided by NumPy package using ``np.``

### Creating Arrays

In [3]:
# Creating a 1D array
array_1d_1 = np.array([1, 2, 3, 4, 5])
print("1D array using np.array with list:")
print(array_1d_1)

array_1d_2 = np.array(range(10))
print("\n1D array using np.array with range:")
print(array_1d_2)

array_1d_3 = np.arange(10)
print("\n1D array using np.arange with single argument:")
print(array_1d_3)

array_1d_4 = np.arange(3, 10)
print("\n1D array using np.arange with start and stop arguments:")
print(array_1d_4)

array_1d_5 = np.arange(3, 10, 2)
print("\n1D array using np.arange with start, stop and stepsize arguments:")
print(array_1d_5)

array_1d_6 = np.linspace(0, 1, 5)
print("\n1D array using np.linspace with start, stop, and number of elements:")
print(array_1d_6)

# Creating a 2D array
array_2d_1 = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D array using np.array with nested lists:")
print(array_2d_1)

array_2d_2 = np.zeros((4, 3))
print("\n2D array of zeros using np.zeros:")
print(array_2d_2)

array_2d_3 = np.ones((4, 3))
print("\n2D array of ones using np.ones:")
print(array_2d_3)

array_2d_4 = np.random.rand(2, 2)
print("\n2D array with random values from a uniform distribution using np.random.rand:")
print(array_2d_4)

array_2d_5 = np.random.randn(2, 2)
print("\n2D array with random values from a normal distribution using np.random.randn:")
print(array_2d_5)


1D array using np.array with list:
[1 2 3 4 5]

1D array using np.array with range:
[0 1 2 3 4 5 6 7 8 9]

1D array using np.arange with single argument:
[0 1 2 3 4 5 6 7 8 9]

1D array using np.arange with start and stop arguments:
[3 4 5 6 7 8 9]

1D array using np.arange with start, stop and stepsize arguments:
[3 5 7 9]

1D array using np.linspace with start, stop, and number of elements:
[0.   0.25 0.5  0.75 1.  ]

2D array using np.array with nested lists:
[[1 2 3]
 [4 5 6]]

2D array of zeros using np.zeros:
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

2D array of ones using np.ones:
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

2D array with random values from a uniform distribution using np.random.rand:
[[0.40053925 0.98047946]
 [0.47796402 0.12159739]]

2D array with random values from a normal distribution using np.random.randn:
[[-0.78938227  0.12113704]
 [-0.84322527  1.77492065]]


### Array Attributes

NumPy arrays have various attributes that provide information about the array.




In [4]:
# Example array
array = np.array([[1, 2, 3], [4, 5, 6]])

# Shape of the array
print("Shape:", array.shape)

# Number of dimensions
print("Number of dimensions:", array.ndim)

# Size of the array (total number of elements)
print("Size:", array.size)

# Data type of the elements
print("Data type:", array.dtype)


Shape: (2, 3)
Number of dimensions: 2
Size: 6
Data type: int64


### Array Indexing and Slicing

NumPy arrays can be indexed and sliced similarly to Python lists. NumPy slicing allows you to extract portions of an array using a familiar Python syntax. Slicing can be used to access a subset of elements from a NumPy array. Here is the basic syntax of slicing.

```python
array[start:stop:step]
```
- `start` is the index where the slice starts (inclusive).
- `stop` is the index where the slice ends (exclusive).
- `step` is the step size between each index.


In [None]:
# Creating a NumPy array
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Slicing the array from index 2 to 7 with a step of 2
sliced_arr = arr[2:8:2]
print(sliced_arr)  # Output: [2 4 6]


# Accessing a single element
print("Element at index 2:", array[2])

# Slicing the array
print("Elements from index 1 to 4:", array[1:5])

# Modifying an element
array[3] = 10
print("Modified array:", array)


#### Slicing by Reference 

When you slice a NumPy array, it does not create a new independent array. Instead, it creates a view of the original array. This means that if you modify the sliced array, it will also modify the original array.

Let's see an example:

In [18]:
# Creating a NumPy array
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Slicing the array
sliced_arr = arr[2:8]

# Modifying the sliced array
sliced_arr[0] = 9999

print("Original array:", arr)  # Output: [  0   1 100   3   4   5   6   7   8   9]
print("Sliced array:", sliced_arr)  # Output: [100   3   4   5   6   7]

Original array: [   0    1 9999    3    4    5    6    7    8    9]
Sliced array: [9999    3    4    5    6    7]


#### Importance of the `.copy` Method

To avoid unwanted changes to the original array, you can use the `.copy` method to create a copy of the sliced array. This creates a new array that is independent of the original array.

Let's see an example:


In [20]:
# Creating a NumPy array
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Slicing the array and creating a copy
sliced_arr_copy = arr[2:8].copy()

# Modifying the copied array
sliced_arr_copy[0] = 9999

print("Original array:", arr)  # Output: [0 1 2 3 4 5 6 7 8 9]
print("Copied array:", sliced_arr_copy)  # Output: [100   3   4   5   6   7]


Original array: [0 1 2 3 4 5 6 7 8 9]
Copied array: [9999    3    4    5    6    7]


### Array Operations
NumPy supports element-wise operations on arrays.



In [6]:
# Example arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Addition
print("Addition:", array1 + array2)

# Subtraction
print("Subtraction:", array1 - array2)

# Multiplication
print("Multiplication:", array1 * array2)

# Division
print("Division:", array1 / array2)


Addition: [5 7 9]
Subtraction: [-3 -3 -3]
Multiplication: [ 4 10 18]
Division: [0.25 0.4  0.5 ]


In [7]:
## Dimension matters
x = np.array([1,2,3])  
y = np.array([1,2,3,4]) 

# x + y

#### Exercise

In [8]:
# Create a numpy array based on a python list. 

array_1 = np.array(list_1)

print(array_1, type(array_1))

# Write code to double every value in the array_1


# Write code to add 5 to each value in the array_1


# Write code to divide each value in the array_1 by 10


# Write code to subtract each value in the array_1 by 1


[1 2 3] <class 'numpy.ndarray'>


### Numpy is faster

To show that Numpy computation is much faster, analyze the code below:

In [9]:
import time

# Creating large lists and arrays
size = 1000000
list1 = list(range(size))
list2 = list(range(size))
array1 = np.arange(size)
array2 = np.arange(size)

# List addition
start_time = time.time()
list_sum = [x + y for x, y in zip(list1, list2)]
end_time = time.time()
print("Time for list addition:", end_time - start_time)

# NumPy array addition
start_time = time.time()
array_sum = array1 + array2
end_time = time.time()
print("Time for NumPy array addition:", end_time - start_time)

Time for list addition: 0.029050111770629883
Time for NumPy array addition: 0.0018529891967773438


### Numpy arrays can only hold elements of the same data type

NumPy arrays are designed to hold elements of a single data type. This characteristic is one of the key reasons for their efficiency and performance advantages over Python lists.

In [10]:
np.array([1, 'haha', True])

array(['1', 'haha', 'True'], dtype='<U21')

The `<U21` data type in NumPy refers to a Unicode string with a maximum length of 21 characters. In NumPy, the dtype object specifies how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted. The character U indicates that the data type is a Unicode string, and the number 21 specifies that the string can be up to 21 characters long. See [more](https://stackoverflow.com/questions/65447594/why-does-np-array1-a-consume-unicode-string-of-21-characters).

## Numpy Reshaping

In this section, we will explore the reshaping capabilities of NumPy. Reshaping allows us to change the shape of an array without changing its data. This is particularly useful in various data manipulation and preparation tasks. Reshaping is essential when dealing with different data structures, machine learning models, and preparing data for visualization. It helps to convert arrays to the required dimensions for various operations.

In [11]:
# Example


# Create a 1D array
array_1d = np.arange(12)
print("Original 1D array:")
print(array_1d)

# Reshape to 2D array
array_2d = array_1d.reshape(3, 4)
print("\nReshaped to 2D array (3x4):")
print(array_2d)

# Reshape to 3D array
array_3d = array_2d.reshape(2, 2, 3)
print("\nReshaped to 3D array (2x2x3):")
print(array_3d)


Original 1D array:
[ 0  1  2  3  4  5  6  7  8  9 10 11]

Reshaped to 2D array (3x4):
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Reshaped to 3D array (2x2x3):
[[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]


In [12]:
# Reshaping is constantly needed for Deep Learning

# Example data: 3 samples, 2 features each
data = np.arange(6).reshape(3, 2)
print("Original data (3x2):")
print(data)

# Reshape for a neural network (e.g., adding a batch dimension)
reshaped_data = data.reshape(3, 2, 1)
print("\nReshaped data for neural network (8x4x1):")
print(reshaped_data)


Original data (3x2):
[[0 1]
 [2 3]
 [4 5]]

Reshaped data for neural network (8x4x1):
[[[0]
  [1]]

 [[2]
  [3]]

 [[4]
  [5]]]


### Using -1 in Reshaping


NumPy allows using -1 to automatically calculate one dimension, making reshaping more flexible.



In [13]:
# Using -1 to automatically calculate the dimension
array_reshaped = array_1d.reshape(2, -1)
print("\nReshaped using -1 (2x6):")
print(array_reshaped)



Reshaped using -1 (2x6):
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]


### Reshape vs. Resize
Understanding the difference between reshape and resize.

In [14]:
# Using reshape
reshaped_array = array_1d.reshape(3, 4)
print(array_1d)
print("\nReshaped array (3x4):")
print(reshaped_array)

# Using resize
array_1d.resize(3, 4)
print("\nOriginal array resized (3x4):")
print(array_1d)

[ 0  1  2  3  4  5  6  7  8  9 10 11]

Reshaped array (3x4):
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Original array resized (3x4):
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


### Flattening

In [15]:
# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("Original 2D array:")
print(array_2d)

# Flatten the 2D array
flattened_array = array_2d.flatten()
print("\nFlattened array:")
print(flattened_array)


Original 2D array:
[[1 2 3]
 [4 5 6]]

Flattened array:
[1 2 3 4 5 6]


In [16]:
# Exercises:
    
# Exercise 1: Reshape a 1D array of size 16 to a 2D array of shape (4, 4).


# Exercise 2: Given a 2D array of shape (6, 6), reshape it to a 3D array of shape (2, 3, 6).


# Exercise 3: Create a 1D array with 12 elements and reshape it to (3, -1).

## Comparison/Logical Operations between Numpy Arrays and Data Filtering

In [20]:
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([5, 4, 3, 2, 1])

print(array1 == array2)
print(array1 != array2)
print(array1 > array2)
print(array1 < array2)
print(array1 >= array2)
print(array1 <= array2)


print((array1 > 2) & (array2 < 4))
print((array1 < 4) | (array2 > 2))
print(~(array1 == array2))





<class 'numpy.ndarray'>
[ True  True False  True  True]
[False False False  True  True]
[ True  True False False False]
[False False  True  True  True]
[ True  True  True False False]
[False False  True  True  True]
[ True  True  True False False]
[ True  True False  True  True]


You can use the resulting boolean arrays to filter data. 

In [25]:
filtered_array_1 = array1[array1 > 3]
print(filtered_array_1)

filtered_array_2 = array1[(array1 > 2) & (array2 < 4)]
print(filtered_array_2)

common_elements = array1[array1 == array2]
print(common_elements)


[4 5]
[3 4 5]
