# NumPy

NumPy (short for Numerical Python) is a powerful library for handling large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. NumPy arrays are the cornerstone of scientific computing in Python and are particularly useful when working with numerical data.

### Why NumPy?
- **Performance:**  
  NumPy operations perform better than Python's built-in sequences due to its internal implementation in C.

- **Memory Efficiency:**  
  As NumPy arrays are densely packed arrays of a homogeneous type, they allow computations on arrays that would be otherwise impossible on regular Python collections.

- **Convenient:**  
  Many mathematical operations on arrays, such as cross products, dot products, transpose, etc., are just convenient one-liners.

- **Interoperability:**  
  Many Python libraries, including Pandas, Matplotlib, and Scikit-learn, rely heavily on NumPy. Mastery of NumPy allows effective use of these tools.

## NumPy applications

NumPy can be used for a wide range of applications. For instance, when working with a large dataset, it's often more efficient to use a NumPy array rather than a Python list for computations because NumPy is specifically designed to work with large arrays of numerical data.

It is also used for random number generation, Fourier transform, linear algebra, and much more. Its integration with other Python libraries like Pandas makes it an invaluable tool for data manipulation and cleaning in data engineering tasks.

NumPy's ndarray objects can also be used to store and manipulate data in a way that is efficient in terms of memory and performance, especially for large datasets. The elements in a NumPy array are all of the same type, which allows NumPy to handle memory more efficiently than a Python list.

## Numpy Arrays  
NumPy arrays are used for storing and manipulating large arrays of numerical data. Creating a NumPy array can be done by passing a list, tuple or array-like object to the `numpy.array()` function. Numpy arrays can be indexed and manipulated in a variety of ways. Basic operations with arrays include arithmetic operations, logical operations, and statistical operations.

Multidimensional arrays can also be created, these are arrays with more than one dimension, and can be thought of as matrices or tensors. NumPy provides tools for creating and manipulating multidimensional arrays, as well as performing linear algebra operations on them.

![arrays.png](attachment:arrays.png)

Numpy array dimensions can be represented and understood in terms of shape and size. Shape provides the dimensionality information while size provides the total number of elements in the array.

In [None]:
import numpy as np # import numpy library, and use np as alias (this is a convention)

# Create a 1-dimensional array
arr1d = np.array([1, 2, 3, 4, 5, 6])
print(f"1D Array:\n{arr1d}\nShape: {arr1d.shape}, Size: {arr1d.size}\n")

# Create a 2-dimensional array
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"2D Array:\n{arr2d}\nShape: {arr2d.shape}, Size: {arr2d.size}\n")

# Create a 3-dimensional array
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(f"3D Array:\n{arr3d}\nShape: {arr3d.shape}, Size: {arr3d.size}\n")



## Array Shape Manipulation
NumPy provides a variety of functions for reshaping arrays, including changing their dimensions, rearranging elements, and combining arrays.


In [None]:
# Reshaping an array
reshaped_arr = arr1d.reshape(2, 3)  # reshapes to 2 rows, 3 columns
print(reshaped_arr)

# Flattening an array
flattened_arr = reshaped_arr.flatten()
print(flattened_arr)

# Transposing an array
transposed_arr = reshaped_arr.T
print(transposed_arr)


## Indexing and Slicing


Indexing a NumPy array is similar to indexing a regular Python list. For example:

In [None]:
# access the second element of a one-dimensional array
print(arr1d[1])

# access the element in the first row, second column of a two-dimensional array
print(arr2d[0, 1])

In [None]:
# 1D array slicing
print(arr1d[1:5])  # prints [2, 3, 4, 5]

# 2D array slicing
print(arr2d[1, 1:3])  # prints [5, 6]

## Using arrays for Efficient Data Storage and Manipulation

The `ndarray` object is the core of the NumPy library. `ndarray` stands for 'n-dimensional array', and as the name suggests, it allows you to store and manipulate multi-dimensional arrays of fixed-size items. The item's type is specified at array creation time and typically a single byte, which allows the ndarray to handle memory more efficiently.


In [None]:
# Creating an ndarray
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)  # Array of int32 type

# Shape manipulation
y = np.reshape(x, (3, 2))  # Reshaping array to 3x2
print(y)


Remember that the size and type of the ndarray are fixed, so any changes you make to the array will not change its size; the array will be copied instead. For large datasets, it's best to preallocate the full array size at the beginning to save memory.

## Basic Array Operations

NumPy arrays can be manipulated in a variety of ways, making it easy to perform mathematical and statistical operations on data.


In [None]:
# Arithmetic operations on arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Adding arrays
print(arr1 + arr2)

# Multiplying arrays
print(arr1 * arr2)

Logical operations can also be performed with arrays:

In [None]:
# create a boolean array based on a condition
e = (arr1d > 2)

# use boolean indexing to select elements from an array
f = arr1d[e]

NumPy arrays can have any number of dimensions, but are commonly used for two-dimensional arrays (matrices) and three-dimensional arrays (tensors). Multidimensional arrays can be indexed and manipulated in a similar way to one-dimensional arrays:

In [None]:
# create a three-dimensional array
g = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

# access an element in the second row, first column, and second "layer"
print(g[1, 0, 1])

# compute the sum of each row in a two-dimensional array
h = np.sum(arr2d, axis=1)

Can you guess what happened in the code above? Let’s break it down.

## Universal Functions
NumPy provides a range of mathematical functions, called "universal functions", that can be applied to an entire array at once, making computations much faster and more efficient.


In [None]:
arr = np.array([1, 2, 3, 4, 5])

# Applying the square root function to all elements in the array
sqrt_arr = np.sqrt(arr)
print(sqrt_arr)

# Applying the exponential function to all elements in the array
exp_arr = np.exp(arr)
print(exp_arr)

## Statistical Methods

NumPy also provides a host of statistical functions for calculating mean, median, percentile, and standard deviation among others.


In [None]:
# Create a sample array
arr = np.array([1,2,3,4,5])

# Mean
print(np.mean(arr))

# Median
print(np.median(arr))

# Standard Deviation
print(np.std(arr))


## Masking and Boolean Logic
NumPy arrays can be used to represent Boolean values and can be used for logical operations such as AND, OR, and NOT. They can also be used for masking, which is the process of setting certain elements in an array to a specific value based on a Boolean condition.


In [None]:
# Creating a masked array
arr = np.array([1, 2, 3, 4, 5])
mask = np.array([True, False, True, False, False])
masked_arr = np.ma.masked_array(arr, mask)
print(masked_arr)

# Boolean logic with arrays
arr1 = np.array([True, False, True, False])
arr2 = np.array([False, True, False, True])
print(np.logical_and(arr1, arr2))
print(np.logical_or(arr1, arr2))
print(np.logical_not(arr1))


## Structured Arrays
NumPy allows for the creation of structured arrays, which are arrays of structured data types that can be used to represent more complex data structures than regular arrays. They can be used to store and manipulate heterogeneous data, such as data from a database or spreadsheet.

In [None]:
# Creating a structured array
person_dtype = np.dtype([('name', 'S10'), ('age', 'i4'), ('height', 'f'), ('is_married', 'b')])
person_arr = np.array([('John', 25, 1.75, True), ('Jane', 30, 1.68, False)], dtype=person_dtype)
print(person_arr)

# Accessing fields in a structured array
print(person_arr['name'])
print(person_arr['age'])
print(person_arr['is_married'])

## NumPy and Pandas

Pandas is built on top of NumPy, and many of the functions in Pandas rely on NumPy functions. Often, data engineers use Pandas for data manipulation and NumPy for more mathematical operations. A Pandas DataFrame can be thought of as a dictionary-like container for Series objects, which are built on top of NumPy arrays.


In [None]:
import pandas as pd

# Create a Pandas DataFrame from a 2D NumPy array
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c'])
print(df)

NumPy exercises covering the topics previously mentioned:

1. Create a 1D NumPy array with the numbers from 1 to 10.
2. Create a 2D NumPy array with shape (3, 3) with random integer values between 0 and 9.
3. Use indexing to access the second element of the 1D array and the (1, 1) element of the 2D array.
4. Use slicing to create a new array with the last five elements of the 1D array and the first two rows of the 2Darray.
5. Use basic operations (+, -, *, /, **) to perform calculations on the 1D and 2D arrays.
6. Use a universal function (e.g. np.sqrt()) to calculate the square root of all elements in the 1D array.
7. Use broadcasting to add a scalar value of 10 to each element in the 1D array and each element in the secondcolumn of the 2D array.
8. Use array shape manipulation functions (e.g. np.reshape(), np.transpose()) to manipulate the shape of the 1D and2D arrays.
9. Use masking and Boolean logic to create a new array with all elements of the 1D array greater than 5 and allelements of the 2D array less than 5.
10. Create a structured array with two fields, 'name' and 'age', and add data for three people.

> Content created by [**Carlos Cruz-Maldonado**](https://www.linkedin.com/in/carloscruzmaldonado/).  
> I am available to answer any questions or provide further assistance.   
> Feel free to reach out to me at any time.  