# NumPy Tutorial for Beginners in Data Science

## Introduction to NumPy

NumPy (Numerical Python) is a core library in the Python ecosystem for scientific computing, widely used in data science. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. NumPy is fundamental because of its ability to handle large datasets efficiently, which is essential for tasks such as data analysis, machine learning, and numerical simulations.



While both NumPy arrays and Python lists can store collections of elements, they have critical differences:
1. Speed: NumPy arrays are implemented in C, making them much faster for operations like element-wise addition or multiplication, especially on large datasets. Python lists, on the other hand, rely on slower, high-level language loops.
2. Memory Efficiency: Python lists store objects, which involves more overhead, while NumPy arrays store data in a contiguous block of memory, allowing for more efficient storage and retrieval.
3. Broadcasting: One of the key features of NumPy arrays is broadcasting, which allows operations to be performed on arrays of different shapes without explicitly looping over elements. This is not possible with Python lists.

Vectorization in NumPy allows for batch operations on data without the need for explicit loops. This is critical for performance because it shifts the computation to low-level, highly optimized C and Fortran code. As a result, vectorized operations can be orders of magnitude faster than traditional Python loops. For example, performing element-wise addition on large datasets with NumPy is much faster compared to using a loop with Python lists. Additionally, vectorized operations lead to more concise and readable code, reducing the risk of errors in complex data operations.

Example of Vectorization vs. Python Loops:

In [None]:
import numpy as np
import time

size = 10**6

# Python list example
py_list = list(range(size))
start = time.time()
py_list = [x + 1 for x in py_list]
print(f"Python List Time: {time.time() - start:.5f} seconds")

# NumPy array example
np_array = np.arange(size)
start = time.time()
np_array += 1
print(f"NumPy Array Time: {time.time() - start:.5f} seconds")


## Getting Started with NumPy

You can install NumPy using pip: pip install numpy


Once installed, import NumPy into your Python script:

In [2]:
import numpy as np

## Creating NumPy Arrays

Convert Python lists or lists of lists into NumPy arrays:

In [None]:

# 1D array
my_list = [1, 2, 3, 5]
np_array = np.array(my_list)
print(np_array)
# Output: [1 2 3 5]

# 2D array
my_list_2d = [[1, 2], [3, 4]]
np_array_2d = np.array(my_list_2d)
print(np_array_2d)
# Output:
# [[1 2]
#  [3 4]]


## Build-in functions 

There are lots of built-in ways to generate Arrays:
1. arange: Return evenly spaced values within a given interval.

In [None]:
np.arange(0,10)

np.arange(0,11,2)

2. zeros and ones: Generate arrays of zeros or ones

In [None]:
np.zeros(5)

np.zeros((4,4))

np.ones(5)

np.ones((4,4))

4. linspace: Return evenly spaced numbers over a specified interval.

In [None]:
np.linspace(0,10,4)

np.linspace(0,10,60)

5. eye: Creates an identity matrix

In [None]:
np.eye(5)

6. rand: Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1].

In [None]:
np.random.rand(2)
np.random.rand(5,5)

7. randn: Return a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform

In [None]:
np.random.randn(2)

Seeding the Random Number Generator. Note: Using the same seed will produce the same sequence of random numbers.




In [None]:
np.random.seed(42)
random_numbers = np.random.rand(3)
print(random_numbers)
# Output: array([0.37454012, 0.95071431, 0.73199394])


8. randint: Return random integers from low (inclusive) to high (exclusive).

In [None]:
np.random.randint(1,100)
np.random.randint(1,100,10)

9. empty(shape): This function creates an array without initializing its entries. It's useful when you're going to fill the array later, avoiding the overhead of zeroing out the array.

In [None]:
arr = np.empty((3, 3))  # Creates a 3x3 array without initializing entries
print(arr)


10. full(shape, fill_value): This creates an array where all elements have the specified fill value, ideal when you need a constant matrix or array.

In [None]:
arr = np.full((2, 4), 7)  # Creates a 2x4 array filled with the value 7
print(arr)


11. Data Type (dtype)

In [None]:
a = np.array([[1, 2], [3, 4]])
print(a.dtype)
# Output: dtype('int64') or dtype('int32') depending on the system


12. Size: Total number of elements in the array.

In [None]:
print(a.size)
# Output: 4


13. Number of Dimensions (ndim):

In [None]:
print(a.ndim)
# Output: 2


14. Indexing: Accessing individual elements or subsets of the array.

In [None]:
a = np.array([[1, 2], [3, 4]])

# Access element at row 0, column 1
print(a[0, 1])
# Output: 2


15. Slicing: Extracting subarrays from the original array.

In [None]:
# Slice rows 0 to 1 and columns 0 to 1
print(a[0:2, 0:2])
# Output:
# [[1 2]
#  [3 4]]


16. Boolean Indexing

In [None]:
b = np.array([1, 2, 3, 4, 5])
print(b[b > 2])
# Output: array([3, 4, 5])


17. Reshaping Arrays: Changing the shape of an array without changing its data.
The total number of elements must remain unchanged during reshaping.

In [None]:
e = np.arange(6).reshape((2, 3))
print(e)
# Output:
# array([[0, 1, 2],
#        [3, 4, 5]])


## Mathematical Operations: Efficient Batch Processing Without Loops

NumPy excels in performing operations on entire arrays without needing explicit loops. This is achieved through vectorization, where operations are applied element-wise to arrays in a single step. Some examples include:

1. Element-wise operations:

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b  # Adds corresponding elements of a and b
print(c)

print(c * 2)  # Scalar multiplication

2. Matrix multiplication:

In [None]:
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])
result = np.dot(mat1, mat2)  # Performs matrix multiplication
print(result)


3. Batch operations (no loops needed):

In [None]:
arr = np.arange(1000000)
arr = arr * 3  # Multiply all elements by 3 in one go


4. Statistical Functions: Compute statistical measures directly on arrays.

In [None]:
print(np.mean(c))    # Mean
# Output: 2.0

print(np.median(c))  # Median
# Output: 2.0

print(np.std(c))     # Standard deviation
# Output: 0.816496580927726


5. Linear Algebra Operations

In [None]:
h = np.array([[1, 2], [3, 4]])

print(np.linalg.inv(h))  # Inverse of matrix
# Output:
# array([[-2. ,  1. ],
#        [ 1.5, -0.5]])

print(np.linalg.eig(h))  # Eigenvalues and eigenvectors
# Output: (array([-0.37228132,  5.37228132]),
#          array([[-0.82456484, -0.41597356],
#                 [ 0.56576746, -0.90937671]]))


## Broadcasting

Broadcasting is a powerful mechanism that allows NumPy to perform operations on arrays of different shapes in a flexible manner without explicitly replicating data.

* Align Dimensions: If the arrays have different numbers of dimensions, prepend ones to the shape of the smaller array until both shapes have the same length.
* Compatible Dimensions: For each dimension, the sizes must either be equal or one of them must be 1. If a dimension does not meet these criteria, broadcasting cannot be performed.

In [None]:
i = np.array([1, 2, 3])
j = np.array([[10], [20], [30]])

print(i + j)
# Output:
# array([[11, 12, 13],
#        [21, 22, 23],
#        [31, 32, 33]])


Explanation: Here, i has shape (3,) and j has shape (3, 1). NumPy broadcasts i across the columns of j to perform element-wise additio

## Array Manipulation

1. Concatenation: Combine multiple arrays into one.

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

# Concatenate along rows (axis=0)
k = np.concatenate([a, b], axis=0)
print(k)
# Output:
# array([[1, 2],
#        [3, 4],
#        [5, 6]])


2. Splitting: Divide an array into multiple sub-arrays.

In [None]:
l = np.split(k, 3, axis=0)
print(l)
# Output:
# [array([[1, 2]]), array([[3, 4]]), array([[5, 6]])]


3. Vertical Stacking (vstack):

In [None]:
m = np.vstack([a, b])
print(m)
# Output:
# array([[1, 2],
#        [3, 4],
#        [5, 6]])


4. Horizontal Stacking (hstack):

In [None]:
n = np.hstack([a, b])
print(n)
# Output:
# array([[1, 2, 5, 6],
#        [3, 4, 5, 6]])


5. Transposing Arrays

In [None]:
print(a.T)
# Output:
# array([[1, 3],
#        [2, 4]])


## Saving and Loading Arrays
1. Saving Arrays (save):

In [None]:
np.save('array.npy', a)

2. Loading Arrays (load):

In [None]:
loaded_array = np.load('array.npy')
print(loaded_array)

## Reading and Writing Text Files
1. Saving to Text File (savetxt):


In [None]:
np.savetxt('array.txt', a, fmt='%d', delimiter=',')


2. Loading from Text File (loadtxt):

In [None]:
loaded_text_array = np.loadtxt('array.txt', delimiter=',')
print(loaded_text_array)
# Output:
# [[1. 2.]
#  [3. 4.]]


NumPy is an indispensable tool for data scientists and anyone involved in numerical computations with Python. Its efficient handling of large datasets, comprehensive mathematical functions, and seamless integration with other libraries like pandas and scikit-learn make it a cornerstone of the data science ecosystem.