# NumPy Advanced Topics
This notebook covers more advanced NumPy operations with examples.

## NumPy Operations
Performing arithmetic operations on arrays and scalars.

In [5]:
import numpy as np

In [3]:
# using python lists we would have to use longer aproach to perform arithmetic operations
# for example, adding elements of two lists in python

list_1 = [1,2,3,4]
list_2 = [5,6,7,8]

sum_array = []
for i in range(len(list_1)):
      sum_array.append(list_1[i] + list_2[i])

print(sum_array)

[6, 8, 10, 12]


In [4]:
# or using map()
list(map( lambda x,y: x+y, list_1, list_2))

[6, 8, 10, 12]

In [6]:
# using numpy arrays offers easier approach to arithmetics operations

array_1_np = np.array([1,2,3,4])
array_2_np = np.array([5,6,7,8])

print('Addition: ' , array_1_np + array_2_np)
print('Subtraction:  ', array_1_np - array_2_np)
print('Multiplication: ',  array_1_np * array_2_np)
print('Power: ',  array_1_np * array_2_np)


Addition:  [ 6  8 10 12]
Subtraction:   [-4 -4 -4 -4]
Multiplication:  [ 5 12 21 32]
Power:  [ 5 12 21 32]


## Broadcasting
- Performing operations on different shaped arrays.
- NumPy automatically expands dimensions to perform operations without loops.

In [8]:
# updating multiple elements of 2d array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print('Original 2d array \n', arr_2d)

arr_2d[1,:] = 999 #update second row elements with 999

print('Updated array: \n', arr_2d)


Original 2d array 
 [[1 2 3]
 [4 5 6]]
Updated array: 
 [[  1   2   3]
 [999 999 999]]


### Broadcasting in NumPy
Broadcasting allows NumPy to perform element-wise operations on arrays of different shapes without explicitly replicating data. This makes computations more efficient and memory-friendly.

### Rules of Broadcasting
When operating on two arrays, NumPy compares their shapes element-wise. The dimensions are compatible if:

1. They are equal, or
2. One of them is 1 (it gets expanded to match the other).
<br>If these conditions are met, NumPy broadcasts the smaller array across the larger one.

In [9]:
# Example 1: Adding a Scalar to an Array
array_1 = np.array([1, 2, 3])
result = array_1 + 10  # 10 is broadcasted to each element
print(result)

[11 12 13]


In [10]:
# Example 2: Adding a 1D Array to a 2D Array

arr_2d = np.array([[1, 2, 3],
                    [4, 5, 6]])

arr_1d = np.array([10, 20, 30])

result = arr_2d + arr_1d  # arr_1d is broadcasted to each row
print(result)

[[11 22 33]
 [14 25 36]]


In [11]:
# Example 3: Broadcasting a Column Vector
arr_2d = np.array([[1, 2, 3],
                    [4, 5, 6]])

arr_col = np.array([[10],
                    [20]])  # 2D column vector

result = arr_2d + arr_col
print(result)


[[11 12 13]
 [24 25 26]]


### When Broadcasting Fails
Broadcasting does not work if the arrays have incompatible shapes.

In [12]:
arr_1 = np.array([1, 2, 3])
arr_2 = np.array([[1, 2],
                  [3, 4]])

try:
    print(arr_1 + arr_2)  #Throws an error (shapes not compatible)
except:
    print('Failed!!!')

Failed!!!


## Selection Using Comparison Operators
In NumPy, you can use comparison operators (>, <, >=, <=, ==, !=) to filter elements in an array. This technique is also called Boolean masking.

In [2]:
# Example 1: filter elements greater than 25
arr = np.array([10, 20, 30, 40, 50])
print(arr > 25)   # Boolean array
print(arr[arr > 25])  # Filter elements greater than 25

[False False  True  True  True]
[30 40 50]


In [3]:
print(arr[[True,False,False,False,False]])
#this will print only the elements that align with True

[10]


In [4]:
# Example 2: Create a 4x4 matrix
# Select even elements, greater than 5
np.random.seed(10)
matrix_4_4 = np.random.randint(1,20, (4,4))
print(matrix_4_4)

[[10  5 16  1]
 [18 17 18  9]
 [10  1 11  9]
 [ 5 17  5 16]]


In [6]:
# we can firstly create conditions, than use them
even = (matrix_4_4 % 2 == 0)
greater_than_5 = (matrix_4_4>5)

print(matrix_4_4[even & greater_than_5])

[10 16 18 18 10 16]


In [7]:
# or
print(matrix_4_4[ (matrix_4_4 % 2 == 0) & (matrix_4_4>5) ])

[10 16 18 18 10 16]


### Replacing Values Using Boolean Masks
Instead of filtering, we can also modify elements based on conditions.

In [8]:
arr_rep = np.array([1, 2, 3, 4, 5, 6, 7, 8])

print(f'Before change: {arr_rep}')
# Replace all values greater than 5 with 99
arr_rep[arr_rep > 5] = 99

print(f'After change: {arr_rep}')

Before change: [1 2 3 4 5 6 7 8]
After change: [ 1  2  3  4  5 99 99 99]


####  Using Boolean Masks with NumPy Functions
Boolean masks can be useful with NumPy functions

In [18]:
arr = np.array([10, 20, 30, 40, 50])

# sum elements are greater than 25
summ = np.sum(arr[arr > 25])
print(summ)

120


In [10]:
# Get the mean of values greater than 15
mean_value = np.mean(arr[arr > 15])
print(mean_value)

35.0


In [19]:
# count how many elements greater than 25
count = len(arr[arr>25])
print(count)

# or

print(  np.sum(arr>25)  )

3
3


## np.where()
Is a powerful tool that allows conditional selection and modification of elements in a NumPy array. It can be used for filtering, replacing values, and even creating new arrays based on conditions.

### Basic Usage of np.where
np.where(condition, value_if_true, value_if_false)

In [20]:
arr = np.array([10, 20, 30, 40, 50])

# Replace values greater than 25 with 100, otherwise keep them the same
new_arr = np.where(arr > 25, 100, arr)

print(new_arr)


[ 10  20 100 100 100]


###  Using np.where for Index Retrieval
If you pass only the condition, np.where returns the indexes where the condition is True.

In [21]:
arr = np.array([5, 10, 15, 20, 25])

# get indexes where value greater than 10
indexes = np.where(arr > 10)

print(indexes)
print(arr[indexes])

(array([2, 3, 4]),)
[15 20 25]


### Multiple Conditions with np.where
You can use logical operators like & (AND) and | (OR) inside np.where.

In [22]:
arr = np.array([2, 4, 6, 8, 10, 12, 14])

# Replace values between 5 and 12 with 99
new_arr = np.where((arr > 5) & (arr < 12), 99, arr)

print(new_arr)

[ 2  4 99 99 99 12 14]


### Conditional Replacement in a 2D Array
It works same also with 2d arrays

In [26]:
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6]])

# Replace values less than 4 with 0, otherwise set them to 1
new_arr = np.where(arr_2d < 4, 0, 1)

print(new_arr)


[[0 0 0]
 [1 1 1]]


In [27]:
# Example: Create random matrix 4_4
# Find indexes of even numbers
# print them as (i,j)

np.random.seed(10)
matrix_4_4 = np.random.randint(1,20, (4,4))
print(matrix_4_4)

[[10  5 16  1]
 [18 17 18  9]
 [10  1 11  9]
 [ 5 17  5 16]]


In [28]:
print(np.where(matrix_4_4 % 2 == 0))

(array([0, 0, 1, 1, 2, 3]), array([0, 2, 0, 2, 0, 3]))


In [29]:
# save rows and cols of even numbers
rows, cols = np.where(matrix_4_4 % 2 == 0)
print(rows)
print(cols)

[0 0 1 1 2 3]
[0 2 0 2 0 3]


In [31]:
# print indexes as combination of row,col
index = []
for i in range(0,6):
    index.append((int(rows[i]),int(cols[i])))

print(f'Even numbers in positions: {index}')

Even numbers in positions: [(0, 0), (0, 2), (1, 0), (1, 2), (2, 0), (3, 3)]


## Aggregations
Computing sum, mean, standard deviation, and more. <br>
Perform row-wise and column-wise calculations using axis

In [37]:
arr_agg = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_agg)
print("Sum:", np.sum(arr_agg))
print("Mean:", np.mean(arr_agg))
print("Standard Deviation:", np.std(arr_agg))
print("Product:", np.prod(arr_agg))
print("Variance:", np.var(arr_agg))

# if axis not specified it will find sum,mean,std,prod,var of all elements

[[1 2 3]
 [4 5 6]]
Sum: 21
Mean: 3.5
Standard Deviation: 1.707825127659933
Product: 720
Variance: 2.9166666666666665


In [40]:
# Aggregation along an axis
print(arr_agg)
print("Sum by row:", np.sum(arr_agg, axis=1))
# axis = 1, to sum elements of a row we need to move by colums

[[1 2 3]
 [4 5 6]]
Sum by row: [ 6 15]


In [41]:
print(arr_agg)
print("Sum by column:", np.sum(arr_agg, axis=0))
# axis = 0 , to sum elements by column we will need to move through rows

[[1 2 3]
 [4 5 6]]
Sum by column: [5 7 9]


In [44]:
print(arr_agg)
print("Sum by row: \n", np.sum(arr_agg, axis=1, keepdims=True))
print("Sum by column:", np.sum(arr_agg, axis=0 ,keepdims=True))

[[1 2 3]
 [4 5 6]]
Sum by row: 
 [[ 6]
 [15]]
Sum by column: [[5 7 9]]


## Rounding, Sorting, Unique Values
Manipulating numerical data.

In [45]:
arr = np.array([3.14159, 2.718, 1.618, 1.414])
print("Rounded:", np.round(arr, 2))

arr_sort = np.array([5, 2, 9, 1, 7])
print("Sorted:", np.sort(arr_sort))

arr_unique = np.array([1, 2, 2, 3, 4, 4, 4, 5])
print("Unique values:", np.unique(arr_unique))


Rounded: [3.14 2.72 1.62 1.41]
Sorted: [1 2 5 7 9]
Unique values: [1 2 3 4 5]


In [48]:
sorted_array = np.sort(arr_sort)

#reverse
reversed_array = sorted_array[::-1]
print(reversed_array)

#or use np.flip()
print(np.flip(sorted_array))

[9 7 5 2 1]
[9 7 5 2 1]


In [52]:
# using axis while sorting

arr_2_3 = np.array([[9, 2, 5], [4, 3, 6]])

print(arr_2_3)
print(np.sort(arr_2_3, axis=0)) # will sort elements of same column

[[9 2 5]
 [4 3 6]]
[[4 2 5]
 [9 3 6]]


In [53]:
# np.sort() will create new array sorted
# arr_2_3.sort()  will change current array

arr_2_3.sort(axis=1)  #sort elements of same row
print(arr_2_3)

[[2 5 9]
 [3 4 6]]


### Working with Missing Data (nan, isnan(), nanmean())
- Creating Nan values
- Detecting Missing Values isnan(), find indexes of nan values
- Handling NaN values
- Replacing NaN Values## Working with Missing Data
Handling `NaN` values in NumPy.

In [54]:
# creating nan values using np.nan
array_n = np.array([1,2,3, np.nan, 5,6, np.nan, 8])
print(array_n)


[ 1.  2.  3. nan  5.  6. nan  8.]


In [55]:
# like boolean mask at conditional selection
print(np.isnan(array_n))

[False False False  True False False  True False]


In [59]:
# use where to find indexes
print(np.where(np.isnan(array_n)))
# it will return a tuple
# if its 1d array, it will return a tuple with just one array of indexes
# if its 2d, first element will be array of rows, second element, array of col

position_of_nans = np.where(np.isnan(array_n))
print(position_of_nans[0])

(array([3, 6]),)
[3 6]


### Performing Calculations with NaNs
- np.nanmean(arr)	    -- Mean ignoring NaNs
- np.nanmedian(arr)	    -- Median ignoring NaNs
- np.nanstd(arr)	    -- Standard deviation ignoring NaNs
- np.nansum(arr)	    -- Sum ignoring NaNs

In [61]:
# Exercise 1: For given array, replace nan values with mean of the array

array_nan = np.array([1,2,3, np.nan, 5,6, np.nan, 8])
print(array_nan)

[ 1.  2.  3. nan  5.  6. nan  8.]


In [62]:
# finding position of nan values
np.where(np.isnan(array_nan))

(array([3, 6]),)

In [64]:
# finding mean, while ignoring nan values
mean = np.nanmean(array_nan)
print(mean)
#rounded mean
print(f'Rounded: ', mean.round(2))

4.166666666666667
Rounded:  4.17


In [66]:
# replace nan values with rounded mean
array_nan = np.where(np.isnan(array_nan), mean.round(2), array_nan)
print(array_nan)

[1.   2.   3.   4.17 5.   6.   4.17 8.  ]


## Performance Testing
Comparing Python list vs NumPy array for summation.

In [67]:
import time

# Python list sum
py_list = list(range(1_000_000))
start = time.time()
sum(py_list)
end = time.time()
print("Python list sum time:", end - start)

# NumPy array sum
np_arr = np.arange(1_000_000)
start = time.time()
np.sum(np_arr)
end = time.time()
print("NumPy array sum time:", end - start)
    

Python list sum time: 0.012954235076904297
NumPy array sum time: 0.0009641647338867188


## Conclusion on NumPy Library
NumPy (Numerical Python) is a fundamental library for scientific computing in Python, providing powerful tools for handling numerical data efficiently. It is widely used in data science, machine learning, engineering, and finance due to its ability to perform fast mathematical operations on large datasets.

Key Takeaways:
<br>✅ Efficient Arrays: NumPy provides ndarray, a high-performance, memory-efficient array structure that is faster than Python lists.
<br>✅ Vectorized Operations: Eliminates the need for loops, allowing element-wise operations for better performance.
<br>✅ Broadcasting: Enables operations on arrays of different shapes without explicit looping.
<br>✅ Indexing & Selection: Supports slicing, Boolean masking, and advanced indexing techniques.
<br>✅ Mathematical & Statistical Functions: Provides built-in methods for mathematical computations, including mean, sum, standard deviation, and linear algebra operations.
<br>✅ Integration with Other Libraries: Works seamlessly with libraries like Pandas, SciPy, and Matplotlib.
<br><br>
In summary, NumPy is the backbone of numerical computing in Python, enabling efficient data manipulation and computation. Mastering it is essential for working with large datasets and optimizing performance in scientific applications. 🚀