1. Explain the purpose and advantages of NumPy in scientific computing and data analysis. How does it
enhance Python's capabilities for numerical operations?

In [23]:
# NumPy (Numerical Python) is a fundamental library for scientific computing in Python. It provides powerful tools for working with arrays and matrices,
# enabling efficient numerical operations and data analysis.

# Advantages:
# 1. Efficient Array Operations:
#    - NumPy arrays are highly optimized for numerical operations, offering significantly faster performance compared to Python lists.
#    - This is achieved through contiguous memory allocation and vectorized operations that leverage optimized C code.

# 2. Mathematical Functions:
#    - NumPy provides a vast collection of mathematical functions for linear algebra, Fourier transforms, random number generation, and more.
#    - These functions are optimized for array operations and are essential for scientific and engineering applications.

# 3. Broadcasting:
#    - NumPy's broadcasting feature allows for element-wise operations on arrays of different shapes, simplifying calculations and reducing the need for loops.

# 4. Integration with Other Libraries:
#    - NumPy serves as the foundation for many other scientific computing libraries in Python, such as SciPy, Pandas, and scikit-learn.
#    - Its seamless integration facilitates data exchange and analysis across different domains.

# 5. Data Manipulation:
#    - NumPy provides tools for manipulating arrays, including indexing, slicing, reshaping, sorting, and searching.
#    - These capabilities are crucial for data preprocessing and analysis tasks.

# How NumPy enhances Python's capabilities for numerical operations:
# - Python alone is not ideal for computationally intensive tasks due to its interpreted nature.
# - NumPy leverages optimized C code for its core functionalities, enabling faster execution of numerical operations.
# - Its array data structure and mathematical functions greatly simplify working with numerical data and performing complex calculations.
# - By providing tools for vectorized operations, broadcasting, and array manipulation, NumPy significantly improves the efficiency and ease of use for numerical computations in Python.

# Example:
# Python Lists vs Numpy Array

import numpy as np
import time

# Python List
python_list = list(range(1000000))
start_time = time.time()
result_list = [x * 2 for x in python_list]
end_time = time.time()
print(f"Python List Time: {end_time - start_time} seconds")

# NumPy Array
numpy_array = np.arange(1000000)
start_time = time.time()
result_array = numpy_array * 2
end_time = time.time()
print(f"NumPy Array Time: {end_time - start_time} seconds")

# As you can see, NumPy arrays perform much faster for numerical operations.


Python List Time: 0.14045357704162598 seconds
NumPy Array Time: 0.0017397403717041016 seconds


2. Compare and contrast np.mean() and np.average() functions in NumPy. When would you use one over the
other?

In [24]:
import numpy as np

# np.mean() and np.average() both calculate the average of an array, but they differ in how they handle weights.

# np.mean() calculates the arithmetic mean, which is the sum of the elements divided by the number of elements.
# It's suitable for finding the simple average of a dataset.

# np.average() allows for weighted averaging. It takes an optional 'weights' argument, which can be an array of weights corresponding to each element in the input array.
# This is useful when some elements in the dataset should contribute more to the average than others.

# Example:
data = np.array([1, 2, 3, 4, 5])

# Using np.mean()
mean = np.mean(data)
print(f"Mean using np.mean(): {mean}")

# Using np.average() with equal weights
average_equal_weights = np.average(data)
print(f"Average using np.average() with equal weights: {average_equal_weights}")

# Using np.average() with custom weights
weights = np.array([0.1, 0.2, 0.3, 0.2, 0.2])
average_weighted = np.average(data, weights=weights)
print(f"Average using np.average() with custom weights: {average_weighted}")

# When to use np.mean():
# - When you need to find the simple arithmetic mean of a dataset.
# - When all elements should contribute equally to the average.

# When to use np.average():
# - When you need to calculate a weighted average, where some elements have more importance than others.
# - For example, in calculating a grade point average (GPA) where different courses have different credit weights.

# In summary, np.mean() is for unweighted averages, and np.average() is for weighted averages.

Mean using np.mean(): 3.0
Average using np.average() with equal weights: 3.0
Average using np.average() with custom weights: 3.2


3. Describe the methods for reversing a NumPy array along different axes. Provide examples for 1D and 2D
arrays

In [25]:
# For 1D arrays:

arr_1d = np.array([1, 2, 3, 4, 5])
reversed_arr_1d = arr_1d[::-1]
print(f"Original 1D array: {arr_1d}")
print(f"Reversed 1D array: {reversed_arr_1d}")


# For 2D arrays:

arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"Original 2D array:\n{arr_2d}")

# Reverse along the first axis (rows):
reversed_rows_arr_2d = arr_2d[::-1, :]
print(f"Reversed rows 2D array:\n{reversed_rows_arr_2d}")


# Reverse along the second axis (columns):
reversed_cols_arr_2d = arr_2d[:, ::-1]
print(f"Reversed columns 2D array:\n{reversed_cols_arr_2d}")


# Reverse along both axes:
reversed_both_axes_arr_2d = arr_2d[::-1, ::-1]
print(f"Reversed both axes 2D array:\n{reversed_both_axes_arr_2d}")

Original 1D array: [1 2 3 4 5]
Reversed 1D array: [5 4 3 2 1]
Original 2D array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Reversed rows 2D array:
[[7 8 9]
 [4 5 6]
 [1 2 3]]
Reversed columns 2D array:
[[3 2 1]
 [6 5 4]
 [9 8 7]]
Reversed both axes 2D array:
[[9 8 7]
 [6 5 4]
 [3 2 1]]


4. How can you determine the data type of elements in a NumPy array? Discuss the importance of data types
in memory management and performance.

In [26]:
import numpy as np

# Create a NumPy array
arr = np.array([1, 2, 3, 4, 5])

# Determine the data type of the elements
data_type = arr.dtype

print(f"Data type of the array: {data_type}")

# Importance of data types in memory management and performance:

# Memory Management:
# - Data types determine how much memory is allocated for each element in the array.
# - Choosing the appropriate data type for your data can significantly reduce memory usage,
#   especially when dealing with large datasets. For example, using int8 instead of int64
#   for storing small integer values will save memory.

# Performance:
# - Data types influence the speed of computations. Operations on arrays with compatible data types
#   are generally faster than operations involving type conversions.
# - NumPy performs type checks and potential conversions during operations, which can lead to performance
#   degradation. Using consistent data types throughout your code can improve execution speed.

# Example:

# Array with int32 data type
arr_int32 = np.array([1, 2, 3], dtype=np.int32)
print(f"Size of int32 array: {arr_int32.nbytes}")

# Array with int8 data type
arr_int8 = np.array([1, 2, 3], dtype=np.int8)
print(f"Size of int8 array: {arr_int8.nbytes}")

# As you can see, the int8 array occupies less memory than the int32 array.

# In summary, understanding data types is crucial for efficient memory management and performance
# optimization when working with NumPy arrays. It allows you to choose the appropriate data type for
# your data, minimize memory usage, and maximize the speed of calculations.


Data type of the array: int64
Size of int32 array: 12
Size of int8 array: 3



5. Define ndarrays in NumPy and explain their key features. How do they differ from standard Python lists?




In [27]:
# In NumPy, an ndarray is a multidimensional container for homogeneous data, meaning all elements must have the same data type.
# It's the fundamental data structure in NumPy and provides efficient storage and manipulation of numerical data.

# Key features of ndarrays:

# 1. Homogeneous Data Type: All elements in an ndarray must have the same data type (e.g., int, float, complex). This allows for optimized memory allocation and operations.
# 2. Multidimensional: ndarrays can have any number of dimensions, from 1D (vectors) to higher dimensions (matrices, tensors).
# 3. Shape and Size: The shape of an ndarray is a tuple representing the size of each dimension. The size of an ndarray is the total number of elements.
# 4. Data Type: The dtype attribute specifies the data type of the elements.
# 5. Fast Element-wise Operations: ndarrays allow for fast element-wise operations using vectorization, which avoids the need for explicit loops.
# 6. Broadcasting: This feature allows for arithmetic operations between arrays of different shapes under specific conditions, making code concise and efficient.

# Differences from Python lists:

# | Feature       | NumPy ndarray                     | Python List                          |
# |-------------- |-----------------------------------|------------------------------------  |
# | Data Type     | Homogeneous                       | Can contain different data types     |
# | Memory Usage  | Efficient (contiguous memory)     | Less efficient (pointers to objects) |
# | Performance   | Faster for numerical operations   | Slower for numerical operations      |
# | Functionality | Specialized for numerical tasks   | General-purpose data structure       |
# | Vectorization | Supported                         | Not natively supported               |
# | Broadcasting  | Supported                         | Not supported                        |

# Example:

# Python List
python_list = [1, 2, 3, 4, 5]

# NumPy ndarray
numpy_array = np.array([1, 2, 3, 4, 5])

# Show different features
print(f"Data type of Python list: {type(python_list)}")
print(f"Data type of NumPy array: {type(numpy_array)}")

print(f"Data type of elements in NumPy array: {numpy_array.dtype}")

print(f"Shape of NumPy array: {numpy_array.shape}")

# Multiply all elements in the list by 2 (Python list)
multiplied_list = [x * 2 for x in python_list]
print(f"Python list multiplied by 2: {multiplied_list}")

# Multiply all elements in the array by 2 (NumPy ndarray)
multiplied_array = numpy_array * 2
print(f"NumPy array multiplied by 2: {multiplied_array}")

Data type of Python list: <class 'list'>
Data type of NumPy array: <class 'numpy.ndarray'>
Data type of elements in NumPy array: int64
Shape of NumPy array: (5,)
Python list multiplied by 2: [2, 4, 6, 8, 10]
NumPy array multiplied by 2: [ 2  4  6  8 10]


6. Analyze the performance benefits of NumPy arrays over Python lists for large-scale numerical operations

In [28]:
import numpy as np
import time

# Define the size of the arrays/lists
size = 1000000

# Create a Python list
python_list = list(range(size))

# Create a NumPy array
numpy_array = np.arange(size)

# Measure the time taken for element-wise multiplication using Python list
start_time = time.time()
result_list = [x * 2 for x in python_list]
end_time = time.time()
python_list_time = end_time - start_time

# Measure the time taken for element-wise multiplication using NumPy array
start_time = time.time()
result_array = numpy_array * 2
end_time = time.time()
numpy_array_time = end_time - start_time

# Print the results
print(f"Python List Time: {python_list_time:.6f} seconds")
print(f"NumPy Array Time: {numpy_array_time:.6f} seconds")

# Calculate the performance improvement
performance_improvement = python_list_time / numpy_array_time
print(f"Performance improvement of NumPy over Python List: {performance_improvement:.2f} times")


# Analysis:
# The code demonstrates the significant performance advantage of NumPy arrays over Python lists for large-scale numerical operations.
# NumPy arrays are significantly faster because:

# 1. Vectorized Operations: NumPy performs operations on entire arrays at once (vectorized operations), which is much faster than iterating through elements in a Python list using a loop.
# 2. Optimized C Code: NumPy is implemented in C and uses highly optimized algorithms for numerical operations, making it much more efficient than Python's interpreted code.
# 3. Contiguous Memory: NumPy arrays store elements in contiguous memory blocks, which allows for faster access and manipulation of data compared to Python lists that store elements as pointers.

# Conclusion:
# For numerical computations and data analysis tasks involving large datasets, NumPy arrays are the preferred choice due to their superior performance and efficiency compared to Python lists.


Python List Time: 0.080248 seconds
NumPy Array Time: 0.006810 seconds
Performance improvement of NumPy over Python List: 11.78 times


7. Compare vstack() and hstack() functions in NumPy. Provide examples demonstrating their usage and
output

In [29]:
import numpy as np

# Create two sample arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Vertical stacking using vstack()
vstack_result = np.vstack((arr1, arr2))
print("Vertical Stacking (vstack()):\n", vstack_result)

# Horizontal stacking using hstack()
hstack_result = np.hstack((arr1, arr2))
print("\nHorizontal Stacking (hstack()):\n", hstack_result)


# Example with 2D arrays
arr3 = np.array([[1, 2], [3, 4]])
arr4 = np.array([[5, 6], [7, 8]])

# Vertical stacking using vstack()
vstack_result_2d = np.vstack((arr3, arr4))
print("\nVertical Stacking (vstack()) with 2D arrays:\n", vstack_result_2d)

# Horizontal stacking using hstack()
hstack_result_2d = np.hstack((arr3, arr4))
print("\nHorizontal Stacking (hstack()) with 2D arrays:\n", hstack_result_2d)


# Summary:
# - vstack(): Stacks arrays vertically, creating a new array with rows from the input arrays appended one after the other.
# - hstack(): Stacks arrays horizontally, creating a new array with columns from the input arrays appended one after the other.


Vertical Stacking (vstack()):
 [[1 2 3]
 [4 5 6]]

Horizontal Stacking (hstack()):
 [1 2 3 4 5 6]

Vertical Stacking (vstack()) with 2D arrays:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]

Horizontal Stacking (hstack()) with 2D arrays:
 [[1 2 5 6]
 [3 4 7 8]]


8. Explain the differences between fliplr() and flipud() methods in NumPy, including their effects on various
array dimensions.

In [30]:
import numpy as np

# Create a 2D array
arr_2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

# Fliplr: Flips the array in the left/right direction (along the second axis).
flipped_lr = np.fliplr(arr_2d)
print("Original 2D array:\n", arr_2d)
print("\nFlipped left/right (fliplr):\n", flipped_lr)

# Flipud: Flips the array in the up/down direction (along the first axis).
flipped_ud = np.flipud(arr_2d)
print("\nFlipped up/down (flipud):\n", flipped_ud)

# Create a 3D array
arr_3d = np.array([[[1, 2], [3, 4]],
                  [[5, 6], [7, 8]],
                  [[9, 10], [11, 12]]])

# Fliplr on a 3D array: Flips each 2D slice along the second axis.
flipped_lr_3d = np.fliplr(arr_3d)
print("\nOriginal 3D array:\n", arr_3d)
print("\nFlipped left/right (fliplr) on 3D array:\n", flipped_lr_3d)

# Flipud on a 3D array: Flips each 2D slice along the first axis.
flipped_ud_3d = np.flipud(arr_3d)
print("\nFlipped up/down (flipud) on 3D array:\n", flipped_ud_3d)

# Summary:
# - fliplr(): Reverses the order of elements along the second axis (columns) of the array.
# - flipud(): Reverses the order of elements along the first axis (rows) of the array.

# In higher-dimensional arrays, fliplr() and flipud() operate on the corresponding axes:
# - fliplr(): Flips the array along the axis representing the horizontal direction.
# - flipud(): Flips the array along the axis representing the vertical direction.


Original 2D array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

Flipped left/right (fliplr):
 [[3 2 1]
 [6 5 4]
 [9 8 7]]

Flipped up/down (flipud):
 [[7 8 9]
 [4 5 6]
 [1 2 3]]

Original 3D array:
 [[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]]

Flipped left/right (fliplr) on 3D array:
 [[[ 3  4]
  [ 1  2]]

 [[ 7  8]
  [ 5  6]]

 [[11 12]
  [ 9 10]]]

Flipped up/down (flipud) on 3D array:
 [[[ 9 10]
  [11 12]]

 [[ 5  6]
  [ 7  8]]

 [[ 1  2]
  [ 3  4]]]


9. Discuss the functionality of the array_split() method in NumPy. How does it handle uneven splits?

In [31]:
import numpy as np

# Create a sample array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

# Split the array into 3 sub-arrays
sub_arrays = np.array_split(arr, 3)

print("Original array:", arr)
print("\nSub-arrays after splitting:")
for i, sub_array in enumerate(sub_arrays):
    print(f"Sub-array {i+1}: {sub_array}")


# Example with uneven splits
arr2 = np.array([1, 2, 3, 4, 5, 6, 7])
sub_arrays2 = np.array_split(arr2, 3)

print("\nOriginal array with uneven split:", arr2)
print("\nSub-arrays after uneven splitting:")
for i, sub_array in enumerate(sub_arrays2):
    print(f"Sub-array {i+1}: {sub_array}")


# Explanation:

# The array_split() method in NumPy splits an array into multiple sub-arrays of (nearly) equal size.
# - It takes the array and the number of splits as input.
# - It returns a list of sub-arrays.

# Handling uneven splits:
# If the array cannot be split into equal-sized sub-arrays, array_split() tries to distribute the extra elements as evenly as possible among the sub-arrays.
# The last sub-array might have a different size than the others.

# In the provided examples:
# - arr is split into 3 sub-arrays of (nearly) equal size.
# - arr2 is split into 3 sub-arrays, and the extra element is assigned to the last sub-array.




Original array: [1 2 3 4 5 6 7 8]

Sub-arrays after splitting:
Sub-array 1: [1 2 3]
Sub-array 2: [4 5 6]
Sub-array 3: [7 8]

Original array with uneven split: [1 2 3 4 5 6 7]

Sub-arrays after uneven splitting:
Sub-array 1: [1 2 3]
Sub-array 2: [4 5]
Sub-array 3: [6 7]


10. Explain the concepts of vectorization and broadcasting in NumPy. How do they contribute to efficient array
operations?

In [32]:
import numpy as np

# Vectorization:

# Vectorization in NumPy refers to performing operations on entire arrays at once, rather than iterating through elements individually.
# It leverages optimized C code and allows for faster execution of operations.

# Example: Element-wise multiplication

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Using a loop (inefficient)
result_loop = []
for i in range(len(arr1)):
    result_loop.append(arr1[i] * arr2[i])

# Using vectorization (efficient)
result_vectorized = arr1 * arr2

print("Result using loop:", result_loop)
print("Result using vectorization:", result_vectorized)

# Broadcasting:

# Broadcasting is a mechanism that allows NumPy to perform operations on arrays with different shapes, under specific conditions.
# It automatically expands the smaller array to match the shape of the larger array before performing the operation.

# Example: Adding a scalar to an array

arr = np.array([1, 2, 3])
scalar = 5

# Adding the scalar to each element of the array using broadcasting
result_broadcasting = arr + scalar

print("Result of broadcasting:", result_broadcasting)


# Example: Adding a row vector to a matrix

matrix = np.array([[1, 2, 3], [4, 5, 6]])
row_vector = np.array([10, 20, 30])

# Adding the row vector to each row of the matrix using broadcasting
result_matrix_broadcasting = matrix + row_vector

print("Result of matrix broadcasting:", result_matrix_broadcasting)

# How they contribute to efficient array operations:

# - Vectorization:
#     - Eliminates the need for explicit loops, resulting in faster execution.
#     - Leverages optimized C code for efficient array operations.

# - Broadcasting:
#     - Allows for concise and efficient code when working with arrays of different shapes.
#     - Eliminates the need to manually reshape or replicate arrays, simplifying calculations.

# By combining vectorization and broadcasting, NumPy facilitates efficient execution of complex mathematical and array operations,
# making it a powerful tool for scientific computing and data analysis.


Result using loop: [4, 10, 18]
Result using vectorization: [ 4 10 18]
Result of broadcasting: [6 7 8]
Result of matrix broadcasting: [[11 22 33]
 [14 25 36]]
