In [1]:
# 1. Explain the purpose and advantages of NumPy in scientific computing and data analysis. How does it
# enhance Python's capabilities for numerical operations?

# Purpose of NumPy:
# Efficient Array Handling**: Provides a powerful `ndarray` for large, multi-dimensional arrays.
# Mathematical Functions**: Offers a wide range of built-in mathematical operations for arrays.
# Broadcasting**: Enables operations on arrays of different shapes without explicit loops.
# Integration**: Serves as a foundation for other libraries like SciPy and Pandas.

# Advantages of NumPy:
# Performance**: Fast execution due to C implementation and vectorized operations.
# Convenience**: Intuitive syntax leads to cleaner and more readable code.
# Functionality**: Comprehensive mathematical and statistical functions for diverse applications.
# Memory Efficiency**: Lower memory usage compared to Python lists and efficient data manipulation.
# Community Support**: Strong ecosystem with extensive documentation and community resources.

# Enhancements to Python:
# Vectorization**: Applies functions to entire arrays, reducing loop usage.
# Linear Algebra**: Robust tools for matrix operations and solving equations.
# Interoperability**: Seamless conversion between NumPy arrays and other data formats.

In [2]:
# 2. Compare and contrast np.mean() and np.average() functions in NumPy. When would you use one over the
# other?

# Both np.mean() and np.average() are functions in NumPy used to compute the central tendency of data, but they have distinct features and use cases. Here’s a comparison:

# np.mean()
# Purpose: Calculates the arithmetic mean (average) of array elements.
# Syntax: np.mean(array, axis=None, keepdims=False)
# Default Behavior: Computes the mean of all elements if no axis is specified.
# Weights: Does not support weights; all elements contribute equally to the mean.
# Use Case: Ideal for simple average calculations where all data points are treated equally.

# np.average()
# Purpose: Computes the weighted average of array elements.
# Syntax: np.average(array, weights=None, axis=None, returned=False)
# Weights: Supports an optional weights parameter, allowing specific contributions for each element.
# Output: Can return both the weighted average and the sum of weights if returned=True.
# Use Case: Useful when you need to account for varying significance of data points, such as in statistical analysis or when working with probabilities.

# When to Use Each:
# Use np.mean() when you need a straightforward average of values with no additional considerations.
# Use np.average() when you require a weighted average or need to specify the importance of certain values in the dataset.
# Example Scenarios:
# np.mean(): Calculating the average test score of a class where all scores are equally important.
# np.average(): Finding the average price of products where some products have higher sales volumes, requiring their prices to weigh more in the final average.


In [5]:
# 3. Describe the methods for reversing a NumPy array along different axes. Provide examples for 1D and 2D arrays.

# Reversing a NumPy array can be done easily using slicing techniques. Here’s how you can reverse 1D and 2D arrays along different axes.

# 1D Array Reversal
# To reverse a 1D array, you can use slicing with a step of -1.

#Example:
import numpy as np

arr1d = np.array([1, 2, 3, 4])
revarry1 = arr1d[ : : -1]
revarry1

array([4, 3, 2, 1])

In [None]:
# 2D Array Reversal
# For a 2D array, you can reverse along specific axes using slicing as well.

# To reverse the entire array (both rows and columns), use [::-1, ::-1].
# To reverse along a specific axis:
# Axis 0: Reverse the rows (vertically).
# Axis 1: Reverse the columns (horizontally).

In [6]:
arry2d = np.array([[1, 2, 3], [4, 5, 6]])
revarry2d = arry2d[ : : -1, : : -1]
revarry2d

array([[6, 5, 4],
       [3, 2, 1]])

In [7]:
# 4. How can you determine the data type of elements in a NumPy array? Discuss the importance of data types
# in memory management and performance.

# In NumPy, you can determine the data type of elements in an array using the .dtype attribute of the NumPy array. Here’s how you can do it
# along with a discussion on the importance of data types in memory management and performance.

# Importance of Data Types
# Memory Management:

# Different data types consume different amounts of memory.

# For example, an int64 type uses 8 bytes, while an int32 uses only 4 bytes. Choosing the appropriate data type can significantly reduce memory usage, especially when dealing with large datasets.
# If you have a large array of integers that can fit into a smaller type (like int8 or int16), using those types instead of larger types can help optimize memory usage.
# Performance:

# Operations on arrays are faster when the data types are suited for the operations being performed. For example, using floating-point numbers for
# calculations involving decimals will yield faster results than converting integers to floats during calculations.
# Using the appropriate data type can also help improve cache efficiency and reduce computational overhead, leading to better overall performance.

# Data Integrity:

# Choosing the correct data type ensures that the data maintains its integrity. For example, using an integer type for a value that
# should never be a decimal helps prevent unexpected behavior or errors during calculations.

# Compatibility with Libraries:

# Many scientific computing libraries and functions expect specific data types. Using the correct data type ensures compatibility and
# helps avoid type-related errors.


In [8]:
# 5. Define ndarrays in NumPy and explain their key features. How do they differ from standard Python lists?

# Definition of ndarrays in NumPy
# In NumPy, ndarrays (n-dimensional arrays) are the core data structure that allows for efficient storage and manipulation of multi-dimensional data. They are designed to provide a fast and flexible way to work with large datasets, supporting a variety of numerical operations.

# Key Features of ndarrays
# Homogeneous Data: All elements in an ndarray must be of the same data type (e.g., all integers, all floats). This uniformity allows for optimized memory usage and performance.

# Multi-dimensional: Ndarrays can be one-dimensional (1D), two-dimensional (2D), or even multi-dimensional (N-D). This flexibility allows for complex data representations like matrices and tensors.

# Contiguous Memory Allocation: Ndarrays store data in contiguous blocks of memory, which enhances performance during numerical operations and minimizes overhead.

# Broadcasting: This feature allows for arithmetic operations between arrays of different shapes without explicit replication of data. It makes it easier to perform operations on multi-dimensional data.

# Vectorized Operations: Ndarrays support vectorized operations, which means operations can be applied to entire arrays at once, resulting in concise and efficient code that avoids explicit loops.

# Advanced Indexing and Slicing: Ndarrays support advanced indexing capabilities, allowing for sophisticated data manipulation through boolean indexing, slicing, and integer array indexing.

# Rich Library Support: Ndarrays serve as the foundational data structure for many scientific computing libraries, making them integral to data analysis and machine learning workflows.

# Differences from Standard Python Lists
# Data Type:

# ndarrays: Must contain elements of the same data type.
# Python Lists: Can contain elements of different types (e.g., integers, floats, strings).
# Performance:

# ndarrays: Faster for numerical operations due to contiguous memory layout and optimized C implementations.
# Python Lists: Slower for numerical computations because operations typically involve iterating through elements.
# Memory Efficiency:

# ndarrays: More memory-efficient since they store data in a uniform type without the overhead of type information for each element.
# Python Lists: Use more memory due to storing references to each element and allowing for mixed types.
# Functionality:

# ndarrays: Support a wide range of mathematical functions and operations directly, enabling easier complex calculations.
# Python Lists: Do not have built-in support for mathematical operations; additional libraries or loops are required for numerical operations.
# Multi-dimensional Capability:

# ndarrays: Naturally support multi-dimensional structures, making them ideal for matrix and tensor operations.
# Python Lists: Can be nested to create multi-dimensional structures, but this is less efficient and more cumbersome.


In [9]:
# 6. Analyze the performance benefits of NumPy arrays over Python lists for large-scale numerical operations.

# NumPy arrays provide several performance benefits over Python lists, particularly for large-scale numerical operations. Here are the key advantages:

# 1. Faster Computation
# Vectorization: NumPy arrays support vectorized operations, allowing calculations to be applied to entire arrays at once. This eliminates the need for explicit loops, significantly speeding up computations.
# Optimized C Implementation: NumPy is built on optimized C and Fortran libraries, which allow for faster execution of array operations compared to the interpreted nature of Python.

# 2. Memory Efficiency
# Contiguous Memory Allocation: NumPy arrays store data in contiguous blocks of memory, which reduces overhead and improves cache performance. This is particularly beneficial for large datasets, as accessing data from contiguous memory is faster.
# Reduced Memory Footprint: Ndarrays require less memory than Python lists because they store data in a homogeneous type without the additional overhead of storing type information for each element.

# 3. Efficient Operations
# Broadcasting: NumPy allows for broadcasting, which means that operations can be performed on arrays of different shapes without creating copies of the data. This reduces memory usage and computation time.
# Built-in Mathematical Functions: NumPy provides a rich set of built-in functions optimized for array operations. These functions are implemented in low-level languages, ensuring faster execution than equivalent Python list operations.

# 4. Better Performance for Large Data Sets
# Scalability: As data size increases, the performance gap between NumPy arrays and Python lists widens. NumPy can handle large-scale numerical operations more efficiently due to its optimized architecture.
# Parallel Processing: NumPy operations can take advantage of low-level optimizations, such as SIMD (Single Instruction, Multiple Data) and parallel processing, further enhancing performance for large datasets.

# 5. Advanced Indexing and Slicing
# Efficient Access: NumPy’s advanced indexing and slicing capabilities allow for quick and efficient access to data subsets. This reduces the time complexity of operations compared to manually iterating over Python lists.
# Manipulation of Multi-dimensional Data: NumPy excels at manipulating multi-dimensional data, making it ideal for applications like image processing, where operations on large matrices are common.

In [10]:
# 7. Compare vstack() and hstack() functions in NumPy. Provide examples demonstrating their usage and output.

# In NumPy, vstack() and hstack() are functions used to stack arrays vertically and horizontally, respectively. Here’s a detailed comparison of the two functions along with examples demonstrating their usage.

# numpy.vstack()
# Purpose: Stacks arrays in sequence vertically (row-wise).
# Input: Requires arrays to have the same shape along all but the first axis (number of columns must match).

# Example:

arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])

stack_ver = np.vstack((arr1, arr2))
stack_ver

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [11]:
# numpy.hstack()
# Purpose: Stacks arrays in sequence horizontally (column-wise).
# Input: Requires arrays to have the same shape along all but the second axis (number of rows must match).
# Output: Returns a new array with the stacked columns.

# Example

arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])

stack_hor = np.hstack((arr1, arr2))
stack_hor

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

In [12]:
# 8. Explain the differences between fliplr() and flipud() methods in NumPy, including their effects on various
# array dimensions.

# Differences between fliplr() and flipud() in NumPy
# Functionality:

# fliplr(): Flips an array left to right (horizontally).
# flipud(): Flips an array up to down (vertically).
# Input Requirements:

# Both functions can be applied to 1D and 2D arrays, but their effects differ based on the array's dimensions.
# 1D Arrays:

# fliplr(): Equivalent to reversing the array.
# flipud(): Has no effect, as there’s only one dimension.
# 2D Arrays:

# fliplr(): Reverses the order of columns within each row.
# flipud(): Reverses the order of rows in the array.
# Higher-Dimensional Arrays:

# Both functions can be applied, but they primarily affect the first two dimensions (rows and columns) while leaving higher dimensions unchanged.


In [13]:
# 9. Discuss the functionality of the array_split() method in NumPy. How does it handle uneven splits?

# The array_split() method in NumPy is used to split an array into multiple sub-arrays. It provides a flexible way to partition arrays into a specified number of sections, accommodating cases where the splits may not be even.

# Functionality of array_split()
# Basic Usage:

# The method splits an array into a specified number of sub-arrays.
# Syntax: numpy.array_split(ary, indices_or_sections, axis=0), where:
# ary: The input array to be split.
# indices_or_sections: Number of splits or the indices at which to split.
# axis: The axis along which to split (default is 0).
# Returns:

# It returns a list of sub-arrays after splitting.
# Handling Uneven Splits
# When the array cannot be evenly divided by the specified number of sections, array_split() distributes the elements as evenly as possible:
# Some sub-arrays may contain one more element than others.
# The extra elements are distributed among the first few sub-arrays.

In [14]:
# 10. Explain the concepts of vectorization and broadcasting in NumPy. How do they contribute to efficient array operations?

# Vectorization
# Definition: Vectorization refers to the ability to perform operations on entire arrays (or large blocks of data) at once rather than element by element.
# This is made possible by utilizing optimized low-level libraries that can operate on array data without explicit loops in Python.

# Benefits:

# Performance: Vectorized operations are implemented in highly optimized C and Fortran code, making them much faster than equivalent Python loops.
# Code Clarity: Using vectorized operations leads to cleaner and more readable code, reducing the likelihood of errors.


# Broadcasting
# Definition: Broadcasting is a powerful mechanism that allows NumPy to perform arithmetic operations on arrays of different shapes without explicitly
# resizing them. When two arrays of different shapes are used in an operation
# NumPy "stretches" the smaller array across the larger one so they can be compared element-wise.

# How It Works:

# If the arrays have different numbers of dimensions, NumPy pads the smaller array's shape with ones on the left until both shapes are the same.
# The sizes of the dimensions are compared element-wise; if they are equal, or one of them is 1, NumPy can broadcast the smaller array across the
# larger one.

# Benefits:

# Memory Efficiency: Broadcasting avoids the need to create large copies of data, saving memory.
# Flexibility: It allows for operations on arrays of different shapes, making it easier to write generalized code.


In [15]:
# Practical Questions:

# 1. Create a 3x3 NumPy array with random integers between 1 and 100. Then, interchange its rows and columns.

arr = np.random.randint(1, 100, size = (3, 3))
arr

array([[97,  8, 57],
       [69, 43, 33],
       [59, 31, 31]])

In [16]:
rev_arr = arr.T
rev_arr

array([[97, 69, 59],
       [ 8, 43, 31],
       [57, 33, 31]])

In [23]:
#2. Generate a 1D NumPy array with 10 elements. Reshape it into a 2x5 array, then into a 5x2 array.

arr1 = np.arange(10)
arr1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [24]:
arr1.reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [25]:
arr1.reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [26]:
# 3. Create a 4x4 NumPy array with random float values. Add a border of zeros around it, resulting in a 6x6 array.

arry = np.random.rand(4, 4)
arry

array([[0.08812789, 0.52099936, 0.1535074 , 0.63223429],
       [0.29923872, 0.42029817, 0.74220701, 0.51052726],
       [0.07511395, 0.01771773, 0.53861988, 0.86125191],
       [0.43864067, 0.50827101, 0.03754549, 0.37270501]])

In [29]:
arry0 = np.zeros((6, 6))
arry0

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [35]:
arry0[1:5, 1:5] = arry
arry0

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ],
       [0.        , 0.08812789, 0.52099936, 0.1535074 , 0.63223429,
        0.        ],
       [0.        , 0.29923872, 0.42029817, 0.74220701, 0.51052726,
        0.        ],
       [0.        , 0.07511395, 0.01771773, 0.53861988, 0.86125191,
        0.        ],
       [0.        , 0.43864067, 0.50827101, 0.03754549, 0.37270501,
        0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ]])

In [36]:
#4. Using NumPy, create an array of integers from 10 to 60 with a step of 5.

int_arry = np.arange(10, 61, 5)
int_arry

array([10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60])

In [37]:
#5. Create a NumPy array of strings ['python', 'numpy', 'pandas']. Apply different case transformations
#(uppercase, lowercase, title case, etc.) to each element.

string_array = np.array(["python" , "numpy", "pandas"])
upppercase = np.char.upper(string_array)
upppercase

array(['PYTHON', 'NUMPY', 'PANDAS'], dtype='<U6')

In [38]:
lowercase = np.char.lower(string_array)
lowercase

array(['python', 'numpy', 'pandas'], dtype='<U6')

In [39]:
titlecase = np.char.title(string_array)
titlecase

array(['Python', 'Numpy', 'Pandas'], dtype='<U6')

In [40]:
#6. Generate a NumPy array of words. Insert a space between each character of every word in the array.

string_array = np.array(["python" , "numpy", "pandas"])
space_array = np.char.join(" ", string_array)
space_array

array(['p y t h o n', 'n u m p y', 'p a n d a s'], dtype='<U11')

In [42]:
# 7. Create two 2D NumPy arrays and perform element-wise addition, subtraction, multiplication, and division.

arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])

addition = arr1 + arr2
addition

array([[ 8, 10, 12],
       [14, 16, 18]])

In [43]:
substraction = arr1 - arr2
substraction

array([[-6, -6, -6],
       [-6, -6, -6]])

In [44]:
multiplication = arr1 * arr2
multiplication

array([[ 7, 16, 27],
       [40, 55, 72]])

In [46]:
division = arr1 / arr2
division

array([[0.14285714, 0.25      , 0.33333333],
       [0.4       , 0.45454545, 0.5       ]])

In [47]:
#8. Use NumPy to create a 5x5 identity matrix, then extract its diagonal elements.

iden_mat = np.eye(5)
iden_mat

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [48]:
diag_elem = np.diagonal(iden_mat)
diag_elem

array([1., 1., 1., 1., 1.])

In [49]:
#9. Generate a NumPy array of 100 random integers between 0 and 1000. Find and display all prime numbers in
#this array.

random_integers = np.random.randint(0, 1001, size=100)
random_integers

array([421, 348, 252, 316, 860,  75, 286, 344, 430, 995, 536, 535, 588,
        84, 444, 483, 981, 150, 232, 222,  11, 612, 988, 841, 250, 807,
       807, 319, 979, 290, 568, 643, 392, 726, 178, 546, 796, 426, 161,
       167, 264, 396,  70, 562, 107, 430, 160, 406, 269, 413, 868, 405,
       404, 249, 741,  19, 235, 112, 993, 561, 718, 825, 706, 881, 244,
       705, 951, 902, 780, 820, 913, 995, 508, 690, 562, 959, 855, 684,
       941, 300, 724, 293, 437, 546, 456, 696, 730, 484, 966, 705, 485,
       270, 657, 293, 325, 462, 961, 918, 866, 377])

In [50]:
#  Function to find prime numbers
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

In [51]:
# to display prime numbers in random_integers

prime = [num for num in random_integers if is_prime(num)]
prime

[421, 11, 643, 167, 107, 269, 19, 881, 941, 293, 293]

In [53]:
# 10. Create a NumPy array representing daily temperatures for a month. Calculate and display the weekly
# averages.

daily_temp = np.random.randint(10, 41, size=30)
daily_temp

array([12, 35, 10, 13, 19, 16, 39, 36, 18, 30, 23, 30, 31, 18, 38, 16, 28,
       35, 38, 17, 22, 22, 15, 36, 26, 22, 33, 38, 22, 38])

In [55]:
weekly_temp = daily_temp[ : 28].reshape(4, 7)
weekly_temp

array([[12, 35, 10, 13, 19, 16, 39],
       [36, 18, 30, 23, 30, 31, 18],
       [38, 16, 28, 35, 38, 17, 22],
       [22, 15, 36, 26, 22, 33, 38]])

In [57]:
rem_days = daily_temp[28 : ]
rem_days

array([22, 38])

In [61]:
weekly_averages = np.mean(weekly_temp, axis=1)
if rem_days.size > 0:
    remaining_average = np.mean(rem_days)
    weekly_averages = np.append(weekly_averages, remaining_average)
weekly_averages

array([20.57142857, 26.57142857, 27.71428571, 27.42857143, 30.        ])