#                                  Numpy Assignment

## Theoretical Questions

### 1. Explain the purpose and advantages of NumPy in scientific computing and data analysis. How does it enhance Python's capabilities for numerical operations?

Purpose of NumPy:

1) Array Handling: It introduces the 'ndarray', which is a central feature for numerica; operation in python.

2) Mathematical Operations: It provides a wide range of mathematical functions and operations that can be performed on these arrays, such as element-wise operations, linear algebra functions, random number generation, and more.

3) Interoperability: NumPy arrays are used as the foundational data structure in many other data science libraries, such as pandas, SciPy, and scikit-learn, which makes it an essential tool for data analysis and machine learning.
    

Advantages of NumPy:
    
1) Performance: NumPy arrays are more memory-efficient and faster than Python lists due to their compact storage and optimized implementation in C.

2) Vectorization: NumPy allows for vectorized operations, meaning operations that can be applied to entire arrays without the need for explicit loops. Example: Adding two arrays element-wise

3) Broadcasting: means you can perform operations on arrays even when they are not of the same size, and NumPy will automatically adjust the smaller array to match the larger one.
Example: Adding a scalar to an array, or adding a smaller array to a larger one.

4) Large-scale Data Handling: It allows for efficient storage and manipulation of data in memory, making it a key tool in big data analysis.

How NumPy Enhances Python's Capabilities:

1) Efficient Memory Usage:
2) Faster Computations
3) Advanced Mathematical Functions(like Fourier transforms, random number generation, and more) 
4) Data Manipulation and Transformation(like reshaping, slicing, and indexing on arrays)

### 2. Compare and contrast np.mean() and np.average() functions in NumPy. When would you use one over the other?

np.mean():

1) Computes the arithmetic mean of array elements.
2) Does not support weights; calculates simple mean.
3) Syntax: np.mean(array, axis=None)
4) Default Behavior: Averages all elements of the array if axis=None.
5) Use when you need the standard arithmetic mean.
6) Handling NaN: Requires np.nanmean() to ignore NaNs.
7) Example with Weights : Not applicable.

np.average()

1) Computes the weighted average of array elements.
2) Supports weights, allowing for weighted mean calculations.
3) Syntax : np.average(array, axis=None, weights=None)
4) Default Behavior : Averages all elements, or computes weighted average if weights are provided.
5) Use when different elements should contribute differently to the average.
6) Handling NaN : Requires manual handling of NaNs, no direct equivalent.
7) Example with Weights:  np.average([1, 2, 3], weights=[0.1, 0.3, 0.6]) results in 2.5.

When to Use:

np.mean(): When you need to calculate the simple arithmetic mean of an array without considering any weights.

np.average(): When you need to calculate the average with specific weights for each element, making it useful in cases where some values should have more influence on the result.

### 3. Describe the methods for reversing a NumPy array along different axes. Provide examples for 1D and 2D arrays

In [2]:
import numpy as np

#### 1D Array 

In [3]:
arr = np.array([1,2,3,4])

In [6]:
# using slicing
arr[::-1]

array([4, 3, 2, 1])

In [7]:
# using np.flip()
np.flip(arr)

array([4, 3, 2, 1])

#### 2D Array

In [9]:
a2d = np.array([[9,6,3],[8,5,2],[7,4,1]])

##### Reversing Along Rows (Axis 1):

In [10]:
# using slicing: 

a2d[:, ::-1]

array([[3, 6, 9],
       [2, 5, 8],
       [1, 4, 7]])

In [12]:
# using np.array()

np.flip(a2d,axis = 1)

array([[3, 6, 9],
       [2, 5, 8],
       [1, 4, 7]])

##### Reversing Along Columns (Axis 0):

In [14]:
# using slicing
a2d[::-1, :]

array([[7, 4, 1],
       [8, 5, 2],
       [9, 6, 3]])

In [17]:
# using np.flip()

np.flip(a2d, axis = 0)

array([[7, 4, 1],
       [8, 5, 2],
       [9, 6, 3]])

##### Reversing Along Both Axes:

In [18]:
# using np.flip()
np.flip(a2d)

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [21]:
# using slicing
a2d[::-1,::-1]

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

### 4. How can you determine the data type of elements in a NumPy array? Discuss the importance of data types in memory management and performance

In [25]:
# Determining data type
a2d.dtype

dtype('int64')

Importance of Data Types in Memory Management and Performance

1) Memory Efficiency:

- Fixed Size: NumPy arrays have a fixed data type for all elements, which ensures that each element takes up a consistent amount of memory. 

- Optimized Storage: Choosing the right data type (dtype) allows you to optimize memory usage. For instance, using int8 (8-bit integer) instead of int64 (64-bit integer) can significantly reduce memory consumption when working with large datasets.

2) Data Integrity and Precision:

- Ensuring Precision: Using the correct data type ensures that calculations are performed with the appropriate level of precision. For example, using float32 versus float64 can affect the precision of floating-point operations.

- Avoiding Overflows: Selecting a data type with sufficient range (e.g., int32 vs. int8) prevents overflows in calculations, which could otherwise lead to incorrect results.

3) Performance: 

- Fast Computation

- Avoiding Type Checking Overhead

### 5. Define ndarrays in NumPy and explain their key features. How do they differ from standard Python lists?


ndarray: is the primary data structure for storing and manipulating homogeneous data in multiple dimensions. It's essentially a grid of values, all of the same data type.

Key Features:

- Multidemionsional
- Homogeneous Data
- Efficient Memory Usage
- Element- wise Operations
- Vectorized Operations
- Braodcasting
- Built-in Functions and Methods
- Memory Views and Slicing
- Interoperability

Python List:
    
1) Heterogeneous data type
2) Less memory-efficient; elements are stored as objects, leading to overhead
3) Slower for numerical operations due to interpreted nature and lack of vectorization
4) Primarily one-dimensional; nested lists are required for higher dimensions
5) Requires explicit loops or list comprehensions for element-wise operations
6) No built-in support for broadcasting
7) Limited built-in functions; relies on external libraries
8) Basic slicing; no memory views, so slicing creates new lists
9) Dynamic size; elements can be added or removed

NumPy ndarray:
    
1) Homogeneous data type
2) More memory-efficient due to contiguous memory allocation
3) Faster for numerical operations due to optimized C-based implementation and vectorization
4) Can be multi-dimensional (1D, 2D, 3D, etc.)
5) Supports direct element-wise operations without explicit loops
6) Supports broadcasting for operations between arrays of different shapes
7) Extensive set of built-in mathematical and statistical functions
8) Advanced, with support for multi-dimensional slicing and views
9) Fixed size; once created, cannot be dynamically resized

### 6. Analyze the performance benefits of NumPy arrays over Python lists for large-scale numerical operations.


NumPy arrays offer significant performance benefits over Python lists, particularly for large-scale numerical operations. These benefits include:

- Efficient memory usage due to contiguous storage and homogeneous data types.

- Faster computations through vectorization and compiled C code
.
- Advanced features like broadcasting and efficient slicing, which simplify operations and enhance performance.

In contrast, Python lists are less efficient for numerical tasks due to their heterogeneous nature, slower execution speed, and lack of support for advanced numerical operations.

### 7. Compare vstack() and hstack() functions in NumPy. Provide examples demonstrating their usage and output.

np.vstack():
    
1) Stacks arrays vertically (along rows)
2) axis = 0
3) The number of columns (second dimension) must be the same
4) Useful for combining arrays into a larger array along the row axis

np.hstack():

1) Stacks arrays horizontally (along columns)
2) axis = 1
3) The number of rows (first dimension) must be the same
4) Useful for combining arrays into a larger array along the column axis

In [1]:
import numpy as np

In [2]:
arr1 = np.array([[1, 2, 3],
                 [4, 5, 6]])

arr2 = np.array([[7, 8, 9],
                 [10, 11, 12]])
res = np.vstack((arr1, arr2)) # stack arrays vertically
print(res)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [6]:
arr1 = np.array([[1, 2, 3],
                 [4, 5, 6]])

arr2 = np.array([[7, 8, 9],
                 [10, 11, 12]])
res = np.hstack((arr1, arr2)) # stack arrays horizontally
print(res)

[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]


### 8. Explain the differences between fliplr() and flipud() methods in NumPy, including their effects on various array dimensions.

fliplr():

- Reverses the order of elements along the horizontal axis (left-right).
- axis = 1
- Useful for flipping the columns of a 2D array

flipud():

- Reverses the order of elements along the vertical axis (up-down).	
- axis=0	
- Useful for flipping the rows of a 2D array.

In [7]:
# fliplr()

arr = np.array([[7, 8, 9],
                 [10, 11, 12]])
res = np.fliplr(arr)
print(res)

[[ 9  8  7]
 [12 11 10]]


In [8]:
# flipud()

arr = np.array([[7, 8, 9],
                 [10, 11, 12]])
res = np.flipud(arr)
print(res)

[[10 11 12]
 [ 7  8  9]]


### 9. Discuss the functionality of the array_split() method in NumPy. How does it handle uneven splits?


Functionality of array_split()

Purpose: To divide an array into multiple sub-arrays along a specified axis. Unlike split(), which requires that the array be evenly divisible, array_split() can handle uneven splits.

Parameters:

- ary: The input array to be split.
- indices_or_sections: Defines how to split the array. This can be:

An integer N, which specifies the number of equal parts to split the array into.

A 1D array of sorted indices, which specifies where to split the array.
- axis: The axis along which to split the array (default is 0)

In [12]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
result = np.array_split(arr, 3) # Split into 3 parts
for sub_array in result:
    print(sub_array)

[1 2 3]
[4 5 6]
[7 8 9]


In [11]:
arr2D = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9],
                  [10, 11, 12]])

result2D = np.array_split(arr2D, 2, axis=0) # Split into 2 section along axis 0
for sub_array in result2D:
    print(sub_array)

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]


In [13]:
# Handling Uneven Splits

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
result = np.array_split(arr, 4) # Split into 4 parts
for sub_array in result:
    print(sub_array)

[1 2 3]
[4 5]
[6 7]
[8 9]


### 10. Explain the concepts of vectorization and broadcasting in NumPy. How do they contribute to efficient array operations?

Vectorization refers to the process of performing operations on entire arrays (or large chunks of data) at once, rather than using loops to process individual elements.

In [3]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Vectorized addition
c = a + b  
c

array([5, 7, 9])

Broadcasting is a mechanism that allows NumPy to perform operations on arrays of different shapes. When performing operations on arrays that don't have the same shape, NumPy automatically "broadcasts" the smaller array across the larger one so that they have compatible shapes.

In [5]:
import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([10, 20, 30])

c = a + b
c

array([[11, 22, 33],
       [14, 25, 36]])

Efficiency of Vectorization and Broadcasting: 

- Memory Efficiency: Vectorization minimizes the overhead of Python loops, while broadcasting avoids creating unnecessarily large temporary arrays.

- Speed: Both techniques allow for better utilization of CPU cache and vectorized instruction sets, leading to faster execution.

- Less Code: These techniques often result in shorter and more intuitive code, making it easier to write and maintain.

## Practical Questions

## 1. Create a 3x3 NumPy array with random integers between 1 and 100. Then, interchange its rows and columns.


In [14]:
a = np.random.randint(1,101,size = (3,3))
trans_a = a.T

In [15]:
a

array([[77, 60, 88],
       [36,  5,  4],
       [47, 37, 64]])

In [17]:
trans_a # interchanged rows and columns

array([[77, 36, 47],
       [60,  5, 37],
       [88,  4, 64]])

## 2. Generate a 1D NumPy array with 10 elements. Reshape it into a 2x5 array, then into a 5x2 array.


In [20]:
a = np.arange(10)
a25 = a.reshape(2,5)
a52 = a25.reshape(5,2)

In [21]:
a25

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [22]:
a52

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

### 3. Create a 4x4 NumPy array with random float values. Add a border of zeros around it, resulting in a 6x6 array.


In [25]:
a = np.random.rand(4,4)
a6 = np.pad(a,pad_width = 1 , mode = 'constant' , constant_values = 0)
a6

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ],
       [0.        , 0.68367841, 0.09355227, 0.26710421, 0.92840138,
        0.        ],
       [0.        , 0.24830143, 0.76209981, 0.38764299, 0.04969862,
        0.        ],
       [0.        , 0.36780692, 0.3863775 , 0.80094147, 0.40129781,
        0.        ],
       [0.        , 0.16402754, 0.58759995, 0.18934137, 0.4081832 ,
        0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ]])

### 4. Using NumPy, create an array of integers from 10 to 60 with a step of 5.


In [27]:
a = np.arange(10,61,5)
a

array([10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60])

### 5. Create a NumPy array of strings ['python', 'numpy', 'pandas']. Apply different case transformations (uppercase, lowercase, title case, etc.) to each element

In [36]:
a = np.array(['python', 'numpy', 'pandas'])
ua = np.char.upper(a)
la = np.char.lower(a)
ta = np.char.title(a)
ca = np.char.capitalize(a)

ua,la,ta,ca

(array(['PYTHON', 'NUMPY', 'PANDAS'], dtype='<U6'),
 array(['python', 'numpy', 'pandas'], dtype='<U6'),
 array(['Python', 'Numpy', 'Pandas'], dtype='<U6'),
 array(['Python', 'Numpy', 'Pandas'], dtype='<U6'))

### 6. Generate a NumPy array of words. Insert a space between each character of every word in the array.


In [40]:
w = np.array(['python','numpy','pandas'])
sw = np.char.join(' ',w)
sw

array(['p y t h o n', 'n u m p y', 'p a n d a s'], dtype='<U11')

### 7. Create two 2D NumPy arrays and perform element-wise addition, subtraction, multiplication, and division.


In [49]:
a1 = np.array([[5,4],[9,8]])
a2 = np.array([[3,2],[6,1]])

add = np.add(a1,a2)
sub = np.subtract(a1,a2)
mul = np.multiply(a1,a2)
div = np.divide(a1,a2)

add,sub,mul,div

(array([[ 8,  6],
        [15,  9]]),
 array([[2, 2],
        [3, 7]]),
 array([[15,  8],
        [54,  8]]),
 array([[1.66666667, 2.        ],
        [1.5       , 8.        ]]))

### 8. Use NumPy to create a 5x5 identity matrix, then extract its diagonal elements.


In [53]:
import numpy as np

identity_matrix = np.eye(5)
diagonal_elements = np.diag(identity_matrix)

identity_matrix, diagonal_elements

(array([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]]),
 array([1., 1., 1., 1., 1.]))

### 9. Generate a NumPy array of 100 random integers between 0 and 1000. Find and display all prime numbers in this array.

In [55]:
def is_prime(n):
    if n <= 1:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    for i in range(3,int(np.sqrt(n))+1 , 2):
        if n % i == 0:
            return False
    return True

ra = np.random.randint(0,1001, size = 100)
pn = np.array([num for num in ra if is_prime(num)])

pn

array([349, 349, 461, 409, 101, 859, 547, 547, 373, 223, 443, 179])

### 10. Create a NumPy array representing daily temperatures for a month. Calculate and display the weekly averages

In [57]:
np.random.seed(0)   # For reproducibility
daily = np.random.randint(0,54,size = 28)
week = daily.reshape(4,7)
w_avg = np.mean(week,axis = 1)
w_avg

array([27.        , 23.42857143, 23.        , 24.28571429])