##Theoretical Questions


**1. Explain the purpose and advantages of NumPy in scientific computing and data analysis. How does it enhance Python's capabilities for numerical operations?**

**Sol.**Numerical Python is a fundamental library for scientific computing and data analysis in Python.It was built for the following purpose.

1. **Efficient Array Handling:** NumPy provides the ndarray (N-dimensional array) object, which is a powerful data structure for storing and manipulating large datasets efficiently.

2. **Numerical Operations:** It offers a comprehensive suite of mathematical functions for performing a wide range of operations, including arithmetic, linear algebra, and statistical analysis.

3. **Interoperability:** NumPy serves as the foundation for many other scientific libraries (like SciPy, Pandas, and Matplotlib), facilitating seamless integration and collaboration within the Python ecosystem.

**Advantages of Numpy**

1. **Speed:**NumPy arrays are implemented in C, which makes operations on them significantly faster than Python lists, especially for large datasets.

*For example*

Lets say we have two lists lst1 and lst2 each has 1 crore elements in it ,we add each element of these list and add them in third list. The execution time for this operation will be:


In [None]:
import numpy as np
import time

def list_time():
  """Function adds two lists containing 1 crorce elements each.
     Returns : time of operation
  """
  lst1=[i for i in range(10000000)]
  lst2=[i for i in range(10000000,20000000)]

  lst=[]

  start1=time.time()
  for i in range(len(lst1)):
    lst.append(lst1[i]+lst2[i])
  end1=time.time()
  return end1-start1

def array_time():
  """Function adds two arrays containing 1 crore elements each.
     Returns: time of operation of addition
  """
  a=np.arange(10000000)
  b=np.arange(10000000,20000000)

  start2=time.time()
  c=a+b
  end2=time.time()
  return end2-start2

print("Time of operation for list:",list_time())
print("Time of operation for Numpy:", array_time())

Time of operation for list: 1.6052958965301514
Time of operation for Numpy: 0.03467273712158203


2. **Memory Efficiency:** NumPy arrays use less memory than Python lists because they are stored as contiguous blocks of memory.

*For example:*

In this example we can clearly see lists occupy more space then arrays

In [None]:

import sys
import numpy as np

list1=[i for i in range(10000000)]
array1=np.arange(10000000)

#by default its stored in 64 bit nut we can convert it in 32 bit
array2=np.arange(10000000,dtype=np.int32)

print("Size of list:",sys.getsizeof(list1))
print("Size of array1:",sys.getsizeof(array1))
print("Size of array2:",sys.getsizeof(array2))
print("Difference:",sys.getsizeof(list1)-sys.getsizeof(array2))

Size of list: 89095160
Size of array1: 80000112
Size of array2: 40000112
Difference: 49095048


3. **Broad Functionality:**

  NumPy includes a vast array of built-in mathematical and statistical functions, making it easy to perform complex calculations without needing to implement algorithms from scratch.

4. **Broadcasting:**

  This feature allows NumPy to perform operations on arrays of different shapes and sizes, automatically expanding the smaller array to match the dimensions of the larger one, which simplifies coding.

5. **Rich Ecosystem:**

  Many scientific computing and data analysis libraries build on NumPy, creating a rich ecosystem that enhances Python's capabilities in data science, machine learning, and artificial intelligence.

**Q2.Compare and contrast np.mean() and np.average() functions in NumPy. When would you use one over the
other?**

**Sol.**

**np.mean():**

1. When we need a straightforward average without considering weights.
2. When we prioritize performance and speed for large datasets.

**np.average():**

1. When we need to compute a weighted average, where elements contribute differently to the final result.
2. When we are working in scenarios such as grading systems or statistical models where different values have different significance.

In [None]:
import numpy as np

data = np.array([1, 2, 3, 4, 5])

# Using np.mean()
mean_value = np.mean(data)

# Using np.average() with weights
weights = np.array([1, 1, 2, 2, 3])
weighted_avg = np.average(data, weights=weights)

print("Mean:", mean_value)
print("Weighted Average:", weighted_avg)


Mean: 3.0
Weighted Average: 3.5555555555555554


**Q3.Describe the methods for reversing a NumPy array along different axes. Provide examples for 1D and 2D arrays.**

**Sol.**The arrays can be reversed by using *slicing* or *flip()*.



**Syntax for slicing:**
```
        array_name[a:b:c,x:y:z]
```

a=initial row value

b=final row value

c=step value for rows

x=initial column value

y=final column value

z=step value for column


**Syntax for flip():**
```
        np.flip(array_name, axis=0/1)    
```

axis = 0 : flips the array along columns

axis = 1 : flips the array along rows





In [None]:
import numpy as np

a=np.random.randint(1,10,5)                         #1D array
b=np.random.randint(1,10,12).reshape(3,4)           #2D array

print("Original_A:",a)
print("Original_B:")
print(b)
#Using slicing

rev_a=a[::-1]
rev_b=b[::-1,:]
rev_b1=b[:,::-1]

print("Reversed A:",rev_a)
print("Reversed b along rows:")
print(rev_b)
print("Reversed b along columns:")
print(rev_b1)
#Using flip()

flip_a=np.flip(a)
flip_b=np.flip(b, axis=0)       #along rows
flip_b1=np.flip(b, axis=1)      #along columns

print("Flipped_A:",flip_a)
print("Flipped_B:")
print(flip_b)
print("Flipped_B1:")
print(flip_b1)

Original_A: [9 9 9 1 6]
Original_B:
[[6 3 7 5]
 [8 8 2 7]
 [5 4 1 9]]
Reversed A: [6 1 9 9 9]
Reversed b along rows:
[[5 4 1 9]
 [8 8 2 7]
 [6 3 7 5]]
Reversed b along columns:
[[5 7 3 6]
 [7 2 8 8]
 [9 1 4 5]]
Flipped_A: [6 1 9 9 9]
Flipped_B:
[[5 4 1 9]
 [8 8 2 7]
 [6 3 7 5]]
Flipped_B1:
[[5 7 3 6]
 [7 2 8 8]
 [9 1 4 5]]


 **Q4.How can you determine the data type of elements in a NumPy array? Discuss the importance of data types in memory management and performance.**

 **Sol.** The data type of elements of arrays can be determined by attribute **dtype**.  

In [None]:
#use of dtype attribute

import numpy as np
a=np.arange(5)
print(a.dtype)

int64


 The choice of data types in NumPy is vital for efficient memory management and optimal performance. Properly selecting data types not only minimizes memory usage but also enhances the speed of computations through better alignment with the underlying hardware and NumPy’s optimized functions.

 **1. Memory Management**
1. **Efficient Storage:** Different data types have varying sizes in memory. For example, an int32 takes up 4 bytes, while a float64 takes up 8 bytes. Choosing the appropriate data type helps minimize memory usage, especially when working with large datasets.

2. **Contiguous Memory Allocation:** NumPy arrays are stored in contiguous blocks of memory. The choice of data type affects how the array is allocated in memory, which can improve cache performance and reduce fragmentation.

3. **Type Consistency:** NumPy arrays are homogeneous, meaning all elements must be of the same type. This consistency allows for more efficient memory management compared to Python lists, which can store mixed types but incur overhead due to type checks and storage management.

**2. Performance**
1. **Speed of Operations:** Operations on NumPy arrays are generally faster than on Python lists because NumPy is implemented in C. Choosing an appropriate data type can further enhance performance since operations can be optimized for specific types (e.g., integer vs. floating-point operations).

2. **Vectorization:** NumPy allows for vectorized operations, which means computations are performed on entire arrays rather than element by element. The performance benefits of vectorization are amplified when the data type is suitable for the operation being performed. For example, using float32 can lead to faster computations than using float64 in certain contexts due to reduced data size.

3. **Broadcasting:** NumPy’s broadcasting mechanism allows for operations on arrays of different shapes. This feature is most efficient when the data types are compatible, avoiding unnecessary type conversions and maintaining performance.

**Q5. Define ndarrays in NumPy and explain their key features. How do they differ from standard Python lists?**

**Sol.**An ndarray is a (usually fixed-size) multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its shape , which is a tuple of N non-negative integers that specify the sizes of each dimension.


**Key Features of ndarrays**

1. **Homogeneous Data Type:** All elements in an ndarray are of the same data type, which allows for optimized performance and memory usage. This is unlike Python lists, which can hold elements of different types.

2. **Multidimensional:** ndarrays can be one-dimensional, two-dimensional (like matrices), or n-dimensional. This flexibility makes them suitable for a variety of scientific computing tasks.

3. **Efficient Storage:** NumPy uses contiguous blocks of memory for ndarrays, leading to more efficient data access and manipulation compared to the linked structure of Python lists.

4. **Vectorized Operations:** NumPy supports element-wise operations, allowing you to perform mathematical operations on entire arrays without the need for explicit loops. This leads to cleaner code and faster execution.

5. **Broadcasting:** This feature allows NumPy to perform operations on arrays of different shapes by automatically expanding the smaller array's dimensions to match the larger array.

6. **Comprehensive Functionality:** NumPy provides a wide range of mathematical functions and tools for linear algebra, Fourier transforms, and random number generation, making it a comprehensive library for numerical computing.

7. **Slicing and Indexing:** NumPy supports advanced slicing and indexing techniques, allowing for more powerful and flexible data manipulation than standard Python lists.

**Differences from Standard Python Lists**
1. **Data Type Restriction:** Python lists can hold mixed types (e.g., integers, strings, objects), while ndarrays require all elements to be of the same type.

2. **Performance:** Operations on ndarrays are generally faster due to optimized C implementations, while Python lists can be slower because they involve more overhead.

3. **Memory Efficiency:** ndarrays use less memory for large data sets compared to lists, which can be more memory-intensive due to their dynamic nature.

4. **Functionality:** NumPy provides many built-in functions for mathematical operations, linear algebra, and statistical analysis, which are not available for standard lists without using additional libraries.

5. **Dimensionality:** While lists can be nested to create multidimensional structures, these are not as efficient or easy to manipulate as true multidimensional ndarrays.

**Q6.Analyze the performance benefits of NumPy arrays over Python lists for large-scale numerical operations.**

**Sol.** Lets say we have two lists lst1 and lst2 each has 1 crore elements in it ,we add each element of these list and add them in third list. Same goes for the arrays, the execution time for these operation will be:

In [None]:
import numpy as np
import time

def list_time():
  """Function adds two lists containing 1 crorce elements each.
     Returns : time of operation
  """
  lst1=[i for i in range(10000000)]
  lst2=[i for i in range(10000000,20000000)]

  lst=[]

  start1=time.time()
  for i in range(len(lst1)):
    lst.append(lst1[i]+lst2[i])
  end1=time.time()
  return end1-start1

def array_time():
  """Function adds two arrays containing 1 crore elements each.
     Returns: time of operation of addition
  """
  a=np.arange(10000000)
  b=np.arange(10000000,20000000)

  start2=time.time()
  c=a+b
  end2=time.time()
  return end2-start2

print("Time of operation for list:",list_time())
print("Time of operation for Numpy:", array_time())

Time of operation for list: 1.6491460800170898
Time of operation for Numpy: 0.03323554992675781


**Q7.Compare vstack() and hstack() functions in NumPy. Provide examples demonstrating their usage and output.**

**Sol** Stacking is used to **concatinate** or **stack** two or more arrays together. Keep in mind the shape of the matrix before stacking.For example arr1 has order 2X4 and arr2 has 2X8 then we use hstack. if arr1 has 2X4 and arr2 has 6X4 then we use vstack.

Stacking is of two types:
1. **Horizantal Stack** : Stacks array in horizontal sequence (column wise)
2. **Vertical Stack** : Stacks array in vertical sequence (row wise)

In [None]:
import numpy as np

a=np.arange(9).reshape(3,3)
b=np.arange(20,38).reshape(3,6)
c=np.arange(24).reshape(4,6)
x=np.hstack((a,b))
y=np.vstack((b,c))

print("Matrix A:")
print(a)
print("---------------------------------------")
print("Matrix B:")
print(b)
print("---------------------------------------")
print("Matrix C:")
print(c)
print("---------------------------------------")
print("Hstack of A and B:")
print(x)
print("---------------------------------------")
print("Vstack of B and C:")
print(y)

Matrix A:
[[0 1 2]
 [3 4 5]
 [6 7 8]]
---------------------------------------
Matrix B:
[[20 21 22 23 24 25]
 [26 27 28 29 30 31]
 [32 33 34 35 36 37]]
---------------------------------------
Matrix C:
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
---------------------------------------
Hstack of A and B:
[[ 0  1  2 20 21 22 23 24 25]
 [ 3  4  5 26 27 28 29 30 31]
 [ 6  7  8 32 33 34 35 36 37]]
---------------------------------------
Vstack of A and C:
[[20 21 22 23 24 25]
 [26 27 28 29 30 31]
 [32 33 34 35 36 37]
 [ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]


**Q8. Explain the differences between fliplr() and flipud() methods in NumPy, including their effects on various array dimensions.**

**Sol.**

**Orientation of Flip:**

*fliplr()*: Flips the array horizontally, reversing columns.

*flipud()*: Flips the array vertically, reversing rows.

**Effect on Dimensions:**

**2D Arrays:**

*fliplr():* Affects the columns of each row.

*flipud():* Affects the rows themselves.

**1D Arrays:**
Both functions effectively reverse the array, but *fliplr()* is more intuitive as it emphasizes a horizontal flip, while *flipud()* clearly indicates a vertical reversal.

**Higher Dimensions:**

For arrays with more than two dimensions, both functions will only flip the specified axis while leaving other dimensions unchanged.

In [None]:
import numpy as np

a=np.arange(9).reshape(3,3)

print(a)

print("fliplr")
print(np.fliplr(a))

print("flipud")
print(np.flipud(a))

[[0 1 2]
 [3 4 5]
 [6 7 8]]
fliplr
[[2 1 0]
 [5 4 3]
 [8 7 6]]
flipud
[[6 7 8]
 [3 4 5]
 [0 1 2]]


**Q9.Discuss the functionality of the array_split() method in NumPy. How does it handle uneven splits?**

**Sol.** Splitting is used to break a matrix into two or more equal parts.

```
        np.split(array_name , number of sections, axis)
```

**Handling Uneven Splits:**

When the size of the array is not evenly divisible by the number of desired splits, array_split() distributes the remaining elements across the resulting sub-arrays. Specifically:

The first few sub-arrays will contain one more element than the others until all elements have been distributed.
For example, if you try to split an array of size 20 into 3 parts, array_split() will return two sub-arrays of size 8 and one of size 4.



In [None]:
import numpy as np

a = np.arange(20).reshape((5,4))

b = np.array_split(a, 3, axis=0)

print("Original 2D Array:")
print(a)
print("Split into 3 Parts (Rows):")
for i, sub_array in enumerate(b):
    print(f"\n Sub-array {i}:\n {sub_array}")


Original 2D Array:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
Split into 3 Parts (Rows):

 Sub-array 0:
 [[0 1 2 3]
 [4 5 6 7]]

 Sub-array 1:
 [[ 8  9 10 11]
 [12 13 14 15]]

 Sub-array 2:
 [[16 17 18 19]]


**Q10. Explain the concepts of vectorization and broadcasting in NumPy. How do they contribute to efficient array operations?**

**Sol.** **Vectorization** is used to increase the speed of the python program.It is applied on arrays to reduce their computational complexity than using for loops.

**Broadcasting** is used to do arithmetic operations on arrays of different dimensions.It provides a means of vectorizing array operations.

**How broadcasting is efficient:**

Broadcasting provides a means of vectorizing array operations, therefore eliminating the need for Python loops. This is because NumPy is implemented in C Programming, which is a very efficient language.

It does this without making needless copies of data which leads to efficient algorithm implementations. But broadcasting over multiple arrays in NumPy extension can raise cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows down the computation.

The resulting array returned after broadcasting will have the same number of dimensions as the array with the greatest number of dimensions.

## Practical Questions

Q1. Create a 3x3 NumPy array with random integers between 1 and 100. Then, interchange its rows and columns.

In [None]:
import numpy as np

a=np.random.randint(1,100,12).reshape(3,4)

print("Original Matrix: \n",a)

#Rows are interchanged

b=np.flipud(a)
print("\n When rows are interchanged: \n",b)

#Columns are interchanged
c=np.fliplr(a)
print("\n When columns are interchanged: \n",c)

#Rows are interchanged with columns

d=a.T
print("\n Rows are interchanged with columns: \n",d)


Original Matrix: 
 [[44 31 45 22]
 [95 50 51 11]
 [27 66 31 83]]

 When rows are interchanged: 
 [[27 66 31 83]
 [95 50 51 11]
 [44 31 45 22]]

 When columns are interchanged: 
 [[22 45 31 44]
 [11 51 50 95]
 [83 31 66 27]]

 Rows are interchanged with columns: 
 [[44 95 27]
 [31 50 66]
 [45 51 31]
 [22 11 83]]


Q2. Generate a 1D NumPy array with 10 elements. Reshape it into a 2x5 array, then into a 5x2 array.

In [None]:
import numpy as np

a=np.arange(10)

print("1D array: \n ",a)
print("\n 2X5 array: \n",a.reshape((2,5)))
print("\n 5X2 array: \n",a.reshape((5,2)))

1D array: 
  [0 1 2 3 4 5 6 7 8 9]

 2X5 array: 
 [[0 1 2 3 4]
 [5 6 7 8 9]]

 5X2 array: 
 [[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


Q3. Create a 4x4 NumPy array with random float values. Add a border of zeros around it, resulting in a 6x6 array.

In [None]:
import numpy as np

a=np.random.rand(4,4)
print("Original Array:\n",a)

b=np.zeros((1,4))
c=np.zeros((6,1))

a=np.vstack((b,a,b))
a=np.hstack((c,a,c))

print("\n\n New Array: \n",a)


Original Array:
 [[0.23183883 0.10680832 0.45249975 0.69447828]
 [0.19308934 0.96946783 0.38952061 0.99343088]
 [0.45154758 0.463192   0.17758781 0.11028824]
 [0.40496892 0.37908542 0.77251136 0.34342992]]


 New Array: 
 [[0.         0.         0.         0.         0.         0.        ]
 [0.         0.23183883 0.10680832 0.45249975 0.69447828 0.        ]
 [0.         0.19308934 0.96946783 0.38952061 0.99343088 0.        ]
 [0.         0.45154758 0.463192   0.17758781 0.11028824 0.        ]
 [0.         0.40496892 0.37908542 0.77251136 0.34342992 0.        ]
 [0.         0.         0.         0.         0.         0.        ]]


Q4.  Using NumPy, create an array of integers from 10 to 60 with a step of 5.

In [None]:
import numpy as np

print(np.arange(10,61,5))

[10 15 20 25 30 35 40 45 50 55 60]


Q5.  Create a NumPy array of strings ['python', 'numpy', 'pandas']. Apply different case transformations
(uppercase, lowercase, title case, etc.) to each element.

In [None]:
import numpy as np

a=np.array(['python', 'numpy', 'pandas'])

print("Original Array:",a)
print("\n Upper case: ",np.char.upper(a))
print("\n Lower case: ",np.char.lower(a))
print("\n Title case: ",np.char.title(a))



Original Array: ['python' 'numpy' 'pandas']

 Upper case:  ['PYTHON' 'NUMPY' 'PANDAS']

 Lower case:  ['python' 'numpy' 'pandas']

 Title case:  ['Python' 'Numpy' 'Pandas']


Q6. Generate a NumPy array of words. Insert a space between each character of every word in the array.

In [6]:
import numpy as np

a=np.array(["Hello", "this", "is", "Numpy"])

print(np.char.join(" ",a))



['H e l l o' 't h i s' 'i s' 'N u m p y']


Q7. Create two 2D NumPy arrays and perform element-wise addition, subtraction, multiplication, and division.

In [5]:
import numpy as np

a=np.random.randint(50,70,12).reshape((3,4))
b=np.random.randint(1,20,12).reshape((3,4))

print("Original Array A:\n",a)
print("\nOriginal Array B:\n",b)

print("\n \n Sum:\n",a+b)
print("\n \n Diff:\n",a-b)
print("\n \n Product:\n",a*b)
print("\n \n Div:\n",a/b)

Original Array A:
 [[50 66 63 56]
 [55 50 65 62]
 [54 56 67 62]]

Original Array B:
 [[ 7 14 11 12]
 [ 4 19 10 17]
 [16 12 16  8]]

 
 Sum:
 [[57 80 74 68]
 [59 69 75 79]
 [70 68 83 70]]

 
 Diff:
 [[43 52 52 44]
 [51 31 55 45]
 [38 44 51 54]]

 
 Product:
 [[ 350  924  693  672]
 [ 220  950  650 1054]
 [ 864  672 1072  496]]

 
 Div:
 [[ 7.14285714  4.71428571  5.72727273  4.66666667]
 [13.75        2.63157895  6.5         3.64705882]
 [ 3.375       4.66666667  4.1875      7.75      ]]


Q8.  Use NumPy to create a 5x5 identity matrix, then extract its diagonal elements.

In [10]:
import numpy as np
a=np.random.randint(1,30,25).reshape((5,5))
print("Array:\n",a)

b=np.identity(5)

a=a*b
a=a[a>0]
print("\n Diagonal elements :\n",a)




Array:
 [[ 9 20 10 19 14]
 [ 5 27 19 22 27]
 [ 4  9 16 16  6]
 [16 28 10 29 27]
 [ 9 23 11 15  8]]

 Diagonal elements :
 [ 9. 27. 16. 29.  8.]


Q9. Generate a NumPy array of 100 random integers between 0 and 1000. Find and display all prime numbers in this array.

In [19]:
import numpy as np

def prime(array):
  for i in array:
    for j in range(2,i):
      if i%j==0:
       break
    else:
      print(i, end=" ")

a=np.random.randint(0,1000,100)

print(a)

print("\n \n Prime numbers: \n")

prime(a)

[ 43 289 546 801 451 502 141 302 670 767 316 987 483 343 344 405 270 119
 401 427 836 223 574 743 314 720  41  31 305 429 964 401 355  51 495 936
 368 866 410 501 794 345  88 261  22 961 521 662 227 493 577 997 562 874
 435 183 804 754 928 897 196 215 308 251 541 927 822 102 998 565 384  74
 725 372 626 352  17 142 350 964 633 220 913 193 818 951 444  56 650 999
 730 121 373  84 867 784 269 248 628 133]

 
 Prime numbers: 

43 401 223 743 41 31 401 521 227 577 997 251 541 17 193 373 269 

Q10. Create a NumPy array representing daily temperatures for a month. Calculate and display the weekly
averages

In [50]:
import numpy as np

a=np.random.randint(-15,50,30).reshape((1,30))
print("Daily Temp:\n",a)

b=np.zeros((1,5),dtype=int)
a=np.hstack((a,b))

a=a.reshape((5,7))
a=a[[0,1,2,3]]
print("\n Temperature weekly array excluding extra 2 days:\n",a)
avg=np.mean(a,axis=1)
print("\n Weekly aaverage:\n",avg)

Daily Temp:
 [[ 22  -5 -13  49   0   3  22  20 -14  29  29  39  36  -1  20  10   5  46
   47  45  -2  10  11  25  24 -14   1  -7  -5  45]]

 Temperature weekly array excluding extra 2 days:
 [[ 22  -5 -13  49   0   3  22]
 [ 20 -14  29  29  39  36  -1]
 [ 20  10   5  46  47  45  -2]
 [ 10  11  25  24 -14   1  -7]]

 Weekly aaverage:
 [11.14285714 19.71428571 24.42857143  7.14285714]
