# GETTING FAMILIARITY WITH NUMPY


Getting familiar with NumPy involves practicing its core functionalities, which are essential for numerical and data manipulation tasks in Python. Here's a step-by-step approach to help you get comfortable with NumPy:

## Installation of Numpy package

In [1]:
pip install numpy

Note: you may need to restart the kernel to use updated packages.


#### Importing Numpy package

In [2]:
import numpy 

## Basic operations

#### 1. Creating numpy array

creating a numpy array can be done by passing python list or tuple of elemets

In [4]:
data=numpy.array([1,2,3,4,5])

#### 2. Accessing elements

In [5]:
print(data[0])

1


In [6]:
print(data[1])

2


#### 3. Slicing

In [8]:
print(data[1:4])

[2 3 4]


In [9]:
print(data[2:])

[3 4 5]


## Mathematical Operation on Numpy arrays

NumPy provides a wide range of mathematical operations that can be performed on arrays. These operations are typically element-wise, meaning they operate on each element of the array individually. Below are some common mathematical operations you can perform with NumPy arrays:

In [10]:
#creating two numpy arrays of same size
array1=numpy.array([1,3,5,7,9])
array2=numpy.array([0,2,4,6,8])

#### Addition

In [11]:
array1+array2

array([ 1,  5,  9, 13, 17])

#### Subtraction

In [12]:
array1-array2

array([1, 1, 1, 1, 1])

#### Multiplication

In [13]:
array1*array2

array([ 0,  6, 20, 42, 72])

#### Division

In [15]:
array2/array1

array([0.        , 0.66666667, 0.8       , 0.85714286, 0.88888889])

## Understanding Array properties

Understanding the properties of NumPy arrays is essential for effectively working with them in your computations. Below are some key properties of NumPy arrays that you should be familiar with:

In [17]:
import numpy as np

#### Shape

The shape of a NumPy array tells you the dimensions of the array. It is represented as a tuple of integers, where each integer corresponds to the size of the array along a particular dimension.

In [18]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)

(2, 3)


This indicates that 'arr' is a 2x3 matrix (2 rows, 3 columns)

#### Number of Dimensions

The 'ndim' property gives the number of dimensions (axes) of the array.

In [19]:
arr = np.array([1, 2, 3])
print(arr.ndim)

1


In [20]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.ndim) 

2


#### Size

The 'size' property gives the total number of elements in the array.

In [21]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.size)

6


#### Data type

The 'dtype' property shows the data type of the elements in the array. NumPy arrays are homogeneous, meaning all elements must be of the same type.

In [23]:
arr = np.array([1, 2, 3])
print(arr.dtype)

int32


In [24]:
arr = np.array([1.0, 2.0, 3.0])
print(arr.dtype)

float64


#### Transposition

The 'T' attribute transposes the array (i.e., flips it over its diagonal).

In [25]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.T)

[[1 4]
 [2 5]
 [3 6]]


#### Flattening

These methods flatten a multi-dimensional array into a 1D array.

In [26]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
flat_arr = arr.flatten()
print(flat_arr)

[1 2 3 4 5 6]


#### View

A view is a new array object that looks at the same data of the original array. Modifying the view will modify the original array.

In [27]:
arr = np.array([1, 2, 3, 4])
view = arr[1:3]  # View
view[0] = 10
print(arr)

[ 1 10  3  4]


A copy creates a new array with a new data buffer. Modifying the copy will not affect the original array.


In [28]:
arr = np.array([1, 2, 3, 4])
copy = arr[1:3].copy()  # Copy
copy[0] = 20
print(arr)

[1 2 3 4]


# Writing a Program in Python to Manipulate, Aggregate and Analyze data using Numpy.

let us know what is Data Manipulation, Aggregation and Analysis

## Data Manipulation
Data manipulation in NumPy involves a range of techniques for reshaping, modifying, and processing arrays to meet specific needs. NumPy provides a powerful and flexible set of functions to handle and manipulate data efficiently.

## Data Aggregation
Data aggregation in NumPy involves summarizing or combining data to provide useful insights and statistics. Aggregation functions compute summary statistics across different dimensions of arrays or across specified elements. 

## Data Analysis
Data analysis with NumPy involves extracting insights from data through various statistical and mathematical techniques. NumPy provides a robust set of tools for analyzing numerical data, enabling users to perform operations such as finding outliers, correlations, and calculating percentiles.

### Example Program

let us import numpy package as np for our convience. 
and then create a dataset which consists of student's Roll Number and their Marks. 

In [30]:
#importing numpy package
import numpy as np

In [34]:
#creating smaple dataset
# Each row represents a student; columns represent Roll No, Math, Physis, Science, English, History
students_data = np.array([
    [101, 85, 92, 78, 90, 89],
    [102, 76, 85, 80, 88, 92],
    [103, 89, 94, 92, 95, 91],
    [104, 65, 70, 72, 68, 74],
    [105, 91, 85, 87, 93, 90],
    [106, 78, 82, 84, 79, 85],
    [107, 88, 90, 86, 92, 87],
    [108, 95, 98, 99, 100, 97],
    [109, 70, 75, 73, 78, 72],
    [110, 84, 88, 85, 87, 90]
])
columns=np.array(['Roll_no','Math','Physis','Science','English','History'])

In [35]:
print(columns)
print(students_data)

['Roll_no' 'Math' 'Physis' 'Science' 'English' 'History']
[[101  85  92  78  90  89]
 [102  76  85  80  88  92]
 [103  89  94  92  95  91]
 [104  65  70  72  68  74]
 [105  91  85  87  93  90]
 [106  78  82  84  79  85]
 [107  88  90  86  92  87]
 [108  95  98  99 100  97]
 [109  70  75  73  78  72]
 [110  84  88  85  87  90]]


In [33]:
# Separating roll numbers and marks
roll_numbers = students_data[:, 0]
marks = students_data[:, 1:]

#### Data Manipulation


In [36]:
# 1. Calculate total marks for each student
total_marks = np.sum(marks, axis=1)
print("\nTotal Marks for Each Student:")
for roll_no, total in zip(roll_numbers, total_marks):
    print(f"Roll No: {roll_no}, Total Marks: {total}")


Total Marks for Each Student:
Roll No: 101, Total Marks: 434
Roll No: 102, Total Marks: 421
Roll No: 103, Total Marks: 461
Roll No: 104, Total Marks: 349
Roll No: 105, Total Marks: 446
Roll No: 106, Total Marks: 408
Roll No: 107, Total Marks: 443
Roll No: 108, Total Marks: 489
Roll No: 109, Total Marks: 368
Roll No: 110, Total Marks: 434


In [43]:
# 2. Add total marks as a new column
total_marks=total_marks.reshape(-1,1)
#new_columns=np.concatenate(columns,'Total')
#print(new_columns)
students_data_with_total = np.concatenate((students_data, total_marks),axis=1)
print("\nStudents' Marks with Total Marks Column:")
print(students_data_with_total)


Students' Marks with Total Marks Column:
[[101  85  92  78  90  89 434]
 [102  76  85  80  88  92 421]
 [103  89  94  92  95  91 461]
 [104  65  70  72  68  74 349]
 [105  91  85  87  93  90 446]
 [106  78  82  84  79  85 408]
 [107  88  90  86  92  87 443]
 [108  95  98  99 100  97 489]
 [109  70  75  73  78  72 368]
 [110  84  88  85  87  90 434]]


In [46]:
# 3. Replace marks below 75 with '75' to simulate minimum passing marks
marks_passing = np.where(marks < 75, 75, marks)
students_data_passing = np.concatenate((roll_numbers.reshape(-1, 1), marks_passing),axis=1)
print("\nStudents' Marks with Minimum Passing Marks Applied:")
print(students_data_passing)



Students' Marks with Minimum Passing Marks Applied:
[[101  85  92  78  90  89]
 [102  76  85  80  88  92]
 [103  89  94  92  95  91]
 [104  75  75  75  75  75]
 [105  91  85  87  93  90]
 [106  78  82  84  79  85]
 [107  88  90  86  92  87]
 [108  95  98  99 100  97]
 [109  75  75  75  78  75]
 [110  84  88  85  87  90]]


#### Data Aggregation

In [47]:
# 1. Compute the mean marks in each subject
mean_marks_per_subject = np.mean(marks, axis=0)
print("\nMean Marks in Each Subject:")
subjects = ["Math", "Science", "physis", "English", "History"]
for subject, mean in zip(subjects, mean_marks_per_subject):
    print(f"{subject}: {mean:.2f}")



Mean Marks in Each Subject:
Math: 82.10
Science: 85.90
physis: 83.60
English: 87.00
History: 86.70


In [48]:
# 2. Compute the highest marks in each subject
highest_marks_per_subject = np.max(marks, axis=0)
print("\nHighest Marks in Each Subject:")
for subject, highest in zip(subjects, highest_marks_per_subject):
    print(f"{subject}: {highest}")


Highest Marks in Each Subject:
Math: 95
Science: 98
physis: 99
English: 100
History: 97


In [49]:
# 3. Compute the average total marks across all students
average_total_marks = np.mean(total_marks)
print("\nAverage Total Marks Across All Students:")
print(f"Average Total Marks: {average_total_marks:.2f}")


Average Total Marks Across All Students:
Average Total Marks: 425.30


#### Data Analysis

In [50]:
# 1. Find the correlation between marks in different subjects
correlation_matrix = np.corrcoef(marks, rowvar=False)
print("\nCorrelation Matrix Between Subjects:")
print(correlation_matrix)



Correlation Matrix Between Subjects:
[[1.         0.91130116 0.88395876 0.94354531 0.84376091]
 [0.91130116 1.         0.83279954 0.93901354 0.89334103]
 [0.88395876 0.83279954 1.         0.83645861 0.81865189]
 [0.94354531 0.93901354 0.83645861 1.         0.8744601 ]
 [0.84376091 0.89334103 0.81865189 0.8744601  1.        ]]


In [51]:
# 2. Identify the student with the highest total marks
index_highest_total = np.argmax(total_marks)
student_highest_total = roll_numbers[index_highest_total]
print("\nStudent with the Highest Total Marks:")
print(f"Roll No: {student_highest_total}, Total Marks: {total_marks[index_highest_total]}")



Student with the Highest Total Marks:
Roll No: 108, Total Marks: [489]


In [52]:
# 3. Analyze how many students passed each subject (considering 75 as the passing marks)
passing_count_per_subject = np.sum(marks >= 75, axis=0)
print("\nNumber of Students Passed in Each Subject (Passing Marks = 75):")
for subject, count in zip(subjects, passing_count_per_subject):
    print(f"{subject}: {count}")



Number of Students Passed in Each Subject (Passing Marks = 75):
Math: 8
Science: 9
physis: 8
English: 9
History: 8


### Conclusion

#### **The Power of NumPy in Data Science**

The use of NumPy in this program showcases its immense value for data manipulation, aggregation, and analysis tasks, especially when dealing with numerical data. NumPy, a core library in the Python ecosystem, is a powerful tool for data science professionals due to its efficiency and flexibility in handling large datasets and performing complex mathematical operations.

#### **Advantages of Using NumPy over Traditional Python Data Structures**

1. **Performance and Efficiency:**
   - **Speed:** NumPy is implemented in C and allows for fast execution of operations on large datasets. It leverages optimized libraries such as BLAS and LAPACK, which are critical for heavy numerical computations.
   - **Memory Efficiency:** NumPy arrays are more memory-efficient than traditional Python lists, especially when dealing with large datasets. They require less overhead and are stored in contiguous memory locations, making them faster to access and manipulate.

2. **Array Operations:**
   - **Vectorization:** NumPy enables vectorized operations, allowing element-wise operations to be performed on entire arrays without the need for explicit loops. This leads to cleaner, more concise code and significantly improved performance.
   - **Broadcasting:** NumPy's broadcasting feature allows operations to be performed on arrays of different shapes, facilitating tasks like adding a scalar to an entire array or performing operations between arrays of different dimensions.

3. **Mathematical and Statistical Functions:**
   - NumPy provides a wide range of mathematical and statistical functions out of the box, making it easy to perform complex calculations such as mean, median, standard deviation, correlation, and more.

4. **Integration with Other Libraries:**
   - **Seamless Integration:** NumPy arrays serve as the foundation for other key Python libraries in data science, such as pandas, SciPy, and scikit-learn. This integration allows for seamless data manipulation and analysis workflows, making it easier to transition from data preprocessing to model building.

#### **Real-World Examples of NumPy’s Capabilities**

1. **Machine Learning:**
   - **Data Preprocessing:** In machine learning pipelines, NumPy is often used for tasks like normalizing data, calculating distances between data points, or even implementing gradient descent algorithms. Its speed and efficiency are critical when dealing with large datasets, as found in image processing or natural language processing tasks.

2. **Financial Analysis:**
   - **Portfolio Optimization:** NumPy is crucial in financial modeling and analysis, where tasks like computing the covariance matrix of asset returns, performing Monte Carlo simulations, and calculating Value at Risk (VaR) are common. The ability to handle large matrices and perform complex calculations quickly makes NumPy indispensable in this domain.

3. **Scientific Research:**
   - **Numerical Simulations:** Scientists rely on NumPy for simulations in physics, chemistry, and biology. For example, simulating the behavior of particles in a fluid or analyzing large-scale genetic data would be impractical with traditional Python lists due to the computational complexity and data volume involved.

#### **Conclusion**

In summary, NumPy is a foundational tool for any data science professional. Its capabilities in handling large-scale numerical data efficiently, combined with its integration into the broader Python ecosystem, make it essential for tasks ranging from basic data manipulation to advanced scientific computations. By leveraging NumPy, data scientists can write code that is not only faster and more memory-efficient but also more readable and maintainable, ultimately enabling more effective analysis and decision-making in various domains.