**Data Manipulation,Agrgregation,and Analysis for Data Science using Numpy**

Here’s a structured approach to get familiar with NumPy, along with a Python program to demonstrate data manipulation, aggregation, and analysis. I’ll cover core functionalities, basic examples, and real-world applications.

**Getting Familiar with NumPy**

Core Functionalities:

**Creating Arrays**:

NumPy arrays are the core data structure. They are more efficient than Python lists for numerical operations.

In [1]:
import numpy as np

# Creating a 1D array
arr = np.array([1, 2, 3, 4, 5])
print("1D Array:", arr)

# Creating a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", arr_2d)

1D Array: [1 2 3 4 5]
2D Array:
 [[1 2 3]
 [4 5 6]]


**Basic Operations**:

NumPy supports a variety of mathematical operations

In [2]:
# Element-wise addition
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
sum_arr = arr1 + arr2
print("Sum:", sum_arr)

# Element-wise multiplication
product_arr = arr1 * arr2
print("Product:", product_arr)


Sum: [5 7 9]
Product: [ 4 10 18]


**Array Properties**:

In [3]:
# Array shape and size
print("Shape of arr_2d:", arr_2d.shape)
print("Size of arr_2d:", arr_2d.size)
print("Data type of arr_2d:", arr_2d.dtype)


Shape of arr_2d: (2, 3)
Size of arr_2d: 6
Data type of arr_2d: int64


**Data Manipulation**

Python Program Example:italicized text

In [4]:
import numpy as np

# Creating an array
data = np.array([10, 20, 30, 40, 50, 60])

# Indexing
print("Element at index 2:", data[2])

# Slicing
print("Slice from index 1 to 4:", data[1:4])

# Reshaping
reshaped_data = data.reshape(2, 3)
print("Reshaped Data:\n", reshaped_data)

# Mathematical Operations
data_squared = np.power(data, 2)
print("Squared Data:", data_squared)


Element at index 2: 30
Slice from index 1 to 4: [20 30 40]
Reshaped Data:
 [[10 20 30]
 [40 50 60]]
Squared Data: [ 100  400  900 1600 2500 3600]


**Data Aggregation**

Using NumPy Functions for Summary Statistics:



In [5]:
import numpy as np

data = np.array([10, 20, 30, 40, 50])

# Mean
mean_val = np.mean(data)
print("Mean:", mean_val)

# Median
median_val = np.median(data)
print("Median:", median_val)

# Standard Deviation
std_dev = np.std(data)
print("Standard Deviation:", std_dev)

# Sum
total_sum = np.sum(data)
print("Sum:", total_sum)


Mean: 30.0
Median: 30.0
Standard Deviation: 14.142135623730951
Sum: 150


**Data Analysis**

Correlations, Outliers, and Percentiles:

In [6]:
import numpy as np

data = np.array([1, 2, 2, 3, 4, 4, 5, 6, 8, 100])

# Correlation (requires two arrays, example skipped for simplicity)

# Identifying outliers
mean = np.mean(data)
std_dev = np.std(data)
outliers = data[(data > mean + 2 * std_dev) | (data < mean - 2 * std_dev)]
print("Outliers:", outliers)

# Percentiles
percentiles = np.percentile(data, [25, 50, 75])
print("25th, 50th, 75th Percentiles:", percentiles)


Outliers: [100]
25th, 50th, 75th Percentiles: [2.25 4.   5.75]


**Advantages of Using NumPy**:

Performance:

NumPy operations are faster and more efficient than traditional Python data structures due to its implementation in C. Memory Efficiency: NumPy uses less memory for large datasets. Vectorized Operations: Allows for operations on entire arrays without needing explicit loops. Real-World Examples:

Machine Learning:

NumPy arrays are used for handling large datasets, performing matrix operations, and feeding data into machine learning algorithms. Financial Analysis: NumPy helps in analyzing stock prices, computing financial metrics, and running simulations. Scientific Research: Essential for numerical simulations, data analysis, and handling complex mathematical operations.

