1. Getting Familiar with NumPy

In [1]:
import numpy as np

# Creating a 1D array from a list
array_1d = np.array([1, 2, 3, 4, 5])
print("1D array:")
print(array_1d)

# Creating a 2D array (matrix)
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D array:")
print(array_2d)

# Basic operations
array_sum = array_1d + 10
print("\nArray after adding 10 to each element:")
print(array_sum)

array_product = array_2d * 2
print("\nArray after multiplying each element by 2:")
print(array_product)

# Array properties
print("\nArray properties:")
print(f"Shape of array_2d: {array_2d.shape}")
print(f"Size of array_2d: {array_2d.size}")
print(f"Data type of array_2d: {array_2d.dtype}")

1D array:
[1 2 3 4 5]

2D array:
[[1 2 3]
 [4 5 6]]

Array after adding 10 to each element:
[11 12 13 14 15]

Array after multiplying each element by 2:
[[ 2  4  6]
 [ 8 10 12]]

Array properties:
Shape of array_2d: (2, 3)
Size of array_2d: 6
Data type of array_2d: int64


2. Data Manipulation with NumPy

In [2]:
# Array creation
array = np.arange(10)  # Create an array with a range of values
print("\nArray created with arange:")
print(array)

# Indexing and slicing
print("\nElement at index 5:")
print(array[5])

print("\nSliced array (elements from index 2 to 5):")
print(array[2:6])

# Reshaping arrays
array_reshaped = np.reshape(array, (2, 5))
print("\nReshaped array (2x5):")
print(array_reshaped)

# Applying mathematical operations
array_squared = array ** 2
print("\nArray after squaring each element:")
print(array_squared)

array_sqrt = np.sqrt(array)
print("\nSquare root of each element in the array:")
print(array_sqrt)


Array created with arange:
[0 1 2 3 4 5 6 7 8 9]

Element at index 5:
5

Sliced array (elements from index 2 to 5):
[2 3 4 5]

Reshaped array (2x5):
[[0 1 2 3 4]
 [5 6 7 8 9]]

Array after squaring each element:
[ 0  1  4  9 16 25 36 49 64 81]

Square root of each element in the array:
[0.         1.         1.41421356 1.73205081 2.         2.23606798
 2.44948974 2.64575131 2.82842712 3.        ]


3. Data Aggregation

In [3]:
# Create a 2D array for aggregation examples
data = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
print("\nData array:")
print(data)

# Computing summary statistics
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
total_sum = np.sum(data)

print("\nSummary statistics:")
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Standard Deviation: {std_dev}")
print(f"Sum: {total_sum}")

# Grouping data and performing aggregations
above_mean = data[data > mean]
print("\nElements above mean:")
print(above_mean)


Data array:
[[10 20 30]
 [40 50 60]
 [70 80 90]]

Summary statistics:
Mean: 50.0
Median: 50.0
Standard Deviation: 25.81988897471611
Sum: 450

Elements above mean:
[60 70 80 90]


4. Data Analysis

In [4]:
# Correlation example
# Create two arrays representing different data points
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

# Compute correlation coefficient
correlation = np.corrcoef(x, y)
print("\nCorrelation coefficient between x and y:")
print(correlation)

# Identifying outliers using Z-score
data_with_outliers = np.array([10, 12, 15, 18, 20, 22, 100])  # 100 is an outlier
mean_data = np.mean(data_with_outliers)
std_dev_data = np.std(data_with_outliers)
z_scores = (data_with_outliers - mean_data) / std_dev_data

print("\nZ-scores of the data (detect outliers):")
print(z_scores)

# Find elements that are considered outliers (absolute Z-score > 2)
outliers = data_with_outliers[np.abs(z_scores) > 2]
print("\nOutliers in the data:")
print(outliers)

# Calculating percentiles
percentile_25 = np.percentile(data_with_outliers, 25)
percentile_50 = np.percentile(data_with_outliers, 50)  # Also the median
percentile_75 = np.percentile(data_with_outliers, 75)

print("\nPercentiles of the data:")
print(f"25th percentile: {percentile_25}")
print(f"50th percentile (median): {percentile_50}")
print(f"75th percentile: {percentile_75}")


Correlation coefficient between x and y:
[[1. 1.]
 [1. 1.]]

Z-scores of the data (detect outliers):
[-0.6129475  -0.54537848 -0.44402496 -0.34267144 -0.27510242 -0.2075334
  2.4276582 ]

Outliers in the data:
[100]

Percentiles of the data:
25th percentile: 13.5
50th percentile (median): 18.0
75th percentile: 21.0


5.Application in Data Science

Advantages of Using NumPy in Data Science

NumPy provides several advantages over traditional Python data structures for numerical computations:

Performance: NumPy arrays are more efficient than Python lists for numerical computations due to their fixed size and data type.
Ease of Use: NumPy offers a wide range of mathematical functions that simplify the implementation of complex algorithms.
Integration: NumPy integrates well with other Python libraries like Pandas, Scikit-Learn, and SciPy, enhancing its utility in data science workflows.
Handling Large Datasets: NumPy is designed to work efficiently with large datasets, making it a valuable tool for data scientists working with high-dimensional data.

Real-World Examples

Machine Learning: NumPy is used to handle numerical data, perform matrix operations, and compute gradients, which are crucial in building machine learning models.

Financial Analysis: In finance, NumPy is used to perform quantitative analysis, compute financial indicators, and model time-series data.

Scientific Research: NumPy is widely used in scientific computing for simulations, solving mathematical problems, and analyzing experimental data due to its ability to handle large, multi-dimensional arrays efficiently.

Conclusion

By exploring the examples above, you’ll gain a comprehensive understanding of how NumPy can be used for data manipulation, analysis, and aggregation in data science. NumPy is a powerful tool that simplifies numerical computations, making it indispensable for data science professionals dealing with large datasets and complex numerical tasks.