## 1. Getting familar with Numpy

## NumPy’s core functionalities:

## Here are some core functionalities as u can see below (They are basic and intermediate functionalities)

In [12]:
import numpy as np

# 1. Array Creation
# 1D Array
array_1d = np.array([1, 2, 3])
print(array_1d)

[1 2 3]


In [13]:
# 2D Array
array_2d = np.array([[1, 2], [3, 4]])
print(array_2d)

[[1 2]
 [3 4]]


In [14]:
# Zeros and Ones
zeros_array = np.zeros((3, 3))
ones_array = np.ones((2, 2))
print(zeros_array)
print("\n")
print(ones_array)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


[[1. 1.]
 [1. 1.]]


In [11]:
# Empty Array
empty_array = np.empty((2, 3))
print(empty_array)

[[-6.95208352e-310  3.92860901e-104  1.08134514e-311]
 [ 1.08134514e-311  5.92878775e-323  7.21335843e-322]]


## Performing basic operations

In [20]:
# 3. Array Operations
# Element-wise Operations
sum_array = array_1d + 2
product_array = array_1d * 3
print(sum_array)
print(product_array)

[3 4 5]
[3 6 9]


In [21]:
# Matrix Multiplication
matrix_mult = np.dot(array_2d, array_2d)
print(matrix_mult)

[[ 7 10]
 [15 22]]


In [22]:
# Transpose
transpose_array = np.transpose(array_2d)
print(transpose_array)

[[1 3]
 [2 4]]


In [23]:
# Slicing and Indexing
slice_array = array_2d[0:2, 0:2]
print(slice_array)

[[1 2]
 [3 4]]


In [24]:
# Reshape
reshaped_array = array_1d.reshape((3, 1))
print(reshaped_array)

[[1]
 [2]
 [3]]


##  Understanding array properties

In [25]:
import numpy as np
array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(array.shape)  # Output: (3, 3)


(3, 3)


In [26]:
array = np.array([[1, 2, 3], [4, 5, 6]])
print(array.ndim)  # Output: 2


2


In [27]:
array = np.array([[1, 2], [3, 4], [5, 6]])
print(array.size)  # Output: 6


6


In [28]:
array = np.array([1.5, 2.5, 3.5])
print(array.dtype)  # Output: float64


float64


## Data Manipulation

In [29]:
array = np.array([11, 27, 37, 44, 58])
print(array[0])  
print(array[2]) 
print(array[-1]) 

11
37
58


In [30]:
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(array_2d[1:3, :])  
print(array_2d[:, 1:3])

[[4 5 6]
 [7 8 9]]
[[2 3]
 [5 6]
 [8 9]]


In [31]:
array = np.arange(24).reshape((2, 3, 4))
swapped_array = array.swapaxes(1, 2)
print(swapped_array)

[[[ 0  4  8]
  [ 1  5  9]
  [ 2  6 10]
  [ 3  7 11]]

 [[12 16 20]
  [13 17 21]
  [14 18 22]
  [15 19 23]]]


## Data Aggregation

In [32]:
import numpy as np
array = np.array([1, 2, 3, 4, 5])
total_sum = np.sum(array)
print(total_sum) 

15


In [33]:
data = np.array([1, 2, 3, 4, 5])
mean_value = np.mean(data)
print("Mean:", mean_value)  

Mean: 3.0


In [34]:
median_value = np.median(data)
print("Median:", median_value) 

Median: 3.0


In [35]:
std_dev = np.std(data)
print("Standard Deviation:", std_dev) 

Standard Deviation: 1.4142135623730951


In [36]:
total_sum = np.sum(data)
print("Sum:", total_sum)

Sum: 15


## Data Analysis

In [38]:
data1 = np.array([1, 2, 3, 4, 5])
data2 = np.array([5, 4, 3, 2, 1])

# Compute correlation coefficient
correlation = np.corrcoef(data1, data2)[0, 1]
print("Correlation coefficient:", correlation) 

Correlation coefficient: -0.9999999999999999


In [39]:
data = np.array([10, 12, 12, 13, 12, 100])  # 100 is an outlier

# Calculate Z-scores
mean = np.mean(data)
std_dev = np.std(data)
z_scores = (data - mean) / std_dev

# Identify outliers
outliers = data[np.abs(z_scores) > 2]
print("Outliers:", outliers)  # Output: Outliers: [100]


Outliers: [100]


In [40]:
data = np.array([10, 20, 30, 40, 50])

# Calculate percentiles
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)

print("25th Percentile:", percentile_25)  # Output: 25th Percentile: 20.0
print("75th Percentile:", percentile_75)  # Output: 75th Percentile: 40.0


25th Percentile: 20.0
75th Percentile: 40.0


In [41]:
# Create a large dataset
large_data = np.random.rand(1000000)  # 1 million random numbers

# Compute mean and standard deviation efficiently
mean_large = np.mean(large_data)
std_dev_large = np.std(large_data)

print("Mean of large dataset:", mean_large)
print("Standard Deviation of large dataset:", std_dev_large)

Mean of large dataset: 0.5005314478299429
Standard Deviation of large dataset: 0.2888059302178745


## Advantages of Numpy

NumPy offers significant advantages over traditional Python data structures for numerical computations:

1. Performance: NumPy arrays are implemented in C, providing faster computation compared to Python lists due to optimized memory handling and vectorized operations.
2. Memory Efficiency: NumPy arrays consume less memory than Python lists by using contiguous blocks of memory, which reduces overhead and improves performance.
3. Convenient Operations: NumPy supports element-wise operations and broadcasting, simplifying complex mathematical computations and reducing the need for explicit loops.
4. Advanced Functions: NumPy provides a vast array of mathematical functions, including linear algebra, statistical analysis, and Fourier transforms, which are not natively available in Python lists.
5. Multi-dimensional Arrays: NumPy handles multi-dimensional arrays (e.g., matrices, tensors) efficiently, enabling sophisticated data manipulations and analyses.
6. Integration: NumPy integrates seamlessly with other scientific libraries like SciPy, Pandas, and Matplotlib, creating a robust ecosystem for data science and analysis.

### Real-World Examples:

1. Machine Learning: In machine learning, NumPy is crucial for handling and preprocessing large datasets. For instance, in training neural networks, NumPy arrays are used to represent feature matrices and perform operations like matrix multiplication.
2. Financial Analysis: Financial analysts use NumPy for tasks such as modeling stock prices and calculating moving averages. The ability to handle large datasets and perform complex calculations efficiently is vital for real-time trading algorithms.
3. Scientific Research: In scientific research, NumPy is employed for simulations, data analysis, and visualization. For example, it is used in astrophysics to analyze large-scale astronomical data and in biology for genetic sequence analysis. 

Overall, NumPy’s efficiency and extensive functionality make it an essential tool in fields requiring high-performance numerical computations.

In [42]:
pip install --upgrade pandoc


Note: you may need to restart the kernel to use updated packages.
