# NumPy Data Explorer

## Project Overview
This project explores the core features of NumPy, a fundamental Python library for numerical
and scientific computing. The objective is to demonstrate practical understanding of
NumPy arrays, indexing and slicing, mathematical and statistical operations, reshaping
and broadcasting, file handling, and performance comparison with standard Python lists.

No external dataset was provided, so a synthetic dataset was created using NumPy for
demonstration purposes.


## Importing Libraries

NumPy is used for numerical operations, while the time module is used to measure
performance differences between NumPy arrays and Python lists.

In [2]:
import numpy as np
import time

## Array Creation

A two-dimensional NumPy array was created to simulate student scores across three subjects.
Each row represents a student, and each column represents a subject.

In [58]:
# Create a 2D NumPy array representing student scores
scores = np.array([
    [65, 70, 75],
    [80, 85, 90],
    [55, 60, 58],
    [90, 92, 95]
])

scores

array([[65, 70, 75],
       [80, 85, 90],
       [55, 60, 58],
       [90, 92, 95]])

## Indexing and Slicing

Indexing allows access to individual elements, while slicing enables extraction
of subsets of data from an array.

In [63]:
# Access a specific element (2nd student, 3rd subject)
scores[1, 2]

# Access the first student's scores - an entire row
scores[0]

# Access all scores for the second subject - an entire column
scores[:, 1]

# Extract/slice a sub-array (first two students and first two subjects)
scores[:2, :2]

array([[65, 70],
       [80, 85]])

## Mathematical Operations

NumPy supports fast, vectorized mathematical operations that are applied
element-wise across arrays.

In [70]:
# Increase all scores by 5 points (Add 5 to all elements)
scores + 5

array([[ 70,  75,  80],
       [ 85,  90,  95],
       [ 60,  65,  63],
       [ 95,  97, 100]])

In [72]:
# Double all scores (Multiply all elements by 2)
scores * 2

array([[130, 140, 150],
       [160, 170, 180],
       [110, 120, 116],
       [180, 184, 190]])

## Axis-wise and Statistical Operations

Axis-wise operations allow calculations across rows or columns.
Statistical operations/functions summarize the dataset.

In [75]:
# Average score per student (row-wise)
np.mean(scores, axis=1)

array([70.        , 85.        , 57.66666667, 92.33333333])

In [77]:
# Average score per subject (column-wise)
np.mean(scores, axis=0)

array([72.5 , 76.75, 79.5 ])

In [79]:
# Other statistics
print(np.sum(scores))
print(np.min(scores))
print(np.max(scores))
print(np.std(scores))

915
55
95
13.796889746122735


## Reshaping Arrays

Reshaping changes the structure of an array without modifying the underlying dat.


In [31]:
# Create a 1D array
arr = np.arange(12)

# Reshape into 3 rows and 4 columns
arr.reshape(3, 4)


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [84]:
# Create a one-dimensional array
numbers = np.arange(12)

numbers

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [86]:
# Reshape into a 3x4 matrix
numbers.reshape(3, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

## Broadcasting

Broadcasting allows NumPy to perform operations between arrays of different shapes
without explicitly copying data, improving efficieny.


In [99]:
# Bonus scores added per subject (1D array added to a 2D array)
bonus = np.array([2, 3, 5])

# Apply bonus to all students
scores + bonus

array([[ 67,  73,  80],
       [ 82,  88,  95],
       [ 57,  63,  63],
       [ 92,  95, 100]])

## Saving and Loading Arrays

NumPy provides efficient methods for saving arrays to disk and loading them back
for reuse in future computations.

In [102]:
# Save the array to a file
np.save("student_scores.npy", scores)

In [104]:
# Load the array from file
loaded_scores = np.load("student_scores.npy")
loaded_scores

array([[65, 70, 75],
       [80, 85, 90],
       [55, 60, 58],
       [90, 92, 95]])

## Performance Comparison: NumPy vs Python Lists

NumPy arrays are optimized for numerical computation and are generally much faster
than Python lists for large-scale operations.

In [131]:
# Create large datasets
size = 1000000

numpy_array = np.arange(size)
python_list = list(range(size))

In [155]:
# NumPy operation
start = time.time()
numpy_array * 2
numpy_duration = time.time() - start
print("NumPy array time:", numpy_duration)

# Python list operation
start = time.time()
[x * 2 for x in python_list]
list_duration = time.time() - start
print("Python list time:", list_duration)

NumPy array time: 0.0029976367950439453
Python list time: 0.10867929458618164


## Conclusion

This project demonstrated the essential capabilities of NumPy, including array creation,
indexing and slicing, mathematical and statistical operations, reshaping and broadcasting,
saving and loading arrays, and performance comparison with Python lists. The results
highlight NumPyâ€™s efficiency and importance in numerical computing and data analsis.
