### What is NumPy?

NumPy (short for Numerical Python) is a fundamental package in Python used for scientific computing. It provides support for:

#### Key Features:
- Works with 1D, 2D, and higher-dimensional arrays
- Fast math operations (like sum, mean, etc.)
- Supports matrix and linear algebra



# Using NumPy for Smart Fitness Score Calculation

## 🏋️ Scenario: Fitness Performance Scoring

You run a fitness platform that evaluates daily performance using three metrics:
- Minutes of exercise
- Liters of water consumed
- Hours of sleep

You want to assign a **daily fitness score** based on how well a user balances these metrics.
---

##Fitness Score Formula

We will use a simple linear formula:

score = w1 * exercise + w2 * water + w3 * sleep


Where `w1`, `w2`, and `w3` reflect the importance of each metric.

---

##Define the Weights and Sample Data


In [85]:
# Assigning importance to each factor
weights = [0.5, 0.2, 0.3]  # More emphasis on exercise

# Sample user data: [exercise_minutes, water_liters, sleep_hours]
day1 = [45, 2.0, 6.5]
day2 = [30, 1.5, 8.0]
day3 = [60, 3.0, 7.0]

In [87]:
def compute_score(day_data, weights):
    result = 0
    for i in range(len(day_data)):
        result += day_data[i] * weights[i]
    return result

# Example usage
print(compute_score(day1, weights))  # Output: Fitness score for day1
print(compute_score(day2, weights))
print(compute_score(day3, weights))


24.849999999999998
17.7
32.7


### using Zip function
The zip() function in Python is a built-in function used to combine multiple iterables (like lists, tuples, or strings) element-wise into a single iterable of tuples. It pairs corresponding elements from each input iterable.

In [88]:
list1 = [1, 2, 3]
list2 = ['a', 'b', 'c']
zipped_result = zip(list1, list2)

# Convert the zip object to a list to view the contents
print(list(zipped_result))

[(1, 'a'), (2, 'b'), (3, 'c')]


In [91]:
print(list(zip(day1,weights)))

[(45, 0.5), (2.0, 0.2), (6.5, 0.3)]


In [92]:
def compute_score(day_data, weights):
    result = 0
    for data,w in zip(day_data,weights):
        result += data*w

    return result

# Example usage
print(compute_score(day1, weights))

24.849999999999998


 ## Now Use NumPy

In [None]:
import numpy as np


###Convert Lists to Arrays

In [93]:
day1 = np.array([45, 2.0, 6.5])
weights = np.array([0.5, 0.2, 0.3])


In [94]:
type(day1)

numpy.ndarray

In [None]:
day1[0]

np.float64(45.0)

In [None]:
a=5.0
print(type(a))

<class 'float'>


In [None]:
score = np.dot(day1, weights)
print("Fitness Score:", score)


Fitness Score: 24.849999999999998


### Or:

In [None]:
print(day1*weights)

[22.5   0.4   1.95]


In [None]:
(day1 * weights).sum()


np.float64(24.849999999999998)

### Compare performance of Python loop vs NumPy dot product

In [None]:

workout_minutes = list(range(30, 1000000))
calories_burned = list(range(30, 1000000))

# Convert to NumPy arrays
workout_np = np.array(workout_minutes)
calories_np = np.array(calories_burned)

# Compare performance of Python loop vs NumPy dot product


In [None]:
%%time
energy_score_loop = 0
for w, c in zip(workout_minutes, calories_burned):
    energy_score_loop += w * c
print(energy_score_loop)

333417833248417495
CPU times: user 201 ms, sys: 0 ns, total: 201 ms
Wall time: 203 ms


### Using NumPy vectorized dot product

In [None]:
%%time
energy_score_np = np.dot(workout_np, calories_np)
print(energy_score_np)

333332833333491445
CPU times: user 3.12 ms, sys: 143 µs, total: 3.27 ms
Wall time: 2.67 ms


###Evaluate a Week at Once

In [97]:
week_data = np.array([
    [45, 2.0, 6.5],
    [30, 1.5, 8.0],
    [60, 3.0, 7.0],
    [50, 2.5, 7.5],
    [40, 1.8, 6.0],
    [55, 2.2, 6.8],
    [35, 1.6, 7.2]
])
#2d array

In [99]:
# print(week_data)
print(week_data.shape)

(7, 3)


In [95]:
x=np.array([1,2,3])
print(x.shape)

(3,)


In [96]:
weights = np.array([0.5, 0.2, 0.3])
#1d array

In [103]:

arr = np.array([[[1, 2, 3], [4, 5, 6]],

                 [[1, 2, 3], [4, 5, 6]]])

print(arr.shape)

(2, 2, 3)


All elements in a NumPy array must have the same data type. You can inspect the type using .dtype.

In [104]:
arr = np.array([
    [8, 2, 1],
    [9, 3, 2],
    [7, 1, 2]
])
print("arr dtype:", arr.dtype)

arr dtype: int64


Add Floating Point to One Entry

In [105]:
# Introduce one float into the array
arr2 = np.array([
    [8.0, 2, 1],
    [9, 3, 2],
    [7, 1, 2]
])
print("arr2 dtype:",arr2.dtype)


arr2 dtype: float64


In [106]:
weekly_scores = np.dot(week_data, weights)
print("Weekly Scores:", weekly_scores)


Weekly Scores: [24.85 17.7  32.7  27.75 22.16 29.98 19.98]


In [107]:
weekly_scores1=week_data @ weights
print("Weekly Scores1:", weekly_scores1)

Weekly Scores1: [24.85 17.7  32.7  27.75 22.16 29.98 19.98]


# NumPy Analysis on Wine Quality Dataset

We will use NumPy to explore the [Wine Quality dataset](https://archive.ics.uci.edu/ml/datasets/wine+quality), which contains physicochemical and quality-related properties of red and white wine samples.

---

## Step 1: Load CSV Data using NumPy


In [None]:
# Load data from UCI Wine Quality Dataset (hosted on GitHub)
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'


In [108]:
# Use genfromtxt with proper delimiter and header skip
wine_data = np.genfromtxt(url, delimiter=';', skip_header=1)
print("Shape of dataset:", wine_data.shape)


Shape of dataset: (1599, 12)


In [110]:
print(wine_data)

[[ 7.4    0.7    0.    ...  0.56   9.4    5.   ]
 [ 7.8    0.88   0.    ...  0.68   9.8    5.   ]
 [ 7.8    0.76   0.04  ...  0.65   9.8    5.   ]
 ...
 [ 6.3    0.51   0.13  ...  0.75  11.     6.   ]
 [ 5.9    0.645  0.12  ...  0.71  10.2    5.   ]
 [ 6.     0.31   0.47  ...  0.66  11.     6.   ]]


Download and Save the CSV File:

In [111]:
import urllib.request

# URL of the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'

# Local filename to save as
filename = 'winequality-red.csv'

# Download the file from `url` and save it locally under `filename`
urllib.request.urlretrieve(url, filename)

print(f"File saved as: {filename}")


File saved as: winequality-red.csv


In [112]:

wine_data = np.genfromtxt('winequality-red.csv', delimiter=';', skip_header=1)
print(wine_data.shape)

(1599, 12)


In [113]:
wine_data

array([[ 7.4  ,  0.7  ,  0.   , ...,  0.56 ,  9.4  ,  5.   ],
       [ 7.8  ,  0.88 ,  0.   , ...,  0.68 ,  9.8  ,  5.   ],
       [ 7.8  ,  0.76 ,  0.04 , ...,  0.65 ,  9.8  ,  5.   ],
       ...,
       [ 6.3  ,  0.51 ,  0.13 , ...,  0.75 , 11.   ,  6.   ],
       [ 5.9  ,  0.645,  0.12 , ...,  0.71 , 10.2  ,  5.   ],
       [ 6.   ,  0.31 ,  0.47 , ...,  0.66 , 11.   ,  6.   ]])

Check Column-wise Statistics

In [None]:
mean_vals = np.mean(wine_data, axis=0)
std_vals = np.std(wine_data, axis=0)
print("Means:\n", mean_vals)
print("Standard Deviations:\n", std_vals)


Means:
 [ 8.31963727  0.52782051  0.27097561  2.5388055   0.08746654 15.87492183
 46.46779237  0.99674668  3.3111132   0.65814884 10.42298311  5.63602251]
Standard Deviations:
 [1.74055180e+00 1.79003704e-01 1.94740214e-01 1.40948711e+00
 4.70505826e-02 1.04568856e+01 3.28850367e+01 1.88674370e-03
 1.54338181e-01 1.69453967e-01 1.06533430e+00 8.07316877e-01]


In [115]:
x = np.array([[1, 2, 3],
                      [4, 5, 6],
                      [7, 8, 9]])
mean_vals = np.mean(x, axis=1)
print(mean_vals)


[2. 5. 8.]


###Indexing and Filtering Examples

Filter wines with alcohol > 10

In [116]:
wine_data[:,10]>10

array([False, False, False, ...,  True,  True,  True])

In [119]:
high_alcohol = wine_data[wine_data[:, 10] > 10]
print(high_alcohol)
high_alcohol.shape
print("High alcohol wines:", high_alcohol.shape[0])


[[ 7.5    0.5    0.36  ...  0.8   10.5    5.   ]
 [ 7.5    0.5    0.36  ...  0.8   10.5    5.   ]
 [ 8.5    0.28   0.56  ...  0.75  10.5    7.   ]
 ...
 [ 6.3    0.51   0.13  ...  0.75  11.     6.   ]
 [ 5.9    0.645  0.12  ...  0.71  10.2    5.   ]
 [ 6.     0.31   0.47  ...  0.66  11.     6.   ]]
High alcohol wines: 852


Filter wines with quality (last column) ≥ 7

In [None]:
good_wines = wine_data[wine_data[:, -1] >= 7]
print("Good quality wines:", good_wines.shape[0])


Good quality wines: 217


In [120]:
lst=[1,2,3,4,5,6]
print(lst[:5])

[1, 2, 3, 4, 5]


Slice first 5 rows and columns

In [121]:
print(wine_data[:5, :5])


[[ 7.4    0.7    0.     1.9    0.076]
 [ 7.8    0.88   0.     2.6    0.098]
 [ 7.8    0.76   0.04   2.3    0.092]
 [11.2    0.28   0.56   1.9    0.075]
 [ 7.4    0.7    0.     1.9    0.076]]


Compute average alcohol content per quality level

In [122]:
np.unique(wine_data[:, -1])

array([3., 4., 5., 6., 7., 8.])

In [126]:
wine_data[:, -1] == 5

array([ True,  True,  True, ..., False,  True, False])

In [128]:
np.mean(wine_data[wine_data[:, -1] == 5][:,10])

np.float64(9.899706314243758)

In [None]:
for quality in np.unique(wine_data[:, -1]):
    avg_alcohol = np.mean(wine_data[wine_data[:, -1] == quality][:, 10])
    print(f"Quality {int(quality)}: Avg Alcohol = {avg_alcohol:.2f}")


Quality 3: Avg Alcohol = 9.96
Quality 4: Avg Alcohol = 10.27
Quality 5: Avg Alcohol = 9.90
Quality 6: Avg Alcohol = 10.63
Quality 7: Avg Alcohol = 11.47
Quality 8: Avg Alcohol = 12.09


Compute correlation between alcohol and quality

In [None]:
alcohol = wine_data[:, 10]
quality = wine_data[:, -1]
correlation = np.corrcoef(alcohol, quality)[0, 1]
print("Correlation between alcohol and quality:", correlation)


Correlation between alcohol and quality: 0.47616632400113584


###Compute Weighted Wine Score
Let’s say you care about three columns:

Alcohol (index 10)

Sulphates (index 9)

Volatile acidity (index 1)

In [None]:
# Select relevant columns
features = wine_data[:, [10, 9, 1]]

# Assign weights: high alcohol is good, low volatile acidity is good
weights = np.array([0.4, 0.3, -0.3])

# Compute custom wine score
scores = features @ weights
print("Wine scores (first 5):", scores[:5])


Wine scores (first 5): [3.718 3.86  3.887 4.01  3.718]


##  Broadcasting

NumPy lets you perform mathematical operations with operators like `+`, `-`, `*`, `/` on arrays.
You can use these with either a single number (scalar) or another array of the same shape.
Here are some useful examples:


In [129]:
import numpy as np

# Example arrays
matrixA = np.array([[4, 7, 2, 5],
                    [6, 3, 8, 1],
                    [0, 9, 2, 6]])

matrixB = np.array([[14, 11, 16, 12],
                    [13, 19, 10, 15],
                    [21, 14, 18, 13]])

# matrix sum
print("# maatrix sum\n", matrixA + matrixB)

# Subtract arrays
print("\n# Subtract arrays\n", matrixB - matrixA)

# Add a scalar
print("# Add a scalar\n", matrixA + 2)


# Divide by scalar
print("\n# Divide by scalar\n", matrixA / 2)

# Elementwise multiplication
print("\n# Elementwise multiplication\n", matrixA * matrixB)

# Modulus with scalar
print("\n# Modulus with scalar\n", matrixA % 3)


# maatrix sum
 [[18 18 18 17]
 [19 22 18 16]
 [21 23 20 19]]

# Subtract arrays
 [[10  4 14  7]
 [ 7 16  2 14]
 [21  5 16  7]]
# Add a scalar
 [[ 6  9  4  7]
 [ 8  5 10  3]
 [ 2 11  4  8]]

# Divide by scalar
 [[2.  3.5 1.  2.5]
 [3.  1.5 4.  0.5]
 [0.  4.5 1.  3. ]]

# Elementwise multiplication
 [[ 56  77  32  60]
 [ 78  57  80  15]
 [  0 126  36  78]]

# Modulus with scalar
 [[1 1 2 2]
 [0 0 2 1]
 [0 0 2 0]]


## Array Broadcasting

Broadcasting describes how NumPy handles operations between arrays of different shapes. Instead of forcing you to create arrays with the exact same dimensions, NumPy automatically "broadcasts" the smaller array across the larger one so that their shapes match.

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimension and works its way left. Two dimensions are compatible when they are equal, or one of them is 1.

In [None]:
a = np.array([1, 2, 3])
b = 10
print(a + b)

[11 12 13]


2D array and 1D array


In [130]:
A = np.array([[1, 2, 3],

              [4, 5, 6]])

B = np.array([10, 20, 30])
print(A + B)
#Here, B is broadcasted over each row of A

[[11 22 33]
 [14 25 36]]


In [131]:
matrixA = np.array([[4, 7, 2, 5],
                    [6, 3, 8, 1],
                    [0, 9, 2, 6]])
vectorC = np.array([2, 4, 6, 8])

print("matrixA.shape:", matrixA.shape)
print("vectorC.shape:", vectorC.shape)
print("\n# Broadcasting add\n", matrixA + vectorC)

vectorD = np.array([5, 7])
# This will error:
try:
    matrixA + vectorD
except Exception as e:
    print("\n# Broadcasting error:", e)


matrixA.shape: (3, 4)
vectorC.shape: (4,)

# Broadcasting add
 [[ 6 11  8 13]
 [ 8  7 14  9]
 [ 2 13  8 14]]

# Broadcasting error: operands could not be broadcast together with shapes (3,4) (2,) 


In [None]:
A = np.array([[1, 2, 3],

              [4, 5, 6]])

B = np.array([[10],

              [20]])
print(A + B)


[[11 12 13]
 [24 25 26]]


In [None]:
a = np.array([1, 2, 3])
b = np.array([1, 2])
a + b  # This will raise an error because their shapes are not compatible!

ValueError: operands could not be broadcast together with shapes (3,) (2,) 

In [146]:
A = np.array([[1],

              [2],

               [3]])   # Shape (3,1)
B = np.array([10, 20, 30])      # Shape (3,)
print(A.shape)
print(B.shape)

(3, 1)
(3,)


In [144]:
a = np.array([1, 2, 3])
b = np.array([4, 5])
print(a.shape,b.shape)
a+b

(3,) (2,)


ValueError: operands could not be broadcast together with shapes (3,) (2,) 

In [134]:
 np.ones((4, 3, 5))

array([[[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]]])

In [143]:
a = np.ones((1, 3, 5))
b = np.ones((4,3,1))
print(a.shape)
print(b.shape)
# Error!
c=a + b

print(c.shape)

(1, 3, 5)
(4, 3, 1)
(4, 3, 5)


## Array Comparison

Elementwise comparisons in NumPy return boolean arrays.
You can use operations like `==`, `!=`, `>`, `<`, `>=`, `<=`.

Example:


In [147]:
X = np.array([[5, 1, 8], [2, 4, 7]])
Y = np.array([[3, 1, 8], [2, 9, 6]])

print("# X == Y\n", X == Y)
print("\n# X >= Y\n", X >= Y)
print("\n# Count not equal:", (X != Y).sum())


# X == Y
 [[False  True  True]
 [ True False False]]

# X >= Y
 [[ True  True  True]
 [ True False  True]]

# Count not equal: 3


## Array Indexing and Slicing

You can select single elements, slices, or subarrays in NumPy arrays using indices and ranges.
Here are a few examples:


In [148]:
cube = np.array([
    [[ 3,  5,  7], [ 8, 10, 12]],
    [[13, 15, 17], [18, 20, 22]],
    [[23, 25, 27], [28, 30, 32]]
])

# print("# Shape:", cube.shape)

# Single element
print("\n# cube[2, 1, 0]:", cube[2, 1, 0])

# Subarray using ranges
print("\n# cube[1:, :, 1]:\n", cube[1:, :, 1])

# Mixing indices and ranges
print("\n# cube[1, :, 1:]:\n", cube[1, :, 1:])

# Fewer indices (returns 2D slice)
print("\n# cube[2]:\n", cube[2])



# cube[2, 1, 0]: 28

# cube[1:, :, 1]:
 [[15 20]
 [25 30]]

# cube[1, :, 1:]:
 [[15 17]
 [20 22]]

# cube[2]:
 [[23 25 27]
 [28 30 32]]


## Different Ways to Initialize Numpy Arrays

Numpy offers many built-in methods to create arrays with preset or random values.  
Here are some useful examples with different shapes and values.  
Check the [official docs](https://numpy.org/doc/stable/reference/routines.array-creation.html) for more options!


In [None]:
import numpy as np

# All zeros array
zero_grid = np.zeros((4, 2))
print("# All zeros array\n", zero_grid)

# All ones, higher dimension
ones_cube = np.ones((2, 3, 2))
print("\n# All ones (3D)\n", ones_cube)

# Identity matrix
identity = np.eye(4)
print("\n# Identity matrix\n", identity)

# Random vector (0 to 1)
rand_vec = np.random.rand(6)
print("\n# Random vector\n", rand_vec)

# Random matrix, standard normal distribution
rand_matrix = np.random.randn(3, 4)
print("\n# Random matrix (normal dist)\n", rand_matrix)

# Array with all entries set to a fixed value
fixed_arr = np.full((3, 2), 77)
print("\n# All 77s\n", fixed_arr)

# Array with range and step
range_arr = np.arange(5, 50, 7)
print("\n# Range with step\n", range_arr)

# Equally spaced points in an interval
eq_space = np.linspace(2, 18, 9)
print("\n# Evenly spaced in [2,18]\n", eq_space)


# All zeros array
 [[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]

# All ones (3D)
 [[[1. 1.]
  [1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]
  [1. 1.]]]

# Identity matrix
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

# Random vector
 [0.23978397 0.41837078 0.82905372 0.6075684  0.41140772 0.13890524]

# Random matrix (normal dist)
 [[ 1.26101764 -1.16950645  0.60976757  0.9550554 ]
 [-0.73022155 -0.39450002  0.02619099 -0.25443079]
 [ 0.36582914 -0.17520133 -0.53313406 -0.35327963]]

# All 77s
 [[77 77]
 [77 77]
 [77 77]]

# Range with step
 [ 5 12 19 26 33 40 47]

# Evenly spaced in [2,18]
 [ 2.  4.  6.  8. 10. 12. 14. 16. 18.]


#Assignment


Use the [Wine Quality Red dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv) from UCI.

**Rules:**
- Use only NumPy (not Pandas, not scikit-learn, not SciPy)
- Download, load, and manipulate the data with NumPy only

###1. What is the shape of the data array? (How many wine samples and how many columns of features does it have?)

###2. Calculate the average (mean) quality score of the red wines. (Hint: this is the mean of the last column in the data array.)

###3. What are the minimum and maximum pH values observed in the dataset? (Use NumPy to find the smallest and largest pH.)
###4. Determine the mean and standard deviation of the alcohol content in these wines. (Calculate the average alcohol percentage and how much it varies.)
###5. List all the unique quality ratings present in the dataset. (What distinct quality values appear? Use np.unique on the quality column.)
###6. How many wine samples have a quality rating of 7? (Count the number of entries where the quality column is exactly 7.)
###7. How many wines have an alcohol content greater than 10%? (Hint: create a boolean mask for alcohol > 10 and sum it or use np.where.)
###8. What is the average citric acid concentration across all red wines? (Compute the mean of the citric acid column.)
###9. Determine the median residual sugar content in the dataset. (Hint: you can use np.median or np.percentile with 50th percentile to find the median residual sugar.)
###10. What is the 75th percentile (upper quartile) of the alcohol content? (In other words, 25% of the wines have an alcohol content above what value?)

###11. determine how many wines fall into each quality score category. (Find the count of samples for each unique quality value.) Which quality rating is most common in the red wine dataset?

###12. Which five wine samples have the highest alcohol content? Identify their alcohol values and corresponding quality scores. (Hint: use np.argsort to sort by the alcohol column and pick the top 5 entries.)
###13. Do wines with higher alcohol content tend to have higher quality? To investigate, compare the average quality of wines with above-average alcohol to those with below-average alcohol. (Calculate the mean quality for wines where alcohol is above the overall average, and compare it to the mean quality for wines where alcohol is below the average.)
###14. Compute the average alcohol percentage for each quality score in the dataset. (Group the data by the quality column and calculate the mean alcohol content for each quality value. Which quality level has the highest average alcohol content?)
###15. Define total acidity as the sum of fixed acidity and volatile acidity for each wine. First, calculate the total acidity for every sample. Next, add this as a new column to the dataset (so the array now has 13 columns). Which wine has the highest total acidity, and what is its quality rating? (Find the index of the max total acidity and check the quality at that index.)
###16. Compute the Pearson correlation coefficient between alcohol content and quality. (Use NumPy to see if higher alcohol correlates with higher quality. A positive correlation would indicate that as alcohol increases, quality tends to increase.)
###17. How many wines have quality >= 7 and alcohol > 10%? (In other words, count the wines that are high quality and also have relatively high alcohol. Use a boolean condition combining both criteria.)
###18.Which feature exhibits the greatest variability among all the wines? (Calculate the standard deviation of each feature column (consider the 11 physicochemical features) and identify which feature has the highest standard deviation.)
###19. Compare the mean volatile acidity of high-quality wines versus lower-quality wines. (Consider wines with quality ≥ 7 as "high quality" and those with quality ≤ 6 as "lower quality". Compute the average volatile acidity in each group. Do high-quality wines have lower volatile acidity on average?)
###20. What is the highest quality score attained by any red wine in the dataset, and how many samples achieved that score? (Find the maximum value in the quality column, and count how many wines have that maximum quality.)