### NumPy Array

In this mini project, you are provided with a dataset representing the daily sales (in USD) of 6 different products recorded over a period of 10 consecutive days. Each day's sales are structured as a row in a 2D NumPy array, with each column corresponding to a specific product. The objective is to analyze this sales data using Python and NumPy by performing various operations such as calculating total and average sales, identifying trends, comparing product performance, and extracting useful insights to support data-driven decision-making.



In [1]:
import numpy as np

sales_data = np.array([
    [200, 220, 250, 210, 180, 190],  # Day 1
    [230, 240, 260, 200, 195, 205],  # Day 2
    [210, 215, 255, 220, 185, 200],  # Day 3
    [205, 225, 270, 215, 190, 195],  # Day 4
    [215, 230, 265, 225, 200, 210],  # Day 5
    [225, 235, 275, 230, 205, 215],  # Day 6
    [235, 245, 280, 240, 210, 220],  # Day 7
    [245, 255, 290, 250, 215, 225],  # Day 8
    [255, 265, 300, 260, 220, 230],  # Day 9
    [265, 275, 310, 270, 225, 235]   # Day 10
])

#### Q1. Mathematical Operations
 *  a. Calculate the total sales per product over 10 days
 *  b. Calculate the average daily sales per product
 *  c. Increase each sale by a 5% commission using broadcasting
 *  d. Apply the square root to all sales values (for testing purposes)

In [2]:
# a. Calculate the total sales per product over 10 days

total_sales_per_product = np.sum(sales_data,axis=0)
for i,total in enumerate(total_sales_per_product , start=1):
    print(f"Product {i} : ${total}")

Product 1 : $2285
Product 2 : $2405
Product 3 : $2755
Product 4 : $2320
Product 5 : $2025
Product 6 : $2125


In [3]:
# b. Calculate the average daily sales per product

average_daily_sales_per_product = np.mean(sales_data,axis=0)
for i,total in enumerate(average_daily_sales_per_product , start=1):
    print(f"Product {i} : ${total}")

Product 1 : $228.5
Product 2 : $240.5
Product 3 : $275.5
Product 4 : $232.0
Product 5 : $202.5
Product 6 : $212.5


In [4]:
# c. Increase each sale by a 5% commission using broadcasting

sales_with_commission = sales_data * 0.05
print(sales_with_commission)

[[10.   11.   12.5  10.5   9.    9.5 ]
 [11.5  12.   13.   10.    9.75 10.25]
 [10.5  10.75 12.75 11.    9.25 10.  ]
 [10.25 11.25 13.5  10.75  9.5   9.75]
 [10.75 11.5  13.25 11.25 10.   10.5 ]
 [11.25 11.75 13.75 11.5  10.25 10.75]
 [11.75 12.25 14.   12.   10.5  11.  ]
 [12.25 12.75 14.5  12.5  10.75 11.25]
 [12.75 13.25 15.   13.   11.   11.5 ]
 [13.25 13.75 15.5  13.5  11.25 11.75]]


In [5]:
# d. Apply the square root to all sales values (for testing purposes)

square_root_on_sales_values = np.round(np.sqrt(sales_data),2)
print(square_root_on_sales_values)

[[14.14 14.83 15.81 14.49 13.42 13.78]
 [15.17 15.49 16.12 14.14 13.96 14.32]
 [14.49 14.66 15.97 14.83 13.6  14.14]
 [14.32 15.   16.43 14.66 13.78 13.96]
 [14.66 15.17 16.28 15.   14.14 14.49]
 [15.   15.33 16.58 15.17 14.32 14.66]
 [15.33 15.65 16.73 15.49 14.49 14.83]
 [15.65 15.97 17.03 15.81 14.66 15.  ]
 [15.97 16.28 17.32 16.12 14.83 15.17]
 [16.28 16.58 17.61 16.43 15.   15.33]]


#### Q2. Broadcasting Concepts
* a. Create a 1D and using broadcasting.
  bonus_array = np.array([10, 20, 15, 25, 30, 5]) add it to each day’s sales

* b. Add a $50 flat bonus to each sale using broadcasting.

In [6]:
# a. Create a 1D and using broadcasting. bonus_array = np.array([10, 20, 15, 25, 30, 5]) add it to each day’s sales

bonus_array = np.array([10, 20, 15, 25, 30, 5])

sales_data_with_bonus = sales_data + bonus_array

print(sales_data_with_bonus)

[[210 240 265 235 210 195]
 [240 260 275 225 225 210]
 [220 235 270 245 215 205]
 [215 245 285 240 220 200]
 [225 250 280 250 230 215]
 [235 255 290 255 235 220]
 [245 265 295 265 240 225]
 [255 275 305 275 245 230]
 [265 285 315 285 250 235]
 [275 295 325 295 255 240]]


In [7]:
# b. Add a $50 flat bonus to each sale using broadcasting.

flat_bonus = sales_data + 50
print(flat_bonus)

[[250 270 300 260 230 240]
 [280 290 310 250 245 255]
 [260 265 305 270 235 250]
 [255 275 320 265 240 245]
 [265 280 315 275 250 260]
 [275 285 325 280 255 265]
 [285 295 330 290 260 270]
 [295 305 340 300 265 275]
 [305 315 350 310 270 280]
 [315 325 360 320 275 285]]


#### Q3. Statistical Analysis
* a. Find the mean, median, variance, and standard deviation of the entire dataset.
* b. Find the maximum and minimum sale value and calculate the range.
* c. Calculate the interquartile range (IQR) of all values.

In [8]:
# a. Find the mean, median, variance, and standard deviation of the entire dataset.

mean=np.round(np.mean(sales_data),2)
median=np.round(np.median(sales_data),2)
varience=np.round(np.var(sales_data),2)
standard_deviation=np.round(np.std(sales_data),2)

print("Mean : ",mean)
print("Median : ",median)
print("Varience : ",varience)
print("Standard deviation : " ,standard_deviation)

Mean :  231.92
Median :  225.0
Varience :  870.08
Standard deviation :  29.5


In [9]:
# b. Find the maximum and minimum sale value and calculate the range.

maximum = np.round(np.max(sales_data),2) 
minimum = np.round(np.min(sales_data),2)
sale_range = np.round(maximum-minimum,2)

print("Maximum : " , maximum)
print("Minimum : " , minimum)
print("Range : " , sale_range)

Maximum :  310
Minimum :  180
Range :  130


In [10]:
# c. Calculate the interquartile range (IQR) of all values.

# Flatten the 2D array to 1D for global IQR
flat_sales = sales_data.flatten()

# Calculate Q1 and Q3
q1 = np.percentile(flat_sales, 25)
q3 = np.percentile(flat_sales, 75)

# Calculate IQR
iqr = q3 - q1

print(f"Q1 (25th percentile): {q1}")
print(f"Q3 (75th percentile): {q3}")
print(f"Interquartile Range (IQR): {iqr}")

Q1 (25th percentile): 210.0
Q3 (75th percentile): 255.0
Interquartile Range (IQR): 45.0


#### Q4. Logical & Comparison Operations
* a. Identify and list all sales greater than $250.
  
* b. Replace all values greater than $300 with 300 (cap the max sale).

  
* c. Count how many times sales were between $200 and $250.

  
* d. Create a new array showing only the sales below the mean.

In [11]:
# a. Identify and list all sales greater than $250.

sales_greater_then_250 = sales_data[sales_data>250]
print(f"Sales Data Greater Then 250 : {sales_greater_then_250}")

Sales Data Greater Then 250 : [260 255 270 265 275 280 255 290 255 265 300 260 265 275 310 270]


In [12]:
# b. Replace all values greater than $300 with 300 (cap the max sale).

sales_data[sales_data > 300] = 300

print(f"Sales data after capping values above $300: {sales_data}")

Sales data after capping values above $300: [[200 220 250 210 180 190]
 [230 240 260 200 195 205]
 [210 215 255 220 185 200]
 [205 225 270 215 190 195]
 [215 230 265 225 200 210]
 [225 235 275 230 205 215]
 [235 245 280 240 210 220]
 [245 255 290 250 215 225]
 [255 265 300 260 220 230]
 [265 275 300 270 225 235]]


In [13]:
# c. Count how many times sales were between 200 and 250.

sales_count = np.count_nonzero((sales_data > 200) & (sales_data < 250))
print(f"The Count Of Sales Between 200 and 250 : {sales_count} ")

The Count Of Sales Between 200 and 250 : 32 


In [14]:
# d. Create a new array showing only the sales below the mean.

sales_below_mean = sales_data[sales_data<np.mean(sales_data)]
print(f"Sales Data lesser Then Mean : {sales_below_mean} ")

Sales Data lesser Then Mean : [200 220 210 180 190 230 200 195 205 210 215 220 185 200 205 225 215 190
 195 215 230 225 200 210 225 230 205 215 210 220 215 225 220 230 225] 


#### Q5. Searching, Sorting, and Final Summary
* a. Sort the sales of Day 5 in ascending order.
* b. Find the day (row index) with the highest total sales.
* c. Calculate column-wise means (average per product).
* d. Calculate row-wise means (average per day).
* e. Print the overall average sales.

In [15]:
# a. Sort the sales of Day 5 in ascending order.

day5_sorted = np.sort(sales_data[4])

print(f"Day 5 sales sorted in ascending order: {day5_sorted}")

Day 5 sales sorted in ascending order: [200 210 215 225 230 265]


In [16]:
# b. Find the day (row index) with the highest total sales.

highest_row_index = np.argmax(sales_data)
print(f"Day (row index) with the highest total sales : {highest_row_index}")

Day (row index) with the highest total sales : 50


In [17]:
# c. Calculate column-wise means (average per product).

average_column_wise_sales_per_product = np.mean(sales_data,axis=0)
for i,total in enumerate(average_column_wise_sales_per_product , start=1):
    print(f"Product {i} : ${total}")

Product 1 : $228.5
Product 2 : $240.5
Product 3 : $274.5
Product 4 : $232.0
Product 5 : $202.5
Product 6 : $212.5


In [18]:
# d. Calculate row-wise means (average per day).

average_column_wise_sales_per_product = np.round(np.mean(sales_data,axis=1),2)
for i,total in enumerate(average_column_wise_sales_per_product , start=1):
    print(f"Product {i} : ${total}")

Product 1 : $208.33
Product 2 : $221.67
Product 3 : $214.17
Product 4 : $216.67
Product 5 : $224.17
Product 6 : $230.83
Product 7 : $238.33
Product 8 : $246.67
Product 9 : $255.0
Product 10 : $261.67


In [19]:
# e. Print the overall average sales.

overall_average_sales = np.round(np.mean(sales_data))
print(f"Overall Average Sales : {overall_average_sales}")

Overall Average Sales : 232.0


#### Q6. Bonus
* Create a function highlight_outliers(data) that returns all values 2 Standard Deviation above or below the mean.

In [20]:
 def highlight_outliers(data):
    mean = np.mean(data)
    std_dev = np.std(data)
    
    # Define bounds for outliers
    lower_bound = mean - 2 * std_dev
    upper_bound = mean + 2 * std_dev
    
    # Return values outside the 2 standard deviation range
    outliers = data[(data < lower_bound) | (data > upper_bound)]
    return f"Outliers : {outliers}"

highlight_outliers(sales_data)

'Outliers : [290 300 300]'