# Foundations of Data Science - CMU Portugal Academy

> In this second lab, we will focus on using numpy features to solve a set of problems. You should use numpy operations as much as possible, in all exercises. 
> 
> Instructors:
>   - David Semedo (df.semedo@fct.unl.pt)
>   - Rafael Ferreira (rah.ferreira@fct.unl.pt)
> 

In [1]:
import numpy as np

## Ex. 1 - Computing Project Costs with Matrix Multiplication


You are working for a construction company that handles multiple projects, each requiring different quantities of materials like steel, cement, and wood. Each material has a specific cost per unit, and your task is to help the project managers quickly estimate the total cost for each project.

Using the provided data on material requirements and costs,use matrix multiplication to compute the total cost for each project. 

This will help the company plan budgets more effectively and ensure that projects stay within financial limits.



In [2]:
# Material requirements for each project (rows: projects, columns: materials)
material_requirements = np.array([[10, 5, 2],  # Project 1
                                  [7, 8, 3],   # Project 2
                                  [5, 4, 1],   # Project 3
                                  [12, 7, 4]]) # Project 4

# Costs per unit for each material
material_costs = np.array([100, 50, 25])  # Cost for Material 1, Material 2, Material 3

In [3]:
total_costs = np.dot(material_requirements, material_costs)

In [4]:
# Print the results
print("Total costs for each project: ", total_costs)

Total costs for each project:  [1300 1175  725 1650]


In [5]:
# validate the results
assert np.all(total_costs == np.array([1300, 1175,  725, 1650])), "Try Again!"

## Ex. 2 - Finding the Peak Temperatures Over the Year, for each city


You are tasked with analyzing temperature data for multiple cities during a full year. 
Each city has provided daily temperature recordings for 365 days, and your goal is to identify the hottest day of the year, and its corresponding temperature, of each city. 

Organize the data in a dictionary and use NumPy and argmax to efficiently find the peak temperatures and when they occurred. Your function should return a dictionary, in which keys are city names (list named `cities`), and values are another dictionary of the form `{"day": day_number, "temperature": temperature_reading}`. In the latter, `day_number` should be a number between 1 and 365, and `temperature_reading` a float with the corresponding temperature.

In [6]:
np.random.seed(42) # Fix the seed for reproducibility

cities = ['Lisbon', 'Porto', 'Sintra', 'Faro', 'Coimbra']

city_temperatures = np.random.uniform(low=-10, high=50, size=(len(cities), 365,))
city_temperatures = np.round(city_temperatures, 2)
print("City_temperatures shape: ", city_temperatures.shape)

City_temperatures shape:  (5, 365)


In [7]:
import numpy as np

def analyze_peak_temperatures(temperatures):

    results = {} 

    peak_indexes = np.argmax(temperatures, axis=1)

    for i in range(len(cities)):
        city_data = {
            "day": peak_indexes[i] + 1,
            "temperature": np.round(temperatures[i, peak_indexes[i]], 2)
        }
        results[cities[i]] = city_data

    return results


In [8]:
temps_dict = analyze_peak_temperatures(city_temperatures)
print(temps_dict)

{'Lisbon': {'day': 262, 'temperature': 49.4}, 'Porto': {'day': 167, 'temperature': 49.98}, 'Sintra': {'day': 325, 'temperature': 49.88}, 'Faro': {'day': 115, 'temperature': 49.83}, 'Coimbra': {'day': 5, 'temperature': 49.96}}


In [9]:
# validate the answer
assert temps_dict == {'Lisbon': {'day': 262, 'temperature': 49.4}, 'Porto': {'day': 167, 'temperature': 49.98}, 'Sintra': {'day': 325, 'temperature': 49.88}, 'Faro': {'day': 115, 'temperature': 49.83}, 'Coimbra': {'day': 5, 'temperature': 49.96}}

## Ex. 3 -  Salesperson's Performance and Income

In a competitive tech company, the sales team is challenged to maximize revenue by selling three key products: smartphones, laptops, and tablets. Each salesperson's performance will be evaluated based on the revenue they generate and the total units they sell.

Salespeople who sell more than 100 units receive a higher commission (10%), while those below 100 units receive 5%. Write a function that:

* Computes each salesperson's total units sold.
* Assigns the correct commission rate.
* Computes the total revenue based on product sales and apply the commission.

Hint: Consider the `np.where` (https://numpy.org/doc/stable/reference/generated/numpy.where.html) function.

In [10]:
# Sales data: each row is a salesperson, each column is the number of units sold for a specific product

sales_data = np.array([[50, 30, 20],
                       [100, 200, 150],
                       [70, 40, 30],
                       [90, 100, 110]])

# Prices of the products
prices = np.array([20, 30, 50])

In [11]:
def calculate_sales_income(sales_data, prices):
    
    # Calculate the total units sold by each salesperson
    total_units_sold = np.sum(sales_data, axis=1)
    
    # Assign commission based on total units sold
    commissions = np.where(total_units_sold > 100, 0.10, 0.05)
    
    # Calculate the total revenue for each salesperson (sales * prices)
    total_revenue = np.dot(sales_data, prices)
    
    # Calculate the final income for each salesperson
    final_income = total_revenue * (1.0 + commissions)
    
    return final_income, total_revenue, commissions

In [12]:

final_income, total_revenue, commissions = calculate_sales_income(sales_data, prices)

print("Total revenue for each salesperson: ", total_revenue)
print("Commission rates: ", commissions)
print("Final income for each salesperson: ", final_income)

Total revenue for each salesperson:  [ 2900 15500  4100 10300]
Commission rates:  [0.05 0.1  0.1  0.1 ]
Final income for each salesperson:  [ 3045. 17050.  4510. 11330.]


In [13]:
# validate answer
assert np.all(total_revenue == np.array([ 2900, 15500,  4100, 10300])), "Try Again!"
assert np.all(commissions == np.array([0.05, 0.1, 0.1, 0.1])), "Try Again!"
assert np.all(np.round(final_income) == np.array([ 3045., 17050.,  4510., 11330.])), "Try Again!"

In [14]:
# validate answer
assert np.all(total_revenue == np.array([ 2900, 15500,  4100, 10300])), "Try Again!"
assert np.all(commissions == np.array([0.05, 0.1, 0.1, 0.1])), "Try Again!"
assert np.all(np.round(final_income) == np.array([ 3045., 17050.,  4510., 11330.])), "Try Again!"

## Ex. 4 - Analyzing Sensor Data for a Space Mission


You are a data scientist working for a space agency, and you’ve just received sensor data from a probe exploring a distant planet. The probe is equipped with three sensors that measure different environmental conditions, such as radiation levels, temperature, and atmospheric pressure.

The mission control team has tasked you with analyzing the data to find safe areas where the probe can land. Based on the data, safe landing zones must meet the following criteria:

* Radiation levels (first sensor) must be below a certain threshold (values greater than 0.5 indicate high radiation and should be avoided).
* Temperature (second sensor) must be stable and below a critical point (anything above 0.75 could indicate volatile conditions).

Filter the sensor data to find rows that represent safe landing zones, allowing the team to focus on those areas for further exploration. 

You should use indexing through multiplication to efficiently filter the data based on these two conditions.

In [15]:
np.random.seed(42)  # For reproducibility
sensor_data = np.random.rand(1000, 3)  # Generate 500 random landing zone measurements from 3 sensors

In [16]:
# Your code goes here

def find_safe_landing_zones(sensor_readings):

    radiation_safe = sensor_readings[:, 0] <= 0.5  # Condition for safe radiation levels
    temperature_stable = sensor_readings[:, 1] < 0.75  # Condition for stable temperature

    safe_zones = radiation_safe * temperature_stable  # Only areas that meet both conditions

    safe_landing_zones = sensor_readings[safe_zones == 1]  # Extract rows that satisfy both conditions

    return safe_landing_zones


In [17]:
safe_landing_zones = find_safe_landing_zones(sensor_data)
print("Safe landing zones (filtered sensor data):\n", safe_landing_zones)
print("Safe landing zones shape: ", safe_landing_zones.shape)
print(f"Number of safe landing zones: {safe_landing_zones.shape[0]}")

Safe landing zones (filtered sensor data):
 [[0.18340451 0.30424224 0.52475643]
 [0.43194502 0.29122914 0.61185289]
 [0.13949386 0.29214465 0.36636184]
 ...
 [0.34534195 0.33561045 0.97852547]
 [0.38051771 0.16303534 0.78620565]
 [0.30978786 0.29004553 0.87141403]]
Safe landing zones shape:  (355, 3)
Number of safe landing zones: 355


In [18]:
# validate the answer
assert safe_landing_zones.shape == (355, 3), "Try Again!"
assert len(safe_landing_zones) == 355, "Try Again!"