# Scientific Programming with Python Assignment: The Interaction Between Two Atoms

Author: **SciPro_19757**

Date: December 1, 2023 

Goal: The goal of this assignment is to understand the Numpy library [1], strengthen the knowledge about Pandas [2] and learn about a loss function using a given dataset.


### Problem and Input Data:

Weather researchers have developed a machine learning model for predicting rainfall and evaporation on various days across different locations in Australia [3, 4]. The collected experimental and model observables include the following parameters:

- Date: The date of observation.
- Location: The weather station location.
- MinTemp: Minimum temperature (℃).
- MaxTemp: Maximum temperature (℃).
- Rainfall: Rainfall amount in 24 hours (mm).
- Evaporation: Evaporation amount in 24 hours (mm).
- Sunshine: Sunshine duration in 24 hours (h).
- WindGustSpeed: Maximum wind gust speed in 24 hours (h).
- RainToday: Indicates whether it rained on that day (yes if precipitation ≥ 1 mm, no if precipitation < 1 mm).
-RainTomorrow: Indicates whether it will rain on the following day (yes if precipitation ≥ 1 mm, no if precipitation < 1 mm).

A loss function, as illustrated in equation (1), serves as a mathematical framework to assess the model's efficacy in predicting observables concerning target data [5].

$$ \text{Loss} = \alpha \cdot |R^{\text{Pred.}} - R^{\text{Exp.}}| + \beta \cdot |E^{\text{Pred.}} - E^{\text{Exp.}}| \ \ \ \ \ \ \ \ (1),$$ 

where $\alpha$ represents the rainfall weighting factor, $\beta$ is the evaporation weighting factor, $R^\text{Pred}$.​ and $R^\text{Exp}​$ are the predicted and experimental rainfall values, and $E^\text{Pred}$​ and $E^\text{Exp}$ are the corresponding evaporation values.

A loss function, as illustrated in equation (1), serves as a mathematical framework to assess the model's efficacy in predicting observables concerning target data [5]. The increased adoption of loss functions is attributed to the growing popularity of machine learning, necessitating the optimization of parameters to enhance modeling outcomes [6]. The target data will be used to accomplish the following tasks.

**Task 1** Read in the data contained in weather_experiment.csv and weather_prediction.csv files.

**Task 2** Create user-defined functions that encodes and computes the loss function (Equation 1), which:
1. performs the calculation using regular Python lists (i.e. do not use Numpy or ndarrays), and
2. performs the calculation using Numpy (i.e. maximizing the use of Numpy's library and performance).

**Task 3** Evaluate the speed performance between your Task 2 functions by computing the loss value for when $\alpha$ = $\beta$ = 0.5. Use the timeit library (i.e. timeit.timeit) for this, and assign its "number" parameter to 100. [7]



The initial step involves importing the necessary libraries.

In [607]:
import timeit

import numpy as np
import pandas as pd

To fulfill Task 1, the information is being assimilated into two dataframes. The first rows of the data will be displayed to demonstrate the successful transfer.

In [614]:
df_prediction = pd.read_csv("weather_prediction.csv")
df_experiment.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,RainToday,RainTomorrow
0,2009-01-01,Cobar,17.9,35.2,0.0,12.0,12.3,48.0,No,No
1,2009-01-02,Cobar,18.4,28.9,0.0,14.8,13.0,37.0,No,No
2,2009-01-04,Cobar,19.4,37.6,0.0,10.8,10.6,46.0,No,No
3,2009-01-05,Cobar,21.9,38.4,0.0,11.4,12.2,31.0,No,No
4,2009-01-06,Cobar,24.2,41.0,0.0,11.2,8.4,35.0,No,No


In [615]:
df_experiment = pd.read_csv("weather_experiment.csv")
df_experiment.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,RainToday,RainTomorrow
0,2009-01-01,Cobar,17.9,35.2,0.0,12.0,12.3,48.0,No,No
1,2009-01-02,Cobar,18.4,28.9,0.0,14.8,13.0,37.0,No,No
2,2009-01-04,Cobar,19.4,37.6,0.0,10.8,10.6,46.0,No,No
3,2009-01-05,Cobar,21.9,38.4,0.0,11.4,12.2,31.0,No,No
4,2009-01-06,Cobar,24.2,41.0,0.0,11.2,8.4,35.0,No,No


Let's proceed to implement both functions for Task 2. One function will leverage the computational capabilities of NumPy arrays and functions to calculate the loss function, while the other function will rely on standard Python lists for the same purpose.

In [610]:
def loss_with_numpy_arrays(alpha: float,
                            beta: float,
                            rainfall_prediction: np.ndarray,
                            rainfall_experiment: np.ndarray,
                            evaporation_prediction: np.ndarray,
                            evaporation_experiment: np.ndarray):
    '''
    Calculate the loss using NumPy arrays based on the specified formula.

    Parameters:
        - alpha (float): Rainfall weighting factor.
        - beta (float): Evaporation weighting factor.
        - rainfall_prediction (np.ndarray): Predicted rainfall values.
        - rainfall_experiment (np.ndarray): Experimental rainfall values.
        - evaporation_prediction (np.ndarray): Predicted evaporation values.
        - evaporation_experiment (np.ndarray): Experimental evaporation values.
    Returns:
        np.ndarray: Loss calculated using the provided formula.
    '''
    if not all(isinstance(val, (float, np.array)) for val in
               [alpha, beta, rainfall_prediction, rainfall_experiment, evaporation_prediction, evaporation_experiment]):
        raise TypeError("Input values must be of type float or np.array")
    loss = alpha*np.abs(rainfall_prediction - rainfall_experiment) \
        + beta*np.abs(evaporation_prediction - evaporation_experiment)
    return loss

In [611]:
def loss_with_lists(alpha: float,
                    beta: float,
                    rainfall_prediction: list,
                    rainfall_experiment: list,
                    evaporation_prediction: list,
                    evaporation_experiment: list) -> list:
    """
    Calculate the loss using lists based on the specified formula.

    Parameters:
    - alpha (float): Rainfall weighting factor.
    - beta (float): Evaporation weighting factor.
    - rainfall_prediction (list): Predicted rainfall values.
    - rainfall_experiment (list): Experimental rainfall values.
    - evaporation_prediction (list): Predicted evaporation values.
    - evaporation_experiment (list): Experimental evaporation values.

    Returns:
    List[float]: Loss calculated using the provided formula.
    """
    if not all(isinstance(val, (float, list)) for val in
               [alpha, beta, rainfall_prediction, rainfall_experiment, evaporation_prediction, evaporation_experiment]):
        raise TypeError("Input values must be of type float or list")
    loss_list = []
    for i in range(min(len(rainfall_prediction),
                      len(rainfall_experiment),
                      len(evaporation_experiment),
                      len(evaporation_experiment))):
        loss_list.append(alpha*abs(rainfall_prediction[i] - rainfall_experiment[i]) \
                         + beta*abs(evaporation_prediction[i] - evaporation_experiment[i]))
    return loss_list

In concluding the assignment, the loss functions will compute the loss for provided data based on expected and actual rainfall and evaporation. The efficiency of each implementation will be assessed using Timeit to determine performance superiority.

In [612]:
timeit_numpy_arrays = timeit.timeit(lambda: 
              loss_with_numpy(alpha=0.5,
                                 beta=0.5,
                                 rainfall_prediction=df_prediction["Rainfall Pred."].to_numpy(),
                                 rainfall_experiment=df_experiment["Rainfall"].to_numpy(),
                                 evaporation_prediction=df_prediction["Evaporation Pred."].to_numpy(),
                                 evaporation_experiment=df_experiment["Evaporation"].to_numpy()
                             ),
              number=100
             )

print(f'The Loss function using Numpy arrays took: {timeit_numpy_arrays} seconds.')

The Loss function using Numpy arrays took: 0.36826619994826615 seconds.


In [613]:
timeit_python_lists = timeit.timeit(lambda: 
              loss_with_lists(alpha=0.5,
                              beta=0.5,
                              rainfall_prediction=df_prediction["Rainfall Pred."].tolist(),
                              rainfall_experiment=df_experiment["Rainfall"].tolist(),
                              evaporation_prediction=df_prediction["Evaporation Pred."].tolist(),
                              evaporation_experiment=df_experiment["Evaporation"].tolist()
                             ),
              number=100
             )
print(f'The Loss function using Python lists took {timeit_python_lists} seconds.')

The Loss function using Python lists took 3.4708454000065103 seconds.


**References**  
  

1. Harris, C.R., Millman, K.J.., van der Walt, S.J. et al. Array programming with NumPy. Nature, 585 (2020) 357-362 (DOI: 10.1038/s41586-020-2649-2)

2. The Pandas Development Team pandas-dev/pandas: Pandas Zenodo, 2023 (https://pandas.pydata.org/)
3. Oswal, N. Predicting Rainfall using Machine Learning Techniques. arXiv, 2019 (https://pandas.pydata.org)
4. Joe Young and Adam Young. Rain in Australia, Kaggle https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package?resource=download&select=weatherAus.csv. Online; accessed on December 1, 2023
5. Wikipedia contributors, Loss function, https://en.wikipedia.org/wiki/Loss_function. Online; accessed on December 1, 2023
6. Datarobot, Introduction to Loss Functions, updated on March 26, 2021 https://www.datarobot.com/blog/introduction-to-loss-function/. Online; accessed on December 1, 2023
8. Karl N. "Scientific Programming with Python Assignment: The Performance of Numpy Versus Regular Python Lists when Computing a Loss Function". Department of Computer Science, University of Applied Sciences Bonn Sankt Augustin, Germany, 27 November 2023