# Technical test for a Python Dev Junior position
# Part 1: Data processing

<div style="text-align: justify;">

1. Create a function *`load_json_data(file_path: str) -> List[Dict[str, Any]]`* that loads data from the JSON file and returns a list of dictionaries, where each dictionary represents a flight record. Make sure to handle possible errors when loading the file and use the logging module to log the errors.  

2. Create a function *`load_csv_data(file_path: str) -> List[Dict[str, Any]]`* that loads data from the CSV file and returns a list of dictionaries, where each dictionary represents a flight record with passenger and revenue information. Implement error handling to deal with issues while reading the CSV file and use the logging module to log them.

3. Create a function *`parse_log_data(file_path: str) -> List[Dict[str, Any]]`* that parses the LOG file and returns a list of dictionaries, where each dictionary represents a flight record with date, flight ID, departure, and arrival airports information. Make sure to handle errors while parsing the LOG file and use the logging module to log the errors.
</div>

In [48]:
# Importing libraries needed
import json
import csv
import logging
import re
import pytest
# Type Hints: Allow us to use annotations to indicate the types of variables and function return values
# List type; Dictionary type; Any allows variables to be of any type
from typing import List, Dict, Any

## 1. Loading data from a JSON file
### Functions

<div style="text-align: justify;">

***`setup_custom_logger`*** creates and configures a custom logger for logging errors and messages. 
 
*<u>Parameters:</u> *`name`* of the logger, path to the *`log file`* where the log messages will be stored and *`logging level`* to be set for the logger. 

*<u>Actions:</u>

- Creates a *`formatter`* object with a specific log message format that includes a timestamp, log level, and log message content. 

*`asctime`* is a formar specifier used in the logging module. The format of asctime is as follows: yyyy-mm-dd HH:MM:SS.

- Creates a *`FileHandler`* object to define the log file where the log messages will be written. The file is opened in write mode *`a`*, (append) mode is used to add data to the end of the file without deleting the existing content. If the file does not exist, a new file will be created. This is useful if you want to maintain a continuous log history. Depending on the need, the *`w`* (write) mode can be used, which one the existing content in app.log will be removed, and only the new log messages will be written to the file.  

- Associates the Formatter created earlier with the FileHandler, specifying the log message format for the log file.  

- Creates a *`logger`* object with the given name, sets the logging level to the provided level, and adds the FileHandler to the logger. Now the logger is ready to log messages to the specified log file with the configured settings.  

- Returns the configured logger object.

***`load_json_data`*** loads data from a JSON file and returns a list of dictionaries, where each dictionary represents a flight record.

it defines a custom logger called json_logger to log errors or higher severity messages to the file *`app.log`*. Each time you run the code, it creates or overwrite the *`app.log`* file in the current directory and log any errors that occur during the execution of the load_json_data function. 

*<u>Parameter:</u> *`file_path`*: The path to the JSON file to be loaded.  

*<u>Actions:</u>

- Calls the setup_custom_logger function to create a logger named *`json_logger`*. This logger will log any errors that occur during the loading process to the log file *`app.log`*.  

- Attempts to open the specified JSON file using a with statement to ensure proper file handling.  

- Parses the JSON data from the file and stores it in a variable called data. Parses the JSON data: means to take the JSON data which is typically in string format and convert it into a Python data structure. 

- Extracts the list of dictionaries representing flight records from the data variable under the key *`flights`*.  

If any errors occur during the loading process (e.g., file not found, invalid JSON format), the function logs the specific error message using the json_logger created earlier. This allows you to track and diagnose any issues that may occur while loading the JSON data.

If no errors occur, the function returns the list of flight records obtained from the JSON data. This list of dictionaries contains the flight details such as flight ID, date, departure airport, arrival airport, duration, passengers, and revenue.

If there are errors during the loading process, the function returns an empty list ([]) to indicate that the loading was not successful.

</div>

In [49]:
def setup_custom_logger(name, log_file, level=logging.ERROR):
    formatter = logging.Formatter(fmt='%(asctime)s - %(levelname)s - %(message)s')
    handler = logging.FileHandler(log_file, mode='a')
    handler.setFormatter(formatter)
    logger = logging.getLogger(name)
    logger.setLevel(level)
    logger.addHandler(handler)
    return logger

# This function takes a string as input and returns a list of dictionaries
# Where keys are strings and values can be of any type
def load_json_data(file_path: str) -> List[Dict[str, Any]]:
    logger = setup_custom_logger('json_logger', 'app.log', level=logging.ERROR)
    
    try:
        with open(file_path, 'r') as file:
            # load() allows us to convert json data (strings) into Python objects
            data = json.load(file)
        return data['flights']
    except FileNotFoundError as e:
        logger.error(f"File '{file_path}' not found.")
    except json.JSONDecodeError as e:
        logger.error(f"Error decoding JSON in file '{file_path}': {e}")
    except Exception as e:
        logger.error(f"An unexpected error occurred while loading data: {e}")
    return []  # Return an empty list in case of any error.

# Using data from local base. Change this route if data is loaded from another location
file_path = '/Users/sheila/Desktop/Prueba_Tecnica/data.json'
# Save into a variable called flight_records the list of dictionaries to check the data
flight_records1 = load_json_data(file_path)
# Each flight record is a dictionary that contains information about a specific flight
# Use a for loop to print each flight on console and check the function's action:
for flight in flight_records1:
    print(flight)

{'Flight_ID': 'FL001', 'Date': '2023-05-05', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'JFK', 'Duration_Minutes': 154, 'Passengers': 232, 'Revenue': 14152}
{'Flight_ID': 'FL002', 'Date': '2023-04-11', 'Departure_Airport': 'ATL', 'Arrival_Airport': 'LAX', 'Duration_Minutes': 704, 'Passengers': 108, 'Revenue': 7344}
{'Flight_ID': 'FL003', 'Date': '2023-11-18', 'Departure_Airport': 'ORD', 'Arrival_Airport': 'SFO', 'Duration_Minutes': 244, 'Passengers': 160, 'Revenue': 13600}
{'Flight_ID': 'FL004', 'Date': '2023-05-02', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'SEA', 'Duration_Minutes': 148, 'Passengers': 115, 'Revenue': 4715}
{'Flight_ID': 'FL005', 'Date': '2023-03-17', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'DEN', 'Duration_Minutes': 623, 'Passengers': 142, 'Revenue': 7810}
{'Flight_ID': 'FL006', 'Date': '2023-07-25', 'Departure_Airport': 'JFK', 'Arrival_Airport': 'SFO', 'Duration_Minutes': 110, 'Passengers': 220, 'Revenue': 19800}
{'Flight_ID': 'FL007', 'Date': '2023-

## 2. Loading data from a CSV file
### Functions

<div style="text-align: justify;">

***`load_csv_data`***

*<u>Parameter:</u> *`file_path`*: The path to the CSV file to be loaded.  

For CSV data, each row has to be read it separately and create a dictionary for each flight record. 

It creates a list *`flight_records2`* to store these individual records.

During the parsing process, each row of the CSV file is read, converted into a dictionary representing a flight record, and then appended to the list *`flight_records2`* for further processing.

For JSON data, the entire data structure is loaded as a Python data structure, and there's no need for an intermediary list to store individual records. 

</div>

In [50]:
def load_csv_data(file_path: str) -> List[Dict[str, Any]]:
    logger = setup_custom_logger('csv_logger', 'app.log', level=logging.ERROR)

    flight_records2 = []
    try:
        with open(file_path, 'r', newline='') as csvfile:
            csv_reader = csv.DictReader(csvfile)
            for row in csv_reader:
                flight_record = {
                    'Flight_ID': row['Flight_ID'],
                    'Passengers': int(row['Passengers']),
                    'Revenue': int(row['Revenue'])
                }
                flight_records2.append(flight_record)
        return flight_records2
    except FileNotFoundError as e:
        logger.error(f"File '{file_path}' not found.")
    except csv.Error as e:
        logger.error(f"Error reading CSV file '{file_path}': {e}")
    except Exception as e:
        logger.error(f"An unexpected error occurred while loading data: {e}")
    return []  # Return an empty list in case of any error.

# Using data from local base. Change this route if data is loaded from another location
file_path = '/Users/sheila/Desktop/Prueba_Tecnica/data.csv'
flight_records2 = load_csv_data(file_path)

# Print the list of flight records to check function's action
for flight in flight_records2:
    print(flight)

{'Flight_ID': 'FL001', 'Passengers': 232, 'Revenue': 14152}
{'Flight_ID': 'FL002', 'Passengers': 108, 'Revenue': 7344}
{'Flight_ID': 'FL003', 'Passengers': 160, 'Revenue': 13600}
{'Flight_ID': 'FL004', 'Passengers': 115, 'Revenue': 4715}
{'Flight_ID': 'FL005', 'Passengers': 142, 'Revenue': 7810}
{'Flight_ID': 'FL006', 'Passengers': 220, 'Revenue': 19800}
{'Flight_ID': 'FL007', 'Passengers': 200, 'Revenue': 17000}
{'Flight_ID': 'FL008', 'Passengers': 142, 'Revenue': 7384}
{'Flight_ID': 'FL009', 'Passengers': 242, 'Revenue': 12584}
{'Flight_ID': 'FL010', 'Passengers': 126, 'Revenue': 3780}
{'Flight_ID': 'FL011', 'Passengers': 216, 'Revenue': 11016}
{'Flight_ID': 'FL012', 'Passengers': 248, 'Revenue': 10912}
{'Flight_ID': 'FL013', 'Passengers': 127, 'Revenue': 11176}
{'Flight_ID': 'FL014', 'Passengers': 198, 'Revenue': 12474}
{'Flight_ID': 'FL015', 'Passengers': 135, 'Revenue': 6345}
{'Flight_ID': 'FL016', 'Passengers': 154, 'Revenue': 12474}
{'Flight_ID': 'FL017', 'Passengers': 211, 'Rev

## 3. Loading data from a LOG file
### Functions

<div style="text-align: justify;">

***`parse_log_data`***

*<u>Parameter:</u> *`file_path`*: The path to the LOG file to be loaded.  

It also creates a list *`flight_records2`* to store individual records and uses `regular expressions` to find a match. For this purpose, it is used the module re. 

It assigns the result of the regular expression match to the variable *`match`*.

The *`re.match()`* function returns a match object if the regular expression pattern matches the beginning of the line. 

The regular expression pattern that the *`re.match()`* function will attempt to match against the line is:  
 
*`r'(\d{4}-\d{2}-\d{2}) (\w+) from (\w+) to (\w+) departed.'`* 

- *`(\d{4}-\d{2}-\d{2})`*: This part of the pattern matches the date in the format 'yyyy-mm-dd' and captures it using parentheses to create a group. The *`\d`* matches any digit, and *`{4}`*, *`{2}`*, and *`{2}`* specify the number of digits expected for the year, month, and day, respectively.

- *`(\w+)`*: This part of the pattern matches and captures one or more word characters (letters, digits, or underscores). This is used to capture the flight ID.

- *`from`*: This part of the pattern matches the literal word 'from'.

- *`(\w+)`*: This part of the pattern matches and captures one or more word characters. This is used to capture the departure airport.

- *`to`*: This part of the pattern matches the literal word 'to'.

- *`(\w+)`*: This part of the pattern matches and captures one or more word characters. This is used to capture the arrival airport.

- *`departed.`*: This part of the pattern matches the literal word 'departed.' at the end of the line.

*`line`* is the current line being processed from the log file. 

<u>After this line of code executes:</u>

If the regular expression pattern matches the line, the match variable will hold a match object, and the subsequent code block (the body of the if statement) will be executed.

If the regular expression pattern does not match the line, the match variable will be assigned None, and the subsequent code block (inside the else statement) will be executed.

In summary, it uses a regular expression pattern to match and capture specific parts of the line, which represents a log entry from the log file. If the log entry matches the expected format, the regular expression groups are used to extract the date, flight ID, departure airport, and arrival airport information. If the log entry doesn't match the expected format, an error message will be logged.

The error message is formatted using an f-string (f"Error parsing line: {line.strip()}").

*`line.strip()`* is used to remove whitespaces from the line. This ensures that any extra whitespaces around the log entry are not included in the error message.

</div>

In [51]:
def parse_log_data(file_path: str) -> List[Dict[str, Any]]:
    logger = setup_custom_logger('log_parser', 'app.log', level=logging.ERROR)

    flight_records3 = []
    try:
        with open(file_path, 'r') as log_file:
            for line in log_file:
                match = re.match(r'(\d{4}-\d{2}-\d{2}) (\w+) from (\w+) to (\w+) departed.', line)
                if match:
                    flight_record = {
                        'Date': match.group(1),
                        'Flight_ID': match.group(2),
                        'Departure_Airport': match.group(3),
                        'Arrival_Airport': match.group(4)
                    }
                    flight_records3.append(flight_record)
                else:
                    logger.error(f"Error parsing line: {line.strip()}")
        return flight_records3
    except FileNotFoundError as e:
        logger.error(f"File '{file_path}' not found.")
    except Exception as e:
        logger.error(f"An unexpected error occurred while parsing data: {e}")
    return []  # Return an empty list in case of any error

# Using data from local base. Change this route if data is loaded from another location
file_path = '/Users/sheila/Desktop/Prueba_Tecnica/data.log'

flight_records3 = parse_log_data(file_path)

# Print the list of flight records
for flight in flight_records3:
    print(flight)

{'Date': '2023-05-05', 'Flight_ID': 'FL001', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'JFK'}
{'Date': '2023-04-11', 'Flight_ID': 'FL002', 'Departure_Airport': 'ATL', 'Arrival_Airport': 'LAX'}
{'Date': '2023-11-18', 'Flight_ID': 'FL003', 'Departure_Airport': 'ORD', 'Arrival_Airport': 'SFO'}
{'Date': '2023-05-02', 'Flight_ID': 'FL004', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'SEA'}
{'Date': '2023-03-17', 'Flight_ID': 'FL005', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'DEN'}
{'Date': '2023-07-25', 'Flight_ID': 'FL006', 'Departure_Airport': 'JFK', 'Arrival_Airport': 'SFO'}
{'Date': '2023-07-16', 'Flight_ID': 'FL007', 'Departure_Airport': 'JFK', 'Arrival_Airport': 'SFO'}
{'Date': '2023-06-08', 'Flight_ID': 'FL008', 'Departure_Airport': 'SFO', 'Arrival_Airport': 'JFK'}
{'Date': '2023-03-30', 'Flight_ID': 'FL009', 'Departure_Airport': 'JFK', 'Arrival_Airport': 'SEA'}
{'Date': '2023-08-07', 'Flight_ID': 'FL010', 'Departure_Airport': 'SFO', 'Arrival_Airport': 'SFO'}
{'Date': '

# Part 2: Data Transformation

<div style="text-align: justify;">

1. Create a function *`combine_data(json_data: List[Dict[str, Any]], csv_data: List[Dict[str, Any]], log_data: List[Dict[str, Any]]) -> List[Dict[str, Any]]`* that combines the information from the three types of data into a list of dictionaries, where each dictionary represents a flight record with the complete information (Flight_ID, Date, Departure_Airport, Arrival_Airport, Duration_Minutes, Passengers, Revenue). Implement error handling to manage possible data combination issues and use the logging module to record any errors that occur.

2. Create a function *`calculate_revenue_per_passenger(flights_data: List[Dict[str, Any]]) -> None`* that adds the "Revenue_Per_Passenger" column to the list of dictionaries, where the value of this column is the result of dividing the revenue (Revenue) by the number of passengers (Passengers) for each flight. This function does not have an expected output as it directly modifies the flight data in place. Make sure to handle possible errors when calculating the "Revenue_Per_Passenger" column and use the logging module to record any issues.

</div>

### 1. Function: ***`combine_data`*** 

*<u>Parameters:</u> 
*`json_data`*: it includes the type, thus it expects a list of dictionaries as an input. 
*`csv_data`*: it also expects a list of dictionaries as an input. 
*`log_data`*: it also expects a list of dictionaries as an input. 

*<u>Action:</u> 

The function receives as inputs the list of dictionaries originated by `load_json_data`, `load_csv_data` and `parse_log_data` and merge the data from these three sources (JSON, CSV, and LOG data) to create a final list of dictionaries, where each dictionary represents a flight record with complete information, including 'Flight_ID', 'Date', 'Departure_Airport', 'Arrival_Airport', 'Duration_Minutes', 'Passengers', and 'Revenue'. The function handles possible errors while combining the data and logs any errors using the custom logger *`data_combiner`*.

It has similar structure but in this case, `list of comprehensions` are used within a for loop to match coincidences in all files for the same flight. Below the steps are commented in detailed within the code. 

In [52]:
def combine_data(json_data: List[Dict[str, Any]], csv_data: List[Dict[str, Any]], log_data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    # Step 1: create a custom logger to log any errors that ocurr during data combination 
    logger = setup_custom_logger('data_combiner', 'app.log', level=logging.ERROR)
    # Step 2: create an empty list to store the combined flight records
    combined_data = []
    try:
        # Step 3: create a for loop to go through each flight in the JSON data
        for json_flight in json_data:
            # Step 4: get the `flight_ID` 
            flight_id = json_flight['Flight_ID']
            # Step 5: find matching flight records in the CSV data based on the `flight_ID`
            matching_csv_flights = [flight for flight in csv_data if flight['Flight_ID'] == flight_id]
            # Step 6: find matching flight records in the LOG data based on the `flight_ID`
            matching_log_flights = [flight for flight in log_data if flight['Flight_ID'] == flight_id]
            # Step 7: check if matching flights were found in both files
            if not matching_csv_flights:
                # Step 8: if not found in CSV data, log an error
                logger.error(f"CSV data not found for Flight_ID '{flight_id}'. Skipping data combination.")
                continue
            if not matching_log_flights:
                # Step 9: if not found in LOG data, log an error
                logger.error(f"Log data not found for Flight_ID '{flight_id}'. Skipping data combination.")
                continue
            # Step 10: if flight is found in both files, then combine the flight records in a dictionary
            combined_flight = {
                'Flight_ID': flight_id,
                'Date': json_flight.get('Date'),
                'Departure_Airport': json_flight.get('Departure_Airport'),
                'Arrival_Airport': json_flight.get('Arrival_Airport'),
                'Duration_Minutes': json_flight.get('Duration_Minutes'),
                'Passengers': matching_csv_flights[0].get('Passengers'),
                'Revenue': matching_csv_flights[0].get('Revenue')
            }
            # Step 11: append the combined records to the list `combined_data`
            combined_data.append(combined_flight)
    # Step 12: if any unexpected error occurs during data combination, log the error
    except Exception as e:
        logger.error(f"An unexpected error occurred while combining data: {e}")
    # Step 13: return the list of combined flight records
    return combined_data

# Use of the function with the provided JSON, CSV, and log data:
json_data = flight_records1 # list of dictionaries representing JSON flight data
csv_data = flight_records2  # list of dictionaries representing CSV flight data
log_data = flight_records3   # list of dictionaries representing log flight data

combined_data = combine_data(json_data, csv_data, log_data)

# Print the list of combined flight records to check results
for flight in combined_data:
    print(flight)

{'Flight_ID': 'FL001', 'Date': '2023-05-05', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'JFK', 'Duration_Minutes': 154, 'Passengers': 232, 'Revenue': 14152}
{'Flight_ID': 'FL002', 'Date': '2023-04-11', 'Departure_Airport': 'ATL', 'Arrival_Airport': 'LAX', 'Duration_Minutes': 704, 'Passengers': 108, 'Revenue': 7344}
{'Flight_ID': 'FL003', 'Date': '2023-11-18', 'Departure_Airport': 'ORD', 'Arrival_Airport': 'SFO', 'Duration_Minutes': 244, 'Passengers': 160, 'Revenue': 13600}
{'Flight_ID': 'FL004', 'Date': '2023-05-02', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'SEA', 'Duration_Minutes': 148, 'Passengers': 115, 'Revenue': 4715}
{'Flight_ID': 'FL005', 'Date': '2023-03-17', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'DEN', 'Duration_Minutes': 623, 'Passengers': 142, 'Revenue': 7810}
{'Flight_ID': 'FL006', 'Date': '2023-07-25', 'Departure_Airport': 'JFK', 'Arrival_Airport': 'SFO', 'Duration_Minutes': 110, 'Passengers': 220, 'Revenue': 19800}
{'Flight_ID': 'FL007', 'Date': '2023-

### 2. Function: ***`calculate_revenue_per_passenger`*** 

<div style="text-align: justify;">

*<u>Parameter:</u> 

*`flights_data`*: it's a list of dictionaries representing flight records as an input.   

*`none`*: indicates that the function does not return any specific value. Instead, it directly modifies the flights_data list in place, adding the `Revenue_Per_Passenger` key to each flight record.

*<u>Action:</u> 

The function takes the data originated by `combine_data` and add the extra column requested (`Revenue_Per_Passenger`). 

If passengers or revenue is None, it means that the corresponding key (`Passengers` or `Revenue`) is not present in the flight record, or its value is not valid.

If passengers is 0 or a negative value, it means that the number of passengers is invalid for the flight.

If any of these conditions are met, the function cannot calculate the `Revenue_Per_Passenger` for the flight, so it logs an error. The use of None and the conditional check help ensure that the function handles cases where the required data is missing or invalid and avoids potential errors during the calculation.

</div>

In [53]:
def calculate_revenue_per_passenger(flights_data: List[Dict[str, Any]]) -> None:
    # Step 1: Create a custom logger to log any errors that occur during the calculation
    logger = setup_custom_logger('revenue_calculator', 'app.log', level=logging.ERROR)

    try:
        # Step 2: Loop through each flight record in the flights_data
        for flight in flights_data:
            passengers = flight.get('Passengers')
            revenue = flight.get('Revenue')

            # Step 3: Check if both 'Passengers' and 'Revenue' are present in the flight record
            if passengers is not None and revenue is not None and passengers > 0:
                # Step 4: Calculate "Revenue_Per_Passenger" by dividing 'Revenue' by 'Passengers'
                revenue_per_passenger = revenue / passengers

                # Step 5: Add "Revenue_Per_Passenger" key and value to the flight record
                flight['Revenue_Per_Passenger'] = revenue_per_passenger
            else:
                # Step 6: Log an error if any of the required data is missing or invalid
                logger.error(f"Invalid data for Flight_ID '{flight.get('Flight_ID')}'. Unable to calculate 'Revenue_Per_Passenger'.")

    except Exception as e:
        # Step 7: If any unexpected error occurs during calculation, log the error
        logger.error(f"An unexpected error occurred while calculating 'Revenue_Per_Passenger': {e}")

flights_data = combined_data  # list of dictionaries representing flight records

calculate_revenue_per_passenger(flights_data)

# Print the list of flight records with the new column `Revenue_Per_Passenger`
for flight in flights_data:
    print(flight)

{'Flight_ID': 'FL001', 'Date': '2023-05-05', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'JFK', 'Duration_Minutes': 154, 'Passengers': 232, 'Revenue': 14152, 'Revenue_Per_Passenger': 61.0}
{'Flight_ID': 'FL002', 'Date': '2023-04-11', 'Departure_Airport': 'ATL', 'Arrival_Airport': 'LAX', 'Duration_Minutes': 704, 'Passengers': 108, 'Revenue': 7344, 'Revenue_Per_Passenger': 68.0}
{'Flight_ID': 'FL003', 'Date': '2023-11-18', 'Departure_Airport': 'ORD', 'Arrival_Airport': 'SFO', 'Duration_Minutes': 244, 'Passengers': 160, 'Revenue': 13600, 'Revenue_Per_Passenger': 85.0}
{'Flight_ID': 'FL004', 'Date': '2023-05-02', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'SEA', 'Duration_Minutes': 148, 'Passengers': 115, 'Revenue': 4715, 'Revenue_Per_Passenger': 41.0}
{'Flight_ID': 'FL005', 'Date': '2023-03-17', 'Departure_Airport': 'LAX', 'Arrival_Airport': 'DEN', 'Duration_Minutes': 623, 'Passengers': 142, 'Revenue': 7810, 'Revenue_Per_Passenger': 55.0}
{'Flight_ID': 'FL006', 'Date': '2023-07-25'

# Part 3: Testing with pytest

<div style="text-align: justify;">

Implement tests for each of the previous functions using pytest. Make sure to cover different scenarios, such as empty data, data with errors, and valid data. Also, verify that the functions return the expected results. For functions without an expected output, ensure to check that the flight data has been modified correctly after calling the `calculate_revenue_per_passenger()` function.

Observations:
- The functions should be independent and testable, with no side effects.
- Use the `json`, `csv`, and file handling functions to load and parse the data.
- Implement proper error handling using `try` and `except` in the functions and utilize the `logging` module to log errors and issues encountered during data processing.
- The randomly generated data in the `generate_flight_data()` function from the previous question can also be used as input for the tests.

</div>

### Testing with pytest ***`load_json_data`*** function:

<div style="text-align: justify;">

1. `test_load_json_data_valid_file`: This test function verifies the behavior of the `load_json_data` function when it is provided with a valid JSON file. It loads the data from the JSON file and checks that the function returns a non-empty list.

2. `test_load_json_data_invalid_file`: This test function verifies the behavior of the `load_json_data` function when it is provided with a JSON file that is invalid. In this case, the function is expected to return an empty list.

3. `test_load_json_data_invalid_json`: This test function verifies the behavior of the `load_json_data` function when it is provided with a JSON file that contains invalid data, i.e., data that cannot be decoded as a valid Python object. Again, the function is expected to return an empty list.

These test functions help ensure that the `load_json_data` function handles various scenarios properly and returns the expected results, both for valid data and for incorrect data or non-existent files.

### Test Assertions in Python Testing

With `pytest`, you can define test functions using the def keyword and use assertions (assert statements) to check if the actual output of a function matches the expected output.

`assert` is a keyword in Python used for making assertions. When an assertion is evaluated, Python checks if the statement following `assert` is `True`. If it is true, the program continues executing normally. If the assertion is `False`, an `AssertionError` exception is raised, and the program stops.

- `isinstance(flight_records, list)`: `isinstance` is used to check if `flight_records` is an instance of the `list` class. In this case, we want to ensure that `flight_records` is a list since we expect the `load_json_data` function to return a list of dictionaries.

- `len(flight_records) > 0`: This assertion checks that the `flight_records` list has a length greater than zero. In other words, we are checking that the list is not empty, which implies that the `load_json_data` function has loaded valid data from the JSON file.

- `len(flight_records) == 0`: This assertion verifies that the `flight_records` list has a length equal to zero. In this case, we want to ensure that the list is empty, indicating that the `load_json_data` function has returned an empty list due to a non-existent JSON file or invalid data.

In summary, the `assert` statements in the test functions allow us to verify if the obtained results align with our expectations for the function being tested. If any of the assertions fail, an `AssertionError` is raised in the test function, indicating that there might be an issue with the function being tested or with the test data used. This helps us detect problems and errors in our code during the development and testing process.

For help on terminal: `python3 -m pydoc pytest`

</div>

In [54]:
# Test loading JSON data function

# def test_load_json_data_valid_file():
#     file_path = '/Users/sheila/Desktop/Prueba_Tecnica/data.json'  # Change this to your test data location
#     flight_records = load_json_data(file_path)
#     assert isinstance(flight_records, list)
#     assert len(flight_records) > 0

# def test_load_json_data_invalid_file():
#     file_path = '/Users/sheila/Desktop/Prueba_Tecnica/invalid_jsonfile.json'
#     flight_records = load_json_data(file_path)
#     assert isinstance(flight_records, list)
#     assert len(flight_records) == 0

# def test_load_json_data_invalid_json():
#     file_path = '/Users/sheila/Desktop/Prueba_Tecnica/invalid_jsondata.json'
#     flight_records = load_json_data(file_path)
#     assert isinstance(flight_records, list)
#     assert len(flight_records) == 0

### Testing with pytest the ***`load_csv_data`*** function

Same tests are carried out. 

In [55]:
# Test loading CSV data function

# def test_load_csv_data_valid_file():
#     file_path = '/Users/sheila/Desktop/Prueba_Tecnica/data.csv'  
#     flight_records = load_csv_data(file_path)
#     assert isinstance(flight_records, list)
#     assert len(flight_records) > 0

# def test_load_csv_data_invalid_file():
#     file_path = '/Users/sheila/Desktop/Prueba_Tecnica/invalid_csvfile.csv'
#     flight_records = load_csv_data(file_path)
#     assert isinstance(flight_records, list)
#     assert len(flight_records) == 0

# def test_load_csv_data_invalid_csv():
#     file_path = '/Users/sheila/Desktop/Prueba_Tecnica/invalid_csvdata.csv'
#     flight_records = load_csv_data(file_path)
#     assert isinstance(flight_records, list)
#     assert len(flight_records) == 0

### Testing with pytest the ***`parse_log_data`*** function

Same tests are carried out. 

In [56]:
# Test loading LOG data function

# def test_load_log_data_valid_file():  
#     file_path = '/Users/sheila/Desktop/Prueba_Tecnica/data.log'   
#     flight_records = parse_log_data(file_path) 
#     assert isinstance(flight_records, list) 
#     assert len(flight_records) > 0 

# def test_load_log_data_invalid_file():
#     file_path = '/Users/sheila/Desktop/Prueba_Tecnica/invalid_logfile.log'
#     flight_records = parse_log_data(file_path)
#     assert isinstance(flight_records, list)
#     assert len(flight_records) == 0

# def test_load_csv_data_invalid_log():
#     file_path = '/Users/sheila/Desktop/Prueba_Tecnica/invalid_logdata.log'
#     flight_records = parse_log_data(file_path)
#     assert isinstance(flight_records, list)
#     assert len(flight_records) == 0

### Testing with pytest the ***`combine_data`*** function

On this part, I try to verify if the combination of data loaded from prevoius funcions has been performed properly. 

For this purpose, the below tests has been designed. 

In [57]:
# Check if the length of the combined data is the same than the lengths of the original datasets
# def test_lenght_combine_data():
#     combined_data = combine_data(flight_records1, flight_records2, flight_records3)
#     assert len(combined_data) == len(flight_records1) == len(flight_records2) == len(flight_records3)

# # Check if the combined data has the correct structure 
# def test_keys_combine_data():
#     combined_data = combine_data(flight_records1, flight_records2, flight_records3)
#     expected_keys = {'Flight_ID', 'Date', 'Departure_Airport', 'Arrival_Airport', 'Duration_Minutes', 'Passengers', 'Revenue'}
#     assert all(set(record.keys()) == expected_keys for record in combined_data)

# # Chech if all `Flight_ID` are present in all files
# def get_flight_ids(records):
#     return [record['Flight_ID'] for record in records]

# def test_combine_data_with_valid_data()``
#     combined_data = combine_data(flight_records1, flight_records2, flight_records3)

#     # Obtain lists of `Flight_ID` from each file 
#     flight_ids_1 = get_flight_ids(flight_records1)
#     flight_ids_2 = get_flight_ids(flight_records2)
#     flight_ids_3 = get_flight_ids(flight_records3)
#     combined_flight_ids = get_flight_ids(combined_data)

#     # Check if the Flight_IDs of each file are the same 
#     assert set(flight_ids_1) == set(flight_ids_2) == set(flight_ids_3)

#     # Check if the Flight_IDs are the same in combined_data and the original files
#     assert set(combined_flight_ids) == set(flight_ids_1)

# # Check that there are not present duplicate data after the combination
# def test_no_duplicates_in_combined_data():
#     combined_data = combine_data(flight_records1, flight_records2, flight_records3)
#     ids = [record['Flight_ID'] for record in combined_data]
#     assert len(ids) == len(set(ids))

### Testing with pytest the ***`calculate_revenue_per_passenger`*** function

<div style="text-align: justify;">

On this section, I create test cases that cover different scenarios and verify if the function behaves as expected.

</div>

In [58]:
# Test when flights_data is an empty list
# def test_calculate_revenue_per_passenger_empty_data():
#     flights_data = []
#     calculate_revenue_per_passenger(flights_data)
#     assert len(flights_data) == 0  # The function should not add any columns

# # Test when flights_data contains valid data
# def test_calculate_revenue_per_passenger_valid_data():
#     # Create some sample flight data with 'Passengers' and 'Revenue'
#     flights_data = [
#         {'Flight_ID': 1, 'Passengers': 100, 'Revenue': 2000},
#         {'Flight_ID': 2, 'Passengers': 50, 'Revenue': 1000},
#     ]
#     calculate_revenue_per_passenger(flights_data)
#     # Check if the 'Revenue_Per_Passenger' column has been added and calculated correctly
#     assert 'Revenue_Per_Passenger' in flights_data[0]
#     assert 'Revenue_Per_Passenger' in flights_data[1]
#     assert flights_data[0]['Revenue_Per_Passenger'] == 20.0
#     assert flights_data[1]['Revenue_Per_Passenger'] == 20.0

# # Test when flights_data contains missing 'Passengers' or 'Revenue' information
# def test_calculate_revenue_per_passenger_missing_data():
#     # Create some sample flight data with missing 'Passengers' and 'Revenue'
#     flights_data = [
#         {'Flight_ID': 1, 'Revenue': 2000},
#         {'Flight_ID': 2, 'Passengers': 100, 'Revenue': None},
#     ]
#     calculate_revenue_per_passenger(flights_data)
#     # The function should not add the 'Revenue_Per_Passenger' column for missing data
#     assert 'Revenue_Per_Passenger' not in flights_data[0]
#     assert 'Revenue_Per_Passenger' not in flights_data[1]


### To run the test functions: 

<div style="text-align: justify;">

Some files have been created to mock the cases of having invalid files as input and files with invalid data to test with the logging module included in each function.

Additionally, a file `test_flights.py` has been created to make testing easier with pytest. This file includes the functions to test, as well as the test functions without comments. All explanations and clarifications are included in this notebook with the goal of making this project more didactic.

Later on, the command on the terminal `pytest test_flights.py` will run the testing.

Note: The `generate_flight_data()` function mentioned in part 3 has not been previously requested; therefore, nothing has been done with respect to it.

</div>