# OilDesk-Python-Assessment: Question 5

Rewritten code to load the data from the CSV file. I chose this code block as thinking about what unique problems with datasets, in terms of missing values or other errors, really interested me. I have described the increased maintainability and modularity in markdown cells above each code block.

In [1]:
import pandas as pd
import logging

## 1. Take one of your previously written code blocks and refactor it to be more maintable and modular. Explain your decisions

Created **`csv_file_path`** variable rather than passing file path directly to **`read_csv()`** function to ensure code readability, maintenance and flexibility (if referencing file path multiple times, changing file location or loading different file) and error handling before attempting to load file if the file path is wrong. 

In [2]:
csv_file_path = "../data/MarketData.csv"

Defined a custom **`read_csv_file`**, which uses a try-except block and Python's **`logging`** module to handle and log the following errors currently:
- **`UnicodeDecodeError`** - function also specifies the widely-used **`utf-8`** encoding be used in case the CSV file contains non-ASCII characters.
- **`FileNotFoundError`** - checks if the file exists at the specified path and logs .
 - Unexpected errors via **`except Exception as e`**.
 - Handles missing values in the dataset by listing those should be treated as missing values. Easily extended to include other values commonly seen.
 
The list above is not exhaustive and can easily be amended to handle more errors for business-specific contexts by including more **`except`** statements.
     



In [3]:
def read_csv_file(file_path):
    try:
        df = pd.read_csv(file_path, encoding='utf-8', na_values=['NA', 'N/A'])
        logging.info("CSV file loaded successfully.")
        return df
    except FileNotFoundError:
        logging.error("File not found: %s", file_path)
        return None
    except UnicodeDecodeError:
        logging.error("Unable to decode file with UTF-8 encoding: %s", file_path)
        return None
    except Exception as e:
        logging.error("An unexpected error occurred: %s", str(e))
        return None

**`read_csv_file()`** uses logging since this affords more flexibility (can log messages to various destinations without changing the logging statements), persistence (logs can be written to a file) and integration (with monitoring tools or dashboards for example). However, the extra if-else block underneath can confirm whether or not the CSV file has been loaded successfully using **`print()`** statements for speed and convenience in a Jupyter notebook. 

In [4]:
# Confirms if CSV file successfully loaded to pandas DataFrame or not
df = read_csv_file(csv_file_path)
if df is not None:
    print("Data successfully loaded.")
else:
    print("Error loading data. Please check logs for details.")

Data successfully loaded.
