***Name: Mubanga Nsofu***


***Date: 13th July 2024***


***Institution: Nexford University***


***Lecturer: Prof. Wanjiku***


***Course: BAN 6420 (Programming in R and Python)***


***Assignment: Module 2***

***Assignment Solution Starts Here***

***1.0 Install library for fast Exploratory Data Analysis in Python***

In [None]:
# Let us install a library that will facilitate EDA before we proceed writing the Salary function
	
!pip install ydata-profiling[notebook]

***2. Let us load the libraries needed for this assignment***

In [1]:
# Import necessary libraries

import pandas as pd  # pandas is used for data manipulation and analysis
import os  # Used for interacting with the operating system with respect to file management e.g. removing files.
import zipfile # For creating Zip files as requested in the assignment question
from ydata_profiling import ProfileReport # For automated Exploratory Data Analysis (EDA)


***3.0 Check working directory and ensure dataset is in the working directory***

In [84]:
# To check your working directory run the following commands

# get the current working directory
current_working_directory = os.getcwd()

# print output and then replace directory_location with the output of where the file lives
print(current_working_directory)





D:\Nexford\MSDA\MSDA Modules\BAN6420 Python and R Programming\From Github\Assignment 2 Submission
D:\Nexford\MSDA\MSDA Modules\BAN6420 Python and R Programming\From Github\Assignment 2 Submission


***Read in the employee dataset as a pandas dataframe called employee_df***

In [None]:
# Example : file_path = "Your/Path/Total.csv"

file_path = "D:/Nexford/MSDA/MSDA Modules/BAN6420 Python and R Programming/From Github/Assignment 2 Submission/Total.csv"


# Read the CSV file 
employee_df = pd.read_csv(file_path)

# Display the first few rows of the DataFrame
print(employee_df.head())

***4.0 Let us have a quick view using EDA, as this is good practice in Data Analytics***

In [None]:
# generate report
profile = ProfileReport(employee_df, title="Profiling Report")

profile

***The dataset from the EDA above shows the following insights:***

1. High Correlation between Total Pay and Total Pay Benefits as expected
2. BasePay is an unsupported type, check if it needs cleaning or further analysis	Unsupported
3. OvertimePay is an unsupported type, check if it needs cleaning or further analysis	Unsupported
4. OtherPay is an unsupported type, check if it needs cleaning or further analysis	Unsupported
5. Benefits is an unsupported type, check if it needs cleaning or further analysis

Items 2 to 5 suggest data quality issues and might need some attention especially in cases where data computation would be needed 

In [None]:
# we need to clean the unsupported types, for example under 
# the EmployeeName column we have entries 'EmployeeName = 'not provided'
# I filter out the rows with employee name having the string "not provided"
# I use the '~' operator to negate the boolean Series returned by 'str.contains'
# I use the 'str.contains' check if 'EmployeeName' contains 'not provided' (case insensitive)
# Finally I use a print statement to just check the output

employee_df = employee_df[~employee_df['EmployeeName'].str.contains('not provided', case=False)]

print(employee_df)


***5.0 Let us create the employee function , that returns employee details, has dictionary processing and error handling.*** 

In [None]:


def employee_details(employee_name):
    """
    This function retrieve details for a given employee by name.

    The function searches for an employee in the employee_df DataFrame
    by matching the provided employee_name. If a match is found, it 
    returns the employee's details as a dictionary. If no match is found,
    it returns a message indicating that no details were found. The function
    also includes error handling to manage any potential issues gracefully.

    Parameters:
    employee_name (str): The name of the employee to search for.

    Returns:
    dict or str: A dictionary containing the employee's details if found, 
                 otherwise a string message indicating that no details were found.
                 If an error occurs, a string message with the error details is returned.

    Example:
    employee_details("thomas")
    """
    try:
        # I search for the employee by name in the pandas DataFrame
        # I use the str.contains method for case-insensitive matching because the dataframe has upper case and lower case characters

        employee = employee_df[employee_df['EmployeeName'].str.contains(employee_name, case=False)]
        
        if not employee.empty:
            # I convert the employee's details to a dictionary and return if the employee is found in the dataframe
            return employee.iloc[0].to_dict()
        else:
            # I return the message below if no employee is found in the pandas DataFrame
            return f"No details found for employee: {employee_name}"
    except Exception as e:
        # I handle any exceptions that occur and return an error message
        return f"An error occurred: {str(e)}"



In [58]:
# Example Usage # 1

employee_details("THOMAS")

'No details found for employee: THOMAS'

In [64]:
#Example Usage # 2

employee_details("thomas")

'No details found for employee: thomas'

In [47]:
# Example Usage # 3

employee_details("not provided")

'No details found for employee: not provided'

In [48]:
# Let us test the function with a non-existent employee
print(employee_details('Mubanga Nsofu'))


No details found for employee: Mubanga Nsofu
No details found for employee: Mubanga Nsofu


In [50]:
# Another example

print(employee_details('Raphael Wanjiku'))

No details found for employee: Raphael Wanjiku
No details found for employee: Raphael Wanjiku


***6.0 Let us export the empolyee details as required by the question as a zipped csv file***

In [73]:
def save_employee_details_to_zip(employee_name, output_zip_file):
    """
    Retrieve details for a given employee by name and save them to a CSV file,
    which is then zipped into a specified output file.

    This function uses the employee_details function to get the employee's details,
    writes the details to a CSV file, and then compresses the CSV file into a ZIP file.

    Parameters:
    employee_name (str): The name of the employee to search for.
    output_zip_file (str): The path to the output ZIP file.

    Returns:
    str: A message indicating whether the process was successful or if an error occurred.

    Example:
    save_employee_details_to_zip("thomas", "employee_details.zip")
    """
    try:
        # I get employee details using the employee_details function defined previously
        details = employee_details(employee_name)
        
        if isinstance(details, dict):
            # If the details are found and returned as a dictionary then,
            # Create a DataFrame from the employee details dictionary
            df = pd.DataFrame([details])
            csv_file = "employee_details.csv"
            
            # Then save the DataFrame to a CSV file
            df.to_csv(csv_file, index=False)
            
            # And then create a ZIP file and add the CSV file to it as requested 
            with zipfile.ZipFile(output_zip_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
                zipf.write(csv_file)
            
            # I then clean up the CSV file by removing it after zipping
            os.remove(csv_file)
             # Return a success message after the operation
            return f"Employee details saved and zipped successfully as {output_zip_file}."
        else:
            # If details are not found, return the message from the employee_details function
            return details 
    except Exception as e:
        # I handle any exceptions that occur and return an error message
        return f"An error occurred: {str(e)}"




In [78]:
# Example usage number 1:

# Let us call the function with an employee name
print(save_employee_details_to_zip("thomas", "Employee Profile.zip"))


Employee details saved and zipped successfully as Employee Profile.zip.
Employee details saved and zipped successfully as Employee Profile.zip.


In [79]:
# Example usage number 2:
# Let us call the function with a non existent employee name
print(save_employee_details_to_zip("mubanga", "Employee Profile.zip"))


No details found for employee: mubanga
No details found for employee: mubanga
