# Invoice Data Consolidation Script

## Overview
This script consolidates invoice data from three monthly CSV files into a single DataFrame. It performs data type specification, calculates total amounts, and saves the merged data to a new CSV file.

## Process Goal
The goal of this script is to:
1. Read invoice data from three separate CSV files representing different months (January, February, March 2024).
2. Specify data types for each column to ensure data integrity.
3. Print the shape and total corporate net amount for each month's data.
4. Merge the data from all three months into a final consolidated DataFrame.
5. Save the final DataFrame as a new CSV file for further analysis.

## Considerations for Improvement
- **Dynamic File Handling**: Use a loop or list comprehension to read and merge files, reducing redundancy in code.
- **Error Handling**: Implement error handling for file reading operations to manage potential issues (e.g., missing files).
- **Data Validation**: Include checks to validate the integrity of the data (e.g., ensuring no missing values in critical columns) before merging.
- **Documentation**: Add comments or docstrings to explain the purpose of specific code sections for future reference and clarity.

In [None]:
# Read files across sources
import pandas as pd

dtype = {
    'Vendor':'str',
    'Vendor ID':'str',
    'Vendor Name':'str',
    'CorpSegment6 _ FAF Category':'str',
    'GL Description':'str',
    'Project Code':'str',
    'CorpSegment6 _ FAF Managerial Mapping':'str',
    'CorpSegment6 _ Unconsolidated Mapping':'str',
    'FPA Function':'str',
    'ORG UNIT _ Region Rollup':'str',
    'ORG UNIT _ Business Unit Rollup':'str',
    'Subway or FAF':'str',
    'Invoice #':'str',
    'Project Code Description':'str',
    'Org Unit - Description':'str',
    'CorpSegment6 _ FAF Working Capital':'str',
    'Month':'str',
    'Quarter':'str',
    'FILE_NAME':'str',
    'Corporate Net Amount': float
}

period_n = pd.read_csv('Invoice_FAF_FEB 04.18.2024.csv', encoding='UTF-8-SIG', dtype=dtype)
period_n_plus1 = pd.read_csv('Invoice_FAF_MAR 04.18.2024.csv', encoding='UTF-8-SIG', dtype=dtype)
period_n_plus2 = pd.read_csv('Invoice_FAF_JAN 04.18.2024.csv', encoding='UTF-8-SIG', dtype=dtype)

print(f"Jan 2024 shape {period_n_plus2.shape} and total amount({period_n_plus2['Corporate Net Amount'].sum()})")
print(f"Feb 2024 shape {period_n .shape} and total amount ({period_n['Corporate Net Amount'].sum()})")
print(f"Mar 2024 shape {period_n_plus1.shape} and total amount ({period_n_plus1['Corporate Net Amount'].sum()})")

Jan 2024 shape (1012, 28) and total amount(53143177.26)
Feb 2024 shape (1149, 28) and total amount (37179546.95)
Mar 2024 shape (1303, 28) and total amount (71022512.69)


In [None]:
final = pd.concat([period_n,period_n_plus1,period_n_plus2], ignore_index=True)

final_shape = final.shape
final_amount = final['Corporate Net Amount'].sum()

print(f"Final shape{final_shape} and amount({final_amount})", )

Final shape(3464, 28) and amount(161345236.89999998)


In [None]:
final.to_csv('Invoice_FAF_FY24_Q1.csv', encoding='UTF-8-SIG', index=False)
