# CSV Data Cleaning Script

This Python script is designed to clean and format data in a CSV file, making it easier to work with. The script performs the following key tasks:

1. **Date Formatting**:
    - The script identifies the `Posting Date` column in the CSV file and formats all date entries to the `YYYY-MM-DD` format. This ensures consistency across the dataset.
    - If a date is missing, incorrectly formatted, or not provided, the script replaces it with `N/A`.

2. **Handling Missing or Invalid Data**:
    - The script replaces any empty cells or cells containing a `"-"` with `"N/A"`. This standardization helps prevent issues during data analysis where missing values might otherwise cause errors or inconsistencies.

3. **Saving the Cleaned Data**:
    - After processing the data, the script saves the cleaned CSV file to the specified output path.


In [2]:
import pandas as pd
from datetime import datetime

# Function to format dates to YYYY-MM-DD format
def format_date(date_str):
    """
    Formats a date string to 'YYYY-MM-DD'. If the input is NaN or 'N/A', returns 'N/A'.
    
    Parameters:
    date_str (str or NaN): The date string to format.
    
    Returns:
    str: The formatted date string or 'N/A' if the input is invalid or not a date.
    """
    if pd.isna(date_str) or date_str == "N/A":
        return "N/A"
    try:
        return datetime.strptime(str(date_str), "%d/%m/%Y").strftime("%Y-%m-%d")
    except ValueError:
        return "N/A"  # Returns 'N/A' if the date is missing or invalid

def clean_csv(input_path, output_path):
    """
    Cleans and formats a CSV file by:
    - Formatting the 'Posting Date' column to 'YYYY-MM-DD'.
    - Replacing empty cells and '-' with 'N/A'.
    
    Parameters:
    input_path (str): The path to the input CSV file.
    output_path (str): The path to save the cleaned CSV file.
    """
    # Reading data from CSV
    df = pd.read_csv(input_path)
    
    # Formatting dates in the 'Posting Date' column
    if 'Posting Date' in df.columns:
        df['Posting Date'] = df['Posting Date'].apply(format_date)
    
    # Replacing empty values and '-' with 'N/A'
    df.replace({"-": "N/A", "": "N/A"}, inplace=True)
    
    # Saving the cleaned CSV
    df.to_csv(output_path, index=False)

# File paths
input_path = 'Pet Project 2608.csv'  
output_path = 'cleaned_data.csv'  


clean_csv(input_path, output_path)

