# Introduction
This script consolidates multiple CSV files within a single directory into one CSV file, while preserving their structure and headers. If the files are spread across different directories or have varied structures, modifications to the script will be necessary. Here’s an overview of how it operates:

## Imports and Setup
The script imports the necessary libraries: Pandas for data manipulation and OS for operating system interaction. It also defines variables to store the paths of the CSV files.

In [25]:
import pandas as pd
import os

## File Discovery:
The script allows for easy switching between using the current directory and specifying a custom path. You can uncomment the appropriate line for the directory you want to use.

In [26]:
# Uncomment the line below to use the current working directory
# path = '.'

# Uncomment and modify the line below to specify a different directory
path = r'C:\Users\chaolu\Project folder\INGENIOUS\Dixie Valley\Dixie Valley Excel\Final version'

It searches for CSV files in the specified directory and checks how many are found.If no CSV files are found, the script raises an error with a descriptive message. If CSV files are found, it prints the total number.

In [27]:
# Get the list of all CSV files in the specified directory
csv_files = [f for f in os.listdir(path) if f.endswith('.csv')]
# Check the number of CSV files found
num_files = len(csv_files)

# Ensure there is at least one CSV file
if num_files == 0:
    raise ValueError("No CSV files found in the directory: " + path)
else:
    print(f"Total CSV files found: {num_files}")

# List CSV files if 10 or fewer are found
if num_files <= 10:
    print("Listing CSV files:")
    for file in csv_files:
        print(file)

Total CSV files found: 7
Listing CSV files:
45-5 Water Analysis Reports_CL2.csv
52-18 Water Analysis Reports_CL2.csv
62-21 Water Analysis Reports_CL2.csv
65-18 Water Analysis Reports_CL2.csv
84-7 Water Analysis Reports_CL2.csv
S.W.LAMB #1 Water Analysis Reports_CL2.csv
S.W.LAMB #3 Water Analysis Reports_CL2.csv


## Read Initial File
The script reads the first CSV file to retrieve column names, which are used to ensure consistency when reading additional files.

In [28]:
# Read the first file to get the column names
first_file = pd.read_csv(csv_files[0])
# Initialize a list to store dataframes, starting with the first file
df_list = [first_file]

## Reading Remaining Files
For each subsequent CSV file, it reads the data using the column names from the first file as headers. This step skips the actual header row of these files to avoid header duplication.

In [29]:
# Read remaining files using the column names from the first file and skip the header
for file in csv_files[1:]:
    df = pd.read_csv(file, names=first_file.columns, header=0)
    df_list.append(df)

## Concatenation
All dataframes are concatenated into a single dataframe, effectively merging all CSV files into one.

In [30]:
# Concatenate all dataframes
final_df = pd.concat(df_list, ignore_index=True)

## Save Output
The combined dataframe is then saved to a new CSV file named 'Combined output.csv', without including row indices.

In [31]:
# Save the concatenated dataframe to a new CSV file
final_df.to_csv('Combined output.csv', index=False)