In [None]:
import pandas as pd
import os

# Set input and output directories
input_directory = "output"
output_file = "cmm_data.csv"

# Create an empty list
deviation_rows = []
quality_status = []
all_file_names = []

# Flag to track if the first file has been processed
first_file_processed = False
first_shape_item = None  # To store the shape_item from the first file

# Loop through all files in the input directory
for filename in os.listdir(input_directory):
    if filename.endswith(".csv"):
        # Read CSV file
        file_path = os.path.join(input_directory, filename)
        df = pd.read_csv(file_path, delimiter=',', encoding='cp949', index_col=False) # Set index_col=False

        # Exclude rows where the value of the 'Item' column is "SMmf"
        df = df[df['항목'] != "SMmf"]

        # Create ‘Shape_Item’ column
        df['도형_항목'] = df['도형'] + ',' + df['항목']

        # Change missing deviation values ​​to 'NaN' instead of '-'
        df['편차'] = df['편차'].replace('-', pd.NA)

        # Calculate average value for missing deviation values
        avg_deviation = pd.to_numeric(df['편차'], errors='coerce').mean(skipna=True) # 'NaN' values ​​are automatically ignored

        # Replace missing deviation values ​​with the average value
        df['편차'] = df['편차'].fillna(avg_deviation)

        # Extract deviation values
        deviations = df['편차'].tolist()

        # Check if this is the first file
        if not first_file_processed:
            # Store the shape_item from the first file
            first_shape_item = df['도형_항목'].tolist()
            first_file_processed = True
            continue  # Move to the next file

        # Check if the current file has the same structure as the first file
        if df['도형_항목'].tolist() != first_shape_item:
            print(f"Skipping file: {filename}. Structure does not match the first file.")
            continue  # Skip processing for this file

        # Add quality status column
        if df.iloc[1, 16] == 'OK': # If second column is 'OK'
            quality_status.append(1)
        else:
            quality_status.append(0)

        # Add to list
        deviation_rows.append(deviations)
        all_file_names.append(os.path.splitext(filename)[0]) # Save the file name without extension

# Create a data frame by arranging deviations as rows and shape_items as columns
combined_data = pd.DataFrame(deviation_rows, columns=first_shape_item) # Create a DataFrame using the shape_item from the first file

# Add file name as first column
combined_data.insert(0, '파일명', all_file_names)

# Add quality status column
combined_data['품질상태'] = quality_status

# Save the results as a CSV file
combined_data.to_csv(output_file, encoding='cp949', index=False) # Do not store index


Skipping file: 240129_일상검사_주_초_1-6-1.csv. Structure does not match the first file.
Skipping file: 240129_일상검사_주_초_1-5-1.csv. Structure does not match the first file.
Skipping file: 240423_일상검사_야_초_1-6-1_.csv. Structure does not match the first file.
Skipping file: 240118_일상검사_야_중_2-3-1_NG.csv. Structure does not match the first file.
Skipping file: 240124_일상검사_주_초_2-3-1.csv. Structure does not match the first file.
Skipping file: 240205_일상검사_야_초_2-6-1_OK.csv. Structure does not match the first file.
Skipping file: 240117_일상검사_야_중_C24A04H_1-6-1_OK.csv. Structure does not match the first file.
Skipping file: 240129_일상검사_주_초_1-3-1.csv. Structure does not match the first file.
Skipping file: 240423_일상검사_야_초_1-4-1_.csv. Structure does not match the first file.
Skipping file: 240415_일상검사_야_중_1-3-2_.csv. Structure does not match the first file.
Skipping file: 240417_일상검사_주_중_1-5-1_OK.csv. Structure does not match the first file.
Skipping file: 240129_일상검사_주_초_1-4-1.csv. Structure does not mat

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['도형_항목'] = df['도형'] + ',' + df['항목']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['편차'] = df['편차'].replace('-', pd.NA)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['편차'] = df['편차'].fillna(avg_deviation)


Skipping file: 240126_일상검사_주_중_4-4-1_(공구교환).csv. Structure does not match the first file.
Skipping file: 240423_일상검사_야_초_1-2-1_.csv. Structure does not match the first file.
Skipping file: 240118_일상검사_야_중_2-2-1_OK.csv. Structure does not match the first file.
Skipping file: 240423_일상검사_야_초_1-5-1_.csv. Structure does not match the first file.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['도형_항목'] = df['도형'] + ',' + df['항목']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['편차'] = df['편차'].replace('-', pd.NA)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['편차'] = df['편차'].fillna(avg_deviation)


Skipping file: 240423_일상검사_야_초_1-3-1_.csv. Structure does not match the first file.
Skipping file: 220420_일상검사_주_초_1-2-1_OK.csv. Structure does not match the first file.
Skipping file: 240403_일상검사_야_초_1-3-1_OK.csv. Structure does not match the first file.
Skipping file: 240417_일상검사_주_중_1-6-1_OK.csv. Structure does not match the first file.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['도형_항목'] = df['도형'] + ',' + df['항목']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['편차'] = df['편차'].replace('-', pd.NA)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['편차'] = df['편차'].fillna(avg_deviation)
