### Task 1: Building a Transaction Database in Google BigQuery

#### Overview  
In Task 1, the focus is on loading the raw Wedge Co-Op transaction data into Google BigQuery while addressing various data quality issues. This task involves managing inconsistencies such as mixed delimiters (commas and semicolons), different representations of null values ("NULL", "\N", and "\\N"), and mismatched column headers. Python scripts are used to automate the Extract, Transform, and Load (ETL) process. The raw transaction files are first extracted from local directories. In the transformation stage, the script standardizes null values, corrects column headers, and ensures data type consistency across files, including proper handling of numeric fields and date-time formats. Once the data is cleaned and transformed, it is loaded into Google BigQuery with schema enforcement to ensure that the transactional data adheres to the expected structure. This process creates a clean, structured dataset in BigQuery, ready for analysis in the subsequent tasks. 

##### Step 1: Extract  

Extract Zipped Raw Files

In [2]:
import zipfile
import os
import shutil

def extract_zip_file(zip_file_path, destination_folder):
    """
    Extracts the contents of a zip file to the specified destination folder.
    
    Parameters:
    - zip_file_path (str): The path to the zip file to be extracted.
    - destination_folder (str): The folder where the contents of the zip file will be extracted.

    The function handles:
    - Skipping extraction of files that already exist.
    - Creating necessary directories if they don't exist.
    - Extracting files and directories from the zip archive.
    """
    # Ensure the destination folder exists
    os.makedirs(destination_folder, exist_ok=True)

    # Open the zip file
    with zipfile.ZipFile(zip_file_path, 'r') as zip_file:
        # Loop through each file in the zip archive
        for zip_info in zip_file.infolist():
            # Determine the output file path
            extracted_file_path = os.path.join(destination_folder, zip_info.filename)
            
            # Skip extraction if the file or folder already exists
            if os.path.exists(extracted_file_path):
                print(f"Skipping {zip_info.filename}, it already exists.")
                continue

            # If the entry is a directory, create it
            if zip_info.is_dir():
                os.makedirs(extracted_file_path, exist_ok=True)
                print(f"Created directory {extracted_file_path}")
            else:
                # Ensure the parent directories for the file exist
                os.makedirs(os.path.dirname(extracted_file_path), exist_ok=True)
                
                # Extract the file
                with zip_file.open(zip_info) as source_file, open(extracted_file_path, 'wb') as target_file:
                    shutil.copyfileobj(source_file, target_file)
                print(f"Extracted {zip_info.filename} to {extracted_file_path}")

# Define the input paths
main_zip_file_path = 'D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/WedgeZipOfZips.zip'
extracted_folder_path = 'D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/extracted_main_zip'

# Extract the main zip file
extract_zip_file(main_zip_file_path, extracted_folder_path)


Skipping transArchive_201001_201003.zip, already exists.
Skipping transArchive_201004_201006.zip, already exists.
Skipping transArchive_201007_201009.zip, already exists.
Skipping transArchive_201010_201012.zip, already exists.
Skipping transArchive_201101_201103.zip, already exists.
Skipping transArchive_201104.zip, already exists.
Skipping transArchive_201105.zip, already exists.
Skipping transArchive_201106.zip, already exists.
Skipping transArchive_201107_201109.zip, already exists.
Skipping transArchive_201110_201112.zip, already exists.
Skipping transArchive_201201_201203.zip, already exists.
Skipping transArchive_201201_201203_inactive.zip, already exists.
Skipping transArchive_201204_201206.zip, already exists.
Skipping transArchive_201204_201206_inactive.zip, already exists.
Skipping transArchive_201207_201209.zip, already exists.
Skipping transArchive_201207_201209_inactive.zip, already exists.
Skipping transArchive_201210_201212.zip, already exists.
Skipping transArchive_201

Extract the Nested Zip Files

In [3]:
import zipfile
import os

def extract_csv_from_nested_zips(source_folder, destination_folder):
    """
    Extracts all CSV files from nested zip files within the specified folder and saves them to a single destination folder.
    
    Parameters:
    - source_folder (str): The folder containing the nested zip files.
    - destination_folder (str): The folder where the extracted CSV files will be saved.

    The function handles:
    - Walking through directories to locate and extract CSV files from nested zip files.
    - Skipping CSV files that already exist in the destination folder.
    - Handling invalid or corrupt zip files gracefully.
    """
    # Ensure the destination folder exists
    os.makedirs(destination_folder, exist_ok=True)

    # Traverse the source folder to locate zip files
    for root, dirs, files in os.walk(source_folder):
        for file_name in files:
            if file_name.endswith('.zip'):
                zip_file_path = os.path.join(root, file_name)
                
                # Verify if the file is a valid zip archive
                try:
                    with zipfile.ZipFile(zip_file_path, 'r') as zip_file:
                        # Extract only CSV files from the nested zip archive
                        for zip_info in zip_file.infolist():
                            if zip_info.filename.endswith('.csv'):
                                destination_file_path = os.path.join(destination_folder, zip_info.filename)
                                
                                # Skip extraction if the CSV file already exists
                                if not os.path.exists(destination_file_path):
                                    zip_file.extract(zip_info, destination_folder)
                                    print(f"Extracted {zip_info.filename} to {destination_folder}")
                                else:
                                    print(f"Skipping {zip_info.filename}, it already exists.")
                except zipfile.BadZipFile:
                    print(f"Skipping {zip_file_path}, not a valid zip file.")

# Define the input paths
source_folder_path = 'D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/extracted_main_zip'
destination_folder_path = 'D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/extracted_csv_files'  

# Extract all CSV files from nested zip files
extract_csv_from_nested_zips(source_folder_path, destination_folder_path)


Skipping transArchive_201001_201003.csv, already exists.
Skipping transArchive_201004_201006.csv, already exists.
Skipping transArchive_201007_201009.csv, already exists.
Skipping transArchive_201010_201012.csv, already exists.
Skipping transArchive_201101_201103.csv, already exists.
Skipping transArchive_201104.csv, already exists.
Skipping transArchive_201105.csv, already exists.
Skipping transArchive_201106.csv, already exists.
Skipping transArchive_201107_201109.csv, already exists.
Skipping transArchive_201110_201112.csv, already exists.
Skipping transArchive_201201_201203.csv, already exists.
Skipping transArchive_201201_201203_inactive.csv, already exists.
Skipping transArchive_201204_201206.csv, already exists.
Skipping transArchive_201204_201206_inactive.csv, already exists.
Skipping transArchive_201207_201209.csv, already exists.
Skipping transArchive_201207_201209_inactive.csv, already exists.
Skipping transArchive_201210_201212.csv, already exists.
Skipping transArchive_201

##### Step 2: Transform  

In [76]:
import pandas as pd
import glob
import os

# Path to the reference file containing the correct column headers
reference_file_path = 'D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/reference_files/transArchive_201001_201003_clean.csv'

# Load the reference file to retrieve the correct column headers
df_reference = pd.read_csv(reference_file_path)
reference_columns = df_reference.columns.tolist()

def clean_and_standardize_csv(input_csv_path, output_csv_path):
    """
    Cleans and standardizes the input CSV file, ensuring correct headers and handling NULL values.
    Saves the cleaned file to the specified output path.
    
    Parameters:
    - input_csv_path (str): Path to the input CSV file.
    - output_csv_path (str): Path to save the cleaned CSV file.
    
    The function performs:
    - Automatic delimiter detection, with fallback to semicolon and comma delimiters if needed.
    - Replacement of various representations of NULL with None/NaN.
    - Ensuring correct headers by either applying the reference headers or using the first row if applicable.
    """
    try:
        # Attempt to load the file with automatic delimiter detection
        try:
            df = pd.read_csv(input_csv_path, sep=None, engine='python')
        except pd.errors.ParserError:
            # Fallback to semicolon delimiter
            df = pd.read_csv(input_csv_path, delimiter=';')
        
        # If column count doesn't match the reference, retry with comma delimiter
        if df.shape[1] != len(reference_columns):
            df = pd.read_csv(input_csv_path, delimiter=',')
        
        # Replace various NULL representations with None/NaN
        df.replace({"NULL": None, r"\\N": None, r"\N": None}, inplace=True)
        
        # Ensure the correct headers are applied
        if list(df.columns) != reference_columns:
            # Check if the first row contains headers matching the reference
            first_row_as_header = df.iloc[0].tolist()
            if set(first_row_as_header) == set(reference_columns):
                # Use the first row as headers if it matches
                df.columns = first_row_as_header
                df = df.iloc[1:].reset_index(drop=True)
            else:
                # Apply the reference headers directly
                df.columns = reference_columns
        
        # Save the cleaned CSV file
        df.to_csv(output_csv_path, index=False, sep=",")
        print(f"File cleaned and saved: {output_csv_path}")
    
    except pd.errors.EmptyDataError:
        print(f"Error: {input_csv_path} is empty.")
    except pd.errors.ParserError:
        print(f"Error: Could not parse {input_csv_path}.")
    except FileNotFoundError:
        print(f"Error: {input_csv_path} not found.")
    except Exception as e:
        print(f"Error processing {input_csv_path}: {e}")

def process_csv_files_in_folder(source_folder, destination_folder):
    """
    Processes all CSV files in the source folder by cleaning and standardizing them before saving to the destination folder.
    Skips files that have already been processed.
    
    Parameters:
    - source_folder (str): Folder containing the CSV files to be processed.
    - destination_folder (str): Folder to save the cleaned and standardized CSV files.
    """
    # Ensure the destination folder exists
    os.makedirs(destination_folder, exist_ok=True)
    
    # Get all CSV files from the source folder
    csv_files = glob.glob(f"{source_folder}/**/*.csv", recursive=True)
    print(f"Found {len(csv_files)} CSV files to process.")
    
    # Process each CSV file
    for csv_file in csv_files:
        output_file_path = os.path.join(destination_folder, os.path.basename(csv_file))
        
        # Skip files that have already been processed
        if os.path.exists(output_file_path):
            print(f"Skipping already processed file: {output_file_path}")
            continue
        
        # Clean and save the file
        clean_and_standardize_csv(csv_file, output_file_path)
    
    # List all saved files in the destination folder
    saved_files = glob.glob(f"{destination_folder}/*.csv")
    print(f"\nSaved {len(saved_files)} files to {destination_folder}:")
    for saved_file in saved_files:
        print(saved_file)

# Define the input and output folder paths
source_folder_path = 'D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/extracted_csv_files'
destination_folder_path = 'D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files'

# Process all CSV files
process_csv_files_in_folder(source_folder_path, destination_folder_path)


Found 53 CSV files to process.
Skipping already processed file: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201001_201003.csv
Skipping already processed file: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201004_201006.csv
Skipping already processed file: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201007_201009.csv
Skipping already processed file: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201010_201012.csv
Skipping already processed file: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201101_201103.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201104.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201105.csv
File cleaned and save

  df = pd.read_csv(input_file, delimiter=';')


File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201201_201203_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201204_201206.csv


  df = pd.read_csv(input_file, delimiter=';')


File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201204_201206_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201207_201209.csv


  df = pd.read_csv(input_file, delimiter=';')


File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201207_201209_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201210_201212.csv


  df = pd.read_csv(input_file, delimiter=';')


File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201210_201212_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201301_201303.csv


  df = pd.read_csv(input_file, delimiter=';')


File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201301_201303_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201304_201306.csv


  df = pd.read_csv(input_file, delimiter=';')


File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201304_201306_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201307_201309.csv


  df = pd.read_csv(input_file, delimiter=';')


File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201307_201309_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201310_201312.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201310_201312_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201401_201403.csv


  df = pd.read_csv(input_file, delimiter=';')


File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201401_201403_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201404_201406.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201404_201406_inactive.csv
Error: Could not parse D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/extracted_csv_files\transArchive_201407_201409.csv.
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201407_201409_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201410_201412.csv
File cleaned and saved: D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files\transArchive_201410_201412_inactive.csv
File cleaned and saved: D:/WedgeProject/Wedge-Proje

##### Step 3: Load

In [82]:
from google.cloud import bigquery
import os

# Initialize the BigQuery client with the correct project ID
client = bigquery.Client(project='wedgeproject-rileyororke')

# Define the dataset ID and create the dataset if it does not exist
dataset_id = 'wedgeproject-rileyororke.transaction_tables'
dataset_ref = bigquery.Dataset(dataset_id)
client.create_dataset(dataset_ref, exists_ok=True)

# Schema definition for the transaction tables
schema = [
    bigquery.SchemaField("datetime", "TIMESTAMP"),
    bigquery.SchemaField("register_no", "FLOAT"),
    bigquery.SchemaField("emp_no", "FLOAT"),
    bigquery.SchemaField("trans_no", "FLOAT"),
    bigquery.SchemaField("upc", "STRING"),
    bigquery.SchemaField("description", "STRING"),
    bigquery.SchemaField("trans_type", "STRING"),
    bigquery.SchemaField("trans_subtype", "STRING"),
    bigquery.SchemaField("trans_status", "STRING"),
    bigquery.SchemaField("department", "FLOAT"),
    bigquery.SchemaField("quantity", "FLOAT"),
    bigquery.SchemaField("Scale", "FLOAT"),
    bigquery.SchemaField("cost", "FLOAT"),
    bigquery.SchemaField("unitPrice", "FLOAT"),
    bigquery.SchemaField("total", "FLOAT"),
    bigquery.SchemaField("regPrice", "FLOAT"),
    bigquery.SchemaField("altPrice", "FLOAT"),
    bigquery.SchemaField("tax", "FLOAT"),
    bigquery.SchemaField("taxexempt", "FLOAT"),
    bigquery.SchemaField("foodstamp", "FLOAT"),
    bigquery.SchemaField("wicable", "FLOAT"),
    bigquery.SchemaField("discount", "FLOAT"),
    bigquery.SchemaField("memDiscount", "FLOAT"),
    bigquery.SchemaField("discountable", "FLOAT"),
    bigquery.SchemaField("discounttype", "FLOAT"),
    bigquery.SchemaField("voided", "FLOAT"),
    bigquery.SchemaField("percentDiscount", "FLOAT"),
    bigquery.SchemaField("ItemQtty", "FLOAT"),
    bigquery.SchemaField("volDiscType", "FLOAT"),
    bigquery.SchemaField("volume", "FLOAT"),
    bigquery.SchemaField("VolSpecial", "FLOAT"),
    bigquery.SchemaField("mixMatch", "FLOAT"),
    bigquery.SchemaField("matched", "FLOAT"),
    bigquery.SchemaField("memType", "STRING"),
    bigquery.SchemaField("staff", "FLOAT"),
    bigquery.SchemaField("numflag", "FLOAT"),
    bigquery.SchemaField("itemstatus", "FLOAT"),
    bigquery.SchemaField("tenderstatus", "FLOAT"),
    bigquery.SchemaField("charflag", "STRING"),
    bigquery.SchemaField("varflag", "FLOAT"),
    bigquery.SchemaField("batchHeaderID", "STRING"),
    bigquery.SchemaField("local", "FLOAT"),
    bigquery.SchemaField("organic", "STRING"),
    bigquery.SchemaField("display", "STRING"),
    bigquery.SchemaField("receipt", "FLOAT"),
    bigquery.SchemaField("card_no", "FLOAT"),
    bigquery.SchemaField("store", "FLOAT"),
    bigquery.SchemaField("branch", "FLOAT"),
    bigquery.SchemaField("match_id", "FLOAT"),
    bigquery.SchemaField("trans_id", "FLOAT"),
]

def load_csv_to_bigquery(folder_path, dataset_id):
    """
    Load all CSV files from the specified folder into BigQuery.
    Each file is uploaded to a table named after the file (without extension).
    Skips tables that already exist in BigQuery.
    
    Parameters:
    - folder_path (str): The folder containing the CSV files to be uploaded.
    - dataset_id (str): The dataset in BigQuery where the tables will be stored.
    """
    # Iterate over all files in the folder and process only CSV files
    for file_name in os.listdir(folder_path):
        if file_name.endswith('.csv'):
            file_path = os.path.join(folder_path, file_name)
            
            # Use the file name (without extension) as the table ID
            table_id = os.path.splitext(file_name)[0]
            
            # Define the table reference within the dataset
            table_ref = client.dataset('transaction_tables').table(table_id)

            # Check if the table already exists
            try:
                client.get_table(table_ref)
                print(f"Skipping table {table_id}, it already exists.")
                continue  # Skip if the table exists
            except Exception:
                # If the table does not exist, proceed with loading
                pass

            # Configure the load job for CSV format
            job_config = bigquery.LoadJobConfig(
                source_format=bigquery.SourceFormat.CSV,
                schema=schema,
                skip_leading_rows=1  # Skip header row
            )

            # Open the CSV file and load it into BigQuery
            with open(file_path, 'rb') as csv_file:
                load_job = client.load_table_from_file(csv_file, table_ref, job_config=job_config)
                load_job.result()  # Wait for the job to complete
            
            print(f"Loaded {file_name} into {dataset_id}.{table_id}")

# Define the folder path for the cleaned CSV files
folder_path = 'D:/WedgeProject/Wedge-Project-ADA-Riley-ORorke/data/final_cleaned_csv_files'

# Run the function to load CSVs from the folder to BigQuery
load_csv_to_bigquery(folder_path, dataset_id)


BadRequest: 400 Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 0; errors: 22; max bad: 0; error percent: 0; reason: invalid, message: Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 0; errors: 22; max bad: 0; error percent: 0; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 2 byte_offset_to_start_of_line: 22 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 3 byte_offset_to_start_of_line: 42 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 4 byte_offset_to_start_of_line: 53 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 5 byte_offset_to_start_of_line: 61 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 6 byte_offset_to_start_of_line: 76 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 7 byte_offset_to_start_of_line: 86 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 8 byte_offset_to_start_of_line: 96 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 9 byte_offset_to_start_of_line: 105 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 10 byte_offset_to_start_of_line: 113 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 11 byte_offset_to_start_of_line: 126 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 12 byte_offset_to_start_of_line: 142 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 13 byte_offset_to_start_of_line: 160 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 14 byte_offset_to_start_of_line: 177 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 15 byte_offset_to_start_of_line: 186 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 16 byte_offset_to_start_of_line: 200 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 17 byte_offset_to_start_of_line: 213 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 18 byte_offset_to_start_of_line: 230 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 19 byte_offset_to_start_of_line: 244 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 20 byte_offset_to_start_of_line: 256 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 21 byte_offset_to_start_of_line: 275 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 22 byte_offset_to_start_of_line: 294 column_index: 49 column_name: "trans_id" column_type: DOUBLE; reason: invalid, message: Error while reading data, error message: CSV table references column position 49, but line contains only 2 columns.; line_number: 23 byte_offset_to_start_of_line: 307 column_index: 49 column_name: "trans_id" column_type: DOUBLE