### Create a simple Python script that merges all the .csv files of a specific location into a single .tsv file. 

You will only need to keep the source, before, after and distance information. The files “report_5458.csv” and “report_5699.csv” are provided as an example.

---

In [1]:
#importing libraries:

import pandas as pd #Used for data manipulation and analysis.
import glob #Used to find all file paths matching a specified pattern.
import os #Used for interacting with the operating system, such as file and directory operations.

In [2]:
#Libraries versions:

print(pd.__version__)

2.3.3


---

### Setup: Verify Working Directory

Check the current working directory to ensure we're in the correct location for processing files.

In [3]:

# Get the current working directory
current_dir = os.getcwd()
current_dir

'/home/obraisan/MT-technical-test_Santos/Solutions/EX.6'

---

### Identify CSV Files

Scan the current directory for all `.csv` files that will be merged.

In [4]:
# Veryfing the .csv files in the current directory

csv_files = glob.glob(os.path.join(current_dir, "*.csv"))
csv_files


['/home/obraisan/MT-technical-test_Santos/Solutions/EX.6/report_5699_copy.csv',
 '/home/obraisan/MT-technical-test_Santos/Solutions/EX.6/report_5458_copy.csv']

---

### Merge CSV Files into TSV

Process all CSV files in the directory:
1. Extract the required columns: `source`, `before`, `after`, `distance`
2. Combine all data into a single dataset
3. Export as a tab-separated file named `merged_tsv`

Displays progress information and handles errors for files with missing columns.

In [5]:

# Find all CSV files in the current directory
csv_files = glob.glob(os.path.join(current_dir, "*.csv"))

if not csv_files:
    print("No CSV files found in the current directory.")
else:
    print(f"Found {len(csv_files)} CSV file(s):")
    for file in csv_files:
        print(f"  - {os.path.basename(file)}")
    
    # Read and merge all CSV files
    all_data = []
    for file in csv_files:
        try:
            df = pd.read_csv(file)
            # Keep only the required columns
            df = df[['source', 'before', 'after', 'distance']]
            all_data.append(df)
            
            print(f"\nProcessed: {os.path.basename(file)}")
            print(f"Number of rows: {len(df)}\n")
        except Exception as e:
            print(f"Error processing {os.path.basename(file)}: {e}")
    
    if all_data:
        # Combine all dataframes
        merged = pd.concat(all_data, ignore_index=True)
        
        # Save as TSV file with the name "merged_tsv"
        output_file = os.path.join(current_dir, "merged_tsv")
        merged.to_csv(output_file, sep='\t', index=False)
        
        print(f"\nSuccess! Merged {len(all_data)} file(s) into 'merged_tsv'")
        print(f"merged_tsv number of rows: {len(merged)}")
    else:
        print("No data to merge.")

Found 2 CSV file(s):
  - report_5699_copy.csv
  - report_5458_copy.csv

Processed: report_5699_copy.csv
Number of rows: 4


Processed: report_5458_copy.csv
Number of rows: 3


Success! Merged 2 file(s) into 'merged_tsv'
merged_tsv number of rows: 7
