# 01: TXT to CSV Conversion and Merging (Energy Sector Only)

This notebook reads CRS raw .txt files (2014–2023) with pipe (`|`) delimiter,
filters projects related to the Energy sector (SectorCode between 230–236),
selects only relevant columns, and merges all years into a single dataset
saved as `CRS_energy_raw_merged.csv`.



Goals for 01_txt_to_csv_and_merge.ipynb

- Read all raw .txt CRS files across 10 years.
- Extract energy sector projects using sector codes.
- Select relevant columns.
- Merge all 10 years into a single merged CSV.
- Save the merged CSV as CRS_energy_raw_merged.csv

In [2]:
# Import libraries
import pandas as pd
import os

In [3]:
# Define paths
input_folder = '/content/'  # where all your raw .txt files are

In [4]:
# Initialize an empty list to collect all years' filtered data
all_years_data = []

In [8]:
# Loop over each year
for year in range(2014, 2024):  # 2014 to 2023 inclusive
    input_file = os.path.join(input_folder, f'CRS {year} data.txt')

    # Check if file exists
    if not os.path.isfile(input_file):
        print(f"File not found: {input_file}")
        continue

    # Read the txt file
    data = pd.read_csv(input_file, delimiter='|', header=0, dtype=str)

    # Filter by SectorCode between 230 and 236
    data['SectorCode'] = pd.to_numeric(data['SectorCode'], errors='coerce')
    df_energy = data[data['SectorCode'].between(230, 236, inclusive='both')].copy()
    print(f"Selected {len(df_energy)} energy sector records for {year}")

    # Select only relevant columns
    columns_to_keep = [
        "Year", "CrsID", "ProjectTitle", "SectorCode",
        "Bi_Multi", "ClimateMitigation", "ClimateAdaptation",
        "DonorName", "RecipientName", "RegionName", "IncomegroupName",
        "USD_Commitment", "USD_Disbursement", "USD_Received",
        "LongDescription",
    ]
    df_energy_selected = df_energy[columns_to_keep]

    # Collect for merging
    all_years_data.append(df_energy_selected)

# Merge all years together
merged_df = pd.concat(all_years_data, ignore_index=True)
print(f"Total merged records: {len(merged_df)}")

# Save the merged dataset
merged_output_path = '/content/CRS_energy_raw_merged.csv'
merged_df.to_csv(merged_output_path, index=False)
print(f"Merged data saved to: {merged_output_path}")


Selected 6575 energy sector records for 2014
Selected 7264 energy sector records for 2015
Selected 7592 energy sector records for 2016
Selected 8484 energy sector records for 2017
Selected 9006 energy sector records for 2018
Selected 9728 energy sector records for 2019
Selected 9766 energy sector records for 2020
Selected 10761 energy sector records for 2021
Selected 10986 energy sector records for 2022
Selected 12517 energy sector records for 2023
Total merged records: 185358
Merged data saved to: /content/CRS_energy_raw_merged.csv
