## Download & Collation Module

This module contains two helper functions to streamline acquiring and consolidating raw response files from a remote server:


### 1. `download_answer_files(cloud_url, save_folder, total_files)`

- **Objective**  
  Retrieve sequentially named text files (`a1.txt`, `a2.txt`, etc.) from a specified cloud endpoint and store them locally under standardized filenames.

- **Parameters**  
  - `cloud_url` (str): Base URL hosting the files (e.g. `https://example.com/answers`)  
  - `save_folder` (str): Target directory for saving downloads (e.g. `data/`)  
  - `total_files` (int): Number of files to fetch  

- **Process**  
  1. Create `save_folder` if it does not exist.  
  2. Iterate from 1 to `total_files`.  
  3. Download each file via HTTP GET.  
  4. Save each response as `answers_respondent_{i}.txt`, and log success or failure.

- **Code**

In [None]:
import os
import requests

def download_answer_files(cloud_url, save_folder, total_files):
    # Create the folder if it doesn't exist
    if not os.path.exists(save_folder):
        os.mkdir(save_folder)

    # Download each file
    for i in range(1, total_files + 1):
        url = f"{cloud_url}/answers_respondent_{i}.txt"
        file_name = f"answers_respondent_{i}.txt"
        file_path = os.path.join(save_folder, file_name)

        response = requests.get(url)
        if response.status_code == 200:
            with open(file_path, "wb") as f:
                f.write(response.content)
            print(f"Saved: {file_name}")
        else:
            print(f"Failed to download: {url}")

### 2. `collate_answer_files(folder_path)`

- **Objective**  
  Merge all individual `answers_respondent_*.txt` files into a single document, with clear separators between each respondent’s input.

- **Parameters**  
  - `folder_path` (str): Path to the directory containing the downloaded respondent files  

- **Process**  
  1. Ensure an `output/` directory exists.  
  2. Open `output/collated_answers.txt` for writing.  
  3. Identify and sort all files matching `answers_respondent_{i}.txt` by their index.  
  4. Append each file’s contents to the output file, inserting a line with `*` between entries.

- **Code**

In [1]:
def collate_answer_files(folder_path):
    # Create output folder
    if not os.path.exists("output"):
        os.mkdir("output")

    # Open the final file for writing
    with open("output/collated_answers.txt", "w", encoding="utf-8") as out_file:
        # Find and sort all respondent files
        files = [f for f in os.listdir(folder_path) if f.startswith("answers_respondent_")]
        files.sort(key=lambda x: int(x.split("_")[-1].split(".")[0]))

        for i, file_name in enumerate(files):
            file_path = os.path.join(folder_path, file_name)
            with open(file_path, "r", encoding="utf-8") as f:
                content = f.read().strip()
                out_file.write(content)

            # Only add a separator if it’s not the last file
            if i < len(files) - 1:
                out_file.write("\n*\n")
            else:
                out_file.write("\n")

    print("All answers have been successfully collated!")


### Example Usage

```python
from data_preparation_M2 import download_answer_files, collate_answer_files

# 1. Download responses from the cloud
download_answer_files(
    cloud_url="https://example.com/answers",
    save_folder="data",
    total_files=5
)

# 2. Combine downloaded files into one
collate_answer_files("data")