# Reviewer: Abraham Sobowale #

# M1: Parsing Module  #
## Objective of `data_extraction_M1.py`
The module provides two core functions for handling quiz answer data:
* `extract_answers_sequence()`: Parses raw quiz files into structured answer sequences
* `write_answers_sequence()`: Writes processed sequences to respondent-specific files

## Key Functions Review
* #### `extract_answers_sequence()`: Parse a quiz answers text file into a list of 100 integers. Then maps each answer to (1, 2, 3, 4) or 0 if unanswered
* > M1 Has code for file handling in the section where the try function helps prevent file-related errors

In [None]:
try:
    with open(file_path, 'r') as file:
        content = file.read()
`

* > Error debugging code

In [None]:
except FileNotFoundError:
    print(f"Error: File not found: {file_path}")
    return None
except Exception as e:
    print(f"Unexpected error: {e}") 
    return None

* > M1 scans each block for X or [ ] and breaks once the answer is found

In [None]:
for i, block in enumerate(question_blocks[:100]):  # Process up to 100 questions
    for line in block.split('\n'):
        line = line.strip()
        if line.startswith('[x]'):  
            answers[i] = option_number  # Store selected option (1-4)
            break
        elif line.startswith('[ ]'):  
            option_number += 1  # Track option positon

* #### `write_answers_sequence()`: Take the extracted list of answers and ID nu
* * > M1: has a section for file writing to overwrite/creates the file

In [None]:
def write_answers_sequence(answer = list, n = int):
    file_name = f"answers_list_respondent_{n}.txt" # Sets file name to n
    f = open(file_name, "w") # Creates file if it's not there before
    answer = str(answer) # Convert to string in order to write
    f.write(answer)
    f.close()   

## Reflections

### `extract_answers_sequence()`i* Error printing while useful can limit the integration usage in future projects
* There's no validation code for the file's content so malformed files can produce error data (Example : The questions could be made incorrectly or randomised)
* Hardcoded for 100 questions when it could be adjusted with another parameter (Example: 50 questions?)
### `write_answers_sequence()`
* the default parameters = list and =int are incorrect (Should be :int and :list)
* There's no validation code for the file's content or the answer list length

## Conclusion
The parsing module does it's core functions effectively but would benefit from:
* Better error handling
* More flexible question handling
* CORRECT Type hintst Summary
When running the 1st function: in an text s

# M2: Download & Collation Module  #

## Objective of `data_preparation_M2.py`
The module handles downloading and combining quiz answer files:
* `download_answer_files()`: Recieves answer files from a cloud storage
* `collate_answer_files()`: Combines individual files into a combined output

## Key Functions Review
* #### `download_answer_files()`: Downloads all the files from a cloud server and stores them.
* > M2 Has code for file handling in the section where a folder will be made if it doesn't exist

In [None]:
    if not os.path.exists(save_folder):
        os.mkdir(save_folder)

* > Download ror debugging  and verification code

In [None]:
        response = requests.get(url)
        if response.status_code == 200:
            with open(file_path, "wb") as f:
                f.write(response.content)
            print(f"Saved: {file_name}")
        else:
            print(f"Failed to download: {url}")None

* > M2 downloads from cloud_url  and sets the destination folder to "save_folder" as "file_name" found

In [None]:
for i in range(1, total_files + 1):
        url = f"{cloud_url}/answers_respondent_{i}.txt"
        file_name = f"answers_respondent_{i}.txt"
        file_path = os.path.join(save_folder, file_name)tion`collate_answer_files()`s_sequs nce(files and combines them into 1 files with all the content. a teM2 Has code for file handling in the section where a folder will be made if it doesn't existates     
        if not os.path.exists("output"):
        os.mkdir("output")

* > M2 creates a files where all the previous will be written into it

In [None]:
    # Open the final file for writing
    with open("output/collated_answers.txt", "w", encoding="utf-8") as out_file:
        # Find and sort all respondent files
        files = [f for f in os.listdir(folder_path) if f.startswith("answers_respondent_")]
        files.sort(key=lambda x: int(x.split("_")[-1].split(".")[0]))

        for i, file_name in enumerate(files):
            file_path = os.path.join(folder_path, file_name)
            with open(file_path, "r", encoding="utf-8") as f:
                content = f.read().strip()
                out_file.write(content)

## Relfections

### `download_answer_files()`
* No way to retry to download any failed download(Ex. If a download fails from a temp network issue)
* There's no verification code for the file's content so corrupted files can produce error data (Example : The questions could be made incorrectly or randomised)
* No type hints (Ex: someone can input a real number)
### `collate_answer_files()`
* No validation for the folder's existence
* There's no validation code for malformed files

## Conclusion
The parsing module does it's core functions effectively but would benefit from:
* Better error handling for files
* Better verfication methods for folders and files
* Type hints

# M3: Analysis Module #

## Objective of `data_preparation_M2.py`
The module handles downloading and combining quiz answer files:
* `generate_means_sequence()`: Recieves answer files from a cloud storage
* `visualize_data()`: Combines individual files into a combined output

## Key Functions Review
* #### `generate_means_sequence()`: Recieves the data from the collated file and calcuates the mean from it.
* > M3 Has code for file reading in the section but there's no verification for its existence

In [None]:
with open(collated_answers_path, 'r', encoding='utf-8') as file:
    raw_data = file.read()

*  Extracting the sequences to write to a temp file

In [None]:
for block in respondent_blocks:
    temp_path = 'temp_response.txt'
    with open(temp_path, 'w', encoding='utf-8') as f:
        f.write(block.strip())
    seq = extract_answers_sequence(temp_path)
    os.remove(temp_path)
    if seq and len(seq) == 100:
        all_sequences.append(seq))

* > M3 Calculates the mean for each question

In [None]:
for i in range(100):
    values = [seq[i] for seq in all_sequences if seq[i] != 0]
    mean = sum(values) / len(values) if values else 0.0
    means.append(mean)
return meansle_name)
visualize_data_answer_files(

* > Takes the files and combines them into 1 files with all the c3 Loads and parses the data into a structured sequences

In [None]:
with open(collated_answers_path, 'r', encoding='utf-8') as file:
    raw_data = file.read()

respondent_blocks = raw_data.strip().split('\n*\n')
all_sequences = []

for block in respondent_blocks:
    temp_path = 'temp_response.txt'
    with open(temp_path, 'w', encoding='utf-8') as f:
        f.write(block.strip())
    seq = extract_answers_sequence(temp_path)
    os.remove(temp_path)
    if seq and len(seq) == 100:
        all_sequences.append(seq3 Generates the plots based off the user's inputus wiif n == 1:
    means = generate_means_sequence(collated_answers_path)
    plt.scatter(range(1, 101), means, color='blue')
    plt.title('Mean Answer Value per Question (Scatter)')
    plt.xlabel('Question Number')
    plt.ylabel('Mean Value')
    plt.grid(True)
    plt.show()

elif n == 2:
    for seq in all_sequences:
        plt.plot(range(1, 101), seq, alpha=0.4)
    plt.title('Individual Respondent Answer Patterns (Line Plot)')
    plt.xlabel('Question Number')
    plt.ylabel('Answer Value')
    plt.grid(True)
    plt.show()

else:
    print("Error: Invalid visualization type. Use 1 for scaer, 2
 for line.")   out_filgenerate_means_sequence``

## Relfections

### `download_answer_files()`
* Temp file creation and deletion isn't the most efficent method(Ex. to prevent duplicated code and data)
* No error hanlding for files
* Duplicated code visualize_data in both functions)
### `collate_answer_files()`
* No figure size or 
* Too many plots in the line charts (Makes it hard to understand)

## Conclusion
This module does it's core functions effectively but would benefit from:
* Modular coding
* Add plot customizing options
* Type hints

# M4: Integration & Execution  #

## Objective of `run_full_analysis_M4.py`
The module handles downloading and combining quiz answer files:
* `download_answer_files()`: Recieves answer files from a cloud storage
* `collate_answer_files()`: Combines individual files into a combined output

## Key Functions Review
* #### `setup_environment()`: Sets-up directory and cleans the log files to allow for no permission issues and validate the filesystem permissions

In [None]:
def setup_environment():
    os.makedirs(CONFIG['data_folder'], exist_ok=True)
    os.makedirs(CONFIG['output_folder'], exist_ok=True)
    with open(CONFIG['log_file'], 'w') as f:
        f.write(f"Analysis Log - {datetime.now()}\n{'='*40}\n")

#### `log_message()`: Logs every major step in both the console and a persistent log file. This helps with tracking execution and debugging errors in the pipeline.

In [None]:
def log_message(message):
    """Log messages to file and console with timestamp."""
    from datetime import datetime
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    log_entry = f"[{timestamp}] {message}"
    print(log_entry)
    with open("analysis_log.txt", 'a') as f:
        f.write(log_entry + "\n")

### `generate_mock_data` Used if file download fails; creates respondent files with a simple answer pattern

In [None]:
generate_mock_data(num_files=5):
    for i in range(1, num_files + 1):
        with open(file_path, 'w') as f:
            for q in range(1, 101):
                if q % 4 == opt % 4:
                    f.write(f"[x] Answer {q}.{opt}\n")40}\n")

### Conclusion
This integration script shows amazing software practice with:
* Great modular design
* Robust error handling
* Detailed logging and comments and files
* Type hints