# NSMQ - Kwame AI Project

###### Title: A script For converting a well structured json file into txt file.
###### By: Ernest Samuel, Team member; Data preprocessing Team
###### Date: 24-06-2023

# Data processing functions

The following functions are used to process the data and convert it into a .txt file.

* `convert_dict_to_strings()`: This function converts a dictionary of strings to a list of strings.
* `convert_list_to_strings()`: This function converts a list of lists to a list of strings.
* `extract_and_save_data()`: This function is the main function that reads the JSON file, converts the data, and saves it to a .txt file.

The `extract_and_save_data()` function first reads the JSON file. It then gets the folder path and file name of the input file. It then creates a list to store the formatted sections.

The function then iterates through the pages and sections in the JSON file. For each section, it creates a formatted section by adding the section title, paragraphs, lists, tables, and figures. The formatted section is then appended to the list of formatted sections.

The function then creates the output file name and saves the formatted sections to the file.



In [3]:
# Data processing funtions

def convert_dict_to_strings(dict_of_strings):
  """
  Converts a dictionary of strings to a list of strings.

  Args:
    dict_of_strings: A dictionary of strings.

  Returns:
    A list of strings.
  """
  new_list = []
  if isinstance(dict_of_strings, dict):
    raise TypeError("dict_of_strings must not be a dictionary")

  for dic in dict_of_strings: 
    for key, value in dic.items():
      new_list.append(f"{key}: {value}")

  return new_list


def convert_list_to_strings(list_of_lists):
  """
  Converts a list of lists to a list of strings.

  Args:
    list_of_lists: A list of lists.

  Returns:
    A list of strings.
  """
  new_list = []
  for list_item in list_of_lists:
    new_list.append("\n".join(map(str, list_item)))
  return new_list

In [6]:
# Main .TXT file convertion function

import json
import os

def extract_and_save_data(input_filename):
    # Read the JSON file
    with open(input_filename, 'r') as json_file:
        data = json.load(json_file)

    # Get the folder path and file name
    # folder_path, file_name = os.path.split(input_filename)
    # base_name, _ = os.path.splitext(file_name)

    # Get the directory and base name of the input file
    input_dir, base_name = os.path.split(input_filename)
    base_name_without_extension, _ = os.path.splitext(base_name)

    # Create the output folder if it doesn't exist
    output_folder = os.path.join(input_dir, "output_txt_files")
    os.makedirs(output_folder, exist_ok=True)

    # Create a list to store formatted sections
    formatted_sections = []

    # Iterate through the pages and sections
    for page_data in data:
        for page_title in page_data.keys():  # Get the page title
            # print(page_title)
            sections = page_data[page_title]
            for section in sections:
                formatted_section = f"__section__\n**{section['title']}**\n"
                formatted_section += "\n\n_paragraph_ \n".join(section['Section']) + "\n"
                #formatted_section += "\n**Lists**\n"
                formatted_section += "\n".join(convert_list_to_strings(section['lists'])) + "\n"
                formatted_section += "\n**Table**\n"
                formatted_section += "\n".join(convert_list_to_strings(section['tables'])) + "\n"
                formatted_section += "\n**Figures**\n"
                formatted_section += "\n".join(convert_dict_to_strings(section['figures'])) + "\n\n"
                formatted_sections.append(formatted_section)

    # Create the output file name
    # output_filename = os.path.join(folder_path, f"{base_name}.txt")
    output_file_name = os.path.join(output_folder, f"{base_name_without_extension}.txt")


    # Save the formatted sections to a text file
    with open(output_file_name, 'w') as output_file:
        output_file.write("\n\n".join(formatted_sections))

    print(f"{base_name_without_extension} Data extracted and saved as .txt successfully in {output_folder} .")



Run Script.
NOTE: Make sure the Json file you are runing is the updated version.

In [14]:
# Call the function with the input filename
input_filename = 'College Algebra 2e.json'  # Replace with the actual input file name
extract_and_save_data(input_filename)


College Algebra 2e Data extracted and saved as .txt successfully in output_txt_files .
