# Production 1: Data translation for storage

To enable a program to function each time it runs there needs to be an external and persistent data storage system that retains the state of the data. There are two considerations here. The physical storage medium for the data, such as a file or database and the format/struture of the data. This week you have explored a number of different formats in both regards as follows:

- CSV
- XML
- JSON
- MongoDB
- SQL Database

Select ONE format that you consider as most suited to the data in the scenario and the aims of the program (client brief/or your own data). The format selected should support both the nature of the data and the aims of the application being designed. It should provide distinct advantages and minimal limitations over other data formats. It should not be selected solely because it is the easiest to program, although this can be included as an advantage if applicable.

## Design

Produce a model that shows how the data needs to be restructured to take best advantage of the selected format and work more effectively within the program. Where you have created groups or objects from the data show how they relate to each other.

## Implementation

Implement a parser that reads in the original data file. You may want to create a subset of the data file for testing and speed. Your program should then perform the translation from the original format/structure into your selected format. The result of this process should then be outputted to its relevant physical medium (files/database).

At this stage there is no requirement to handle data types (other than those inherent in the data format, i.e. numbers and “Strings”), conversions or missing data. The program can be demonstrated as a simple console based application, requiring the input of the file name by the user and sufficient output to demonstrate the correctness of the translation process.

Your program should produce regular output statements to the console so that it is easy to follow what the program is doing and provides a visual demonstration of the translation process. This will also eb handy for any debugging required.

## Reflection on design decisions

Write a 200-word reflection that states the reason for your format selection and the advantages the format leads to the data and application and any limitations on the future use of this data within the selected format.

In [22]:
import json, os

In [23]:
CSV_DIR = 'datasets/'
JSON_DIR = 'json/'

In [24]:
# CSV parser


class CSVParser:
    def __init__(self, csv_file):
        self.__file = csv_file
        self.__headers = []
        self.__data = []
        self.__parse()

    def __readln(self, line):
        return line.strip().split(",")

    def __parse(self):
        with open(self.__file, "r") as f:
            for line in f:
                if not self.__headers:
                    self.__headers = self.__readln(line)
                else:
                    self.__data.append(self.__readln(line))

    @property
    def headers(self):
        return self.__headers

    @property
    def data(self):
        return self.__data

    def to_dict(self):
        return [
            {self.__headers[i]: row[i] for i in range(len(self.__headers))}
            for row in self.__data
        ]

    def to_json(self):
        return json.dumps(self.to_dict(), indent=4)

In [25]:
# main functions


def parse_csv():
    csv_filename = input("Enter the csv filename: ")
    csv_filepath = CSV_DIR + csv_filename
    if not os.path.exists(csv_filepath):
        print(f"File {csv_filepath} not found")
        return None
    print(f"Parsing CSV file {csv_filename}...")
    csv_parser = CSVParser(csv_filepath)
    print(f"CSV file {csv_filename} parsed successfully")
    return csv_parser


def export_json(csv_parser):
    json_filename = input("Enter the json filename: ")
    json_filepath = JSON_DIR + json_filename
    print(f"Exporting JSON file {json_filename}...")
    os.makedirs(os.path.dirname(json_filepath), exist_ok=True)
    with open(json_filepath, "w") as f:
        f.write(csv_parser.to_json())
    print(f"JSON file {json_filename} exported successfully")


def main():
    csv_parser = parse_csv()
    if csv_parser:
        print()
        export_json(csv_parser)
    else:
        print("Exited due to error")

In [None]:
main()