# Data Extraction Script Explanation

This notebook explains the `extract_data` function, which processes student and membership data from JSON files, and then extracts, reformats, and saves this data into a consolidated JSON file.

## 1. Introduction

The `extract_data` function is designed to:
1. Load student and group data from JSON files.
2. Reformat student data and create memberships.
3. Generate external IDs for students.
4. Merge old and new memberships.
5. Save the consolidated data into a JSON file.

## 2. Import Necessary Libraries


The script imports several libraries:
- `json`: To handle JSON data.
- `time`: To add delays where necessary.
- `uuid`: To generate unique identifiers.
- `os`: To interact with the operating system.


## 3. Utility Functions

### `ensure_file_exists` Decorator

This decorator ensures that a file exists before attempting to open it. If the file does not exist, it initializes an empty data structure and writes it to the file.

### Explanation of the Code
- Checks if the file exists.
- If not, creates an empty data structure and writes it to the file.
- Calls the original function with the provided arguments.


In [None]:
def ensure_file_exists(func):
    def wrapper(*args, **kwargs):
        # Extract the `path` argument from the function's arguments
        path = kwargs.get("path") if "path" in kwargs else args[0]

        if os.path.exists(path):
            # If the file exists, proceed with the original function
            return func(*args, **kwargs)
        else:
            # If the file doesn't exist, initialize an empty data structure
            data = {}

            # Write the empty data to the file
            with open(path, "w", encoding="utf-8") as file:
                json.dump(data, file, indent=4)

            # Now call the original function
            return func(*args, **kwargs)

    return wrapper


### `open_file` Function

This function opens a JSON file and loads the data corresponding to a specified key (`data_name`). If the file is empty or the key does not exist, it returns an empty list.

### Explanation of the Code
- Opens the specified file and reads its contents.
- Loads the data for the specified key.
- Returns the data or an empty list if the key does not exist.


In [None]:
@ensure_file_exists
def open_file(path, data_name):
    if os.path.getsize(path) > 0:
        with open(path, "r", encoding="utf-8") as data_file:
            data = json.load(data_file)
            if data_name in data:
                out_data = data[data_name]
            else:
                out_data = []
    else:
        out_data = []

    return out_data


## 4. `extract_data` Function

This function orchestrates the extraction and transformation of student and membership data. It loads data from various JSON files, processes it, and saves the consolidated data into a single JSON file.

### Explanation of the Code
- Loads student, group, and membership data from JSON files.
- Reformats student data and creates new memberships.
- Generates external IDs for students using the `create_externalids` function.
- Merges old and new memberships to remove duplicates.
- Saves the final extracted data into a JSON file.


In [None]:
@config_check_web(config_web["extract_data"])
def extract_data():

    # Load the data from the JSON files
    students = open_file("utils/c_students_data.json", "users")
    groups = open_file("utils/b_groups_data.json", "groups")

    # Create a dictionary of old memberships based on group_id and user_id
    old_memberships = open_file("systemdata.json", "memberships")
    old_memberships_dict = {
        (m["user_id"], m["group_id"]): m["id"] for m in old_memberships
    }

    # Define the new data structures for memberships and users
    memberships = []
    users = []

    # Reformat the student data and create new memberships
    for student in students:
        # Split the name into parts
        name_parts = student["name"].split()

        # Assume the first part is the surname and the rest is the first name
        surname = name_parts[0]
        name = " ".join(name_parts[1:])

        membership_id = old_memberships_dict.get(
            (student["id"], student["group_id"]), str(uuid.uuid4())
        )

        memberships.append(
            {
                "id": membership_id,
                "user_id": student["id"],
                "group_id": student["group_id"],
                "valid": True,
            }
        )

        users.append(
            {
                "id": student["id"],
                "name": name,
                "surname": surname,
                "email": student["email"],
            }
        )
    list_length = len(memberships)
    print(f"Number of memberships: {list_length}")

    # Create externalidtypes
    externalidtypes = []

    # Default externalidtype for MojeAP-Student from gqlUG
    externalidtypes.append(
        {
            "id": "d5bfe043-f82e-4d24-baa2-524a4f443ed0",
            "name": "MojeAP-Student",
            "name_en": "UCO",
            "urlformat": "https://apl.unob.cz/MojeAP/Student/%s",
            "category_id": "0ee3a92d-971f-499a-956f-ca6edb8d6094",
        }
    )

    # Create externalids
    externalids = create_externalids(students, memberships, externalidtypes)

    # Merge the old memberships with the new ones
    membership_dict = {}

    for m in old_memberships:
        membership_dict[m["id"]] = m

    for m in memberships:
        membership_dict[m["id"]] = {**membership_dict.get(m["id"], {}), **m}

    filtered_memberships = list(membership_dict.values())

    # Remove the URL parameter from the groups
    for group in groups:
        if "url" in group:
            del group["url"]

    # Save the extracted data to a JSON file
    extracted_data = {
        "externalidtypes": externalidtypes,
        "externalids": externalids,
        "groups": groups,
        "users": users,
        "memberships": filtered_memberships,
    }

    with open("systemdata.json", "w", encoding="utf-8") as outfile:
        json.dump(extracted_data, outfile, ensure_ascii=False, indent=4)

## 5. Conclusion

In this notebook, we have broken down the `extract_data` function and its supporting functions, explaining each part and its purpose. This script is useful for extracting and organizing student and membership information from various JSON files and saving it in a consolidated format.

Ensure you have the necessary utility functions (`create_externalids`, `config_check_web`, `open_file`) and configurations available in your project for this script to run correctly.


