In [None]:
import pandas as pd
import json
import glob
import os

# Corrected the directory path to include a slash at the end
# directory_path = '<PATH>'

# Use os.path.dirname(os.path.abspath(__file__)) to get the current directory of the script
directory_path = os.path.dirname(os.path.abspath(__file__)) + '/'



# Using glob.glob to match the JSON file extension
json_files = glob.glob(directory_path + '*.json')

dataframes = []

for file_path in json_files:
    with open(file_path, 'r') as file:
        data = json.load(file)
        data = [data] if isinstance(data, dict) else data
        df = pd.json_normalize(data)

        gene_name = df['targetGene.name'].iloc[0] if 'targetGene.name' in df.columns else None

        # Check if 'mapped_scores' is in the DataFrame and it's not empty
        if 'mapped_scores' in df.columns and not df['mapped_scores'].isnull().all():
            # Extracting data from 'mapped_scores'
            for scores in df['mapped_scores'].dropna():
                extracted_data = [{
                    # 'gene_name': gene_name,
                    # 'start_value': item.get('post_mapped', {}).get('variation', {}).get('location', {}).get('interval', {}).get('start', {}).get('value'),
                    'end_value': item.get('post_mapped', {}).get('variation', {}).get('location', {}).get('interval', {}).get('end', {}).get('value'),
                    'Ref': item.get('post_mapped', {}).get('vrs_ref_allele_seq'),
                    'Alt': item.get('post_mapped', {}).get('variation', {}).get('state', {}).get('sequence'),
                    'Functional score': item.get('score')
                } for item in scores if item.get('post_mapped')]

                # Only append if extracted_data is not empty
                if extracted_data:
                    df_extracted = pd.DataFrame(extracted_data)
                    dataframes.append(df_extracted)

# Concatenate all the DataFrames into one if dataframes is not empty
if dataframes:
    final_df = pd.concat(dataframes, ignore_index=True)
    # Export the final DataFrame to a CSV file
    final_csv_path = directory_path + 'MaveDV_data.csv'
    final_df.to_csv(final_csv_path, index=False)
else:
    print("No dataframes were created, please check your JSON structure or 'mapped_scores' content.")

## Analysis of 'MaveDB_JSON_CSV.ipynb' Notebook

The notebook 'MaveDB_JSON_CSV.ipynb' contains a Python script designed to process JSON files and convert them into a structured CSV format. Here's a breakdown of the key steps and components of the script:

1. **Importing Libraries:** The script begins by importing necessary libraries - pandas for data manipulation, json for parsing JSON files, glob for file path matching, and os for operating system interactions.

2. **Setting Directory Path:** The directory path for the JSON files is set using `os.path.dirname(os.path.abspath(__file__))`, ensuring the script works with the current directory of the script.

3. **Reading JSON Files:** The script uses `glob.glob` to find all JSON files in the specified directory. It then iterates through these files, loading and normalizing the JSON data into pandas DataFrames.

4. **Data Extraction and Transformation:** The script extracts specific fields from the JSON data, focusing on 'end_value', 'Ref', 'Alt', and 'Functional score'. It handles nested JSON structures and conditional data extraction.

5. **Concatenating DataFrames:** If any data is extracted, the script concatenates these into a single DataFrame.

6. **Exporting to CSV:** Finally, the script exports the consolidated DataFrame to a CSV file named 'MaveDV_data.csv'.

This script is a comprehensive solution for converting JSON data into a structured CSV format, suitable for further data analysis or integration into data pipelines.

## Analysis of 'MaveDB_JSON_CSV.ipynb' Notebook

The notebook 'MaveDB_JSON_CSV.ipynb' contains a Python script designed to process JSON files and convert them into a structured CSV format. Here's a breakdown of the key steps and components of the script:

1. **Importing Libraries:** The script begins by importing necessary libraries - pandas for data manipulation, json for parsing JSON files, glob for file path matching, and os for operating system interactions.

2. **Setting Directory Path:** The directory path for the JSON files is set using `os.path.dirname(os.path.abspath(__file__))`, ensuring the script works with the current directory of the script.

3. **Reading JSON Files:** The script uses `glob.glob` to find all JSON files in the specified directory. It then iterates through these files, loading and normalizing the JSON data into pandas DataFrames.

4. **Data Extraction and Transformation:** The script extracts specific fields from the JSON data, focusing on 'end_value', 'Ref', 'Alt', and 'Functional score'. It handles nested JSON structures and conditional data extraction.

5. **Concatenating DataFrames:** If any data is extracted, the script concatenates these into a single DataFrame.

6. **Exporting to CSV:** Finally, the script exports the consolidated DataFrame to a CSV file named 'MaveDV_data.csv'.

This script is a comprehensive solution for converting JSON data into a structured CSV format, suitable for further data analysis or integration into data pipelines.