# Creating a File Structure for Deployment

In developing Streamlit applications, managing the files and paths to datasets and model artifacts is crucial for clean and maintainable code. Using JSON (JavaScript Object Notation) to store and track these file paths offers a structured, readable, and flexible approach to handling these resources. In this lesson, we'll explore using a JSON configuration file to manage paths within a Streamlit app effectively.

A streamlit dashboard can have many aspects/components, all with their own assets. It can take time to run the modeling code live, leaving the user waiting, so we pre-save as many of the components and variables we will need as possible.

Some of the files we may want to track are:


- data:
    - Full dataframe of dataset used
    - EDA/preview DataFrame
    - Data from Machine Learning (ML) Models (if applicable):
        - train-data.joblib (X_train, y_train)
        - test-data.joblib (X_test, y_test)
    - Data from Neural Network (NN) Models (if applicable):
        - TF records (saving Tensorflow Datasets)
- models:
    - for Machine Learning (ML): joblib file or tensorflow model folder)
    - for Neural Networks (NN): saved keras model folder.
- images:
    - App assets (e.g. banner png image)
    - Saved Figures
- config:
    - a file for tracking all of the filepaths above.

In [1]:
from pprint import pprint
FPATHS = dict(
    data={
        "raw": {
            "full": "data/ames-housing-dojo-for-ml.csv",  # (This is the original full dataframe we already have)
            "eda": "data/ames-housing-dojo-for-ml-eda.csv" # We haven't saved this yet
        },
        "ml": {
            "train": "data/training-data.joblib",  # (X_train,y_train) We haven't saved this yet
            "test": "data/testing-data.joblib",  # (X_test,y_test) We haven't saved this yet
        },
    },
    models={
        "linear_regression": "models/linear_regression/linreg.joblib", # We haven't saved this yet
        "random_forest": "models/random_forest/rf_reg.joblib", # We haven't saved this yet
    },
    images={
        "banner": "images/app-banner.png", # We haven't saved this yet
    },
)
pprint(FPATHS)

{'data': {'ml': {'test': 'data/testing-data.joblib',
                 'train': 'data/training-data.joblib'},
          'raw': {'eda': 'data/ames-housing-dojo-for-ml-eda.csv',
                  'full': 'data/ames-housing-dojo-for-ml.csv'}},
 'images': {'banner': 'images/app-banner.png'},
 'models': {'linear_regression': 'models/linear_regression/linreg.joblib',
            'random_forest': 'models/random_forest/rf_reg.joblib'}}


We can then save the dictionary to a file immediately since we will use the file paths within it to name the files we will save. There are several places/ways you could do this, but we will demonstrate creating a "config/" folder, which will contain our dictionary of file paths as "config/filepaths.json"

In [2]:
 ## Save the filepaths 
import os, json
os.makedirs('config/', exist_ok=True)
FPATHS_FILE = 'config/filepaths.json'
with open(FPATHS_FILE, 'w') as f:
    json.dump(FPATHS, f)

Now that we have defined our file paths, we can optionally look through our FPATHS dictionary, find the directory name (folder name) in the file path, and then use the os module to create the directories. This will avoid issues saving files later if the folder hasn't been created yet. Below, we have a function to accomplish this with our nested dictionary.

The function below will create all of the directories (folders) specified in our dictionary:

In [3]:
import os
def create_directories_from_paths(nested_dict):
    """OpenAI. (2023). ChatGPT [Large language model]. https://chat.openai.com 
    Recursively create directories for file paths in a nested dictionary.
    Parameters:
    nested_dict (dict): The nested dictionary containing file paths.
    """
    for key, value in nested_dict.items():
        if isinstance(value, dict):
            # If the value is a dictionary, recurse into it
            create_directories_from_paths(value)
        elif isinstance(value, str):
            # If the value is a string, treat it as a file path and get the directory path
            directory_path = os.path.dirname(value)
            # If the directory path is not empty and the directory does not exist, create it
            if directory_path and not os.path.exists(directory_path):
                os.makedirs(directory_path)
                print(f"Directory created: {directory_path}")

# Use the function on your FPATHS dictionary
create_directories_from_paths(FPATHS)

Directory created: models/linear_regression
Directory created: models/random_forest


If you return to your Jupyter Home tab, you will see new folders have been created. For example, there is a "models" folder containing two folders: "linear_regression" and "random_forest".

We can access a file name using our dictionary. The example below accesses a file name for a file that exists.

In [4]:
# We can access a file using our dictionary
FPATHS['data']['raw']['full']

'data/ames-housing-dojo-for-ml.csv'

The example below accesses a file name for a file we have not yet created.

In [5]:
# We can access a file using our dictionary
FPATHS['models']['random_forest']

'models/random_forest/rf_reg.joblib'

Now that we have created our file structure, we must create the rest of the files we will need for our app.

**Banner Image**

In [6]:
# Confirm the images is in the correct location
from IPython.display import display, Markdown
Markdown(f"<img src='{FPATHS['images']['banner']}'>")

<img src='images/app-banner.png'>

### Summary

In this lesson, we defined a dictionary to organize the files we will use in our deployment app. We plan to use data, model, and image files for our app, so we have used those as our top-level keys. We created all of the folders we will need from our file structure dictionary. Next, we will make the rest of the required files and use our convenient dictionary to save them according to our file structure.