Step 3 - Plot TFs

This script enables you to compare the impact of modification of a certain TF in different conditions. It plots one TF data (overexpression, overexpression with FL protein, knock-out and control strain) for one condition variant in one subplot. It plots mean of the data for one strain with standard deviation as an opaque band.

This visualisation will hopefully facilitate the conclusions of an impact of TF on Yarrowia phenotype.

You need to run the following cell only once (if you haven't done it in the Step 2 script). Than you can hash it (#), because it takes long time to install again the package, which you don't need to waste.

In [2]:
import altair as alt
from datetime import datetime
import vl_convert as vlc
import glob
import pandas as pd
import os

If you want to use the latest excel file generated by the 'Step 1' script, you don't need to change anything in the following cell.

However, you might want to go back to some previous files generated by you - in such case mark the first part of the script with triple quotes, unquote the second part, and enter the filename that interests you.

In the output of the following cell you can see which file is used for the analysis.

In [3]:
# Import the data

IMPORT_PATH = os.path.join(os.getcwd(), "output_data")

# Use the latest excel file as an input (default option)
available_results = glob.glob(os.path.join(IMPORT_PATH, '*growth_data.xlsx'))
latest_file = max(available_results)
data = pd.read_excel(latest_file)

'''
Or enter the exact filename generated in the 01_data_import.ipynb script as an input

data = pd.read_excel('XXX_growth_data.xlsx')
'''

print(latest_file)

\\wsl.localhost\Ubuntu\home\marysia\stress_resistance_msc\output_data\2025-05-13_growth_data.xlsx


In [4]:
data_copy = data.copy()

In [5]:
# Calculating standard deviation

data_copy['mean_growth'] = data_copy.groupby(
    ['strain_name', 'condition', 'modification', 'time']
)['growth'].transform('mean')

data_copy['std_growth'] = data_copy.groupby(
    ['strain_name', 'condition', 'modification', 'time']
)['growth'].transform('std')

# Then compute lower and upper bounds
data_copy['lower'] = data_copy['mean_growth'] - data_copy['std_growth']
data_copy['upper'] = data_copy['mean_growth'] + data_copy['std_growth']

In [6]:

control_data = data_copy[data_copy['modification'] == 'control']

list_TFs = data['TF'].unique()


loop_data_for_plotting = []

for tf in list_TFs:
    if tf != 'control':
        control_duplicate = control_data.copy()
        control_duplicate['plot_config'] = tf
        
        tf_data = data_copy[data_copy['TF'] == tf].copy()
        tf_data['plot_config'] = tf
        
        tf_combined = pd.concat([control_duplicate, tf_data], ignore_index=True)
        loop_data_for_plotting.append(tf_combined)
        
    else:
        pass

data_for_plotting = pd.concat(loop_data_for_plotting, ignore_index=True)

In [7]:
width = 300
height = 300

In the following cell you can enter the order in which you want your rows (transcription factor) and columns (variant of condition) to appear.
By default is will be displayed in the order as in 'list_TFs' variable. You can just pass the empty brackets and it will stay default, or specify the order.
To do so, you need to enter the exact names of TFs/conditions in quotes, separated by a comma, e.g ['test3', 'test', 'test2'].

To prompt you from what values you can choose, the following code will also print all the availavle TFs and conditions (but don't include 'control' in specifying rows, as it is plotted in each row regardless - you can specify the order of 'control_prot')

In [8]:
print(list_TFs)
print(data['condition'].unique())

row_order = ['Dal81', 'Hap1', 'Mhy1', 'Msn4', 'Msn4w', 'Msn4m', 'Msn4b', 'TF009', 'TF011', 'TF036', 'Yas1', 'control_prot']
column_order = []

['control' 'TF011' 'Mhy1' 'Hap1' 'TF036' 'control_prot' 'TF009' 'Yas1'
 'Dal81' 'Msn4' 'Msn4w' 'Msn4m' 'Msn4b']
['test' 'test2' 'test3']


In [9]:
base = alt.Chart(data_for_plotting).properties(
    width=200,
    height=200
)

control_layer = alt.Chart(data_for_plotting).transform_filter(
    alt.datum.modification == 'control'
).mark_line(strokeDash=[4, 4], color='black').encode(
    x=alt.X('time:Q', title='Time [h]'),
    y=alt.Y('average(growth):Q', title='Average Growth [OD\u2086\u2080\u2080]'),
    detail='modification:N'
)

band_layer_control = alt.Chart(data_for_plotting).transform_filter(
    alt.datum.modification == 'control'
).mark_area(opacity=0.2).encode(
    x=alt.X('time:Q'),
    y=alt.Y('lower:Q'),
    y2='upper:Q'
)


mod_layer = base.transform_filter(
    alt.datum.modification != 'control'
).mark_line(point=True).encode(
    x=alt.X('time:Q', title='Time [h]'),
    y=alt.Y('average(growth):Q', title='Average Growth [OD\u2086\u2080\u2080]'),
    color=alt.Color('modification:N', title='Modification',
                    scale=alt.Scale(
                        domain=['KO', 'OE', 'OE_prot'],
                        range=['#3295a8', '#b84839', '#edbc1c']),
    )
)

band_layer = alt.Chart(data_for_plotting).mark_area(opacity=0.2).encode(
    x=alt.X('time:Q'),
    y=alt.Y('lower:Q'),
    y2='upper:Q',
    color=alt.Color('modification:N',
        scale=alt.Scale(
            domain=['KO', 'OE', 'OE_prot'],
            range=['#3295a8', '#b84839', '#edbc1c']
        ),
        #legend=None  # hide duplicate legend
    )
)


layered = alt.layer(mod_layer, band_layer, band_layer_control, control_layer)

chart = layered.facet(
    row=alt.Row('plot_config:N', sort = row_order, title='TF'),
    column=alt.Column('condition:N', sort = column_order, title='Condition'),
    spacing=10
).configure_view(
    continuousWidth=width,
    continuousHeight=height
).resolve_scale(
    y='independent'
)

chart

You can change the name of the figure in the following cell, as well as the number of pixels per inch (ppi).
You need to remember that each time you run the code the saved figure will overwrite itself, so if you want to keep the previous versions of your figures you have to change the name each time.

In [10]:
VISUALISATION_PATH = os.path.join(os.getcwd(), "visualisations")

current_date = datetime.now().strftime("%Y-%m-%d")
output_filename = f"{current_date}_figure_2.png"
output_path = os.path.join(VISUALISATION_PATH, output_filename)

chart.save(output_path, ppi=600)