# Sunburst Plots for Metabolites:

**Updated on:** 2023-03-21 10:40 CET

In this Jupyter Notebook, we use the "CCEMetabolites_SunburstInfo" files generated by "StackedBarPlot_Metabolites.ipynb" notebook to create sunburst charts.

**Authors**: Abzer Kelminal (abzer.shah@uni-tuebingen.de) <br>
**Input file format**: .csv files <br>
**Outputs**: .svg images  <br>
**Dependencies**: numpy, pandas, plotly

---
**Necessary input files**:
 The "CCEMetabolites_SunburstInfo" csv files of all cycles: From Cycle 1 to Cycle 4
 
---

# 1. Loading packages and setting working directory

In [2]:
# installing necessary packages (omitted for now)
! pip install plotly

Collecting plotly
  Downloading plotly-5.13.1-py2.py3-none-any.whl (15.2 MB)
     ---------------------------------------- 0.0/15.2 MB ? eta -:--:--
     ---------------------------------------- 0.0/15.2 MB ? eta -:--:--
     ---------------------------------------- 0.1/15.2 MB 1.2 MB/s eta 0:00:14
      --------------------------------------- 0.2/15.2 MB 2.2 MB/s eta 0:00:07
     - -------------------------------------- 0.5/15.2 MB 3.0 MB/s eta 0:00:05
     - -------------------------------------- 0.8/15.2 MB 3.7 MB/s eta 0:00:04
     -- ------------------------------------- 1.1/15.2 MB 4.2 MB/s eta 0:00:04
     --- ------------------------------------ 1.5/15.2 MB 5.1 MB/s eta 0:00:03
     ----- ---------------------------------- 1.9/15.2 MB 5.6 MB/s eta 0:00:03
     ----- ---------------------------------- 2.2/15.2 MB 5.7 MB/s eta 0:00:03
     ------ --------------------------------- 2.6/15.2 MB 6.0 MB/s eta 0:00:03
     -------- ------------------------------- 3.1/15.2 MB 6.4 MB/s e

In [None]:
! pip install -U kaleido #to save the plotly images as svg

In [37]:
# importing necessary modules
import pandas as pd
import numpy as np
import os
import plotly.express as px
import plotly.graph_objects as go
import kaleido
import datetime

In [None]:
# pip show kaleido             #to check if a particular dependency is already installed, here, kaleido

In [58]:
#Setting working directory
directory = input("Enter the path of the folder in the output cell:\n")
os.chdir(directory)

Enter the path of the folder in the output cell:
 G:\My Drive\CCE DATA\Sunburst Plots\Sunburst_Plots_Metabolites


In [5]:
#get the current working directory (to check)
path= os.getcwd()
path

'G:\\My Drive\\CCE DATA\\Sunburst Plots\\Sunburst_Plots_Metabolites'

# 2. Load the input files

In [6]:
#Lists all the csv files in the working directory:

# to store files in a list
names = []

for x in os.listdir(): 
    if x.endswith(".csv"):
        print(x) # Prints only csv file present in My Folder
        names.append(x) #adding the filenames to the list 'names'

230309_CCEMetabolites_SunburstInfo_Cycle_1_day1.csv
230309_CCEMetabolites_SunburstInfo_Cycle_1_day2.csv
230309_CCEMetabolites_SunburstInfo_Cycle_1_day3.csv
230309_CCEMetabolites_SunburstInfo_Cycle_2_day1.csv
230309_CCEMetabolites_SunburstInfo_Cycle_2_day2.csv
230309_CCEMetabolites_SunburstInfo_Cycle_2_day3.csv
230309_CCEMetabolites_SunburstInfo_Cycle_2_day4.csv
230309_CCEMetabolites_SunburstInfo_Cycle_3_day1.csv
230309_CCEMetabolites_SunburstInfo_Cycle_3_day2.csv
230309_CCEMetabolites_SunburstInfo_Cycle_3_day3.csv
230309_CCEMetabolites_SunburstInfo_Cycle_4_day1.csv
230309_CCEMetabolites_SunburstInfo_Cycle_4_day2.csv


In [7]:
names[0]

'230309_CCEMetabolites_SunburstInfo_Cycle_1_day1.csv'

# 3. Read the input files

Here, we read particular columns of the csv file, then replace NAN values to be empty. Finally, we add a column 'middle' with the cycle name of the file. This 'middle' will be used as the main branch (the centre part) of the sunburst chart.

In [8]:
df = pd.read_csv(names[0])[["ClassyFire.superclass","ClassyFire.class","ClassyFire.subclass","ClassyFire.level.5","Avg"]].replace(np.nan,' ')
df['middle'] = np.array(['Cycle 1 Day 1']*len(df))
df.head()

Unnamed: 0,ClassyFire.superclass,ClassyFire.class,ClassyFire.subclass,ClassyFire.level.5,Avg,middle
0,Lipids and lipid-like molecules,Prenol lipids,Terpene lactones,,177304.370333,Cycle 1 Day 1
1,Organic acids and derivatives,Carboxylic acids and derivatives,"Amino acids, peptides, and analogues",Amino acids and derivatives,26484.540275,Cycle 1 Day 1
2,Organic acids and derivatives,Carboxylic acids and derivatives,"Amino acids, peptides, and analogues",Peptides,0.0,Cycle 1 Day 1
3,Lipids and lipid-like molecules,Prenol lipids,Terpene lactones,Diterpene lactones,137584.26625,Cycle 1 Day 1
4,Organic acids and derivatives,Carboxylic acids and derivatives,Tricarboxylic acids and derivatives,,93198.66375,Cycle 1 Day 1


In [9]:
df.shape # gives the number of rows and columns

(8389, 6)

# 4. Sunburst Chart visualization:

## i. Visualize the 1st sunburst chart:

In [10]:
fig= px.sunburst(df,
                 path=["middle","ClassyFire.superclass","ClassyFire.class","ClassyFire.subclass","ClassyFire.level.5"],
                 values="Avg", 
                 color='ClassyFire.superclass',
                 width=1000,height=1000,
                )
#fig.update_traces(labels=['',] * len(fig.data[0]['labels'])) #to turn off the labels
fig.show(renderer='browser') #result shows in a browser window

## ii. Changing the colors of Superclass levels in surburst charts:

In [11]:
#get the SuperClass (SPC) names
SPC_names =pd.DataFrame(df["ClassyFire.superclass"].value_counts(dropna=False)) #counting the dataframe based on superclass names
class_list= np.array(SPC_names.index.values) #getting the index (rownames) into an array
class_list.sort() #sort the array in Alphabetical order
print(class_list)

['Alkaloids and derivatives' 'Benzenoids' 'Hydrocarbons'
 'Lignans, neolignans and related compounds'
 'Lipids and lipid-like molecules'
 'Nucleosides, nucleotides, and analogues' 'Organic 1,3-dipolar compounds'
 'Organic acids and derivatives' 'Organic nitrogen compounds'
 'Organic oxygen compounds' 'Organohalogen compounds'
 'Organoheterocyclic compounds' 'Organophosphorus compounds'
 'Organosulfur compounds' 'Phenylpropanoids and polyketides']


---
Create a color_list and assign those colors to the superclass level:

In [12]:
color_list = ['#ff925f','#0152d5','#99d849','#a84ed5','#f09400',
              '#5d3da9','#01673c','#e17fff','#625108','#0071c3',
              '#993300','#01cc1d1a','#ff7987','#e7be8f','#902f52']

#create a new dictionary 'color_dict' with the 'color_list' colors assigned to 'class_list'
color_dict =  dict(zip(class_list, color_list))
color_dict

{'Alkaloids and derivatives': '#ff925f',
 'Benzenoids': '#0152d5',
 'Hydrocarbons': '#99d849',
 'Lignans, neolignans and related compounds': '#a84ed5',
 'Lipids and lipid-like molecules': '#f09400',
 'Nucleosides, nucleotides, and analogues': '#5d3da9',
 'Organic 1,3-dipolar compounds': '#01673c',
 'Organic acids and derivatives': '#e17fff',
 'Organic nitrogen compounds': '#625108',
 'Organic oxygen compounds': '#0071c3',
 'Organohalogen compounds': '#993300',
 'Organoheterocyclic compounds': '#01cc1d1a',
 'Organophosphorus compounds': '#ff7987',
 'Organosulfur compounds': '#e7be8f',
 'Phenylpropanoids and polyketides': '#902f52'}

---
Add the corresponding new colors into a 'hex' column of the 'df' dataframe and visualize the sunburst chart with the new colors:

In [14]:
# Add a column 'hex' with 
df['hex'] = np.array([color_dict[superclass] for superclass in df['ClassyFire.superclass']]) 

In [15]:
colorMapSubset = dict(zip(df.hex, df.hex)) #another dictionary with the colors

In [16]:
df.head()

Unnamed: 0,ClassyFire.superclass,ClassyFire.class,ClassyFire.subclass,ClassyFire.level.5,Avg,middle,hex
0,Lipids and lipid-like molecules,Prenol lipids,Terpene lactones,,177304.370333,Cycle 1 Day 1,#f09400
1,Organic acids and derivatives,Carboxylic acids and derivatives,"Amino acids, peptides, and analogues",Amino acids and derivatives,26484.540275,Cycle 1 Day 1,#e17fff
2,Organic acids and derivatives,Carboxylic acids and derivatives,"Amino acids, peptides, and analogues",Peptides,0.0,Cycle 1 Day 1,#e17fff
3,Lipids and lipid-like molecules,Prenol lipids,Terpene lactones,Diterpene lactones,137584.26625,Cycle 1 Day 1,#f09400
4,Organic acids and derivatives,Carboxylic acids and derivatives,Tricarboxylic acids and derivatives,,93198.66375,Cycle 1 Day 1,#e17fff


In [26]:
fig = px.sunburst(df,
                  path=['middle','ClassyFire.superclass', 'ClassyFire.class', 'ClassyFire.subclass', 'ClassyFire.level.5'],
                  values='Avg',
                  color='hex', #specifying the column for coloring
                  color_discrete_map=colorMapSubset,
                  width=1000,
                  height=1000)
fig.show(renderer='browser')

In [18]:
# creating a folder 'images'
if not os.path.exists("images"):
    os.mkdir("images")

In [20]:
fig.write_image("images/c1d1.svg") #writing the figure as SVG

## iii. FOR loop to get all Sunburst plots

Let's create a list with names for the output svg file:

In [38]:
Date = datetime.date.today() # Get the current date
date_string = Date.strftime("%Y-%m-%d") # Format the date as YYYY-MM-DD

In [46]:
Figure_names = ['Cycle1_Day1', 'Cycle1_Day2', 'Cycle1_Day3',
                'Cycle2_Day1', 'Cycle2_Day2', 'Cycle2_Day3', 'Cycle2_Day4',
                'Cycle3_Day1', 'Cycle3_Day2', 'Cycle3_Day3',
                'Cycle4_Day1', 'Cycle4_Day2']

In [52]:
file_name = []
for name in range(len(Figure_names)):
    x = f"{date_string}_{Figure_names[name]}_Sunburst_Plot.svg"
    file_name.append(x)

In [53]:
file_name

['2023-03-21_Cycle1_Day1_Sunburst_Plot.svg',
 '2023-03-21_Cycle1_Day2_Sunburst_Plot.svg',
 '2023-03-21_Cycle1_Day3_Sunburst_Plot.svg',
 '2023-03-21_Cycle2_Day1_Sunburst_Plot.svg',
 '2023-03-21_Cycle2_Day2_Sunburst_Plot.svg',
 '2023-03-21_Cycle2_Day3_Sunburst_Plot.svg',
 '2023-03-21_Cycle2_Day4_Sunburst_Plot.svg',
 '2023-03-21_Cycle3_Day1_Sunburst_Plot.svg',
 '2023-03-21_Cycle3_Day2_Sunburst_Plot.svg',
 '2023-03-21_Cycle3_Day3_Sunburst_Plot.svg',
 '2023-03-21_Cycle4_Day1_Sunburst_Plot.svg',
 '2023-03-21_Cycle4_Day2_Sunburst_Plot.svg']

Also, defining cycle names to appear in the middle of each sunburst chart:

In [21]:
Cycle_names = ['Cycle 1 Day 1', 'Cycle 1 Day 2', 'Cycle 1 Day 3',
               'Cycle 2 Day 1', 'Cycle 2 Day 2', 'Cycle 2 Day 3', 'Cycle 2 Day 4',
               'Cycle 3 Day 1', 'Cycle 3 Day 2', 'Cycle 3 Day 3',
               'Cycle 4 Day 1', 'Cycle 4 Day 2']

Finally, creating all the sunburst charts in a for loop and automatically saving it in the images folder:

In [57]:
for i in range(len(names)):
    df = pd.read_csv(names[i])[["ClassyFire.superclass","ClassyFire.class","ClassyFire.subclass","ClassyFire.level.5","Avg"]].replace(np.nan,' ')
    df['middle'] = np.array([Cycle_names[i]]*len(df))
    df.head()
    
    # Add a column 'hex' with 
    df['hex'] = np.array([color_dict[superclass] for superclass in df['ClassyFire.superclass']]) 
    colorMapSubset = dict(zip(df.hex, df.hex)) 
    
    fig = px.sunburst(df,
                      path=['middle','ClassyFire.superclass', 'ClassyFire.class', 'ClassyFire.subclass', 'ClassyFire.level.5'],
                      values='Avg',
                      color='hex', 
                      color_discrete_map=colorMapSubset,
                      width=1000, height=1000)
    
    fig.update_traces(labels=['',] * len(fig.data[0]['labels'])) #to turn off the labels
    
    fig.show(renderer='browser')
    fig.write_image(f"images/{file_name[i]}")