<a href="https://colab.research.google.com/github/Ash100/DaS/blob/main/Plot_your_data_on_Plotnine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Plotting with Plotnine
This notebook is compiled by **Dr. Ashfaq Ahmad**. It intends the creation of biplots figures or simple plots (Line and Histograms). I particularly test them on RMSD.dat and RoG obtained in Molecular dynamics simulations files. How to use this notebook? Please watch a video tutorial "Plotline Plots via Jupyter Notebook" on https://www.youtube.com/@Bioinformaticsinsights

Best Regards,


In [None]:
#Install Plotnine
!pip install pandas plotnine

In [2]:
import warnings
warnings.filterwarnings('ignore')


**Important**. Incase you want to load data from Google Drive. Incase of a direct upload, you do not need to import Google Drive. Therefore, do not run the below cell.

In [None]:
#Import google modules:
from google.colab import drive
drive.mount('/content/drive')

##RMSD / RMSF Data or Line Plot
You are allowed to increase or decrease the inline options as per your needs. The inline options are for three files, if you have more than three, you need to increase otherwise disable one line if two.

In [10]:
import numpy as np
import pandas as pd
from plotnine import *

%matplotlib inline
rmsf1 = pd.read_csv('/content/sample_data/P-rmsd_ca.csv').dropna()
rmsf2 = pd.read_csv('/content/sample_data/d1-rmsd_ca.csv').dropna()
rmsf3 = pd.read_csv('/content/sample_data/d2-rmsd_ca.csv').dropna()

In [None]:
# Add a 'Group' column to each DataFrame to differentiate them
rmsf1['Receptor'] = 'NDM1-Apo'
rmsf2['Receptor'] = 'NDM1-ABTS'
rmsf3['Receptor'] = 'NDM1-CBM'

In [None]:
# Define custom colors for each receptor
custom_colors = {'NDM1-Apo': 'Green', 'NDM1-ABTS': 'Blue', 'NDM1-CBM': 'Red'}

In [None]:
# Concatenate the three DataFrames into one
combined_data = pd.concat([rmsd1, rmsd2, rmsd3])

In [None]:
print(combined_data.columns)

To plot your data, please run the below cell. Remember you need to edit the X and Y label for your plot.

In [16]:
#Generate horitontal line graph for RMSD data
p = ggplot(combined_data, aes(x='Time (ns)', y='RMSD (A)', color='Receptor')) + geom_line(size=1.5) + scale_color_manual(values=custom_colors) + theme_minimal() + theme(figure_size=(6, 3), panel_background=element_rect(fill='white'), panel_grid_major=element_blank(), panel_grid_minor=element_blank(), axis_line=element_line(color='black'), panel_border=element_blank()) + xlim(0, combined_data['Time (ns)'].max() + 10)

In [None]:
display(p)

In [18]:
#Save your plot
p.save(filename='name.png', height=3, width=6, units='in', dpi=600)

Your Plot is already generated. Do not read the below cell. I have tweak some options there

In [None]:
#To generate single RMSF plot, use this section
p = ggplot(combined_data, aes(x='Residues', y='RMSF (A)')) + geom_line(size=1.5, color='Receptor') + theme_minimal() + theme(figure_size=(6, 3), panel_background=element_rect(fill='white'), panel_grid_major=element_blank(), panel_grid_minor=element_blank(), axis_line=element_line(color='black'), panel_border=element_blank()) + xlim(0, combined_data['Residues'].max() + 10)

In [None]:
 #To generate byplot from RMSF data
 p = ggplot(combined_data, aes(x='Residues', y='RMSF (A)', color='Receptor')) + geom_line(size=1.5) + theme_minimal() + theme(figure_size=(6, 3), panel_background=element_rect(fill='white'), panel_grid_major=element_blank(), panel_grid_minor=element_blank(), axis_line=element_line(color='black'), panel_border=element_blank()) + xlim(0, combined_data['Residues'].max() + 10)

In [None]:
display(p)


In [None]:
p.save(filename='RMSF.png', height=3, width=6, units='in', dpi=600)

##**2. Now we will plot Histogram Data**
In this case I will use the Radius of Gyration file from the simulations.

In case you only want to use this section, just a reminder please run the installation Cell (the first one), and load your data.

In [31]:
import numpy as np
import pandas as pd
from plotnine import *

%matplotlib inline
hist1 = pd.read_csv('/content/sample_data/P-radius_gyration.csv').dropna()
hist2 = pd.read_csv('/content/sample_data/d1-radius_gyration.csv').dropna()
hist3 = pd.read_csv('/content/sample_data/d2-radius_gyration_ca.csv').dropna()

In [32]:
# Add a 'Group' column to each DataFrame to differentiate them
hist1['Receptor'] = 'NDM1-Apo'
hist2['Receptor'] = 'NDM1-ABTS'
hist3['Receptor'] = 'NDM1-CBM'

In [33]:
# Define custom colors for each receptor
custom_colors = {'NDM1-Apo': 'Green', 'NDM1-ABTS': 'Blue', 'NDM1-CBM': Red'}

In [34]:
# Concatenate the three DataFrames into one
combined_data = pd.concat([hist1, hist2])

In [35]:
#To Generate density plots froom the RoG or any other data.
p = ggplot(combined_data, aes(x='RoG', fill='Receptor')) + geom_density(alpha=0.7, bw=0.7) + scale_fill_manual(values=custom_colors) + theme_minimal() + theme(figure_size=(6, 3), panel_background=element_rect(fill='white'), panel_grid_major=element_blank(), panel_grid_minor=element_blank(), axis_line=element_line(color='black'), panel_border=element_blank(), axis_text=element_text(color='black'), axis_title=element_text(color='black')) + xlim(18, combined_data['RoG'].max() + 2)

In [None]:
display(p)

In [37]:
p.save(filename = 'RoG.png', height=3, width=8, units = 'in', dpi=600)

**Congratulation!**

You have plot your data and save it with high resolution (600 dpi). You can download your data from the file menu located on the left side.