<a href="https://colab.research.google.com/github/MWPlabUTSW/Chi-Score-Analysis/blob/main/ChiScore_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Preliminary Information:**
This Colab notebook enables easy use of the Chi-Score Analysis to parse a protein sequence into regions of distinct amino acid composition.

Running the analysis requires the user only to input the protein sequence to analyze. There are however, a few parameters that can be adjusted as desired, such as which window sizes to generate the matrices and the desired confidence level, however the recommended values for these are loaded by default.

Please note that the following cells can be executed one by one, however this notebook was written with the intent for all cells to be executed together. For a step-by-step breakdown of the analysis, please see our GitHub repository at https://github.com/MWPlabUTSW/Chi-Score-Analysis.git

In [None]:
from ipywidgets import widgets

#@title <b><font color='#00AAFF'>1 - Input Protein Sequence</font></b>

#@markdown Name of the sequence to be analyzed:
NAME = "CeOrc1-IDR" #@param {type:"string"}

#@markdown Input sequence as string of capitalized amino acid codes:
SEQUENCE = "MNTRKSETSKTVSATPVKRRSTRITNLPKSAPKIVKRSSVRLRGAPQCTYKSDSSSSSSSSDSDGEDEYAATKDELKAVDHDNQMEIDFSDEIGENFSEEDSCSDKENRRVTRSRTPTRLEETPSKRLARELSKASVSKVSTSKTLFKESKSPRKVEISRKTNKARVFQEEDDDDEDDFSDEIDEKFYSKTNKRTPITIKIPSKMITQKVTPLVISKTPGGTLRTRRRARQNSEELEDLVDPLDS" #@param {type:"string"}

# Analysis parameters:
Window_Sizes = [6, 8, 10, 12, 14, 16, 18, 20, 22]
Min_ZScore = 1.96

#@markdown Select "YES" to download output matrix upon completion:
DOWNLOAD_MATRIX   = " NO"       #@param [" NO", " YES"]


In [None]:
#@title <b><font color='#00AAFF'>2 - Setting Environment</b>
!pip install wget
import wget
wget.download('https://raw.githubusercontent.com/MWPlabUTSW/Chi-Score-Analysis/main/chi_score_analysis.py')

import chi_score_analysis as xid
import matplotlib.pyplot as plt
import numpy as np
from google.colab import files
import time


In [None]:
#@title <b><font color='#00AAFF'>3 - Run Analysis</b>

results = xid.analyze_sequence(SEQUENCE, Window_Sizes)

c = 0
solution = [[], []]
while c < len(results):
  if np.min(results[c][1]) < Min_ZScore:
    c += 1
  else:
    solution = results[c]
    break

for _ in range(5):
  print('.....')

modules = xid.get_modules(SEQUENCE, solution[0])
print(f'Analysis Complete: {len(modules)} modules found in {NAME}:')
x = 1
for module in modules:
  print(f'{x}) {module}')
  x += 1

for _ in range(5):
  print('.....')

print(f'Boundary positions and z-scores are...')
if solution == [[], []]:
  print('No modules found in input sequence for specified confidence level.')
else:
  print(solution)

In [None]:
#@title <b><font color='#00AAFF'>4 - Plot Solution</b>

OUTFILE = f'{NAME}_{time.time()}.svg'

WINDOW = 12

xid.plot_solution(SEQUENCE, xid.get_corr_scores(xid.get_heatmap_scores(SEQUENCE, WINDOW)), solution[0], WINDOW, NAME)

if DOWNLOAD_MATRIX == " YES":
  plt.savefig(OUTFILE, bbox_inches = 'tight')
  files.download(OUTFILE)