# CRISPR-Cas Predictor Analysis Examples

This notebook provides examples of how to analyze protein sequences and predict potential CRISPR-Cas proteins using the CRISPR-Cas Predictor project.

In [1]:
import os
import sys
import pandas as pd
from src.protein_analyzer import ProteinAnalyzer
from src.hmmer_search import HMMERSearch
from src.cas_typing import CASTyping
from src.utils.file_handling import read_fasta

# Set the path to the data directory
data_dir = '../data/example_sequences/'
fasta_file = os.path.join(data_dir, 'test_proteins.fasta')

# Read protein sequences from the FASTA file
sequences = read_fasta(fasta_file)
print(f"Loaded {len(sequences)} sequences from {fasta_file}")

In [2]:
# Analyze the protein sequences
analyzer = ProteinAnalyzer()
filtered_sequences = analyzer.filter_sequences(sequences)
print(f"Filtered down to {len(filtered_sequences)} sequences after analysis.")

In [3]:
# Perform HMMER search on the filtered sequences
hmmer_search = HMMERSearch()
hmmer_results = hmmer_search.search(filtered_sequences)
print(f"Found {len(hmmer_results)} potential CRISPR-Cas proteins.")

In [4]:
# Classify the identified CRISPR-Cas proteins
cas_typing = CASTyping()
classification_results = cas_typing.classify(hmmer_results)
classification_df = pd.DataFrame(classification_results)
classification_df.head()

## Visualization of Results

In this section, we can visualize the classification results using appropriate plots.

In [5]:
# Example visualization (this will depend on the specific libraries used)
# import matplotlib.pyplot as plt
# plt.figure(figsize=(10, 6))
# plt.bar(classification_df['class'], classification_df['count'])
# plt.title('CRISPR-Cas Protein Classification')
# plt.xlabel('Class')
# plt.ylabel('Count')
# plt.show()