# Lipidation Site Analysis Template

This notebook provides a template for analyzing lipidation sites in protein sequences.

## Analysis Overview
- Load and explore sequence data
- Identify potential lipidation motifs
- Visualize results
- Statistical analysis

## Requirements
```
pip install pandas numpy matplotlib seaborn biopython
```

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print("Libraries imported successfully!")

## 1. Load Data

Load the results from the sequence analysis script.

In [None]:
# Load analysis results
# Replace 'results.tsv' with your actual results file
data_file = Path('results.tsv')

if data_file.exists():
    df = pd.read_csv(data_file, sep='\t')
    print(f"Loaded {len(df)} lipidation sites")
    display(df.head())
else:
    print(f"Data file not found: {data_file}")
    print("Please run the sequence_motif_finder.py script first to generate results.")

## 2. Exploratory Data Analysis

In [None]:
# Summary statistics
if 'df' in locals():
    print("Modification type distribution:")
    print(df['Modification_Type'].value_counts())
    print("\nSequence length statistics:")
    print(df.groupby('Sequence_ID')['Length'].first().describe())

## 3. Visualization

In [None]:
# Plot distribution of modification types
if 'df' in locals():
    plt.figure(figsize=(10, 6))
    mod_counts = df['Modification_Type'].value_counts()
    plt.bar(range(len(mod_counts)), mod_counts.values)
    plt.xticks(range(len(mod_counts)), mod_counts.index, rotation=45, ha='right')
    plt.xlabel('Modification Type')
    plt.ylabel('Count')
    plt.title('Distribution of Lipidation Modification Types')
    plt.tight_layout()
    plt.show()

In [None]:
# Plot position distribution of modifications
if 'df' in locals():
    plt.figure(figsize=(12, 6))
    for mod_type in df['Modification_Type'].unique():
        subset = df[df['Modification_Type'] == mod_type]
        plt.hist(subset['Position'], alpha=0.5, label=mod_type, bins=20)
    plt.xlabel('Position in Sequence')
    plt.ylabel('Frequency')
    plt.title('Position Distribution of Lipidation Sites')
    plt.legend()
    plt.tight_layout()
    plt.show()

## 4. Statistical Analysis

Add your statistical tests and comparisons here.

In [None]:
# Example: Calculate the average number of modifications per sequence
if 'df' in locals():
    mods_per_seq = df.groupby('Sequence_ID').size()
    print(f"Average modifications per sequence: {mods_per_seq.mean():.2f}")
    print(f"Median modifications per sequence: {mods_per_seq.median():.0f}")
    print(f"Max modifications in a single sequence: {mods_per_seq.max()}")

## 5. Export Results

Save figures and processed data for publication.

In [None]:
# Example: Save a summary table
if 'df' in locals():
    summary = df.groupby('Modification_Type').agg({
        'Sequence_ID': 'count',
        'Position': ['mean', 'std']
    })
    summary.columns = ['Count', 'Mean_Position', 'Std_Position']
    print(summary)
    
    # Uncomment to save
    # summary.to_csv('modification_summary.csv')
    # print("Summary saved to modification_summary.csv")

## Notes and Conclusions

Add your observations and conclusions here.