# Protein Structure Homology and Evolutionary Analysis with Foldseek

Welcome! In this notebook, you will explore protein structure homology and evolutionary relationships using the Foldseek API. You can use your own protein structure or select one from the AlphaFold Database, identify homologous structures, and analyze their relationships.

## Objectives
- **Select a protein structure:** Use your own data or choose a structure from the AlphaFold DB.
- **Find homologs:** Query the Foldseek API to identify structurally similar proteins.
- **Download structures:** Retrieve the 3D structures of homologous proteins.
- **Compare structures:** Perform structural alignments and analyze similarities.
- **Build a phylogeny:** Derive a phylogenetic tree based on structural relationships.

## Workflow Overview
1. Choose a protein structure
2. Search for homologs with Foldseek
3. Download homologous structures
4. Structural comparison
5. Phylogenetic analysis

---

Let's get started!

In [None]:

#where is your structure?
path_to_your_structure = "####change this####"

In [None]:
import requests
import pandas as pd

def query_foldseek_api(pdb_file_path):
	"""
	Query the Foldseek API with a custom PDB file and return results as a pandas DataFrame.
	"""
	url = "https://search.foldseek.com/api/search"
	with open(pdb_file_path, 'rb') as pdb_file:
		files = {'query': pdb_file}
		data = {'db': 'alphafold'}
		response = requests.post(url, files=files, data=data)
	response.raise_for_status()
	results = response.json()
	hits = results.get('results', [])
	if not hits:
		return pd.DataFrame()
	df = pd.DataFrame(hits)
	return df



In [None]:
#using the unprot API let's get the metadata for these proteins

In [None]:
#we can visualize the results. What Evalues, lengths, and identities do we have?
#lets plot the results and decide on some meaningful cutoffs

In [None]:
#once we have a reasonable number of structures, we can download them


In [None]:
#at this point we can inspect some of the structures
#do we have any interesting domain architectures? Unexpected folds?

In [None]:
#we can use the taxonomic tree to see where this protein is found


In [None]:
#