Below is a step-by-step Jupyter Notebook code to load a simulated genotype dataset (provided in the study), perform PCA, apply NNLS, and compare the estimated ancestry proportions with true simulated values.

In [None]:
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from scipy.optimize import nnls
# Load simulated genotype data
# Note: Replace 'simulated_data.csv' with actual dataset path from the repository
sim_data = pd.read_csv('simulated_data.csv', index_col=0)
# Perform PCA
pca = PCA(n_components=5) 
pc_scores = pca.fit_transform(sim_data.values)
# Assume proxy sources available in a dataframe proxy_df with matching PC scores
# For demonstration, simulate proxy sources
proxy_df = pd.DataFrame(np.random.rand(10,5), columns=[f'PC{i+1}' for i in range(5)])
# For each admixed sample, solve NNLS to estimate contributions
results = []
for score in pc_scores:
    coef, _ = nnls(proxy_df.values.T, score)
    results.append(coef / coef.sum())

results = pd.DataFrame(results, columns=[f'Source{i+1}' for i in range(proxy_df.shape[0])])
print(results.head())
# Compare with true ancestry proportions if available

The above code performs PCA on simulated genotype data and estimates ancestry proportions via NNLS. This pipeline can be extended to benchmark PANE for accuracy and speed using both simulated and real datasets from repositories such as Zenodo.

In [None]:
# Further analysis: calculating estimation error using true proportions stored in true_props.csv
true_props = pd.read_csv('true_props.csv', index_col=0)
error = np.abs(results - true_props).mean()
print('Average estimation error:', error)

The code concludes by comparing the NNLS estimated ancestry proportions with the true simulated values, allowing for a detailed benchmarking of the method.

In [None]:
# Save benchmark results for future reference
results.to_csv('PANE_estimated_proportions.csv')
print('Benchmark results saved.')





***
### [**Evolve This Code**](https://biologpt.com/?q=Evolve%20Code%3A%20This%20code%20downloads%20and%20processes%20simulated%20admixed%20genotype%20datasets%20to%20benchmark%20PANE%20against%20standard%20methods.%0A%0AIncorporate%20real%20simulated%20datasets%20from%20the%20PANE%20GitHub%20repository%20and%20add%20visualization%20for%20comparing%20PC%20spaces%20and%20error%20distributions.%0A%0APANE%20ancestral%20reconstruction%20ancient%20genotype%20non-negative%20least%20squares%20PCA%0A%0ABelow%20is%20a%20step-by-step%20Jupyter%20Notebook%20code%20to%20load%20a%20simulated%20genotype%20dataset%20%28provided%20in%20the%20study%29%2C%20perform%20PCA%2C%20apply%20NNLS%2C%20and%20compare%20the%20estimated%20ancestry%20proportions%20with%20true%20simulated%20values.%0A%0Aimport%20numpy%20as%20np%0Aimport%20pandas%20as%20pd%0Afrom%20sklearn.decomposition%20import%20PCA%0Afrom%20scipy.optimize%20import%20nnls%0A%23%20Load%20simulated%20genotype%20data%0A%23%20Note%3A%20Replace%20%27simulated_data.csv%27%20with%20actual%20dataset%20path%20from%20the%20repository%0Asim_data%20%3D%20pd.read_csv%28%27simulated_data.csv%27%2C%20index_col%3D0%29%0A%23%20Perform%20PCA%0Apca%20%3D%20PCA%28n_components%3D5%29%20%0Apc_scores%20%3D%20pca.fit_transform%28sim_data.values%29%0A%23%20Assume%20proxy%20sources%20available%20in%20a%20dataframe%20proxy_df%20with%20matching%20PC%20scores%0A%23%20For%20demonstration%2C%20simulate%20proxy%20sources%0Aproxy_df%20%3D%20pd.DataFrame%28np.random.rand%2810%2C5%29%2C%20columns%3D%5Bf%27PC%7Bi%2B1%7D%27%20for%20i%20in%20range%285%29%5D%29%0A%23%20For%20each%20admixed%20sample%2C%20solve%20NNLS%20to%20estimate%20contributions%0Aresults%20%3D%20%5B%5D%0Afor%20score%20in%20pc_scores%3A%0A%20%20%20%20coef%2C%20_%20%3D%20nnls%28proxy_df.values.T%2C%20score%29%0A%20%20%20%20results.append%28coef%20%2F%20coef.sum%28%29%29%0A%0Aresults%20%3D%20pd.DataFrame%28results%2C%20columns%3D%5Bf%27Source%7Bi%2B1%7D%27%20for%20i%20in%20range%28proxy_df.shape%5B0%5D%29%5D%29%0Aprint%28results.head%28%29%29%0A%23%20Compare%20with%20true%20ancestry%20proportions%20if%20available%0A%0AThe%20above%20code%20performs%20PCA%20on%20simulated%20genotype%20data%20and%20estimates%20ancestry%20proportions%20via%20NNLS.%20This%20pipeline%20can%20be%20extended%20to%20benchmark%20PANE%20for%20accuracy%20and%20speed%20using%20both%20simulated%20and%20real%20datasets%20from%20repositories%20such%20as%20Zenodo.%0A%0A%23%20Further%20analysis%3A%20calculating%20estimation%20error%20using%20true%20proportions%20stored%20in%20true_props.csv%0Atrue_props%20%3D%20pd.read_csv%28%27true_props.csv%27%2C%20index_col%3D0%29%0Aerror%20%3D%20np.abs%28results%20-%20true_props%29.mean%28%29%0Aprint%28%27Average%20estimation%20error%3A%27%2C%20error%29%0A%0AThe%20code%20concludes%20by%20comparing%20the%20NNLS%20estimated%20ancestry%20proportions%20with%20the%20true%20simulated%20values%2C%20allowing%20for%20a%20detailed%20benchmarking%20of%20the%20method.%0A%0A%23%20Save%20benchmark%20results%20for%20future%20reference%0Aresults.to_csv%28%27PANE_estimated_proportions.csv%27%29%0Aprint%28%27Benchmark%20results%20saved.%27%29%0A%0A)
***

### [Created with BioloGPT](https://biologpt.com/?q=Paper%20Review%3A%20PANE%3A%20fast%20and%20reliable%20ancestral%20reconstruction%20on%20ancient%20genotype%20data%20with%20non-negative%20least%20square%20and%20principal%20component%20analysis)
[![BioloGPT Logo](https://biologpt.com/static/icons/bioinformatics_wizard.png)](https://biologpt.com/)
***