This section downloads the LC-MS dataset from the provided GitHub repository, processes the data, and generates PCA plots and clustering heatmaps using pandas, scikit-learn, and seaborn. It demonstrates reproducibility and visualization of key metabolomic differences.

In [None]:
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import seaborn as sns
import matplotlib.pyplot as plt

# Load dataset from GitHub repository
url = 'https://github.com/alrichardbollans/coffeemetabolomics/raw/main/dataset.csv'
data = pd.read_csv(url)

# Preprocess data
features = data.drop(['SampleID', 'Species'], axis=1)
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

# PCA analysis
pca = PCA(n_components=2)
principal_components = pca.fit_transform(scaled_features)
df_pca = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
df_pca['Species'] = data['Species']

# Plot PCA
plt.figure(figsize=(8,6))
sns.scatterplot(x='PC1', y='PC2', hue='Species', data=df_pca, palette='deep')
plt.title('PCA of Coffee Metabolomic Profiles')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()

The above cell performs PCA to differentiate coffee species based on metabolomic data. Further cells could include hierarchical clustering heatmaps to elucidate compound correlations.

In [None]:
import scipy.cluster.hierarchy as sch

# Hierarchical clustering
plt.figure(figsize=(10, 7))
dendrogram = sch.dendrogram(sch.linkage(scaled_features, method='complete'), labels=data['Species'].values)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Sample')
plt.ylabel('Euclidean distances')
plt.show()

This dendrogram illustrates the clustering of coffee samples based on their metabolomic profiles, reinforcing the study's findings on interspecific differences.





***
### [**Evolve This Code**](https://biologpt.com/?q=Evolve%20Code%3A%20Download%20and%20process%20LC-MS%20data%20from%20the%20study%20to%20reproduce%20PCA%20and%20clustering%20heatmaps%20for%20comparative%20metabolomic%20analysis.%0A%0AAdditional%20metadata%20on%20sample%20conditions%20and%20sensory%20data%20would%20enhance%20interpretation%20of%20the%20modeled%20clustering%20and%20PCA%20results.%0A%0AMetabolomics%20Arabica%20stenophylla%20coffee%20flavour%20chemistry%20quality%0A%0AThis%20section%20downloads%20the%20LC-MS%20dataset%20from%20the%20provided%20GitHub%20repository%2C%20processes%20the%20data%2C%20and%20generates%20PCA%20plots%20and%20clustering%20heatmaps%20using%20pandas%2C%20scikit-learn%2C%20and%20seaborn.%20It%20demonstrates%20reproducibility%20and%20visualization%20of%20key%20metabolomic%20differences.%0A%0Aimport%20pandas%20as%20pd%0Aimport%20numpy%20as%20np%0Afrom%20sklearn.decomposition%20import%20PCA%0Afrom%20sklearn.preprocessing%20import%20StandardScaler%0Aimport%20seaborn%20as%20sns%0Aimport%20matplotlib.pyplot%20as%20plt%0A%0A%23%20Load%20dataset%20from%20GitHub%20repository%0Aurl%20%3D%20%27https%3A%2F%2Fgithub.com%2Falrichardbollans%2Fcoffeemetabolomics%2Fraw%2Fmain%2Fdataset.csv%27%0Adata%20%3D%20pd.read_csv%28url%29%0A%0A%23%20Preprocess%20data%0Afeatures%20%3D%20data.drop%28%5B%27SampleID%27%2C%20%27Species%27%5D%2C%20axis%3D1%29%0Ascaler%20%3D%20StandardScaler%28%29%0Ascaled_features%20%3D%20scaler.fit_transform%28features%29%0A%0A%23%20PCA%20analysis%0Apca%20%3D%20PCA%28n_components%3D2%29%0Aprincipal_components%20%3D%20pca.fit_transform%28scaled_features%29%0Adf_pca%20%3D%20pd.DataFrame%28data%3Dprincipal_components%2C%20columns%3D%5B%27PC1%27%2C%20%27PC2%27%5D%29%0Adf_pca%5B%27Species%27%5D%20%3D%20data%5B%27Species%27%5D%0A%0A%23%20Plot%20PCA%0Aplt.figure%28figsize%3D%288%2C6%29%29%0Asns.scatterplot%28x%3D%27PC1%27%2C%20y%3D%27PC2%27%2C%20hue%3D%27Species%27%2C%20data%3Ddf_pca%2C%20palette%3D%27deep%27%29%0Aplt.title%28%27PCA%20of%20Coffee%20Metabolomic%20Profiles%27%29%0Aplt.xlabel%28%27Principal%20Component%201%27%29%0Aplt.ylabel%28%27Principal%20Component%202%27%29%0Aplt.show%28%29%0A%0AThe%20above%20cell%20performs%20PCA%20to%20differentiate%20coffee%20species%20based%20on%20metabolomic%20data.%20Further%20cells%20could%20include%20hierarchical%20clustering%20heatmaps%20to%20elucidate%20compound%20correlations.%0A%0Aimport%20scipy.cluster.hierarchy%20as%20sch%0A%0A%23%20Hierarchical%20clustering%0Aplt.figure%28figsize%3D%2810%2C%207%29%29%0Adendrogram%20%3D%20sch.dendrogram%28sch.linkage%28scaled_features%2C%20method%3D%27complete%27%29%2C%20labels%3Ddata%5B%27Species%27%5D.values%29%0Aplt.title%28%27Hierarchical%20Clustering%20Dendrogram%27%29%0Aplt.xlabel%28%27Sample%27%29%0Aplt.ylabel%28%27Euclidean%20distances%27%29%0Aplt.show%28%29%0A%0AThis%20dendrogram%20illustrates%20the%20clustering%20of%20coffee%20samples%20based%20on%20their%20metabolomic%20profiles%2C%20reinforcing%20the%20study%27s%20findings%20on%20interspecific%20differences.%0A%0A)
***

### [Created with BioloGPT](https://biologpt.com/?q=Paper%20Review%3A%20Metabolomic%20insights%20into%20the%20Arabica-like%20flavour%20of%20stenophylla%20coffee%20and%20the%20chemistry%20of%20quality%20coffee)
[![BioloGPT Logo](https://biologpt.com/static/icons/bioinformatics_wizard.png)](https://biologpt.com/)
***