This notebook section downloads and preprocesses provided TCGA, CCLE, and 10X Visium datasets. It prepares data matrices for miRNA and mRNA expressions and constructs training and testing splits.

In [None]:
import pandas as pd
import anndata
import xgboost as xgb
from sklearn.metrics import mean_squared_error, r2_score

# Load preprocessed bulk RNA-seq and spatial transcriptomics data
bulk_data = pd.read_csv('X_ranked_tcga_ccle_mrna.csv', index_col=0)
mirna_data = pd.read_csv('X_ranked_tcga_ccle_mirna.csv', index_col=0)
spatial_data = pd.read_csv('X_st_mrna.csv', index_col=0)

# Prepare feature matrix and target variable
X = bulk_data.loc[bulk_data.index.intersection(spatial_data.index)]
y = mirna_data.loc[bulk_data.index.intersection(spatial_data.index)]

# Train XGBoost model
model = xgb.XGBRegressor(max_depth=4, n_estimators=100, learning_rate=0.1)
model.fit(X, y)

# Predict and evaluate
predictions = model.predict(X)
mse = mean_squared_error(y, predictions)
r2 = r2_score(y, predictions)
print('MSE:', mse, 'R2:', r2)


This section analyzes the correlation between observed and predicted miRNA expression, and plots the distribution of Spearman correlation coefficients.

In [None]:
import scipy.stats as stats
import matplotlib.pyplot as plt

# Calculate Spearman correlation for each miRNA feature
correlations = []
for col in y.columns:
    corr, _ = stats.spearmanr(y[col], predictions[:, y.columns.get_loc(col)])
    correlations.append(corr)

plt.figure(figsize=(8,4))
plt.hist(correlations, bins=20, color='#6A0C76', edgecolor='black')
plt.title('Distribution of Spearman Correlations for miRNA Predictions')
plt.xlabel('Spearman Correlation')
plt.ylabel('Frequency')
plt.show()


This completes the modular pipeline testing STmiR's predictive performance with XGBoost using integrated datasets.

In [None]:
# Final evaluation and saving results
results_df = pd.DataFrame({'miRNA': y.columns, 'Spearman_Correlation': correlations})
results_df.to_csv('STmiR_evaluation_results.csv', index=False)
print('Evaluation results saved.')






***
### [**Evolve This Code**](https://biologpt.com/?q=Evolve%20Code%3A%20This%20code%20integrates%20bulk%20RNA-seq%20and%20spatial%20transcriptomics%20datasets%20and%20applies%20XGBoost%20to%20predict%20miRNA%20activity%2C%20providing%20step-by-step%20modular%20analysis%20using%20real%20data.%0A%0AInclude%20more%20direct%20single-cell%20miRNA%20validation%20datasets%20and%20integrate%20cross-validation%20techniques%20for%20enhanced%20model%20robustness.%0A%0AXGBoost%20framework%20miRNA%20activity%20prediction%20review%0A%0AThis%20notebook%20section%20downloads%20and%20preprocesses%20provided%20TCGA%2C%20CCLE%2C%20and%2010X%20Visium%20datasets.%20It%20prepares%20data%20matrices%20for%20miRNA%20and%20mRNA%20expressions%20and%20constructs%20training%20and%20testing%20splits.%0A%0Aimport%20pandas%20as%20pd%0Aimport%20anndata%0Aimport%20xgboost%20as%20xgb%0Afrom%20sklearn.metrics%20import%20mean_squared_error%2C%20r2_score%0A%0A%23%20Load%20preprocessed%20bulk%20RNA-seq%20and%20spatial%20transcriptomics%20data%0Abulk_data%20%3D%20pd.read_csv%28%27X_ranked_tcga_ccle_mrna.csv%27%2C%20index_col%3D0%29%0Amirna_data%20%3D%20pd.read_csv%28%27X_ranked_tcga_ccle_mirna.csv%27%2C%20index_col%3D0%29%0Aspatial_data%20%3D%20pd.read_csv%28%27X_st_mrna.csv%27%2C%20index_col%3D0%29%0A%0A%23%20Prepare%20feature%20matrix%20and%20target%20variable%0AX%20%3D%20bulk_data.loc%5Bbulk_data.index.intersection%28spatial_data.index%29%5D%0Ay%20%3D%20mirna_data.loc%5Bbulk_data.index.intersection%28spatial_data.index%29%5D%0A%0A%23%20Train%20XGBoost%20model%0Amodel%20%3D%20xgb.XGBRegressor%28max_depth%3D4%2C%20n_estimators%3D100%2C%20learning_rate%3D0.1%29%0Amodel.fit%28X%2C%20y%29%0A%0A%23%20Predict%20and%20evaluate%0Apredictions%20%3D%20model.predict%28X%29%0Amse%20%3D%20mean_squared_error%28y%2C%20predictions%29%0Ar2%20%3D%20r2_score%28y%2C%20predictions%29%0Aprint%28%27MSE%3A%27%2C%20mse%2C%20%27R2%3A%27%2C%20r2%29%0A%0A%0AThis%20section%20analyzes%20the%20correlation%20between%20observed%20and%20predicted%20miRNA%20expression%2C%20and%20plots%20the%20distribution%20of%20Spearman%20correlation%20coefficients.%0A%0Aimport%20scipy.stats%20as%20stats%0Aimport%20matplotlib.pyplot%20as%20plt%0A%0A%23%20Calculate%20Spearman%20correlation%20for%20each%20miRNA%20feature%0Acorrelations%20%3D%20%5B%5D%0Afor%20col%20in%20y.columns%3A%0A%20%20%20%20corr%2C%20_%20%3D%20stats.spearmanr%28y%5Bcol%5D%2C%20predictions%5B%3A%2C%20y.columns.get_loc%28col%29%5D%29%0A%20%20%20%20correlations.append%28corr%29%0A%0Aplt.figure%28figsize%3D%288%2C4%29%29%0Aplt.hist%28correlations%2C%20bins%3D20%2C%20color%3D%27%236A0C76%27%2C%20edgecolor%3D%27black%27%29%0Aplt.title%28%27Distribution%20of%20Spearman%20Correlations%20for%20miRNA%20Predictions%27%29%0Aplt.xlabel%28%27Spearman%20Correlation%27%29%0Aplt.ylabel%28%27Frequency%27%29%0Aplt.show%28%29%0A%0A%0AThis%20completes%20the%20modular%20pipeline%20testing%20STmiR%27s%20predictive%20performance%20with%20XGBoost%20using%20integrated%20datasets.%0A%0A%23%20Final%20evaluation%20and%20saving%20results%0Aresults_df%20%3D%20pd.DataFrame%28%7B%27miRNA%27%3A%20y.columns%2C%20%27Spearman_Correlation%27%3A%20correlations%7D%29%0Aresults_df.to_csv%28%27STmiR_evaluation_results.csv%27%2C%20index%3DFalse%29%0Aprint%28%27Evaluation%20results%20saved.%27%29%0A%0A%0A)
***

### [Created with BioloGPT](https://biologpt.com/?q=Paper%20Review%3A%20STmiR%3A%20A%20Novel%20XGBoost-Based%20Framework%20for%20Spatially%20Resolved%20miRNA%20Activity%20Prediction)
[![BioloGPT Logo](https://biologpt.com/static/icons/bioinformatics_wizard.png)](https://biologpt.com/)
***