This notebook downloads and preprocesses the TCGA HGSOC dataset, extracts copy number and expression data for RNF144B and PPP2R2A, and integrates clinical annotations for survival analysis.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load TCGA dataset (assuming local file availability)
cn_data = pd.read_csv('tcga_hgsoc_copy_number.csv')
expr_data = pd.read_csv('tcga_hgsoc_expression.csv')
clinical = pd.read_csv('tcga_hgsoc_clinical.csv')

# Extract relevant gene data
rnf144b_cn = cn_data[cn_data['Gene'] == 'RNF144B']
ppp2r2a_cn = cn_data[cn_data['Gene'] == 'PPP2R2A']

# Merge with expression data
rnf144b_expr = expr_data[expr_data['Gene'] == 'RNF144B']
ppp2r2a_expr = expr_data[expr_data['Gene'] == 'PPP2R2A']

# Simple plot: copy number vs expression
plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
sns.scatterplot(x=rnf144b_cn['CopyNumber'], y=rnf144b_expr['Expression'])
plt.title('RNF144B Copy Number vs Expression')
plt.subplot(1,2,2)
sns.scatterplot(x=ppp2r2a_cn['CopyNumber'], y=ppp2r2a_expr['Expression'])
plt.title('PPP2R2A Copy Number vs Expression')
plt.tight_layout()
plt.show()

Next, we perform a survival analysis correlating gene expression levels with patient outcomes.

In [None]:
import lifelines
from lifelines import KaplanMeierFitter

# Merge expression data with clinical data
merged = pd.merge(clinical, rnf144b_expr, on='PatientID')

# Divide patients based on median expression
median_expr = merged['Expression'].median()
merged['Group'] = np.where(merged['Expression'] >= median_expr, 'High', 'Low')

# Kaplan-Meier analysis
kmf = KaplanMeierFitter()

plt.figure(figsize=(8,6))
for name, grouped_df in merged.groupby('Group'):
    kmf.fit(grouped_df['SurvivalTime'], event_observed=grouped_df['Event'], label=name)
    kmf.plot_survival_function()

plt.title('RNF144B Expression and Survival in HGSOC')
plt.xlabel('Time (days)')
plt.ylabel('Survival Probability')
plt.show()

The code above demonstrates a basic integration of copy number and expression data with clinical outcomes, which is critical to validate the roles of RNF144B and PPP2R2A in ovarian cancer.





***
### [**Evolve This Code**](https://biologpt.com/?q=Evolve%20Code%3A%20The%20code%20loads%20TCGA%20HGSOC%20datasets%2C%20performs%20differential%20copy%20number%20analysis%20and%20expression%20correlation%20for%20RNF144B%20and%20PPP2R2A%2C%20and%20visualizes%20their%20association%20with%20clinical%20outcomes.%0A%0AInclude%20more%20refined%20statistical%20models%2C%20expand%20to%20multi-gene%20network%20interactions%2C%20and%20incorporate%20external%20validation%20cohorts.%0A%0ACharacterization%20of%20RNF144B%20PPP2R2A%20TCGA%20ovarian%20cancer%202025%0A%0AThis%20notebook%20downloads%20and%20preprocesses%20the%20TCGA%20HGSOC%20dataset%2C%20extracts%20copy%20number%20and%20expression%20data%20for%20RNF144B%20and%20PPP2R2A%2C%20and%20integrates%20clinical%20annotations%20for%20survival%20analysis.%0A%0Aimport%20pandas%20as%20pd%0Aimport%20numpy%20as%20np%0Aimport%20matplotlib.pyplot%20as%20plt%0Aimport%20seaborn%20as%20sns%0A%23%20Load%20TCGA%20dataset%20%28assuming%20local%20file%20availability%29%0Acn_data%20%3D%20pd.read_csv%28%27tcga_hgsoc_copy_number.csv%27%29%0Aexpr_data%20%3D%20pd.read_csv%28%27tcga_hgsoc_expression.csv%27%29%0Aclinical%20%3D%20pd.read_csv%28%27tcga_hgsoc_clinical.csv%27%29%0A%0A%23%20Extract%20relevant%20gene%20data%0Arnf144b_cn%20%3D%20cn_data%5Bcn_data%5B%27Gene%27%5D%20%3D%3D%20%27RNF144B%27%5D%0Appp2r2a_cn%20%3D%20cn_data%5Bcn_data%5B%27Gene%27%5D%20%3D%3D%20%27PPP2R2A%27%5D%0A%0A%23%20Merge%20with%20expression%20data%0Arnf144b_expr%20%3D%20expr_data%5Bexpr_data%5B%27Gene%27%5D%20%3D%3D%20%27RNF144B%27%5D%0Appp2r2a_expr%20%3D%20expr_data%5Bexpr_data%5B%27Gene%27%5D%20%3D%3D%20%27PPP2R2A%27%5D%0A%0A%23%20Simple%20plot%3A%20copy%20number%20vs%20expression%0Aplt.figure%28figsize%3D%2810%2C4%29%29%0Aplt.subplot%281%2C2%2C1%29%0Asns.scatterplot%28x%3Drnf144b_cn%5B%27CopyNumber%27%5D%2C%20y%3Drnf144b_expr%5B%27Expression%27%5D%29%0Aplt.title%28%27RNF144B%20Copy%20Number%20vs%20Expression%27%29%0Aplt.subplot%281%2C2%2C2%29%0Asns.scatterplot%28x%3Dppp2r2a_cn%5B%27CopyNumber%27%5D%2C%20y%3Dppp2r2a_expr%5B%27Expression%27%5D%29%0Aplt.title%28%27PPP2R2A%20Copy%20Number%20vs%20Expression%27%29%0Aplt.tight_layout%28%29%0Aplt.show%28%29%0A%0ANext%2C%20we%20perform%20a%20survival%20analysis%20correlating%20gene%20expression%20levels%20with%20patient%20outcomes.%0A%0Aimport%20lifelines%0Afrom%20lifelines%20import%20KaplanMeierFitter%0A%0A%23%20Merge%20expression%20data%20with%20clinical%20data%0Amerged%20%3D%20pd.merge%28clinical%2C%20rnf144b_expr%2C%20on%3D%27PatientID%27%29%0A%0A%23%20Divide%20patients%20based%20on%20median%20expression%0Amedian_expr%20%3D%20merged%5B%27Expression%27%5D.median%28%29%0Amerged%5B%27Group%27%5D%20%3D%20np.where%28merged%5B%27Expression%27%5D%20%3E%3D%20median_expr%2C%20%27High%27%2C%20%27Low%27%29%0A%0A%23%20Kaplan-Meier%20analysis%0Akmf%20%3D%20KaplanMeierFitter%28%29%0A%0Aplt.figure%28figsize%3D%288%2C6%29%29%0Afor%20name%2C%20grouped_df%20in%20merged.groupby%28%27Group%27%29%3A%0A%20%20%20%20kmf.fit%28grouped_df%5B%27SurvivalTime%27%5D%2C%20event_observed%3Dgrouped_df%5B%27Event%27%5D%2C%20label%3Dname%29%0A%20%20%20%20kmf.plot_survival_function%28%29%0A%0Aplt.title%28%27RNF144B%20Expression%20and%20Survival%20in%20HGSOC%27%29%0Aplt.xlabel%28%27Time%20%28days%29%27%29%0Aplt.ylabel%28%27Survival%20Probability%27%29%0Aplt.show%28%29%0A%0AThe%20code%20above%20demonstrates%20a%20basic%20integration%20of%20copy%20number%20and%20expression%20data%20with%20clinical%20outcomes%2C%20which%20is%20critical%20to%20validate%20the%20roles%20of%20RNF144B%20and%20PPP2R2A%20in%20ovarian%20cancer.%0A%0A)
***

### [Created with BioloGPT](https://biologpt.com/?q=Paper%20Review%3A%20Characterization%20of%20RNF144B%20and%20PPP2R2A%20identified%20by%20a%20novel%20approach%20using%20TCGA%20data%20in%20ovarian%20cancer%20%5B2025%5D)
[![BioloGPT Logo](https://biologpt.com/static/icons/bioinformatics_wizard.png)](https://biologpt.com/)
***