# TCGA Cancer Types Case Analysis

This notebook analyzes data from the National Cancer Institute's *The Cancer Genome Atlas* (TCGA) program.  
TCGA molecularly characterized over 20,000 primary cancer and matched normal samples across 33 cancer types【651701134984701†L32-L41】.  
The table below lists cancer types studied by TCGA with the number of cases characterized【333371338791367†L67-L130】.  
We'll explore which cancer types have the most cases and visualize the distribution of cases.


In [None]:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline


In [None]:

# Create DataFrame
cancer_df = pd.DataFrame(data)
# Sort by descending cases
cancer_df_sorted = cancer_df.sort_values('Cases', ascending=False)
cancer_df_sorted.head()


In [None]:

# Visualize top 10 cancer types by number of cases
plt.figure(figsize=(12,6))
sns.barplot(x='Cases', y='Cancer_Type', data=cancer_df_sorted.head(10), palette='viridis')
plt.title('Top 10 TCGA Cancer Types by Number of Cases')
plt.xlabel('Number of Cases')
plt.ylabel('Cancer Type')
plt.tight_layout()
plt.show()


In [None]:

# Pie chart of all cancer types case distribution
plt.figure(figsize=(10,10))
plt.pie(cancer_df['Cases'], labels=cancer_df['Cancer_Type'], autopct='%1.1f%%', startangle=140)
plt.title('Distribution of Cases Across TCGA Cancer Types')
plt.tight_layout()
plt.show()



## Insights

- **Breast Ductal Carcinoma** has the highest number of characterized cases (778), followed by **Colorectal Adenocarcinoma**, **Glioblastoma Multiforme**, **Ovarian Serous Adenocarcinoma**, and **Lung Adenocarcinoma**.  
- **Cholangiocarcinoma** and **Uterine Carcinosarcoma** have the fewest cases, reflecting the rarity of these cancers.  
- The distribution chart shows that a handful of cancer types account for a large share of the total cases, while many types have relatively small sample sizes.  

These case numbers can guide researchers in choosing well-powered datasets for machine‑learning models and highlight underrepresented cancers where more data collection is needed.
