# Model with TCGNA data

The purpose of this notebook is to prototype an ML model using pytorch and wandb. I'll based it on TCGNA, specifically from the following [paper](https://www.cell.com/cancer-cell/fulltext/S1535-6108(17)30053-3).

Disclaimer: I'm not a cancer expert and I'm doing this without guidance from one.

In [1]:
from pathlib import Path

import numpy as np
import pandas as pd
import seaborn as sns

In [2]:
%matplotlib inline
rng = np.random.default_rng(12345)
sns.set_context("talk")
sns.set_palette("colorblind")

In [4]:
DATA_DIR = "/Users/benlacar/Documents/career/data_science/repos/wandb-explore/data/"

# Explore clinical data

In [5]:
df_clinical = pd.read_excel(Path(DATA_DIR, "Cherniack_s1_clinical_data.xlsx"))
df_clinical.head()

ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

In [4]:
import wandb

# Run quick start

In [5]:
wandb.login()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /Users/benlacar/.netrc


True

In [6]:
run = wandb.init(
    # Set the project where this run will be logged
    project="my-awesome-project",
    # Track hyperparameters and run metadata
    config={
        "learning_rate": 0.01,
        "epochs": 10,
    },
)

[34m[1mwandb[0m: Currently logged in as: [33mbenslack19[0m ([33mbenslack19-self[0m). Use [1m`wandb login --relogin`[0m to force relogin


In [9]:
# train.py
import random  # for demo script

epochs = 10
lr = 0.01

run = wandb.init(
    # Set the project where this run will be logged
    project="my-awesome-project",
    # Track hyperparameters and run metadata
    config={
        "learning_rate": lr,
        "epochs": epochs,
    },
)

offset = random.random() / 5
print(f"lr: {lr}")

# simulating a training run
for epoch in range(2, epochs):
    acc = 1 - 2**-epoch - random.random() / epoch - offset
    loss = 2**-epoch + random.random() / epoch + offset
    print(f"epoch={epoch}, accuracy={acc}, loss={loss}")
    wandb.log({"accuracy": acc, "loss": loss})

# run.log_code()



lr: 0.01
epoch=2, accuracy=0.39077614965655927, loss=0.42465757790786096
epoch=3, accuracy=0.5793069813839529, loss=0.40045406666139094
epoch=4, accuracy=0.7569201383804353, loss=0.43469313869166704
epoch=5, accuracy=0.6330842043062369, loss=0.23222180389233693
epoch=6, accuracy=0.7289425020942291, loss=0.25788110512560247
epoch=7, accuracy=0.7924265661406995, loss=0.27971094359898496
epoch=8, accuracy=0.7472078685632099, loss=0.16951062213499507
epoch=9, accuracy=0.7557763101533337, loss=0.16531536937785601


In [8]:
%load_ext watermark
%watermark -n -u -v -iv -w

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Last updated: Thu Dec 05 2024

Python implementation: CPython
Python version       : 3.11.11
IPython version      : 8.30.0

pandas    : 2.2.3
wandb     : 0.19.0
seaborn   : 0.13.2
matplotlib: 3.9.2
numpy     : 2.0.2

Watermark: 2.5.0

