Fraction Unbound (Human)
Description: Fraction unbound (FU) refers to the proportion of a small molecule drug that is not bound to proteins in the bloodstream of humans. FU is an important pharmacokinetic property because only the unbound fraction of a drug is typically available to exert pharmacological effects or be metabolized and eliminated from the body. Therefore, it directly influences the drug's potency, efficacy, and potential for adverse effects.



In pharmacokinetics and pharmacology, Fraction Unbound (Human), also known as fu (human), refers to the fraction of a drug that is unbound or free in the plasma. It represents the proportion of the drug that is not bound to plasma proteins and is available for distribution and pharmacological action.

High Fraction Unbound (fu): A high fraction unbound indicates that a larger portion of the drug is in its free form and available for distribution to tissues and interaction with its target receptors or enzymes. This can lead to increased pharmacological activity and efficacy, as a higher concentration of the drug is present in the bloodstream and able to exert its effects.

Low Fraction Unbound (fu): Conversely, a low fraction unbound suggests that a significant portion of the drug is bound to plasma proteins, reducing its availability for distribution and pharmacological action. While a low fu may increase the drug's plasma half-life and stability, it can also decrease its pharmacological activity and efficacy as less free drug is available to interact with target sites.

The optimal fraction unbound for a given drug depends on various factors, including its pharmacokinetic and pharmacodynamic properties, therapeutic index, and desired clinical outcomes. Therefore, the significance of the fraction unbound in drug therapy depends on the specific context and the therapeutic goals of the treatment.

In [1]:
import pandas as pd

In [10]:
!pip install rdkit
!pip install Sklearn
!pip install tensorflow
import numpy as np
from rdkit import Chem
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras import layers, models

Collecting rdkit
  Downloading rdkit-2023.9.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.9 kB)
Downloading rdkit-2023.9.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m34.9/34.9 MB[0m [31m36.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hInstalling collected packages: rdkit
Successfully installed rdkit-2023.9.6
Collecting Sklearn
  Downloading sklearn-0.0.post12.tar.gz (2.6 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[15 lines of output][0m
  [31m   [0m The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
  [31m   [0m rather than 'sklearn' for pip commands.
  [31m   [0m 
  [31m   [0m

2024-05-15 16:55:36.146458: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-15 16:55:36.149740: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-15 16:55:36.187992: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [11]:
data_fu = pd.read_csv("fu_train.csv", header=0)
data_fu.columns = ['smiles', 'label', 'group']

In [12]:
data_fu['Molecule'] = data_fu['smiles'].apply(Chem.MolFromSmiles)

In [45]:
data_fu.shape

(1901, 4)

In [15]:
from rdkit.Chem import Descriptors, AllChem
# Function to calculate all molecular descriptors for a molecule
def calculate_all_descriptors(molecule):
    descriptors = {}
    for descriptor, descriptor_fn in Descriptors.descList:
        descriptors[descriptor] = descriptor_fn(molecule)
    return descriptors

# Calculate all molecular descriptors for each molecule
all_descriptors = data_fu['Molecule'].apply(calculate_all_descriptors)

# Convert dictionary of descriptors into dataframe
descriptor_df = pd.DataFrame(all_descriptors.tolist())

# Concatenate the original dataframe with the descriptor dataframe
data_fu_descriptor = pd.concat([data_fu, descriptor_df], axis=1)

In [83]:
data_fu_descriptor.columns[data_fu_descriptor.isna().any()].tolist()

[]

In [84]:
list_desc =  [descr[0] for descr in Descriptors.descList]

In [98]:
X = data_fu_descriptor[list_desc].values
y = data_fu_descriptor['label'].values

In [99]:
X.shape

(1901, 210)

In [103]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Trying without scaler to capture variability 
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)

In [104]:
X_train_scaled.shape

(1520, 210)

In [105]:
learning_rate = 0.001
from keras import optimizers

model = models.Sequential([
    layers.Dense(200, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    layers.Dense(400, activation='relu'),  # Increased complexity
    layers.Dropout(0.2),  # Regularization
    layers.Dense(200, activation='relu'),
    layers.Dense(1)  # Output layer
])

# Compile the model with a lower learning rate
model.compile(optimizer=optimizers.Adam(learning_rate=learning_rate), loss='mean_squared_error')

# Train the model with more epochs
model.fit(X_train_scaled, y_train, epochs=200, batch_size=15, verbose=1)


Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 0.6026
Epoch 2/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.3077
Epoch 3/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.1988
Epoch 4/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.1727
Epoch 5/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.1329
Epoch 6/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.1408
Epoch 7/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.0995
Epoch 8/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.0974
Epoch 9/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.0984
Epoch 10/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss

<keras.src.callbacks.history.History at 0x7fe5b062fac0>

In [113]:
loss = model.evaluate(X_test_scaled, y_test)
print("Test Loss:", loss)


[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 928us/step - loss: 0.2140
Test Loss: 0.22780200839042664


In [114]:
predictions = model.predict(X_test_scaled)

[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 979us/step


In [115]:
predictions = np.array(predictions).reshape(-1)  # Reshape predictions to be 1-dimensional
y_test = np.array(y_test).reshape(-1)            # Reshape y_test to be 1-dimensional


In [116]:
results = pd.DataFrame({'Predictions': predictions, 'Targets': y_test})
results

Unnamed: 0,Predictions,Targets
0,0.585859,1.481486
1,0.528993,0.744727
2,1.299636,1.301030
3,0.510466,0.301030
4,0.356507,0.221849
...,...,...
376,1.202487,1.301030
377,1.315314,2.522879
378,1.869944,2.000000
379,1.489623,1.187087


In [117]:
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, predictions)

In [118]:
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, predictions)
rmse = np.sqrt(mse)

from sklearn.metrics import r2_score
r2 = r2_score(y_test, predictions)

In [119]:
print(r2, mse)

0.553965995273497 0.22780201894134502
