Fraction Unbound (Human)
Description: Fraction unbound (FU) refers to the proportion of a small molecule drug that is not bound to proteins in the bloodstream of humans. FU is an important pharmacokinetic property because only the unbound fraction of a drug is typically available to exert pharmacological effects or be metabolized and eliminated from the body. Therefore, it directly influences the drug's potency, efficacy, and potential for adverse effects.



In pharmacokinetics and pharmacology, Fraction Unbound (Human), also known as fu (human), refers to the fraction of a drug that is unbound or free in the plasma. It represents the proportion of the drug that is not bound to plasma proteins and is available for distribution and pharmacological action.

High Fraction Unbound (fu): A high fraction unbound indicates that a larger portion of the drug is in its free form and available for distribution to tissues and interaction with its target receptors or enzymes. This can lead to increased pharmacological activity and efficacy, as a higher concentration of the drug is present in the bloodstream and able to exert its effects.

Low Fraction Unbound (fu): Conversely, a low fraction unbound suggests that a significant portion of the drug is bound to plasma proteins, reducing its availability for distribution and pharmacological action. While a low fu may increase the drug's plasma half-life and stability, it can also decrease its pharmacological activity and efficacy as less free drug is available to interact with target sites.

The optimal fraction unbound for a given drug depends on various factors, including its pharmacokinetic and pharmacodynamic properties, therapeutic index, and desired clinical outcomes. Therefore, the significance of the fraction unbound in drug therapy depends on the specific context and the therapeutic goals of the treatment.

In [2]:
import pandas as pd

In [3]:
!pip install rdkit
!pip install Sklearn
!pip install tensorflow
import numpy as np
from rdkit import Chem
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras import layers, models
from keras.models import save_model
from keras import optimizers

Collecting Sklearn
  Using cached sklearn-0.0.post12.tar.gz (2.6 kB)
  Preparing metadata (setup.py) ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py egg_info[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[15 lines of output][0m
  [31m   [0m The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
  [31m   [0m rather than 'sklearn' for pip commands.
  [31m   [0m 
  [31m   [0m Here is how to fix this error in the main use cases:
  [31m   [0m - use 'pip install scikit-learn' rather than 'pip install sklearn'
  [31m   [0m - replace 'sklearn' by 'scikit-learn' in your pip requirements files
  [31m   [0m   (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
  [31m   [0m - if the 'sklearn' package is used by one of your dependencies,
  [31m   [0m   it would be great if you take some time to track which package uses
  [31m   [0m   'sklearn' instead of 'scikit-le

2024-05-23 14:12:39.428619: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-23 14:12:40.638423: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-23 14:12:43.183628: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [4]:
data_fu = pd.read_csv("data/fu_train.csv", header=0)
data_fu.columns = ['smiles', 'label', 'group']

In [5]:
data_fu['Molecule'] = data_fu['smiles'].apply(Chem.MolFromSmiles)

In [6]:
data_fu.shape

(1901, 4)

In [14]:
from rdkit.Chem import Descriptors, AllChem
# Function to calculate all molecular descriptors for a molecule
def calculate_all_descriptors(molecule):
    descriptors = {}
    for descriptor, descriptor_fn in Descriptors.descList:
        descriptors[descriptor] = descriptor_fn(molecule)
    return descriptors

# Calculate all molecular descriptors for each molecule
all_descriptors = data_fu['Molecule'].apply(calculate_all_descriptors)

# Convert dictionary of descriptors into dataframe
descriptor_df = pd.DataFrame(all_descriptors.tolist())

# Concatenate the original dataframe with the descriptor dataframe
data_fu_descriptor = pd.concat([data_fu, descriptor_df], axis=1)

KeyboardInterrupt: 

In [8]:
data_fu_descriptor.columns[data_fu_descriptor.isna().any()].tolist()

[]

In [9]:
list_desc =  [descr[0] for descr in Descriptors.descList]

In [10]:
from rdkit import Chem
from rdkit.Chem import MACCSkeys

def smiles_to_maccs_fingerprint(smiles):
    """
    Convert a SMILES string to a MACCS fingerprint bit string.
    
    Parameters:
    - smiles (str): The SMILES representation of the molecule.
    
    Returns:
    - bitstring (str): The MACCS fingerprint represented as a bit string.
    """
    
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        raise ValueError("Invalid SMILES string")

    fingerprint = MACCSkeys.GenMACCSKeys(mol)
    
    # Convert the fingerprint to a bit string
    bitstring = ''.join(['1' if fingerprint.GetBit(i) else '0' for i in range(fingerprint.GetNumBits())])
    
    return bitstring


In [11]:
#Apply fingerprint function on smiles in the df
data_fu_descriptor['maccs_fingerprint'] = data_fu_descriptor['smiles'].apply(smiles_to_maccs_fingerprint)


In [19]:
# Split the column into a list of substrings
data_fu_descriptor['maccs_fingerprint_list'] = data_fu_descriptor['maccs_fingerprint'].apply(list)

# Expand the list of substrings into separate columns
fingerprints_df = pd.DataFrame(data_fu_descriptor['maccs_fingerprint_list'].to_list(), columns=[f'bit_{i}' for i in range(1, 168)])




In [30]:
fingerprints_df

Unnamed: 0,bit_1,bit_2,bit_3,bit_4,bit_5,bit_6,bit_7,bit_8,bit_9,bit_10,...,bit_158,bit_159,bit_160,bit_161,bit_162,bit_163,bit_164,bit_165,bit_166,bit_167
0,0,0,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,0
1,0,0,0,0,0,0,0,0,0,0,...,1,1,1,0,1,1,1,1,1,0
2,0,0,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,0
3,0,0,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,0
4,0,0,0,0,0,0,0,0,0,0,...,1,1,1,0,1,1,1,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1896,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,1,1,1,0,1,0
1897,0,0,0,0,0,0,0,0,0,0,...,0,1,0,1,1,1,1,1,1,0
1898,0,0,0,0,0,0,0,0,1,0,...,1,1,1,0,1,1,1,1,1,0
1899,0,0,0,0,0,0,0,0,1,0,...,1,1,1,1,1,1,1,1,1,0


In [20]:
data_fu_des_fp = pd.concat([data_fu_descriptor, fingerprints_df], axis=1)

In [21]:
data_fu_des_fp

Unnamed: 0,smiles,label,group,Molecule,MaxAbsEStateIndex,MaxEStateIndex,MinAbsEStateIndex,MinEStateIndex,qed,SPS,...,bit_158,bit_159,bit_160,bit_161,bit_162,bit_163,bit_164,bit_165,bit_166,bit_167
0,CC(=N)N1CC[C@H](Oc2ccc(C(Cc3ccc4ccc(C(=N)N)cc4...,0.397940,training,<rdkit.Chem.rdchem.Mol object at 0x793959b75620>,12.073341,12.073341,0.005830,-0.880421,0.324103,16.515152,...,1,1,1,1,1,1,1,1,1,0
1,N=C(N)c1ccc(CNC(=O)C2CCCN2C(=O)C(NCC(=O)O)C(c2...,0.289037,training,<rdkit.Chem.rdchem.Mol object at 0x793959b75690>,14.068602,14.068602,0.021824,-1.070124,0.188924,15.815789,...,1,1,1,0,1,1,1,1,1,0
2,Cc1c(CC2=NN(Cc3ccc(F)cc3F)C(=O)CC2)c2cc(F)ccc2...,1.698970,training,<rdkit.Chem.rdchem.Mol object at 0x793959b75700>,14.036069,14.036069,0.136081,-1.019465,0.621431,14.187500,...,1,1,1,1,1,1,1,1,1,0
3,Cc1ccc2c(c1)c(-c1ccnc3c(Cl)cccc13)c(C)n2CC(=O)O,2.221849,training,<rdkit.Chem.rdchem.Mol object at 0x793959b75770>,11.386073,11.386073,0.074240,-0.861337,0.542446,11.346154,...,1,1,1,1,1,1,1,1,1,0
4,N=C(N)c1cc2c(OC(COC(=O)Nc3ccccc3CN3CCNCC3)c3cc...,1.301030,training,<rdkit.Chem.rdchem.Mol object at 0x793959b757e0>,12.887452,12.887452,0.022221,-0.532500,0.180119,14.631579,...,1,1,1,0,1,1,1,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1896,N#Cc1ccc(C(c2ccc(C#N)cc2)n2cncn2)cc1,0.392545,training,<rdkit.Chem.rdchem.Mol object at 0x793959bc5af0>,8.917602,8.917602,0.150013,-0.150013,0.740744,10.136364,...,0,1,0,0,1,1,1,0,1,0
1897,CCN(C(C)=O)c1cccc(-c2ccnc3c(C#N)cnn23)c1,0.397940,training,<rdkit.Chem.rdchem.Mol object at 0x793959bc5b60>,11.740122,11.740122,0.004277,-0.004277,0.745264,10.478261,...,0,1,0,1,1,1,1,1,1,0
1898,NC(C(=O)NC1C(=O)N2C(C(=O)O)=C(CSc3c[nH]nn3)CS[...,0.397940,training,<rdkit.Chem.rdchem.Mol object at 0x793959bc5bd0>,12.720935,12.720935,0.046558,-1.194469,0.279017,21.322581,...,1,1,1,0,1,1,1,1,1,0
1899,CC1=C(C(=O)O)N2C(=O)C(NC(=O)C(N)c3ccc(O)cc3)[C...,0.408935,training,<rdkit.Chem.rdchem.Mol object at 0x793959bc5c40>,12.324446,12.324446,0.011961,-1.151184,0.559704,23.600000,...,1,1,1,1,1,1,1,1,1,0


In [22]:
# Veryfing shape 
print(data_fu_des_fp.shape, data_fu.shape, fingerprints_df.shape, descriptor_df.shape)

(1901, 383) (1901, 4) (1901, 167) (1901, 210)


In [23]:
data_fu_des_fp = data_fu_des_fp.drop(['Molecule', 'group', 'smiles'], axis=1)

In [31]:
data_fu_des_fp = data_fu_des_fp.drop(['maccs_fingerprint'], axis=1)

In [32]:
data_fu_des_fp

Unnamed: 0,MaxAbsEStateIndex,MaxEStateIndex,MinAbsEStateIndex,MinEStateIndex,qed,SPS,MolWt,HeavyAtomMolWt,ExactMolWt,NumValenceElectrons,...,bit_158,bit_159,bit_160,bit_161,bit_162,bit_163,bit_164,bit_165,bit_166,bit_167
0,12.073341,12.073341,0.005830,-0.880421,0.324103,16.515152,444.535,416.311,444.216141,170,...,1,1,1,1,1,1,1,1,1,0
1,14.068602,14.068602,0.021824,-1.070124,0.188924,15.815789,533.654,502.406,533.209675,198,...,1,1,1,0,1,1,1,1,1,0
2,14.036069,14.036069,0.136081,-1.019465,0.621431,14.187500,443.425,423.265,443.145676,166,...,1,1,1,1,1,1,1,1,1,0
3,11.386073,11.386073,0.074240,-0.861337,0.542446,11.346154,364.832,347.696,364.097855,130,...,1,1,1,1,1,1,1,1,1,0
4,12.887452,12.887452,0.022221,-0.532500,0.180119,14.631579,529.666,498.418,529.214761,196,...,1,1,1,0,1,1,1,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1896,8.917602,8.917602,0.150013,-0.150013,0.740744,10.136364,285.310,274.222,285.101445,104,...,0,1,0,0,1,1,1,0,1,0
1897,11.740122,11.740122,0.004277,-0.004277,0.745264,10.478261,305.341,290.221,305.127660,114,...,0,1,0,1,1,1,1,1,1,0
1898,12.720935,12.720935,0.046558,-1.194469,0.279017,21.322581,462.513,444.369,462.078010,162,...,1,1,1,0,1,1,1,1,1,0
1899,12.324446,12.324446,0.011961,-1.151184,0.559704,23.600000,363.395,346.259,363.088892,132,...,1,1,1,1,1,1,1,1,1,0


In [25]:

y = data_fu_des_fp.pop('label')

(1901, 210)

In [33]:

X_train, X_test, y_train, y_test = train_test_split(data_fu_des_fp, y, test_size=0.2, random_state=42)
# Trying without scaler to capture variability 
scaler = StandardScaler()

print(X_train.info())
print(X_train.head())


<class 'pandas.core.frame.DataFrame'>
Index: 1520 entries, 1794 to 1126
Columns: 378 entries, MaxAbsEStateIndex to bit_167
dtypes: float64(106), int64(104), object(168)
memory usage: 4.4+ MB
None
      MaxAbsEStateIndex  MaxEStateIndex  MinAbsEStateIndex  MinEStateIndex  \
1794           8.903975        8.903975           0.188201        0.188201   
1775          11.151855       11.151855           0.177547       -0.503545   
339           10.406475       10.406475           0.193858       -0.863050   
824           13.083100       13.083100           0.133371       -0.955111   
733           13.426806       13.426806           0.035554       -3.279809   

           qed        SPS    MolWt  HeavyAtomMolWt  ExactMolWt  \
1794  0.389671  15.428571  204.314         180.122  204.183778   
1775  0.877602  30.250000  324.424         300.232  324.183778   
339   0.776519  12.714286  323.183         311.087  322.038816   
824   0.600540  18.606061  453.543         422.295  453.237604   
733  

In [34]:
object_columns = X_train.select_dtypes(include=['object']).columns
print("Columns with object data type:")
print(len(object_columns))

Columns with object data type:
168


In [35]:
def convert_to_float(column):
    try:
        return column.astype(float)
    except ValueError:
        # If conversion to float fails, fill with NaN
        return pd.to_numeric(column, errors='coerce')

# Apply the function to each column in the DataFrame
X_train = X_train.apply(convert_to_float)

In [37]:
X_test = X_test.apply(convert_to_float)

In [38]:
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)

  updated_mean = (last_sum + new_sum) / updated_sample_count
  T = new_sum / new_sample_count
  new_unnormalized_variance -= correction**2 / new_sample_count
  updated_mean = (last_sum + new_sum) / updated_sample_count
  T = new_sum / new_sample_count
  new_unnormalized_variance -= correction**2 / new_sample_count


In [39]:
X_train_scaled

array([[ -2.0020046 ,  -2.0020046 ,   0.38368272, ...,   0.17665672,
        -10.3716647 ,   0.        ],
       [ -0.80131464,  -0.80131464,   0.32196063, ...,   0.17665672,
          0.09641654,   0.        ],
       [ -1.19945428,  -1.19945428,   0.41645873, ...,   0.17665672,
          0.09641654,   0.        ],
       ...,
       [  0.94227699,   0.94227699,  -0.5722744 , ...,   0.17665672,
          0.09641654,   0.        ],
       [  0.24196653,   0.24196653,   0.17493629, ...,   0.17665672,
          0.09641654,   0.        ],
       [  0.12301281,   0.12301281,  -0.63906728, ...,   0.17665672,
          0.09641654,   0.        ]])

<rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x77b471167290>


Mutagnicity prediction models 
1st layer 100 neurons Tanh
2nd 50 neurons 50 Tanh
3 1 neuron Tanh 
SGD
0.001
batch size 100 

In [40]:
learning_rate = 0.001
from keras import optimizers

model = models.Sequential([
    layers.Dense(200, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    layers.Dense(400, activation='relu'),  # Increased complexity
    layers.Dropout(0.2),  # Regularization
    layers.Dense(200, activation='relu'),
    layers.Dense(1)  # Output layer
])

# Compile the model with a lower learning rate
model.compile(optimizer=optimizers.Adam(learning_rate=learning_rate), loss='mean_squared_error')

# Train the model with more epochs
model.fit(X_train_scaled, y_train, epochs=200, batch_size=15, verbose=1)


Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: nan
Epoch 2/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: nan
Epoch 3/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: nan
Epoch 4/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: nan
Epoch 5/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: nan
Epoch 6/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: nan
Epoch 7/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: nan
Epoch 8/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: nan
Epoch 9/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: nan
Epoch 10/200
[1m102/102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: nan
Epoch 11/200
[1m102/

<keras.src.callbacks.history.History at 0x793950e69060>

In [14]:
save_model(model, 'my_model.h5')



In [15]:
loss = model.evaluate(X_test_scaled, y_test)
print("Test Loss:", loss)


[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.2256 
Test Loss: 0.24393831193447113


In [16]:
predictions = model.predict(X_test_scaled)

[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step 


In [17]:
predictions = np.array(predictions).reshape(-1)  # Reshape predictions to be 1-dimensional
y_test = np.array(y_test).reshape(-1)            # Reshape y_test to be 1-dimensional


In [18]:
results = pd.DataFrame({'Predictions': predictions, 'Targets': y_test})
results

Unnamed: 0,Predictions,Targets
0,0.448505,1.481486
1,0.310784,0.744727
2,1.417065,1.301030
3,0.458502,0.301030
4,0.230560,0.221849
...,...,...
376,1.211963,1.301030
377,1.338630,2.522879
378,1.743725,2.000000
379,1.524705,1.187087


In [19]:
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, predictions)

In [20]:
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, predictions)
rmse = np.sqrt(mse)

from sklearn.metrics import r2_score
r2 = r2_score(y_test, predictions)

In [21]:
print(r2, mse)

0.522371332101373 0.24393829550795976


In [22]:
import structure

In [23]:
a =structure.smile_to_image("CC")
type(a)

rdkit.Chem.rdchem.Mol