# Creating a Data Table of BBB Permeability Data 
This notebook takes the data set of four different files (LightBBB, MoleculeNet, DeePred, B3BD) and converts it into a Pandas DataFrame to be filtered and analyzed. The created dataset consists of the molecule's SMILES and its BBB classification data (BBB permeable or BBB nonpermeable)

## Imports 
Import the Python libraries needed to create a data table of BBB permeability data

In [19]:
import pandas as pd 

## Load DataSet as Variables
Read the files of each dataset with read_csv() and store the data as variables corresponding to its dataset name

In [20]:
# LightBBB DataSet
LightBBB = pd.read_csv('y_test_indices.csv')

# MoleculeNet DataSet
MoleculeNet = pd.read_csv('MoleculeNet-BBBP-process-flow-step5-traindata.csv')

# DeePred DataSet
DeePred = pd.read_csv('Table 1(Data Set).csv', encoding='cp1252', delimiter=',')

# B3BD DataSet
B3BD = pd.read_csv('B3DB_regression.tsv', sep='\t')

## Standardize Column Names
Standardize the required columns so that the column names are consistent throughout the four datasets and remove any unnecessary columns. 

**Note**: The BBB Dataset doesn't have a BBB classification column, the following section will handle this problem

In [21]:
# LightBBB DataSet
LightBBB = LightBBB.rename(columns= {
    'Unnamed: 0': 'SMILES',
})

# MoleculeNet DataSet
MoleculeNet = MoleculeNet.rename(columns={
    'BBB': 'BBclass'
})

# DeePred DataSet
DeePred = DeePred.rename(columns= {
    'Compounds': 'SMILES',
    'BBB-Class': 'BBclass'
})
DeePred = DeePred.drop(['Unnamed: 5', 'Unnamed: 6'], axis=1)

## Use logBB value to create a BBB classification column for B3BD Dataset
Becuase the BBB Dataset didn't have a BBclass (BBB classification data) section, we sorted the molecules into BBB permeable or not using a logBB cut-off value and stored the result as a new column representing BBB classification data.

In [22]:
# BBB DataSet
logBB_value = []
for value in B3BD['logBB']: 
    if value >= 0.3: 
        logBB_value.append(1)
    elif value < 0.3: 
        logBB_value.append(0)
B3BD['BBclass'] = logBB_value

## Extract Columns
Make a separate list of variables to store all the columns for each dataset.

In [23]:
# LightBBB DataSet
LightBBB_columns = LightBBB.columns

# MoleculeNet DataSet
MoleculeNet_columns = MoleculeNet.columns

# DeePred DataSet
DeePred_columns = DeePred.columns

# B3BD DataSet
B3BD_columns = B3BD.columns