# qHTS for Inhibitors of human tyrosyl-DNA phosphodiesterase 1 (TDP1): qHTS in cells in absence of CPT

## Introduction
###### Human tyrosyl-DNA phosphodiesterase 1 (TDP1) is a novel repair gene, and we propose to use it as a new target for anticancer drug development. TDP1 is not an essential protein, but under treatment with topoisomerase I poison (camptothecin: CPT), TDP1 works as a critical factor for cell survival. To directly identify novel TDP1 inhibitors active in a cellular environment, we have knocked-out the Tdp1 gene in chicken DT40 cells (Tdp1-/-) and generated a complemented counterpart cells that contains a stable transfection of the human TDP1 gene (Tdp1-/-;hTDP1 cells). For the primary screen, Tdp1-/-;hTDP1 cells will be exposed to small molecules in the presence or absence of CPT, and their growth kinetics will be evaluated after 48 hours by measuring ATP activity. If a given compound shows a synergistic effect with CPT, this compound could inhibit the repair pathway of CPT-induced lesions including the TDP1-mediated repair pathway. The hit compounds will then be evaluated in the presence or absence of CPT using Tdp1-/- cells. If a compound shows synergistic effect with CPT in Tdp1-/-;hTDP1 cells, but not with Tdp1-/- cells, such compound could be involved in the TDP1-mediated repair pathway inhibition. In tertiary assays, biochemical gel-based assays will be used to assess whether the hit compounds specifically target TDP1.

## Imports

In [None]:
import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm
from sklearn import preprocessing
from sklearn.decomposition import PCA
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import missingno as msno
import seaborn as sns
from standardizer.CustomStandardizer import CustomStandardizer
from loaders.Loaders import CSVLoader

## Initial exploration

### Carregar o dataset
###### The first step, analysing this dataset, includes loading and displaying TDP1 data.

In [None]:
file = 'C:/Users/rafes/Documents/GitHub/SIB_SMILES/src/smiles/dataset/TDP1_activity_dataset.csv'
dataframe = pd.read_csv(file, sep=',', dtype={'Excluded_Points': str, 'Compound QC': str, 'smiles': str})
dataframe.head()

### Simple Analyses
###### This following step was taken to analyse how data presents itself along the lines and collumns of the Datasets

In [None]:
dataframe.size
dataframe.shape
dataframe.columns
dataframe.dtypes
dataframe.describe()

## Pre-Processing

### Visualization of the NAn's in each collumn

In [None]:
dataframe.isna().sum().sum()
dataframe.isna().sum()

msno.bar(dataframe,  sort="ascending")

###### The dataset presented a wide number of NA values (VER ...)

### Drop specific features

In [None]:
del dataframe['PUBCHEM_ACTIVITY_URL']  # drop de colunas desnecessarias
del dataframe['Compound QC']

dataframe.shape
dataframe.columns

### Graphic Analyses
#### Activity_outcome nad Phenotype

In [None]:
activity = dataframe.groupby('PUBCHEM_ACTIVITY_OUTCOME').size()
labels_activity = dataframe.groupby('PUBCHEM_ACTIVITY_OUTCOME').size().index
dataframe.groupby('PUBCHEM_ACTIVITY_OUTCOME').size()

fenotipo = dataframe.groupby('Phenotype').size()
labels_fenotipo = dataframe.groupby('Phenotype').size().index
dataframe.groupby('Phenotype').size()

#### Pie Charts Activity_outcome and Phenotype

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
ax1.pie(activity, labels=labels_activity, autopct='%1.1f%%', startangle=90)
ax1.set_title('PUBCHEM_Activity_Outcome')
ax2.pie(fenotipo, labels=labels_fenotipo, autopct='%1.1f%%', startangle=360)
ax2.set_title('Phenotype')

#### Boxplots of Activity at 46.23 uM, 1.849 uM, 0.363 uM, 0.00299 uM and 9.037 uM

In [None]:
plt.subplots(figsize=(10, 10))
sns.set(font_scale=1.4)
plt.title("Activity at 46.23 uM", fontsize=25)
sns.boxplot(y="Activity at 46.23 uM",
            data=dataframe, palette="Set3")

plt.subplots(figsize=(10, 10))
sns.set(font_scale=1.4)
plt.title("Activity at 1.849 uM", fontsize=25)
sns.boxplot(y="Activity at 1.849 uM",
            data=dataframe, palette="Set3")

plt.subplots(figsize=(10, 10))
sns.set(font_scale=1.4)
plt.title("Activity at 0.363 uM", fontsize=25)
sns.boxplot(y="Activity at 0.363 uM",
            data=dataframe, palette="Set3")

plt.subplots(figsize=(10, 10))
sns.set(font_scale=1.4)
plt.title("Activity at 0.00299 uM", fontsize=25)
sns.boxplot(y="Activity at 0.00299 uM",
            data=dataframe, palette="Set3")

plt.subplots(figsize=(10, 10))
sns.set(font_scale=1.4)
plt.title("Activity at 9.037 uM", fontsize=25)
sns.boxplot(y="Activity at 9.037 uM",
            data=dataframe, palette="Set3")

## Standardize molecules

In [None]:
def standardize(dataset, id_field ,mols_field,class_field):

    loader = CSVLoader(dataset,
                       id_field=id_field,
                       mols_field = mols_field,
                       labels_fields = class_field)

    dataset = loader.create_dataset()

    standardisation_params = {
        'REMOVE_ISOTOPE': True,
        'NEUTRALISE_CHARGE': True,
        'REMOVE_STEREO': False,
        'KEEP_BIGGEST': True,
        'ADD_HYDROGEN': False,
        'KEKULIZE': True,
        'NEUTRALISE_CHARGE_LATE': True}

    CustomStandardizer(params = standardisation_params).standardize(dataset)

    return dataset

In [None]:
dataframe = standardize(file, "PUBCHEM_CID", "smiles", "PUBCHEM_ACTIVITY_OUTCOME")