# Introduction and Problem Statement

Introduction:
The imbalance between energy intake and expenditure where the intake exceeds the demand or usage of energy in a human body results in the activation of adipogenesis [1]. Adipogenesis is a process where adipocytes are formed and accumulate as adipose tissue. This process can result in the formation of fat deposits at different sites in the human body [2]. The persistent formation of fat deposits results in obesity, a biological process that can cause downstream health problems, including type 2 diabetes [3].

Alpha-Lapachone is an organic heterotricyclic compound and an organooxygen compound. Alpha-Lapachone is a natural product found in Firmiana simplex, Catalpa ovata, and other organisms [4]. The compound had been reported for having the anti-infectious activity [5].



Problem Statement:

Beta-Lapachone is a member of the Naphthoquinones class of compounds. The naphthoquinones compounds are mostly found to have antibacterial, antifungal, antiviral, insecticidal, anti-inflammatory, and antipyretic properties [6]. Recent experimentation shows beta-lapachone; a member of naphthoquinones, to have antiobesity activity [7]. Alpha-lapachone is another member of naphthoquinones class with a similar chemical constitution. There is a need to annotate alpha-lapachone against the possession of antiobesity activity. This research will broaden the scope of the chemical nature of antiobesity compounds.



References:



1. Michael W Schwartz, Randy J Seeley, Lori M Zeltser, Adam Drewnowski, Eric Ravussin, Leanne M Redman, Rudolph L Leibel, Obesity Pathogenesis: An Endocrine Society Scientific Statement, Endocrine Reviews, Volume 38, Issue 4, 1 August 2017, Pages 267–296, https://doi.org/10.1210/er.2017-00111  
2. Haider, Nida & Larose, Louise. (2019). Harnessing adipogenesis to prevent obesity. Adipocyte. 8. 1-7. 10.1080/21623945.2019.1583037.   
3. Heidi S Camp, Delin Ren, Todd Leff, Adipogenesis and fat-cell function in obesity and diabetes, Trends in Molecular Medicine,Volume 8, Issue 9, 2002, Pages 442-447, ISSN 1471-4914, https://doi.org/10.1016/S1471-4914(02)02396-1.
4. National Center for Biotechnology Information (2023). PubChem Compound Summary for CID 72732, alpha-Lapachone. Retrieved September 26, 2023 from https://pubchem.ncbi.nlm.nih.gov/compound/alpha-Lapachone.
5. Peixoto, J. F., Oliveira, A. D. S., Gonçalves-Oliveira, L. F., Souza-Silva, F., & Alves, C. R. (2023). Epoxy-α-lapachone (2,2-Dimethyl-3,4-dihydro-spiro[2H-naphtho[2,3-b]pyran-10,2'-oxirane]-5(10H)-one): a promising molecule to control infections caused by protozoan parasites. The Brazilian journal of infectious diseases : an official publication of the Brazilian Society of Infectious Diseases, 27(2), 102743. https://doi.org/10.1016/j.bjid.2023.102743
6. Dmitry Aminin, Sergey Polonik, 1,4-Naphthoquinones: Some Biological Properties and Application, Chemical and Pharmaceutical Bulletin, 2020, Volume 68, Issue 1, Pages 46-57, Released on J-STAGE January 01, 2020, Online ISSN 1347-5223, Print ISSN 0009-2363, https://doi.org/10.1248/cpb.c19-00911
7. Kwak, H. J., Jeong, M. Y., Um, J. Y., & Park, J. (2019). β -Lapachone Regulates Obesity through Modulating Thermogenesis in Brown Adipose Tissue and Adipocytes: Role of AMPK Signaling Pathway. The American journal of Chinese medicine, 47(4), 803–822. https://doi.org/10.1142/S0192415X19500423


# Importations

In [None]:
# imports

import pandas as pd
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from yellowbrick.classifier import ClassificationReport
from yellowbrick.classifier import ROCAUC

# Data Preprocessing

In [None]:
# data importation

df1 = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/data/alpha_lapachone_targets_tripeptide.csv", header=None)
print(df1.shape)
#df2 = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/data/alpha_lapachone_non_targets_tripeptide.csv", header=None)
df2 = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/data/tripeptide_data_non-pd.csv', header=None)
df2 = df2.head(84)
print(df2.shape)

(42, 8000)
(84, 8000)


In [None]:
# data transformation and mangling

df1['label'] = 1
df2['label'] = 0
df = pd.concat([df1, df2], ignore_index=True)
df = shuffle(df)
print (df.shape)
print (df)
df.fillna(method ='pad', inplace=True) # filling null values with pad

(126, 8001)
     0  1  2  3  4  5  6  7  8  9  ...  7991  7992  7993  7994  7995  7996  \
25   0  1  0  0  0  0  0  2  0  0  ...     0     0     0     2     0     0   
55   0  0  0  0  0  0  0  0  0  0  ...     0     0     0     0     0     0   
33   4  0  0  0  0  1  1  2  0  0  ...     1     0     0     0     0     0   
97   0  0  1  0  0  0  0  0  0  0  ...     0     0     0     0     2     1   
69   0  0  0  2  0  0  0  0  0  1  ...     0     0     0     0     0     0   
..  .. .. .. .. .. .. .. .. .. ..  ...   ...   ...   ...   ...   ...   ...   
96   0  0  0  1  0  0  0  1  0  0  ...     0     0     0     0     1     0   
3    0  0  0  0  0  0  0  0  0  0  ...     0     0     0     0     0     0   
86   0  0  0  0  0  0  0  0  0  0  ...     0     0     0     0     0     0   
47   0  0  0  0  0  0  0  0  0  0  ...     0     0     0     0     0     0   
110  1  0  0  0  0  0  1  0  0  1  ...     0     0     0     0     0     1   

     7997  7998  7999  label  
25      1     0     

In [None]:
# features extraction

features = df.iloc[:, :-1]; features

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,7990,7991,7992,7993,7994,7995,7996,7997,7998,7999
25,0,1,0,0,0,0,0,2,0,0,...,0,0,0,0,2,0,0,1,0,0
55,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
33,4,0,0,0,0,1,1,2,0,0,...,0,1,0,0,0,0,0,0,0,0
97,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,2,1,0,0,1
69,0,0,0,2,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,0,0,0,1,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,0,1
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
86,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
47,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
# labels extraction

labels = df.iloc[:,-1:]; labels

Unnamed: 0,label
25,1
55,0
33,1
97,0
69,0
...,...
96,0
3,1
86,0
47,0


In [None]:
# splitting dataset

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.3, random_state=50)