# <b><span style='color:#F1A424'>AutoDataPrep - Binary Classification - Titanic Survival- Load Data</span> </b> 

### Disclaimer
The sample code (“Sample Code”) provided is not covered by any Teradata agreements. Please be aware that Teradata has no control over the model responses to such sample code and such response may vary. The use of the model by Teradata is strictly for demonstration purposes and does not constitute any form of certification or endorsement. The sample code is provided “AS IS” and any express or implied warranties, including the implied warranties of merchantability and fitness for a particular purpose, are disclaimed. In no event shall Teradata be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) sustained by you or a third party, however caused and on any theory of liability, whether in contract, strict liability, or tort arising in any way out of the use of this sample code, even if advised of the possibility of such damage.

## <b> Problem overview - Binary Classification </b>


The Titanic dataset is a well-known dataset in the field of machine learning and data science. It contains information about passengers aboard the RMS Titanic, including whether they survived or not. The dataset is often used for predictive modeling and classification tasks. Here are some key details about the Titanic dataset:

**Features**:

- `PassengerId`: Unique identifier for each passenger.
- `Pclass`: Ticket class (1st, 2nd, or 3rd).
- `Name`: Passenger's name.
- `Sex`: Passenger's gender (male or female).
- `Age`: Passenger's age.
- `SibSp`: Number of siblings or spouses aboard.
- `Parch`: Number of parents or children aboard.
- `Ticket`: Ticket number.
- `Fare`: Fare paid for the ticket.
- `Cabin`: Cabin number.
- `Embarked`: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).

**Target Variable**:

- `Survived`: Binary variable indicating whether the passenger survived (1) or not (0).
        
**Objective**:

The main objective is typically to build a predictive model that can accurately predict whether a passenger survived based on the available features.

**Challenges**:

- Missing data in the columns such as `Age`, `Cabin`, and `Embarked`.
- Exploring feature engineering techniques to improve model performance.(`Feature exploration and engineering`)
- Understanding passenger demographics and characteristics that influenced survival.(`Model training`)

**Usecase**:

- Here, we will use AutoML(Automated Machine Learning) functionality to automate the entire process of developing a predictive model. 
- It will perform `feature exploration`, `feature engineering`, `data preparation`, `model training` and `model evaluation` on dataset in auto run and at end we will get `leaderboard` containined different models along with their performance. 
- Model will also have `rank` associated with them which indicates which is `best performing model` for given data followed by other models.

In [1]:
# Importing AutoDataPrep from teradataml
from teradataml import AutoDataPrep

In [2]:
# Importing other important libraries
import getpass
from teradataml import create_context, remove_context
from teradataml import DataFrame
from teradataml import load_example_data

In [3]:
# Create the connection.
host = getpass.getpass("Host: ")
username = getpass.getpass("Username: ")
password = getpass.getpass("Password: ")

con = create_context(host=host, username=username, password=password)

Host:  ········
Username:  ········
Password:  ········


## <b><span style='color:#F1A424'>| 1.</span> Load deployed data from AutoDataPrep </b>

In [4]:
adp = AutoDataPrep()

In [5]:
data = adp.load(table_name="titanic_prep")

In [6]:
data

{'rfe_train':          r_sex_0  survived  automl_id  r_embarked_2  r_embarked_0  r_passenger     r_age  r_sibsp  r_pclass    r_fare
 r_sex_1                                                                                                              
 1              0         0         11             1             0     0.662921  0.627451      0.0       1.0  0.125000
 1              0         0         17             1             0     0.341573  0.509804      0.0       1.0  0.141228
 1              0         0         22             1             0     0.249438  0.941176      0.0       1.0  0.141228
 1              0         1         23             1             0     0.502247  0.607843      0.0       0.0  0.465789
 1              0         1         27             1             0     0.800000  0.882353      0.5       0.0  0.912281
 1              0         0         28             1             0     0.570787  0.490196      0.0       1.0  0.395175
 0              1         1        

In [7]:
data['rfe_train']



r_sex_1,r_sex_0,survived,automl_id,r_embarked_2,r_embarked_0,r_passenger,r_age,r_sibsp,r_pclass,r_fare
1,0,0,11,1,0,0.6629213483146067,0.6274509803921569,0.0,1.0,0.125
1,0,0,17,1,0,0.3415730337078652,0.5098039215686274,0.0,1.0,0.1412280701754386
1,0,0,22,1,0,0.249438202247191,0.9411764705882352,0.0,1.0,0.1412280701754386
1,0,1,23,1,0,0.5022471910112359,0.6078431372549019,0.0,0.0,0.4657894736842105
1,0,1,27,1,0,0.8,0.8823529411764706,0.5,0.0,0.912280701754386
1,0,0,28,1,0,0.5707865168539326,0.4901960784313725,0.0,1.0,0.3951754385964912
0,1,1,12,0,1,0.3651685393258427,0.6470588235294118,0.0,0.0,0.2280701754385964
0,1,1,19,1,0,0.9606741573033708,0.2941176470588235,0.0,1.0,0.1640350877192982
0,1,0,21,1,0,0.1123595505617977,0.4901960784313725,0.0,1.0,0.1385228070175438
0,1,1,24,1,0,0.7314606741573034,0.2941176470588235,0.0,0.5,0.4035087719298245


In [8]:
data['pca_train']



automl_id,col_0,col_1,col_2,col_3,col_4,col_5,survived
387,-1.207637740325192,-0.6621570227332567,0.0384700035908823,-0.3665275621063439,0.0702415565527134,-0.2506499506779073,1
713,-0.6397864111237941,0.6795591286686358,0.3408087196160689,-0.1970307939082055,0.5098075806362357,-0.0637627941946031,1
19,-0.6378043085682823,0.6799448436278037,0.3751187924630473,-0.2264710545632606,0.5474129237576701,-0.0591823805442088,1
753,0.1355450477161559,-1.1213840547504108,0.2585238211222831,-0.479923959501247,-0.1302030476609008,0.3657801469263169,0
324,-0.7313162602639105,0.6374565663085365,-0.0767831638625762,-0.0118713413750005,0.2133831057521939,-0.2267773926786543,1
385,-0.9770023330605028,-0.1434389422012849,0.990735493983962,0.6599487561085293,0.2457930540334945,-0.0673031293986802,0
59,-0.6402278947848477,0.6766118219189383,0.3830164800142429,-0.243977144009084,0.3278154236282112,-0.1287979557642659,1
856,0.5547656521185322,0.1308294512981963,-0.1721415574218471,-0.0085773402389063,-0.3276486492286639,-0.2357478481357209,0
591,0.5098084106161389,0.1472031149233407,-0.3457482020961598,0.1092652935962184,-0.0338264639389702,0.3791994898805872,1
122,0.4020199456331065,0.1242808353193081,-0.8539623674520939,0.3515111880593369,0.1241595557224409,0.4790149160792117,0


## <b><span style='color:#F1A424'>| 2.</span> Remove Deployed Data </b>

In [9]:
adp.delete_data(table_name="titanic_prep", fs_method=['rfe'])

Removed rfe_train table successfully.


In [10]:
adp.delete_data(table_name="titanic_prep")

Removed pca_train table successfully.
Deployed data removed successfully.


In [11]:
remove_context()

True