
# Tabular Classification on Titanic Dataset

* **Author:** Ethan Harris (ethan@pytorchlightning.ai)
* **License:** CC BY-SA
* **Generated:** 2023-01-03T14:26:37.616121

In this notebook, we'll go over the basics of lightning Flash by training a TabularClassifier on [Titanic Dataset](https://www.kaggle.com/c/titanic).


---
Open in [Open In Colab{height="20px" width="117px"}](https://colab.research.google.com/github/PytorchLightning/lightning-tutorials/blob/publication/.notebooks/flash_tutorials/tabular_classification.ipynb)

Give us a ⭐ [on Github](https://www.github.com/Lightning-AI/lightning/)
| Check out [the documentation](https://pytorch-lightning.readthedocs.io/en/stable/)
| Join us [on Slack](https://www.pytorchlightning.ai/community)

## Setup
This notebook requires some packages besides pytorch-lightning.

In [1]:
! pip install --quiet "ipython[notebook]" "pytorch-lightning>=1.4" "lightning-flash[tabular]>=0.6.0" "setuptools==59.5.0" "torch>=1.8" "torchmetrics>=0.7" "pytorch-lightning==1.3.6"

[31mERROR: Cannot install pytorch-lightning==1.3.6 and pytorch-lightning>=1.4 because these package versions have conflicting dependencies.[0m[31m
[0m[31mERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts[0m[31m
[0m

In this notebook, we'll go over the basics of lightning Flash by training a TabularClassifier on [Titanic Dataset](https://www.kaggle.com/c/titanic).

# Training

In [2]:

import flash
from flash.core.data.utils import download_data
from flash.tabular import TabularClassificationData, TabularClassifier

## Download the data
The data are downloaded from a URL, and save in a 'data' directory.

In [3]:
download_data("https://pl-flash-data.s3.amazonaws.com/titanic.zip", "data/")

data/titanic.zip:   0%|          | 0/28 [00:00<?, ?KB/s]

## Load the data
Flash Tasks have built-in DataModules that you can use to organize your data. Pass in a train, validation and test folders and Flash will take care of the rest.

Creates a TabularData relies on [Pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).

In [4]:
datamodule = TabularClassificationData.from_csv(
    ["Sex", "Age", "SibSp", "Parch", "Ticket", "Cabin", "Embarked"],
    ["Fare"],
    target_fields="Survived",
    train_file="./data/titanic/titanic.csv",
    test_file="./data/titanic/test.csv",
    val_split=0.25,
    batch_size=8,
)

  if await self.run_code(code, result, async_=asy):


## Build the model

Note: Categorical columns will be mapped to the embedding space. Embedding space is set of tensors to be trained associated to each categorical column.

In [5]:
model = TabularClassifier.from_data(datamodule)

Using 'tabnet' provided by manujosephv/PyTorch Tabular (https://github.com/manujosephv/pytorch_tabular).


## Create the trainer. Run 10 times on data

In [6]:
trainer = flash.Trainer(max_epochs=10)

GPU available: True, used: False


TPU available: False, using: 0 TPU cores




## Train the model

In [7]:
trainer.fit(model, datamodule=datamodule)


  | Name          | Type                  | Params
--------------------------------------------------------
0 | train_metrics | ModuleDict            | 0     
1 | val_metrics   | ModuleDict            | 0     
2 | test_metrics  | ModuleDict            | 0     
3 | adapter       | PytorchTabularAdapter | 27.0 K
--------------------------------------------------------
27.0 K    Trainable params
0         Non-trainable params
27.0 K    Total params
0.108     Total estimated model params size (MB)


Validation sanity check: 0it [00:00, ?it/s]





Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

## Test model

In [8]:
trainer.test(model, datamodule=datamodule)



Testing: 0it [00:00, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'valid_accuracy': 0.4333333373069763, 'valid_loss': 0.7570610642433167}
--------------------------------------------------------------------------------


[{'valid_loss': 0.7570610642433167, 'valid_accuracy': 0.4333333373069763}]

## Save it!

In [9]:
trainer.save_checkpoint("tabular_classification_model.pt")

# Predicting
## Load the model from a checkpoint

`TabularClassifier.load_from_checkpoint` supports both url or local_path to a checkpoint. If provided with an url, the checkpoint will first be downloaded and laoded to re-create the model.

In [10]:
model = TabularClassifier.load_from_checkpoint(
    "https://flash-weights.s3.amazonaws.com/0.7.0/tabular_classification_model.pt"
)

Downloading: "https://flash-weights.s3.amazonaws.com/0.7.0/tabular_classification_model.pt" to /home/AzDevOps_azpcontainer/.cache/torch/hub/checkpoints/tabular_classification_model.pt


  0%|          | 0.00/3.69M [00:00<?, ?B/s]

Using 'fttransformer' provided by manujosephv/PyTorch Tabular (https://github.com/manujosephv/pytorch_tabular).


## Generate predictions from a sheet file! Who would survive?

`TabularClassifier.predict` support both DataFrame and path to `.csv` file.

In [11]:
datamodule = TabularClassificationData.from_csv(
    predict_file="data/titanic/titanic.csv",
    parameters=datamodule.parameters,
    batch_size=8,
)
predictions = trainer.predict(model, datamodule=datamodule)
print(predictions)



Predicting: 75it [00:00, ?it/s]

[[tensor([1.0370, 0.2658]), tensor([ 1.3271, -1.1785]), tensor([ 1.5275, -0.0543]), tensor([ 1.9531, -0.6869]), tensor([0.3367, 0.0766]), tensor([ 1.0592, -0.0227]), tensor([1.4075, 0.2209]), tensor([1.7006, 0.0568])], [tensor([ 1.7244, -0.4591]), tensor([ 2.4998, -0.0469]), tensor([1.2143, 0.5040]), tensor([ 0.6133, -0.5891]), tensor([1.9369, 0.0949]), tensor([ 1.5767, -0.3554]), tensor([1.6089, 0.1781]), tensor([ 1.9634, -0.1353])], [tensor([1.0589, 0.9602]), tensor([ 0.9735, -0.0896]), tensor([0.9249, 0.2066]), tensor([ 0.8503, -0.4779]), tensor([ 1.8155, -0.8587]), tensor([ 1.7981, -0.5121]), tensor([ 1.9315, -0.3505]), tensor([ 1.4336, -0.4495])], [tensor([0.0940, 0.4807]), tensor([ 1.2381, -0.3445]), tensor([ 2.1075, -0.1645]), tensor([0.9662, 0.6035]), tensor([ 1.0062, -0.2820]), tensor([ 0.0406, -0.4682]), tensor([ 0.8612, -0.0846]), tensor([ 1.3461, -0.1367])], [tensor([ 2.0188, -0.5369]), tensor([ 1.3262, -0.7182]), tensor([ 1.2811, -0.1661]), tensor([1.3779, 0.5776]), tensor

## Congratulations - Time to Join the Community!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the Lightning
movement, you can do so in the following ways!

### Star [Lightning](https://github.com/Lightning-AI/lightning) on GitHub
The easiest way to help our community is just by starring the GitHub repos! This helps raise awareness of the cool
tools we're building.

### Join our [Slack](https://www.pytorchlightning.ai/community)!
The best way to keep up to date on the latest advancements is to join our community! Make sure to introduce yourself
and share your interests in `#general` channel


### Contributions !
The best way to contribute to our community is to become a code contributor! At any time you can go to
[Lightning](https://github.com/Lightning-AI/lightning) or [Bolt](https://github.com/Lightning-AI/lightning-bolts)
GitHub Issues page and filter for "good first issue".

* [Lightning good first issue](https://github.com/Lightning-AI/lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)
* [Bolt good first issue](https://github.com/Lightning-AI/lightning-bolts/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)
* You can also contribute your own notebooks with useful examples !

### Great thanks from the entire Pytorch Lightning Team for your interest !

[Pytorch Lightning{height="60px" width="240px"}](https://pytorchlightning.ai)