# Titanic Dataset Example 🚢

This example demonstrates how to fetch the Titanic dataset using **Scikit-learn** and explore it. We will use the dataset from OpenML.

[OpenML](https://openml.org) is an open, collaborative platform for machine learning. It provides access to datasets, algorithms, and experiments, making it easy to share and reuse data and models. By promoting transparency and reproducibility, OpenML accelerates discovery and supports open science in the field of machine learning.

In [10]:
from pprint import pprint
from sklearn.datasets import fetch_openml

# Fetch the Titanic dataset
titanic = fetch_openml("titanic", version=1, as_frame=True)

# Print details about the dataset
pprint(titanic.details)

{'default_target_attribute': 'survived',
 'description_version': '9',
 'file_id': '16826755',
 'format': 'ARFF',
 'id': '40945',
 'licence': 'Public',
 'md5_checksum': '60ac7205eee0ba5045c90b3bba95b1c4',
 'minio_url': 'https://openml1.win.tue.nl/datasets/0004/40945/dataset_40945.pq',
 'name': 'Titanic',
 'parquet_url': 'https://openml1.win.tue.nl/datasets/0004/40945/dataset_40945.pq',
 'processing_date': '2018-10-04 07:19:36',
 'status': 'active',
 'tag': ['Computational Universe', 'Manufacturing', 'text_data'],
 'upload_date': '2017-10-16T01:17:36',
 'url': 'https://api.openml.org/data/v1/download/16826755/Titanic.arff',
 'version': '1',
 'visibility': 'public'}


In [11]:
# Initialize the dataframe
df = titanic.data

# Print the first 5 rows
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 13 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   pclass     1309 non-null   int64   
 1   name       1309 non-null   object  
 2   sex        1309 non-null   category
 3   age        1046 non-null   float64 
 4   sibsp      1309 non-null   int64   
 5   parch      1309 non-null   int64   
 6   ticket     1309 non-null   object  
 7   fare       1308 non-null   float64 
 8   cabin      295 non-null    object  
 9   embarked   1307 non-null   category
 10  boat       486 non-null    object  
 11  body       121 non-null    float64 
 12  home.dest  745 non-null    object  
dtypes: category(2), float64(3), int64(3), object(5)
memory usage: 115.4+ KB
