# Optional Notebook – Kaggle + Spaceship Titanic + YData Profiling

**Timebox:** 1 hour

This notebook guides you through: Kaggle intro → manual dataset download → quick EDA → YData Profiling.

## 0) Kaggle Intro (10m)
**Goal:** get familiar with Kaggle's key areas and locate the Spaceship Titanic data.

1. **Create an account**  
   - Go to **kaggle.com** → **Sign In / Create Account** → verify email.  
   - (Optional) Complete your **profile** (bio, location, skills).

2. **Check *Courses***  
   - Top menu → **Courses** (learn tab).  
   - Suggested quick starts: *Intro to Programming*, *Intro to ML*, *Pandas*, *Data Visualization*.

3. **Check *Datasets***  
   - Top menu → **Datasets**. Use the **search bar** and **filters** (file types, topics, sizes).  
   - Open a dataset page to see: description, files, discussions, licenses.  
   - Click **Save/Star** to bookmark for later.

4. **Check *Competitions***  
   - Top menu → **Competitions**; look at *Getting Started* / *Beginner* competitions.  
   - Each competition has **Overview**, **Data**, **Code**, **Discussion**, **Leaderboard**, **Rules** tabs.

5. **Find "Spaceship Titanic"**  
   - Use the global search and open the **Spaceship Titanic** **competition** page (the dataset lives under the *Data* tab there).  
   - Read the **Overview** and **Evaluation** briefly, then go to the **Data** tab.

## 1) Download the data **manually** (10m)
**On the competition page → `Data` tab:**
- Accept the **Rules** if prompted.  
- Download: **`train.csv`**, **`test.csv`**, and **`sample_submission.csv`**.  

**Upload to this notebook environment:**
- In Colab: use the cell below to upload the three files.  
- Locally/Jupyter: place the files inside a folder next to this notebook.

### (Optional) Kaggle CLI path (skip if you already uploaded files)
If you prefer, you can install Kaggle CLI and download via API token.

In [None]:
# OPTIONAL: Kaggle CLI install & auth (requires kaggle.json)
# !pip -q install kaggle
# from google.colab import files
# files.upload()  # select kaggle.json
# import os, shutil
# os.makedirs(os.path.expanduser("~/.kaggle"), exist_ok=True)
# shutil.move("kaggle.json", os.path.expanduser("~/.kaggle/kaggle.json"))
# os.chmod(os.path.expanduser("~/.kaggle/kaggle.json"), 0o600)
# !mkdir -p data
# !kaggle competitions download -c spaceship-titanic -p data
# !unzip -o data/spaceship-titanic.zip -d data


## 2) Environment Setup (3m)
Install libraries needed for EDA and profiling.

In [1]:
!pip -q install ydata-profiling plotly pandas

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/62.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.1/400.1 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.5/296.5 kB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m679.7/679.7 kB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m37.3/37.3 MB[0m [31m49.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.4/105.4 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.3/43.3 kB[0m [31m3.3 MB/s[0m eta [36

## 3) Load Dataset & Sanity Check (7m)

## 4) Quick EDA (25m)

* Column types
* Missing values
* Unexpected values
* Duplicate rows
* Descriptive Stats
* Univariate vs Bivariate analysis

## 5) Full EDA with YData Profiling (25m)

In [2]:
from ydata_profiling import ProfileReport
import pandas as pd
df = pd.read_csv("train.csv")

profile = ProfileReport(df, title="Spaceship Titanic – EDA Profile", explorative=True)
profile.to_notebook_iframe()


Output hidden; open in https://colab.research.google.com to view.

In [3]:
profile.to_file("spaceship_titanic_profile.html")
print("Saved report to spaceship_titanic_profile.html")


Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

Saved report to spaceship_titanic_profile.html


## 6) Check Notebooks

Go to 'Spaceship Titanic' dataset page, chose `Code` tab. Then read a few notebooks and try to understand and learn their EDA techniques.