# Dataset Project Ideas

This notebook introduces several datasets and proposes project ideas to practice `pandas` and machine learning techniques.

## Titanic Survival Dataset
[Kaggle link](https://www.kaggle.com/c/titanic)

The Titanic dataset contains information about passengers. It's useful for practicing classification problems like predicting who survived.

In [None]:
import pandas as pd

titanic_url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
titanic = pd.read_csv(titanic_url)
titanic.head()

**Project ideas**
- Handle missing values and engineer new features (e.g., family size).
- Train a model to predict the `Survived` column.
- Compare algorithms such as logistic regression and random forests.

## Wine Quality Dataset
[UCI link](https://archive.ics.uci.edu/ml/datasets/wine+quality)

This dataset contains physicochemical properties of wine samples along with quality scores.

In [None]:
import pandas as pd

wine_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
wine = pd.read_csv(wine_url, sep=';')
wine.head()

**Project ideas**
- Explore correlations between features and quality.
- Build a regression model to predict the quality score.
- Turn quality into categories (good/bad) for classification.

## Penguins Dataset
[Palmer Penguins](https://github.com/allisonhorst/palmerpenguins)

Measurements for three penguin species provide a clean dataset for classification.

In [None]:
import pandas as pd

penguin_url = 'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv'
penguins = pd.read_csv(penguin_url)
penguins.head()

**Project ideas**
- Drop rows with missing measurements and visualize feature distributions.
- Train a classifier to predict `species`.
- Try dimensionality reduction for visualization.