# Exercises XP Gold: W2_D1

## What We Will Learn
- Basic Python programming in Google Colab and Jupyter Notebook
- Accessing and inspecting datasets (e.g., from Kaggle)
- Basic data analysis techniques with Pandas/Numpy
- Creating and manipulating data programmatically

## What You Will Create
- Python scripts in Google Colab/Jupyter
- Outputs displaying dataset summaries and basic statistics

---

## Exercise 1: Basic Python Programming with Google Colab
- Create a new Colab notebook.
- Write a Python script to check whether a given number is **prime** (e.g., test `7`).

## Exercise 2: List Comprehensions with Jupyter Notebook
- Launch Jupyter Notebook locally and create a new notebook.
- Using a **list comprehension**, find all numbers in a range (e.g., `1..20`) that are **divisible by a given number** (e.g., `3`).

## Exercise 3: Kaggle Dataset Exploration — Print Dataset Information
- Choose a dataset **other than Titanic** (e.g., Iris).
- Load the dataset and print:
  - `head()` (first rows)
  - the list of **columns**
  - **data types** of each column

## Exercise 4: Basic Data Analysis with Kaggle
- Select a simple dataset (e.g., Boston Housing).
- Compute the **mean**, **median**, and **standard deviation** of a chosen **numeric column** and print the results.

## Exercise 5: Transform Qualitative → Quantitative (Titanic)
- Download and load the **Titanic** dataset (e.g., from Kaggle).
- Transform the `sex` column using `.map` so that:
  - `female → 0`
  - `male → 1`
- Show the transformed column and basic value counts.



## Exercise 1 — Prime check (example with 7)

In [1]:
# Title: Prime number check (with sqrt optimization)
# This function returns True if n is prime, else False. Includes quick tests.

import math

def is_prime(n: int) -> bool:
    """Return True if n is a prime number, else False."""
    if n <= 1:
        return False
    if n <= 3:
        return True
    if n % 2 == 0 or n % 3 == 0:
        return False
    limit = int(math.isqrt(n))
    i = 5
    while i <= limit:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True

# Example: check if 7 is prime
n = 7
print(f"{n} is prime? {is_prime(n)}")

7 is prime? True


## Exercise 2 — List comprehension (divisible numbers)

In [2]:
# Title: List comprehension for numbers divisible by a given divisor
# Find all numbers in a given inclusive range that are divisible by 'divisor'.

start, end, divisor = 1, 20, 3  # change as needed
divisible = [x for x in range(start, end + 1) if x % divisor == 0]
print(f"Numbers in [{start}, {end}] divisible by {divisor}: {divisible}")

Numbers in [1, 20] divisible by 3: [3, 6, 9, 12, 15, 18]


## Exercise 3 — Kaggle dataset exploration (head, columns, dtypes)

In [3]:
# Title: Dataset exploration — head, columns, dtypes (Kaggle-like)
# Option A: set csv_path to your downloaded Kaggle CSV file path.
# Option B: leave csv_path=None to demo with seaborn's built-in "iris" (no download).

import pandas as pd

csv_path = None  # e.g., "/content/iris.csv" or "path/to/your.csv"

try:
    if csv_path:
        df = pd.read_csv(csv_path)
        source = f"Loaded from CSV: {csv_path}"
    else:
        import seaborn as sns
        df = sns.load_dataset("iris")
        source = "Loaded seaborn iris (demo fallback)"
except Exception as e:
    raise RuntimeError(f"Failed to load dataset: {e}")

print(source)
print("\n--- HEAD ---")
print(df.head())
print("\n--- COLUMNS ---")
print(list(df.columns))
print("\n--- DTYPES ---")
print(df.dtypes)

Loaded seaborn iris (demo fallback)

--- HEAD ---
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

--- COLUMNS ---
['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

--- DTYPES ---
sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
species          object
dtype: object


## Exercise 4 — Basic data analysis (mean, median, std of a column)

In [4]:
# Title: Basic stats — mean, median, std for a chosen numeric column
# Works with any DataFrame 'df' (reuse df from previous cell). If no column specified,
# pick the first numeric column automatically.

import numpy as np

# Optional: set a column name explicitly (string). If None, auto-detect the first numeric one.
column_name = None  # e.g., "sepal_length"

# Ensure df exists (from Exercise 3). Otherwise, quickly demo with seaborn iris.
try:
    df
except NameError:
    import seaborn as sns
    df = sns.load_dataset("iris")

# Pick column
if column_name is None:
    numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
    if not numeric_cols:
        raise ValueError("No numeric columns found.")
    column_name = numeric_cols[0]

col = df[column_name].dropna()
mean_val = col.mean()
median_val = col.median()
std_val = col.std(ddof=1)  # sample std

print(f"Column: {column_name}")
print(f"Mean:   {mean_val:.4f}")
print(f"Median: {median_val:.4f}")
print(f"Std:    {std_val:.4f}")

Column: sepal_length
Mean:   5.8433
Median: 5.8000
Std:    0.8281


## Exercise 5 — Titanic: map sex → 0 (female), 1 (male)

In [5]:
# Title: Titanic — map 'sex' to 0/1 with .map
# Option A: set titanic_csv to your Kaggle Titanic CSV path (train.csv).
# Option B: leave titanic_csv=None to demo with seaborn's titanic dataset (similar schema).

import pandas as pd

titanic_csv = None  # e.g., "/content/train.csv"

try:
    if titanic_csv:
        titanic = pd.read_csv(titanic_csv)
        source = f"Kaggle CSV: {titanic_csv}"
    else:
        import seaborn as sns
        titanic = sns.load_dataset("titanic")
        source = "seaborn titanic (demo fallback)"
except Exception as e:
    raise RuntimeError(f"Failed to load Titanic dataset: {e}")

print("Source:", source)
print("Columns:", list(titanic.columns))

# Handle common casing: Kaggle uses 'Sex', seaborn uses 'sex'
sex_col = "sex" if "sex" in titanic.columns else ("Sex" if "Sex" in titanic.columns else None)
if sex_col is None:
    raise KeyError("No 'sex' or 'Sex' column found in the dataset.")

titanic["sex_num"] = titanic[sex_col].map({"female": 0, "male": 1, "Female": 0, "Male": 1})

print("\nValue counts for mapped column 'sex_num':")
print(titanic["sex_num"].value_counts(dropna=False))

print("\nPreview:")
print(titanic[[sex_col, "sex_num"]].head(10))

Source: seaborn titanic (demo fallback)
Columns: ['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town', 'alive', 'alone']

Value counts for mapped column 'sex_num':
sex_num
1    577
0    314
Name: count, dtype: int64

Preview:
      sex  sex_num
0    male        1
1  female        0
2  female        0
3  female        0
4    male        1
5    male        1
6    male        1
7    male        1
8  female        0
9  female        0


## Conclusion

In this set of exercises I:

- Implemented a **prime-check** function and practiced control flow.
- Used a **list comprehension** to filter numbers by divisibility.
- Loaded a real-world dataset and reported its **head, columns, and dtypes**.
- Calculated **mean**, **median**, and **standard deviation** for a numeric column using Pandas/Numpy.
- Converted a **categorical** feature (`sex`) into a **numeric** representation with `.map`, preparing the data for downstream analysis or modeling.

These tasks reinforced practical data-wrangling skills in notebooks and established a base for more advanced EDA and ML workflows.
