
## Agenda 
- Setup + Python mindset
- Python essentials (variables → lists/dicts → control flow → functions)
-  Kaggle + pandas mini‑EDA (Titanic)
-  SQLite basics (SELECT, WHERE, GROUP BY, JOIN) with `sqlite3`

> If running on **Kaggle**: click **“Add data”** and search for **Titanic** (Kaggle’s classic dataset).  
> The typical path will be `/kaggle/input/titanic/train.csv`.  
> This notebook also includes smart fallbacks if you’re local.


In [1]:

# 🧪 Environment check: Are we on Kaggle?
import os, sys, platform, pathlib, textwrap

IN_KAGGLE = "KAGGLE_KERNEL_RUN_TYPE" in os.environ
print("Running on Kaggle:", IN_KAGGLE)
print("Python:", sys.version.split()[0], "| OS:", platform.system())

# A helper to pretty‑print section headers in output:
def banner(title):
    print("\n" + "="*len(title))
    print(title)
    print("="*len(title))


Running on Kaggle: True
Python: 3.11.13 | OS: Linux



## 1) Python in 90 seconds 

- Python is **batteries‑included**: a huge standard library + a vast ecosystem.
- You write **readable** code first; cleverness is optional.
- Lists/dicts are your daily drivers; functions keep code tidy.
- Jupyter notebooks let you **iterate**, **annotate**, and **teach** all in one place.





### Variables & Types (demo)

Common built‑ins you’ll meet everywhere:
- `int`, `float`, `str`, `bool`
- Containers: `list`, `dict`, `tuple`, `set`


In [2]:

banner("Variables & Types")
age = 52               # int
temperature = 72.5     # float
name = "Ada"           # str
is_cool = True         # bool

nums = [3, 1, 4, 1, 5]                         # list
person = {"first": "Ada", "last": "Lovelace"}  # dict
coords = (37.77, -122.42)                      # tuple
unique = {1, 2, 2, 3}                          # set → {1,2,3}

print(type(age), type(temperature), type(name), type(is_cool))
print(nums, person, coords, unique)

# Quick conversions / f-strings:
print(f"{name} is {age} years old. Next year: {age + 1}.")



Variables & Types
<class 'int'> <class 'float'> <class 'str'> <class 'bool'>
[3, 1, 4, 1, 5] {'first': 'Ada', 'last': 'Lovelace'} (37.77, -122.42) {1, 2, 3}
Ada is 52 years old. Next year: 53.



### Lists & Dicts


In [None]:

banner("Lists & Dicts")
nums = [10, 20, 30]
nums.append(40)
nums[1] = 25
print("nums:", nums, "| slice nums[1:]:", nums[1:])

grades = {"Alice": 92, "Bob": 85}
grades["Bob"] = 88
grades["Cara"] = 95
print("grades:", grades)
print("keys:", list(grades.keys()), "| values:", list(grades.values()))



### Control Flow 


In [None]:

banner("Control Flow")
x = 7
if x % 2 == 0:
    print("even")
else:
    print("odd")

# for‑loop & list comprehension
squares = []
for n in range(1, 6):
    squares.append(n*n)

squares_comp = [n*n for n in range(1, 6)]
print("squares:", squares, "| comprehension:", squares_comp)



### Functions


In [None]:

banner("Functions")
def greet(first, last="Coder"):
    '''Return a friendly greeting.'''
    return f"Hello, {first} {last}!"

print(greet("Ada", "Lovelace"))
print(greet("Grace"))



#### 🧪 Micro‑exercise A (2 minutes)
Write a function `only_evens(seq)` that returns a **new list** containing only the even numbers in `seq`.

<details>
<summary>Solution</summary>

```python
def only_evens(seq):
    return [x for x in seq if x % 2 == 0]

only_evens([1,2,3,4,5,6])
```
</details>



## 2) Kaggle + pandas mini‑EDA (Titanic) (≈15 min)

**Goal:** Load a real dataset, inspect it, do simple transforms, then answer a basic question.

**If on Kaggle:** Click **Add data** → search **Titanic** → add the dataset.  
Typical path: `/kaggle/input/titanic/train.csv`

This notebook will try, in order:
1. Kaggle Titanic path
2. Another common Kaggle path (`/kaggle/input/titanic/titanic.csv`)
3. Public URL (works locally if you have internet)
4. Tiny synthetic fallback so the lesson always runs


In [3]:

banner("Load Titanic (with smart fallbacks)")
from pathlib import Path
import pandas as pd

def load_titanic() -> pd.DataFrame:
    # 1) Kaggle default
    p1 = Path("/kaggle/input/titanic/train.csv")
    if p1.exists():
        return pd.read_csv(p1)
    # 2) Alternate path sometimes used
    p2 = Path("/kaggle/input/titanic/titanic.csv")
    if p2.exists():
        return pd.read_csv(p2)
    # 3) Public URL (works locally; likely blocked on Kaggle)
    try:
        return pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
    except Exception as e:
        print("URL load failed:", e)
    # 4) Tiny synthetic fallback
    import numpy as np
    df = pd.DataFrame({
        "Survived":[0,1,1,0,1,0,1,0],
        "Pclass":[3,1,1,3,2,3,2,1],
        "Sex":["male","female","female","male","female","male","female","male"],
        "Age":[22,38,26,35,27,54,14,40],
        "Fare":[7.25,71.2833,7.925,8.05,10.5,51.8625,14.4542,27.7208],
    })
    print("Using synthetic miniature dataset (8 rows). Add Kaggle Titanic for full experience.")
    return df

df = load_titanic()
print(df.shape)
df.head()



Load Titanic (with smart fallbacks)
The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.
(891, 12)


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S



### Quick look


In [4]:

banner("Inspect")
print(df.head(3))
print("\nColumns:", list(df.columns))
print("\nInfo:")
print(df.info())
print("\nDescribe (numeric):")
print(df.describe())



Inspect
   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  

Columns: ['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']

Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0


### Select → Filter → Sort


In [None]:

banner("Select/Filter/Sort")
subset = df[["Survived","Pclass","Sex","Age","Fare"]].copy()
adults = subset[subset["Age"] >= 18]
sorted_adults = adults.sort_values(["Pclass","Fare"], ascending=[True, False]).head(5)
sorted_adults



### New columns & GroupBy


In [None]:

banner("New columns & GroupBy")
import numpy as np

subset = df[["Survived","Pclass","Sex","Age","Fare"]].copy()
subset["is_minor"] = subset["Age"] < 18
rate_by_sex = subset.groupby("Sex")["Survived"].mean().reset_index(name="survival_rate")
rate_by_sex



### Quick plot (matplotlib)


In [None]:

banner("Plot survival rate by Sex")
import matplotlib.pyplot as plt

ax = rate_by_sex.plot(kind="bar", x="Sex", y="survival_rate", legend=False, title="Survival Rate by Sex")
ax.set_ylabel("Rate")
plt.show()



#### 🧪 Micro‑exercise B (3 minutes)
Compute survival rate by **class** (Pclass). Which class had the highest rate?

<details>
<summary>Solution</summary>

```python
rate_by_class = subset.groupby("Pclass")["Survived"].mean().reset_index(name="survival_rate")
rate_by_class.sort_values("survival_rate", ascending=False)
```
</details>



## 3) SQLite basics in‑notebook (≈10–12 min)

We’ll push a DataFrame into an **in‑memory** SQLite database, then query it with SQL.

**Why SQLite here?**
- Zero install (built into Python via `sqlite3`)
- SQL syntax you can reuse in Postgres/MySQL later
- Perfect for demos & small projects


In [5]:

banner("SQLite setup: create DB and load table")
import sqlite3
conn = sqlite3.connect(":memory:")  # or 'titanic.db' for a file
df_sql = df[["Survived","Pclass","Sex","Age","Fare"]].copy()
df_sql.to_sql("titanic", conn, index=False, if_exists="replace")

# Create a tiny lookup table for a JOIN demo
import pandas as pd
class_lookup = pd.DataFrame({
    "Pclass":[1,2,3],
    "ClassName":["First","Second","Third"]
})
class_lookup.to_sql("class_lookup", conn, index=False, if_exists="replace")

print("Tables ready: titanic, class_lookup")



SQLite setup: create DB and load table
Tables ready: titanic, class_lookup



### SELECT, WHERE, ORDER BY, LIMIT


In [6]:

banner("SQL: basic querying")
import pandas as pd

q1 = '''
SELECT Survived, Pclass, Sex, Age, Fare
FROM titanic
WHERE Age >= 18
ORDER BY Pclass ASC, Fare DESC
LIMIT 5;
'''
pd.read_sql_query(q1, conn)



SQL: basic querying


Unnamed: 0,Survived,Pclass,Sex,Age,Fare
0,1,1,female,35.0,512.3292
1,1,1,male,36.0,512.3292
2,1,1,male,35.0,512.3292
3,0,1,male,19.0,263.0
4,1,1,female,23.0,263.0



### Aggregations & GROUP BY


In [7]:

banner("SQL: GROUP BY")
q2 = '''
SELECT Sex, AVG(Survived) AS survival_rate, COUNT(*) AS n
FROM titanic
GROUP BY Sex
ORDER BY survival_rate DESC;
'''
pd.read_sql_query(q2, conn)



SQL: GROUP BY


Unnamed: 0,Sex,survival_rate,n
0,female,0.742038,314
1,male,0.188908,577



### JOIN example


In [None]:

banner("SQL: JOIN with lookup")
q3 = '''
SELECT c.ClassName, AVG(t.Survived) AS survival_rate, COUNT(*) AS n
FROM titanic t
JOIN class_lookup c ON t.Pclass = c.Pclass
GROUP BY c.ClassName
ORDER BY survival_rate DESC;
'''
pd.read_sql_query(q3, conn)



### Parameterized queries (avoid SQL injection)


In [None]:

banner("SQL: parameterized WHERE")
min_age = 30
q4 = '''
SELECT Sex, AVG(Survived) AS survival_rate, COUNT(*) AS n
FROM titanic
WHERE Age >= ?
GROUP BY Sex
ORDER BY survival_rate DESC;
'''
pd.read_sql_query(q4, conn, params=(min_age,))



### SQL vs pandas: same question, two ways

**Question:** Survival rate by Sex and Class (sorted by rate desc).  
Try both and compare results.


In [None]:

banner("SQL answer")
q5 = '''
SELECT Sex, Pclass, AVG(Survived) AS survival_rate, COUNT(*) AS n
FROM titanic
GROUP BY Sex, Pclass
ORDER BY survival_rate DESC;
'''
sql_answer = pd.read_sql_query(q5, conn)
sql_answer


In [None]:

banner("pandas answer")
pd_answer = (
    df_sql
    .groupby(["Sex","Pclass"])["Survived"]
    .mean()
    .reset_index(name="survival_rate")
    .sort_values("survival_rate", ascending=False)
)
pd_answer



#### 🧪 Micro‑exercise C (3 minutes)
Using **SQL**, compute survival rate by **(ClassName, Sex)** and show the **top 5** rows by survival_rate.

<details>
<summary>Solution</summary>

```sql
SELECT c.ClassName, t.Sex, AVG(t.Survived) AS survival_rate, COUNT(*) AS n
FROM titanic t
JOIN class_lookup c ON t.Pclass = c.Pclass
GROUP BY c.ClassName, t.Sex
ORDER BY survival_rate DESC
LIMIT 5;
```
</details>



## Wrap‑Up & Next Steps

- **Python:** variables, lists/dicts, control flow, functions
- **pandas + Kaggle:** load CSV, filter/sort, groupby, quick plots
- **SQLite:** SELECT/WHERE, GROUP BY, JOIN, parameters

**Practice :**
- - Try another Kaggle dataset and repeat the flow.

