# 🎮 Pokémon Data & Analytics Platform

## Objective
Build a modern data engineering pipeline that extracts rich Pokémon data from the [PokeAPI](https://pokeapi.co/), processes and stores it in a scalable **Lakehouse architecture**, and enables complex analytical queries, ML insights, and interactive visualizations.

## Key Features / Workflow

### 1. Data Ingestion (ELT Pipeline)
- Periodically extract data from the PokeAPI.
- Extract hierarchical data:
  - Pokémon → Evolutions → Moves → Stats → Types.
- Store raw JSON data in a **data lake** (e.g., AWS S3, GCS, or local MinIO) or SQLite?

### 2. Data Transformation
- Use **Apache Spark** (PySpark) for transformation:
  - Flatten and normalize nested structures.
  - Create dimensional models:
    - `pokemon` (fact)
    - `types`, `abilities`, `moves` (dimensions)
  - Enrich data with external datasets (e.g., popularity, games, community rankings).

### 3. Data Storage (Lakehouse Architecture)
- Store processed data

### 4. Data Serving & Exploration
- Use tools like:
  - **Apache Superset** / **Metabase** / **Streamlit**
- Build visualizations and dashboards:
  - Top 10 strongest Pokémon by base stats.
  - Evolution treemaps.
  - Type effectiveness matrix.
  - Fun stats and comparisons.
- Interactive querying:
  - E.g., "List all Fire-type Pokémon with speed > 100 and special attack > 90".

### 5. Machine Learning & Graph Analytics (Bonus)
- **Clustering**: Use K-Means to group similar Pokémon based on stats.
- **Classification**: Predict battle outcomes using logistic regression or tree-based models.
- **Graph Analysis**:
  - Use **NetworkX** or **Neo4j** to explore evolution chains as directed graphs.

## Bonus Ideas
- **API Gateway**:
  - Build a FastAPI or GraphQL service layer on top of your Lakehouse data.
- **Streaming**:
  - Simulate real-time "wild Pokémon encounters" using **Kafka** + **Spark Structured Streaming**.
- **Leaderboard**:
  - Create a dynamic ranking of strongest Pokémon by an aggregated score (e.g., sum of normalized stats).
- **Data Versioning**:
  - Integrate **DVC** or **LakeFS** to version data and track evolution of your datasets over time.

In [None]:
import requests

url = "https://pokeapi.co/api/v2/pokemon/ditto"
response = requests.get(url, verify=False) # Self-signed SSL certificate error, so we disable verification
data = response.json()
print(data)