# Performance improvement for SQL (analysis)
1. Add proper indexes
   ```sql
   CREATE INDEX idx_customer_id ON orders(customer_id);
   ```
2. Avoid SELECT *
3. Use JOINs efficiently
   Always join on indexed columns and avoid unnecessary JOINs.
4. Filter early using WHERE
   Push filters down so the database processes fewer rows.
5. Use LIMIT for large tables


# view vs materialized view vs table
## Table  
- A physical storage of data. Real rows exist on disk.
- Use case: store actual business data.

## View
- A saved SQL query. No data stored; computed every time I query it.  
- Use case: simplify complex queries; always shows fresh data
```sql
CREATE VIEW high_value_orders AS
SELECT * FROM orders WHERE amount > 1000;
```

## Materialized View
- A view that stores the query result physically. Faster than a view.
- Must be refreshed manually or on a schedule.
- Use case: heavy aggregation queries that **donâ€™t need real-time freshness**.
```sql
CREATE MATERIALIZED VIEW daily_sales AS
SELECT date, SUM(amount) 
FROM orders 
GROUP BY date;
```


# NoSQL database types and use cases
![image.png](attachment:image.png)


# ACID vs CAP theorem
## ACID (mostly SQL databases)
- Guarantees safe, stable transactions.
	- tomicity â€“ all or nothing
	- **C**onsistency â€“ data always valid
	- **I**solation â€“ transactions donâ€™t affect each other
	- **D**urability â€“ changes survive crashes

- Example: bank transfer must complete fully or not at all.

## CAP Theorem (distributed systems NoSQL)
- A distributed system can only guarantee 2 out of 3:
	- **C**onsistency â€“ everyone sees the same data
	- **A**vailability â€“ system always responds
	- **P**artition-tolerance â€“ system works even if network splits

- Example:
	- **CP** (MongoDB in strong mode): consistent but maybe not always available
	- **AP** (Cassandra): always available, but might return slightly old data



# SQL vs NoSQL decision criteria

**Choose SQL when**:
	â€¢	I need ACID transactions
	â€¢	Data has a fixed schema
	â€¢	I use complex JOINs
	â€¢	Data consistency is important
Example: banking, inventory, ERP systems

**Choose NoSQL when:**
	â€¢	Schema is flexible
	â€¢	I need fast reads/writes at scale
	â€¢	I store huge volumes of unstructured data
	â€¢	I can tolerate eventual consistency
Example: IoT data, social media posts, logs


# Document databases (MongoDB)
MongoDB stores data as BSON documents (similar to JSON).  
Each document can have different fields, so flexible schema.
```json
{
  "name": "Yannis",
  "skills": ["Python", "SQL"],
  "age": 25
}
```
Good use cases:
	- User profiles
	- Product catalogs
	- Log/event data
	- Flexibly structured data
# Key-value stores (Redis)
Redis stores data as simple key â†’ value pairs.
```text
user123 â†’ {"name": "Chengyang", "score": 99}
session_token â†’ "af3k9x92"
```
Good use cases:
	- Caching (fastest reads)
	- Session storage
	- Rate limiting
	- Message queues (using Redis streams)