# 📜 IBM Data Science Professional Certificate  
*Curiosity to Capability — One Notebook at a Time*

---

**Compiled and Authored by:**  
**Partho Sarothi Das**  
Dhaka, Bangladesh  
🎓 Bachelor's & Master's in Statistics  
💼 Investment Banking Professional → Aspiring Data Scientist  

>**Disclaimer:** This notebook is based on content from the [IBM Data Science Professional Certificate](https://www.coursera.org/professional-certificates/ibm-data-science) offered on Coursera. It is intended for personal learning and review purposes.

---
---

# 🐍 Accessing Databases with Python – Summary

### Why Use Python for Databases?

* Simple syntax, open-source, cross-platform
* Rich ecosystem for data science: **NumPy, Pandas, Matplotlib, SciPy**
* Easily integrates with databases via **Python DB API**

---

### Python DB API (Database API)

* A **standard interface** for connecting Python applications to relational databases
* Uses **API calls** to:

  * Connect to the database
  * Send SQL queries
  * Retrieve results and status
  * Disconnect from the database

---

### Popular SQL APIs for Python & Other Languages

| Database       | API Used      |
| -------------- | ------------- |
| PostgreSQL     | `psycopg2`    |
| IBM DB2        | `ibm_db`      |
| SQL Server     | `dblib`       |
| MySQL          | `MySQL C API` |
| Oracle         | `OCI`         |
| Java DB Access | `JDBC`        |
| Windows Access | `ODBC`        |

---

### Jupyter Notebooks for Database Work

* Web-based environment for writing & executing Python
* Supports **live code, text, equations, visualizations**
* Widely used in data science
* Supports 40+ languages: **Python, R, Julia, Scala**
* Integration with tools like **Spark, TensorFlow, scikit-learn**

---

### How Database Access Works in Python

1. **Connect** using API
2. **Build SQL** query as a text string
3. **Execute** via API function
4. **Fetch results** and handle errors
5. **Close connection**

---

### Key Takeaways

* Python + SQL = powerful combination for data handling
* Python DB API standardizes the way you interact with relational databases
* Jupyter notebooks are ideal for mixing code, output, and explanation
* Each DBMS has a specific library or API to connect from Python

---

# Python DB-API: Writing Code to Access Databases

### What is the DB-API?

* The **DB-API** is Python’s **standard interface** for connecting to relational databases.
* It allows writing database code that is **portable** across multiple types of databases (e.g., MySQL, PostgreSQL, DB2).
* Encourages **consistency and simplicity**, making it easier to learn and apply across different database systems.

---

### Popular Python Database Libraries

| Database   | DB-API Library             |
| ---------- | -------------------------- |
| IBM DB2    | `ibm_db`                   |
| MySQL      | `mysql-connector-python`   |
| PostgreSQL | `psycopg2`                 |
| MongoDB    | `pymongo` *(Note: No SQL)* |

---

### Core Concepts in DB-API

#### Connection Object

* Created using the `connect()` function from a DB library.
* Manages the connection session and provides access to:

  * `.cursor()` – to create a **cursor object**
  * `.commit()` – to save changes
  * `.rollback()` – to undo changes
  * `.close()` – to terminate the connection

#### Cursor Object

* Used to **run queries** and **fetch results**
* Acts like a **file handle** — keeps track of your current position in a result set
* Changes made by one cursor on a connection are immediately visible to other cursors on the same connection
* Cursors created on **different connections** may or may not be isolated

---

### Basic Steps in Python DB Access

```python
import db_module  # e.g., import ibm_db or psycopg2

# Step 1: Connect to database
conn = db_module.connect(database="dbname", user="username", password="password")

# Step 2: Create a cursor object
cursor = conn.cursor()

# Step 3: Execute SQL query
cursor.execute("SELECT * FROM tablename")

# Step 4: Fetch results
results = cursor.fetchall()

# Step 5: Close connection
conn.close()
```

---

### Best Practice

* **Always close** database connections when you're done to free up resources.

---

### Key Takeaways

* Python’s **DB-API** provides a **unified and consistent way** to interact with many different relational databases.
* Use **connection** and **cursor** objects to manage interactions and results.
* The **cursor** is central to executing queries and processing results in Python.
* Clean up: always **close connections and cursors** to prevent resource leaks.

---

# Accessing databases using SQL Magic** in Jupyter Notebooks:

### What are Magic Statements in Jupyter Notebooks?

* Magic commands/functions provide special functionalities in Jupyter Notebooks.
* They are not valid Python code but extend the notebook’s functionality.

### Types of Magic Statements:

1. Line Magics (`%`): Operate on a single line.

   * Examples:

     * `%pwd`: Print current directory
     * `%ls`: List files
     * `%history`: Show command history
     * `%reset`: Clear variables
     * `%who`, `%whos`: List variables
     * `%matplotlib inline`: Show plots in notebook
     * `%timeit`: Time execution of a single statement

2. **Cell Magics** (`%%`): Operate on **multiple lines/cells**.

   * Can run code in different languages.
   * Examples:

     * `%%timeit`: Time the execution of the entire cell
     * `%%writefile filename.txt`: Save cell content to a file
     * `%%html`, `%%javascript`, `%%bash`: Run code in HTML, JavaScript, or Bash

---

#### Connect to a Database:

* Example (for SQLite):

  ```python
  %sql sqlite:///HR.db
  ```

#### Run Queries:

* **Line Magic Example**:

  ```python
  %sql SELECT * FROM employees;
  ```
* **Cell Magic Example**:

  ```python
  %%sql
  SELECT department, COUNT(*) 
  FROM employees 
  GROUP BY department;
  ```

---

### What You Learned:

* The concept of **magic statements**
* Difference between **line magic** and **cell magic**
* Popular magic commands in Jupyter
* How to **install and use SQL Magic** to run SQL queries directly in notebooks

---

# Analyzing data with Python using the McDonald’s nutrition dataset:

---

### Objectives of the Lesson:

After this lesson, you will be able to:

* Perform **Exploratory Data Analysis (EDA)** using **Pandas**
* Use **SQLite3** with Python for data storage and retrieval
* **Visualize data** using **Seaborn** to detect patterns and gain insights

---

### Dataset Overview:

* Data source: **McDonald’s Menu Nutritional Facts** from **Kaggle**
* Covers 260 menu items with nutritional information like fat, sodium, protein, and sugar

---

### Working with SQLite3 and Pandas:

1. Read CSV File:

   ```python
   data = pd.read_csv("mcdonalds.csv")
   ```

2. Connect to SQLite Database:

   ```python
   conn = sqlite3.connect("mcdonalds.db")
   ```

3. Store DataFrame to SQL Table:

   ```python
   data.to_sql("mcdonalds_nutrition", conn)
   ```

4. Query Data with Pandas:

   ```python
   df = pd.read_sql("SELECT * FROM mcdonalds_nutrition", conn)
   ```

---

### Data Exploration Using Pandas:

* Use `df.head()` to preview data
* Use `df.describe()` to get summary statistics
* 260 food items in 9 categories
* Example insight: Max fat content is **118g**

---

### Case Study: Analyzing Sodium Content

* Sodium intake should be < 2,000mg/day
* Use `df['Sodium'].describe()` → max sodium = **3600mg**
* Use `df['Sodium'].idxmax()` to get index of max sodium
* Retrieve item using `.at[]` → Highest sodium: **Chicken McNuggets (40 pieces)**

---

### Data Visualization with Seaborn:

#### 1. **Swarm Plot** for Sodium:

```python
sns.swarmplot(x="Category", y="Sodium", data=df)
```

* Displays sodium content across food categories
* Highlights high-sodium items visually

#### 2. Scatter Plot (Joint Plot)** for Protein vs. Fat:

```python
sns.jointplot(x="Protein", y="Total Fat", data=df)
```

* Shows **positive correlation** (Pearson’s r = 0.81)
* Highlights potential **outliers**

#### 3. Box Plot** for Sugar Content:

```python
sns.boxplot(y="Sugars", data=df)
```

* Shows median sugar \~30g
* Identifies **outliers**, with some values near **128g**

---

### Key Takeaways:

* **SQLite3** is a serverless SQL engine perfect for small projects
* **Pandas** can load, store, and query SQL data easily
* **Seaborn** offers powerful tools like swarm plots, joint plots, and box plots for EDA
* Visualizations reveal trends, correlations, and outliers effectively

---