# Python Programming 2: Class 4
## Understanding Data – Definitions & Types

### 1. Introduction: Why Do We Need to Understand Data?

So far, we've learned how to read and write files (`.txt`, `.csv`, `.json`). But what is *inside* those files? It's **data**.

Understanding the *type* of data you are working with is the most important first step in any data-related task. It dictates:

1.  **How you store it:** Do you use an integer, a string, a list of dictionaries?
2.  **What you can do with it:** You can calculate the average of student heights, but you can't calculate the average of their names.
3.  **How you clean it:** Finding a negative age is an error, but finding a negative temperature might be valid.
4.  **How you visualize it:** You use a bar chart for categories but a line chart for data over time.

Think of it like being a chef. Before you can cook, you must understand your ingredients: Is it a vegetable or a fruit? Does it need to be chopped, boiled, or baked? Getting this wrong leads to a bad meal. In programming, getting data types wrong leads to bugs and bad analysis.

### 2. The Two Main Families: Structured vs. Unstructured Data

At the highest level, all data falls into one of two families.

#### 2.1 Structured Data

This is data that has a pre-defined model or organization. It's highly organized and easy to process. Think of a spreadsheet or a database table.

*   **Characteristics:** Follows a strict schema, fits neatly into rows and columns.
*   **Examples:**
    *   A CSV file of student records (`id, name, major`).
    *   A database of employees.
    *   A JSON object with consistent keys for every entry.
*   **Analogy:** A perfectly organized library where every book has a specific shelf and a catalog number.

#### 2.2 Unstructured Data

This is data that has no pre-defined model. It's the messy, free-form data that makes up about 80% of the world's data.

*   **Characteristics:** No inherent structure, does not fit into a row-column format.
*   **Examples:**
    *   The text of an email or a social media post.
    *   An image (`.jpg`) or a video file (`.mp4`).
    *   The audio from a phone call.
*   **Analogy:** A giant box of old family photographs and letters. There's rich information inside, but you have to sort through it manually to find anything specific.

### 3. Diving Deeper: Quantitative vs. Qualitative Data

Within structured data, we can classify individual data points further. This is the most common classification you will use.

#### 3.1 Quantitative Data (Numerical)

This data represents a **quantity** or a measurement. It answers questions like "how much?" or "how many?". You can perform mathematical operations on it.

Quantitative data is further split into two types:

**A) Discrete Data**
*   **Definition:** Can only take specific, separate values. It's **countable**.
*   **Examples:**
    *   The number of students in a class (you can't have 25.5 students).
    *   The number of cars in a parking lot.
    *   The score on a dice roll (1, 2, 3, 4, 5, or 6).

**B) Continuous Data**
*   **Definition:** Can take any value within a given range. It's **measurable**.
*   **Examples:**
    *   The height of a person (e.g., 175.5 cm).
    *   The temperature of a room (e.g., 21.7°C).
    *   The weight of an object.

#### 3.2 Qualitative Data (Categorical)

This data represents a **quality** or a characteristic. It describes things by putting them into groups or categories. You cannot perform mathematical operations on it.

*   **Definition:** Descriptive labels or names.
*   **Examples:**
    *   Eye color ('Blue', 'Green', 'Brown').
    *   A student's major ('Computer Science', 'Physics', 'History').
    *   Survey responses ('Yes', 'No', 'Maybe').
    *   A person's name or a country.

### 4. Python Representations of Data

Now, let's connect this theory back to Python. How do we store these different types of data in our code?

*   **Structured Data:** Often represented as a **List of Dictionaries** or a **List of Lists**. This is the foundation for libraries like Pandas.
    ```python
    # List of Dictionaries (very common for JSON data)
    structured_data_1 = [
        {'id': 101, 'name': 'Alice', 'score': 88},
        {'id': 102, 'name': 'Bob', 'score': 92}
    ]
    ```
*   **Unstructured Data:** Typically stored as a single **String** (`str`) for text, or handled by specialized libraries for images/audio.
    ```python
    unstructured_text = "The quick brown fox jumps over the lazy dog."
    ```
*   **Quantitative (Discrete):** Stored as an **Integer** (`int`).
    ```python
    num_students = 25
    ```
*   **Quantitative (Continuous):** Stored as a **Float** (`float`).
    ```python
    room_temperature = 21.7
    ```
*   **Qualitative (Categorical):** Stored as a **String** (`str`). A `set` is useful for finding unique categories.
    ```python
    major = "Computer Science"
    colors = ['Blue', 'Green', 'Blue', 'Brown']
    unique_colors = set(colors) # {'Blue', 'Green', 'Brown'}
    ```

---

### 5. Hands-On Lab Activity: Classify Real-Life Data

**Goal:** Apply our new knowledge to classify the data points within a typical real-world object: a product from an e-commerce website.

In [None]:
# Here is a Python dictionary representing a product for sale online.
product_data = {
    "product_name": "Wireless Noise-Cancelling Headphones",
    "brand": "AudioPhile",
    "price_usd": 249.99,
    "stock_quantity": 150,
    "is_on_sale": True,
    "color_options": ["Black", "White", "Blue"],
    "customer_reviews": [
        {"user": "user123", "rating": 5, "comment": "Amazing sound quality!"},
        {"user": "user456", "rating": 4, "comment": "Good, but battery could be better."}
    ],
    "technical_specs": {
        "weight_grams": 225.5,
        "bluetooth_version": "5.2"
    }
}

#### Your Task

For each key in the `product_data` dictionary, classify the data and identify its Python representation. Fill in the blanks below (you can do this in a markdown cell or just write it on paper).

1.  **`product_name`**: "Wireless Noise-Cancelling Headphones"
    *   Data Type: `_________________` (Qualitative / Quantitative)
    *   Python Representation: `_________`

2.  **`price_usd`**: 249.99
    *   Data Type: `_________________` (Qualitative / Quantitative)
    *   Sub-Type: `_________________` (Discrete / Continuous)
    *   Python Representation: `_________`

3.  **`stock_quantity`**: 150
    *   Data Type: `_________________`
    *   Sub-Type: `_________________`
    *   Python Representation: `_________`

4.  **`color_options`**: ["Black", "White", "Blue"]
    *   The overall structure is a list of what data type? `_________________`
    *   Python Representation: `_________`

5.  **`customer_reviews`**: (The list of dictionaries)
    *   This is an example of what kind of data family? `_________________` (Structured / Unstructured)
    *   Python Representation: `_________`

6.  **`bluetooth_version`** (inside `technical_specs`): "5.2"
    *   *Tricky Question!* Based on its Python type (`str`), how should we treat this data? `_________________`

#### Lab Solution (Discussion)

1.  **`product_name`**:
    *   Data Type: **Qualitative** (It's a descriptive name).
    *   Python Representation: **`str`**

2.  **`price_usd`**:
    *   Data Type: **Quantitative** (It's a measurement of value).
    *   Sub-Type: **Continuous** (Price can be $249.99, $249.98, etc.).
    *   Python Representation: **`float`**

3.  **`stock_quantity`**:
    *   Data Type: **Quantitative**
    *   Sub-Type: **Discrete** (You can only have whole numbers of items).
    *   Python Representation: **`int`**

4.  **`color_options`**:
    *   The list contains **Qualitative** data (categories of colors).
    *   Python Representation: **`list`** of `str`.

5.  **`customer_reviews`**:
    *   This is **Structured Data**. Each review has a consistent structure (`user`, `rating`, `comment`).
    *   Python Representation: **`list`** of `dict`.

6.  **`bluetooth_version`**:
    *   Because it's stored as a string (`"5.2"`), we must treat it as **Qualitative/Categorical** data. We can't perform math on it directly. If we wanted to treat it as a number, we would first have to convert it to a `float`.

---

### 6. Assignment: Categorize Real-World Data

**Your Task:** Find and identify 5 different real-world data points. They can be from your daily life, a website, a news article, etc. For each one, categorize it fully.

**Submission Format:** Create a text file (`.txt`) or a markdown file (`.md`) named `my_data_examples.txt`. Follow the template below for each of your 5 examples.

--- 

**Example 1:**
*   **Data Point:** The ages of your family members.
*   **Main Type:** Quantitative
*   **Sub-Type:** Discrete
*   **Possible Python Representation:** A `list` of `int` (e.g., `[52, 50, 25, 21]`)

--- 

**Example 2:**
*   **Data Point:** Your current letter grade in a course.
*   **Main Type:** Qualitative
*   **Sub-Type:** N/A (or Categorical)
*   **Possible Python Representation:** A `str` (e.g., `'B+'`)

--- 

*(Now add your 5 unique examples below this line)*