# Variables and Data Types (Python)

<hr>

## 1. What Is a Variable?

A **variable** is a *named reference* to a value stored somewhere in memory.

In Python:
- Variables **do not store values directly**
- They store **references to objects**
    - A reference is a pointer or link that tells Python where an object lives in memory. 

```python
x = 10
```

Here:
- `x` is a name  
- `10` is an object (an integer)  
- `=` binds the name to the object  

$$new_page$$

## 2. Variable Assignment

### Basic Assignment
```python
a = 5
b = a
```

Both `a` and `b` now reference the **same object**.

### Reassignment
```python
a = 10
```

- `a` now references a *new* object  
- `b` still references `5`  

$$new_page$$

## 3. Dynamic Typing

To understand Dynamic Typing we must first take a look at **runtime** and **compile time**:

### Compile time: 
- is the phase when source code is translated into another form before the program runs.
This usually includes:
    - Checking syntax
    - Determining variable types
    - Catching type-related errors
    - Generating machine code or bytecode

Languages like C, C++, and Java perform strict checks at compile time.

### Runtime: 
- is the phase when the program is actually executing, line by line, in memory. \
At runtime:
    - Objects are created
	- Memory is allocated
	- Types are attached to objects
	- Expressions are evaluated

```text
You type Python code
        ↓
Source code (plain text .py file)
        ↓
Python reads the file
        ↓
COMPILE TIME
   -> Tokenization (break code into tokens)
   -> Syntax checking (grammar rules)
   -> Bytecode generation (.pyc)
        ↓
If syntax is valid
        ↓
RUNTIME (Python Virtual Machine)
   -> Execute bytecode line by line
   -> Create objects in memory
   -> Assign types to objects
   -> Bind variable names to objects
   -> Evaluate expressions
   -> Call functions
        ↓
Output sent to screen (stdout)
        ↓
Program finishes
        ↓
Memory cleaned up (garbage collection)
```

Python operates almost entirely at runtime.

Python uses **dynamic typing**, meaning:
- Variable types are determined **at runtime**
- A variable can reference objects of different types over time

```python
x = 5
x = "hello"
x = 3.14
```

This increases flexibility but requires discipline to avoid logical errors.

### Contrast with Static Typing

| Feature | Python | C / Java |
|------|------|------|
| Type declaration | Not required | Required |
| Type checking | Runtime | Compile-time |
| Flexibility | High | Lower |
| Safety | Developer responsibility | Compiler enforced |

$$new_page$$

## 4. Built-in Data Types Overview

Python provides several **core built-in data types**:

| Category | Type | Example |
|------|------|------|
| Numeric | `int` | `10` |
| Numeric | `float` | `3.14` |
| Text | `str` | `"data"` |
| Logical | `bool` | `True` |
| Null | `NoneType` | `None` |

Understanding these types is essential for:
- Data cleaning
- Feature engineering
- Algorithm correctness

$$new_page$$

## 5. Integers (`int`)

Integers represent **whole numbers**.

```python
x = 42
y = -10
```

Key properties:
- Unlimited precision (no overflow, which means python integers can be as large as your computer memory allows)
    -  Some langauges such as C and Java have overflow whereby numbers can wrap around or become incorrect
- Exact arithmetic (There is no approximation involved which is different to floating-point numbers which can have rounding errors)


```10 + 3```
- mathematical addition
- Exact and precise

```10 // 3   # integer division```
- Divides the numbers 
- discards the decimal portion
- returns the quotient only 
- It does not round 10 / 3 = 3.3333 to 3 
- it always moves towards negative infinity
    - -10 / 3 = -4
        
```10 % 3    # modulus```
- returns the remainder after division
    - 10 % 3 = 1
    - -10 % 3 = -2 (refer to the division algorithm in the last slide)

Used heavily in:
- Indexing
- Counting
- Loop control

$$new_page$$

## 6. Floating-Point Numbers (`float`)

Floats represent **real numbers with decimals**.

```python
pi = 3.14159
```

### Important Concept: Precision Errors

```python
0.1 + 0.2 != 0.3
```

0.1 + 0.2 != 0.3 because floating-point numbers are stored in binary, and 0.1 and 0.2 cannot be represented exactly in base-2, leading to tiny rounding errors. 

Implications for data science:
- Avoid direct equality checks
- Use tolerances when comparing floats

$$new_page$$

## 7. Strings (`str`)

Strings represent **textual data**.

```python
name = "Alice"
```

### Characteristics
- Immutable
- Indexed
- Iterable

```python
name[0]    # 'A'
len(name)  # length
```

### Common Operations

| Operation | Example |
|------|------|
| Concatenation | `"a" + "b"` |
| Repetition | `"ha" * 3` |
| Formatting | `f"Hi {name}"` |

Strings are central to:
- File processing
- Data cleaning
- Feature extraction

$$new_page$$

## 8. Booleans (`bool`)

Boolean values represent **truth states**.

```python
is_valid = True
```

### Boolean Expressions
```python
x > 5
x == 10
x != 3
```

### Logical Operators

| Operator | Meaning |
|------|------|
| `and` | Both true |
| `or` | At least one true |
| `not` | Negation |

Booleans drive:
- Control flow
- Filtering
- Decision logic

$$new_page$$

## 9. NoneType (`None`)

`None` represents **absence of a value**.

```python
result = None
```

Common uses:
- Default return values
- Missing data
- Placeholders

```python
if result is None:
    pass
```

Use `is` and `is not` when comparing to `None`.

$$new_page$$

## 10. Type Inspection

To check the type of an object:

```python
type(x)
```

Used primarily for:
- Debugging
- Validation
- Learning

$$new_page$$

## 11. Type Conversion (Casting)

```python
int("5")
float("3.14")
str(10)
bool(1)
```

Casting is essential when:
- Reading CSV files
- Cleaning datasets
- Converting features

$$new_page$$

## 12. Mutability vs Immutability

| Type | Mutable? |
|------|------|
| int | No |
| float | No |
| str | No |
| bool | No |

Immutable objects:
- Cannot be changed after creation
- remember x = 5 is just x being assigned the object '5'
- therefore when we now say x = 6 and there are no other assignemnts to the object 5 it will be removed later by python with garbage collection 
- New objects are created on modification

$$new_page$$

## 13. Naming Conventions and Best Practices

```python
total_sales = 10
user_age = 21
```

- Use `snake_case`
- Be descriptive
- Avoid single-letter names

$$new_page$$

## 14. Why Variables and Types Matter for Data Science

Variables and types are the foundation for:
- Feature representation
- Memory efficiency
- Algorithm correctness
- Debugging pipelines

$$new_page$$

## The division algorithm: 
- Is  a rule that all integer numbers have to follow
  
```a = (a//b) x b + (a%b)```

**We can look at it in the following way:**

10 % 3:
- multiples of 3:

... 0   3   6   9   12   15 ...
  
- Closest multiple less than or equal to 10 → 9
- Distance from 9 to 10 → 1

<br>
<hr style="display: flex; width: 60%; margin:auto;">
<br>

-10 % 3
- multiples of 3:

... -12   -9   -6   -3   0   3 ...

- Closest multiple less than or equal to -10 → -12
- Distance from -12 to -10 → 2

## Why does 0.1 + 0.2 != 0.3 ?

Computers store floating point numbers using binary (base-2), not decimal (base-10)
**In base-10 (what humans use)**
- 0.1 is exact
- 0.2 is exact
- 0.3 is exact

**In base-2 (what computers use)**
- 0.1 → repeating binary fraction
- 0.2 → repeating binary fraction
- Just like 1/3 = 0.3333... in decimal

So the computer stores approximations, not exact values.

Correct method to check equality for floats in python:

```
import math 
math.isclose(0.1 + 0.2, 0.3)
```

math.isclose() checks whether two numbers are close enough within a safe tolerance.

## Object Identity

We can use the function `id()` to obtain the **identity** of an object in Python.
This identity is unique for the object during its lifetime.

In CPython, the value returned by `id()` often corresponds to the object's
memory address, but this is an **implementation detail** and should not be relied on.

$$new_page$$

In [16]:
10%3

1

In [17]:
-10%3

2

In [18]:
10//3

3

In [19]:
10/3

3.3333333333333335

In [21]:
if 0.1 + 0.2 == 0.3:
    print('correct')
else:
    print('Use math.isclose !')

0.1 + 0.2

Use math.isclose !


0.30000000000000004

In [23]:
x = 5
print(id(x))
y = x
x = 3
print(id(y))
print(id(x))

4369440656
4369440656
4369440592


In [24]:
x = 3
type(x)

int

In [25]:
str(56)

'56'

In [26]:
int('4.3')

ValueError: invalid literal for int() with base 10: '4.3'

In [27]:
int('4')

4