# R to Python Bridge

Welcome! You already know how to think like a programmer from your R experience. This notebook will help you translate that knowledge to Python.

## What Transfers Directly
- Your statistical thinking
- Understanding of data structures (vectors, data frames)
- Logic and control flow concepts
- The habit of working with data

## What's Different (and We'll Cover)
- **Indexing starts at 0** (not 1!)
- Assignment uses `=` (not `<-`)
- Indentation matters (no curly braces)
- Objects are shared by reference (R copies by default)

Let's dive in!

---
## 1. Assignment: `=` vs `<-`

In R, you write:
```r
x <- 5
y <- "hello"
```

In Python, just use `=`:

In [None]:
x = 5
y = "hello"
print(x, y)

**Note:** Python actually allows `<-` but it means something completely different (less than, followed by negative). Never use it for assignment!

In [None]:
# This is NOT assignment in Python - it's a comparison!
x = 10
result = x < -5  # Is x less than -5?
print(result)

### Your Turn
Create three variables: your name, your age, and whether you like coffee (True/False).

In [None]:
# YOUR CODE HERE
name = None  # Replace with your name (as a string)
age = None   # Replace with your age (as an integer)
likes_coffee = None  # Replace with True or False

In [None]:
# ðŸ§ª Grading cell - run this to check your answer
assert 'name' in dir(), "Variable 'name' not defined"
assert 'age' in dir(), "Variable 'age' not defined"
assert 'likes_coffee' in dir(), "Variable 'likes_coffee' not defined"
assert isinstance(name, str), f"'name' should be a string, got {type(name).__name__}"
assert isinstance(age, int), f"'age' should be an integer, got {type(age).__name__}"
assert isinstance(likes_coffee, bool), f"'likes_coffee' should be a boolean (True/False), got {type(likes_coffee).__name__}"
print("âœ“ All variables created correctly!")

<details>
<summary>ðŸ’¡ Hint (click to expand)</summary>

```python
name = "Your Name"  # Use quotes for strings
age = 20            # No quotes for numbers
likes_coffee = True # Note: capital T, no quotes
```

Remember: strings need quotes, numbers and booleans don't!
</details>

---
## 2. The Big Gotcha: 0-Based Indexing

This is the **most important difference** between R and Python.

In R:
```r
vec <- c("a", "b", "c", "d")
vec[1]  # Returns "a"
```

In Python:

In [None]:
my_list = ["a", "b", "c", "d"]
print("Index 0:", my_list[0])  # First element
print("Index 1:", my_list[1])  # Second element

### Predict Before You Run

Given this list, **write down what you think each expression returns BEFORE running the cell**:

```python
letters = ["x", "y", "z"]
```

1. `letters[0]` = ?
2. `letters[1]` = ?
3. `letters[2]` = ?
4. `letters[3]` = ?

In [None]:
letters = ["x", "y", "z"]

# Uncomment each line one at a time and run to check your predictions
# print(letters[0])
# print(letters[1])
# print(letters[2])
# print(letters[3])  # What happens here?

### Negative Indexing

Python has a nice feature R doesn't: negative indices count from the end.

In [None]:
my_list = ["a", "b", "c", "d"]
print("Last element:", my_list[-1])
print("Second to last:", my_list[-2])

**Careful!** In R, negative indices *exclude* elements. In Python, they access from the end. Totally different!

```r
# R: vec[-1] returns everything EXCEPT the first element
# Python: my_list[-1] returns the LAST element
```

---
## 3. Range Endpoints

Another indexing gotcha: ranges in Python **exclude the endpoint**.

In R:
```r
1:5  # Returns 1, 2, 3, 4, 5
```

In Python:

In [None]:
print(list(range(1, 5)))  # Does NOT include 5!

This also applies to slicing:

In [None]:
my_list = ["a", "b", "c", "d", "e"]
print(my_list[1:4])  # Elements at index 1, 2, 3 (NOT 4!)

### Predict Before You Run

```python
nums = [10, 20, 30, 40, 50]
```

What does `nums[0:3]` return? Write your prediction, then run:

In [None]:
nums = [10, 20, 30, 40, 50]
print(nums[0:3])

---
## 4. Data Types Comparison

| R | Python | Notes |
|---|--------|-------|
| `c(1, 2, 3)` | `[1, 2, 3]` | R vectors â†’ Python lists |
| `list(a=1, b=2)` | `{"a": 1, "b": 2}` | Named lists â†’ dictionaries |
| `data.frame()` | `polars.DataFrame()` | We'll use polars, not pandas |
| `TRUE/FALSE` | `True/False` | Capitalization differs! |
| `NULL` | `None` | Different keyword |
| `NA` | `None` or `float('nan')` | Missing values |


In [None]:
# R vector equivalent - Python list
numbers = [1, 2, 3, 4, 5]

# R named list equivalent - Python dictionary
person = {"name": "Alice", "age": 25, "student": True}

# Accessing dictionary values (like R's $ operator)
print(person["name"])

### No `c()` Needed

In R, you use `c()` to combine values into a vector. In Python, just use square brackets:

In [None]:
# R: c(1, 2, 3)
# Python:
my_list = [1, 2, 3]
print(my_list)

---
## 5. Functions: `def` vs `function()`

In R:
```r
add_numbers <- function(a, b) {
    return(a + b)
}
```

In Python:

In [None]:
def add_numbers(a, b):
    return a + b

result = add_numbers(3, 5)
print(result)

Key differences:
- `def` keyword instead of `function()`
- Colon `:` after the parameters
- **Indentation defines the body** (no curly braces!)
- `return` without parentheses

### The Indentation Rule

Python uses indentation to define code blocks. R uses curly braces. This is probably the biggest syntax adjustment.

In [None]:
def greet(name):
    # Everything indented here is part of the function
    message = f"Hello, {name}!"
    return message

# This is outside the function (not indented)
print(greet("World"))

### Your Turn

Write a function called `square` that takes a number and returns its square.

<details>
<summary>ðŸ’¡ Hint (click to expand)</summary>

The function body should be one line that returns `n` multiplied by itself.

In Python, you can use:
- `n * n` (multiplication)
- `n ** 2` (exponentiation)

```python
def square(n):
    return n ** 2
```
</details>

In [None]:
# YOUR CODE HERE
def square(n):
    pass  # Replace this with your code

# Test it:
# print(square(4))  # Should print 16

In [None]:
# ðŸ§ª Grading cell - run this to check your answer
assert square(4) == 16, f"square(4) should return 16, got {square(4)}"
assert square(0) == 0, f"square(0) should return 0, got {square(0)}"
assert square(-3) == 9, f"square(-3) should return 9, got {square(-3)}"
assert square(2.5) == 6.25, f"square(2.5) should return 6.25, got {square(2.5)}"
print("âœ“ square() function works correctly!")

---
## 6. Control Flow: Indentation Instead of Braces

In R:
```r
if (x > 0) {
    print("positive")
} else if (x < 0) {
    print("negative")
} else {
    print("zero")
}
```

In Python:

In [None]:
x = 5

if x > 0:
    print("positive")
elif x < 0:  # Note: elif, not else if
    print("negative")
else:
    print("zero")

Key differences:
- No parentheses around the condition (optional but not required)
- Colon `:` after each condition
- `elif` instead of `else if`
- **Indentation** defines what's inside each block

### For Loops

In R:
```r
for (i in 1:5) {
    print(i)
}
```

In Python:

In [None]:
for i in range(5):  # 0, 1, 2, 3, 4 (remember: 0-based, endpoint excluded!)
    print(i)

In [None]:
# To get 1, 2, 3, 4, 5 like R's 1:5:
for i in range(1, 6):  # Start at 1, go up to (but not including) 6
    print(i)

---
## 7. The Reference Trap (R Copies, Python Shares)

This is a subtle but **critical** difference.

In R, when you assign a vector to a new variable, R copies it:
```r
a <- c(1, 2, 3)
b <- a
b[1] <- 99
print(a)  # Still c(1, 2, 3) - a is unchanged!
```

In Python, assignment creates a **reference** (both variables point to the same object):

In [None]:
a = [1, 2, 3]
b = a  # b now refers to the SAME list as a
b[0] = 99
print("a:", a)  # a is ALSO changed!

### Predict Before You Run

What will `original` contain after running this code?

In [None]:
original = ["a", "b", "c"]
copy = original
copy.append("d")

# What is original now? Make your prediction, then print it:
# print(original)

### How to Actually Copy

If you want R-like behavior (a true copy), you need to be explicit:

In [None]:
original = [1, 2, 3]
actual_copy = original.copy()  # Or: list(original) or original[:]
actual_copy[0] = 99
print("original:", original)  # Unchanged!
print("actual_copy:", actual_copy)

---
## 8. Printing and String Formatting

In R:
```r
name <- "Alice"
age <- 25
paste0("Name: ", name, ", Age: ", age)
# Or: sprintf("Name: %s, Age: %d", name, age)
```

In Python, use **f-strings** (the modern way):

In [None]:
name = "Alice"
age = 25
print(f"Name: {name}, Age: {age}")

The `f` before the string quote lets you embed expressions inside `{}`:

In [None]:
x = 10
print(f"x is {x}, and x squared is {x ** 2}")

---
## 9. Methods vs Functions

R primarily uses functions:
```r
length(vec)
toupper("hello")
```

Python has both functions AND methods (functions attached to objects):

In [None]:
my_list = [1, 2, 3]
text = "hello"

# Functions (like R)
print(len(my_list))  # Like R's length()

# Methods (attached to the object with .)
print(text.upper())  # Like R's toupper(), but as a method
my_list.append(4)    # Modifies the list in place
print(my_list)

---
## 10. Quick Reference Cheat Sheet

| Task | R | Python |
|------|---|--------|
| Assignment | `x <- 5` | `x = 5` |
| First element | `vec[1]` | `lst[0]` |
| Last element | `vec[length(vec)]` | `lst[-1]` |
| Range 1-5 | `1:5` | `range(1, 6)` |
| Create list/vector | `c(1, 2, 3)` | `[1, 2, 3]` |
| Named list/dict | `list(a=1)` | `{"a": 1}` |
| Define function | `function(x) {...}` | `def fn(x):` |
| If statement | `if (x) {...}` | `if x:` |
| Else if | `else if` | `elif` |
| For loop | `for (i in 1:n)` | `for i in range(n):` |
| Boolean values | `TRUE/FALSE` | `True/False` |
| Null value | `NULL` | `None` |
| String formatting | `paste0()` or `sprintf()` | `f"...{var}..."` |
| Print | `print()` | `print()` |
| Length | `length(x)` | `len(x)` |
| Append | `c(vec, new)` | `lst.append(new)` |

<details>
<summary>ðŸ’¡ Hint (click to expand)</summary>

1. `c(85, 92, ...)` becomes `[85, 92, ...]` (square brackets, no c())
2. There's no built-in `mean()`, but you can calculate it: `sum(grades) / len(grades)`
3. `paste0()` becomes an f-string: `f"Average grade: {average}"`

```python
grades = [85, 92, 78, 95, 88]
average = sum(grades) / len(grades)
print(f"Average grade: {average}")
```
</details>

---
## Practice Exercises

### Exercise 1: Translate R to Python

Convert this R code to Python:
```r
grades <- c(85, 92, 78, 95, 88)
average <- mean(grades)
print(paste0("Average grade: ", average))
```

In [None]:
# YOUR CODE HERE
# Hint: Python doesn't have mean() built-in, but you can use sum()/len()

<details>
<summary>ðŸ’¡ Hint (click to expand)</summary>

Three bugs to find:

1. **Assignment**: `<-` should be `=`
2. **First element**: Index 1 gets the SECOND element; use index 0 for first
3. **Last three**: Need to start at index 2 (third element) to get indices 2,3,4
4. **String formatting**: `paste0()` doesn't exist; use f-strings

```python
numbers = [10, 20, 30, 40, 50]
first_element = numbers[0]        # Index 0 for first!
last_three = numbers[2:5]         # Or numbers[2:] or numbers[-3:]
print(f"First: {first_element}")
```
</details>

In [None]:
# ðŸ§ª Grading cell - run this to check your answer
assert 'grades' in dir(), "Variable 'grades' not defined"
assert 'average' in dir(), "Variable 'average' not defined"
assert grades == [85, 92, 78, 95, 88], f"grades should be [85, 92, 78, 95, 88], got {grades}"
assert average == 87.6, f"average should be 87.6, got {average}"
print("âœ“ R to Python translation correct!")

---
## Key Takeaways

1. **Indexing starts at 0** - This will trip you up. A lot. It's normal.
2. **Ranges exclude the endpoint** - `range(1, 5)` gives 1,2,3,4 not 1,2,3,4,5
3. **Indentation is syntax** - Get used to it; your editor will help
4. **Assignment shares references** - Use `.copy()` when you need a real copy
5. **Use f-strings** - They're the modern way to format strings

The good news: your R intuition about *what* to do is right. You just need to learn *how* Python expresses it.

---

**Next up:** Notebook 02 - Data Types & Variables (Python's building blocks)

### Exercise 2: Fix the R Habits

This code was written by someone thinking in R. Find and fix the bugs:

In [None]:
# Buggy code - fix the R habits!
numbers <- [10, 20, 30, 40, 50]
first_element = numbers[1]  # Should get 10
last_three = numbers[3:5]   # Should get [30, 40, 50]
print(paste0("First: ", first_element))

### Exercise 3: Write a Function

Write a Python function that takes a list of numbers and returns both the minimum and maximum as a tuple. In R you might write:
```r
min_max <- function(vec) {
    return(c(min(vec), max(vec)))
}
```

In [None]:
# YOUR CODE HERE
def min_max(numbers):
    pass  # Replace with your implementation

# Test:
# print(min_max([3, 1, 4, 1, 5, 9, 2, 6]))  # Should print (1, 9)

In [None]:
# ðŸ§ª Grading cell - run this to check your answer
assert min_max([3, 1, 4, 1, 5, 9, 2, 6]) == (1, 9), f"min_max([3, 1, 4, 1, 5, 9, 2, 6]) should return (1, 9)"
assert min_max([5]) == (5, 5), f"min_max([5]) should return (5, 5)"
assert min_max([-1, -5, -2]) == (-5, -1), f"min_max([-1, -5, -2]) should return (-5, -1)"
print("âœ“ min_max() function works correctly!")