# 📚 Notebook 1: R-to-Python Translation Guide for Political Science
## Your Bridge from R to Python for Text Analysis

**Time to complete:** 60-90 minutes  
**Difficulty:** Beginner friendly!  
**Prerequisites:** You know R, that's enough!

### 🎯 What You'll Learn
1. How your R knowledge translates to Python
2. Python equivalents for common R operations
3. How to think in Python while leveraging your R experience

---

## 🚀 Part 0: Welcome R User!

**Good news:** Python and R are more similar than different!

**Your R skills that directly transfer:**
- Working with dataframes
- Writing functions
- Data manipulation logic
- Statistical thinking

**What's different (but easy):**
- Syntax details (we'll cover these!)
- 0-based indexing (instead of 1-based)
- Indentation matters (instead of curly braces)

## 📦 Part 1: Library Loading - R vs Python

In R, you use `library()` or `require()`. In Python, you use `import`.

In [None]:
# R Style:
# library(tidyverse)
# library(stringr)

# Python Style:
import pandas as pd       # Like dplyr/tidyr
import numpy as np        # Like base R vectors/matrices
import re                 # Like stringr

print("✅ Libraries loaded! This is like R's library() function")

## 📊 Part 2: Dataframes - Your Familiar Friend

Python's pandas DataFrames work just like R dataframes!

In [None]:
# Let's create a sample political dataset
# This is like R's data.frame() or tibble()

data = pd.DataFrame({
    'speaker': ['Biden', 'Trump', 'Sanders', 'Harris', 'DeSantis'],
    'party': ['Democrat', 'Republican', 'Democrat', 'Democrat', 'Republican'],
    'speech_length': [1250, 2100, 1800, 1500, 1900],
    'sentiment_score': [0.65, -0.3, 0.4, 0.7, -0.1]
})

print("Our political speeches dataset:")
print(data)
print("\n📝 This is just like viewing a dataframe in R!")

## 🔄 Part 3: R to Python Operation Dictionary

Here's your cheat sheet for common operations:

In [None]:
# === R's glimpse() or str() ===
print("R: glimpse(data) → Python: data.info()")
print("-" * 40)
data.info()

In [None]:
# === R's head() ===
print("\nR: head(data, 3) → Python: data.head(3)")
print("-" * 40)
print(data.head(3))

In [None]:
# === R's summary() ===
print("\nR: summary(data) → Python: data.describe()")
print("-" * 40)
print(data.describe())

In [None]:
# === R's nrow() and ncol() ===
print("\nR: nrow(data) → Python: len(data) or data.shape[0]")
print(f"Number of rows: {len(data)}")

print("\nR: ncol(data) → Python: data.shape[1]")
print(f"Number of columns: {data.shape[1]}")

print("\nR: dim(data) → Python: data.shape")
print(f"Dimensions: {data.shape}")

## 🔍 Part 4: Data Selection - The Python Way

Selecting data in Python is similar to R, just different syntax!

In [None]:
# === Selecting Columns ===

# R style: data$speaker or data[['speaker']]
# Python style:
print("Single column (like R's $):")
print(data['speaker'])
print()

# Multiple columns
# R: select(data, speaker, party)
# Python:
print("Multiple columns (like R's select):")
print(data[['speaker', 'party']])

In [None]:
# === Filtering Rows ===

# R: filter(data, party == "Democrat")
# Python:
democrats = data[data['party'] == 'Democrat']
print("Democrats only (like R's filter):")
print(democrats)
print()

# R: filter(data, speech_length > 1600)
# Python:
long_speeches = data[data['speech_length'] > 1600]
print("Long speeches (> 1600 words):")
print(long_speeches)

In [None]:
# === Multiple Conditions ===

# R: filter(data, party == "Democrat" & speech_length > 1500)
# Python: use & instead of &&, | instead of ||
dems_long = data[(data['party'] == 'Democrat') & (data['speech_length'] > 1500)]
print("Democrats with long speeches:")
print(dems_long)

# Note: Parentheses around each condition are important in Python!

## 🛠 Part 5: Data Manipulation - dplyr → pandas

In [None]:
# === Adding New Columns (mutate) ===

# R: mutate(data, words_per_minute = speech_length / 10)
# Python:
data['words_per_minute'] = data['speech_length'] / 10
print("Added words_per_minute column:")
print(data)

In [None]:
# === Group By and Summarize ===

# R: data %>% group_by(party) %>% summarise(mean_length = mean(speech_length))
# Python:
party_summary = data.groupby('party').agg({
    'speech_length': 'mean',
    'sentiment_score': 'mean'
}).round(2)

print("\nMean values by party (like R's group_by + summarise):")
print(party_summary)

In [None]:
# === Sorting (arrange) ===

# R: arrange(data, desc(speech_length))
# Python:
sorted_data = data.sort_values('speech_length', ascending=False)
print("\nSorted by speech length (like R's arrange):")
print(sorted_data[['speaker', 'speech_length']])

## 💬 Part 6: String Operations - stringr → re & string methods

In [None]:
# Sample text data
texts = pd.DataFrame({
    'speech': [
        "We must unite the AMERICAN people!",
        "This administration has FAILED America",
        "Healthcare is a HUMAN RIGHT for all",
        "We will build back BETTER together",
        "The woke agenda is DESTROYING our values"
    ],
    'speaker': ['Biden', 'Trump', 'Sanders', 'Harris', 'DeSantis']
})

print("Our text data:")
print(texts)

In [None]:
# === Basic String Operations ===

# R: str_to_lower()
# Python:
texts['speech_lower'] = texts['speech'].str.lower()
print("\nLowercase (like R's str_to_lower):")
print(texts[['speech_lower']].head(2))

# R: str_length()
# Python:
texts['length'] = texts['speech'].str.len()
print("\nString length (like R's str_length):")
print(texts[['speaker', 'length']])

In [None]:
# === Pattern Detection ===

# R: str_detect(speech, "America")
# Python:
texts['mentions_america'] = texts['speech'].str.contains('America', case=False)
print("\nWho mentions America? (like R's str_detect):")
print(texts[['speaker', 'mentions_america']])

In [None]:
# === Pattern Extraction ===

import re

# R: str_extract_all(speech, "[A-Z]{2,}")
# Python: Extract CAPITALIZED words
def extract_caps(text):
    return re.findall(r'[A-Z]{2,}', text)

texts['emphasis_words'] = texts['speech'].apply(extract_caps)
print("\nCapitalized words for emphasis:")
print(texts[['speaker', 'emphasis_words']])

## 🎯 Part 7: Your Turn! Practice Exercises

Now let's practice with political science examples.
**Modify only the sections marked with === MODIFY THIS SECTION ===**

In [None]:
# Exercise 1: Create your own political dataset
# Task: Create a dataframe with at least 3 columns about political topics

# === MODIFY THIS SECTION ===
# Change this to your own data!
my_data = pd.DataFrame({
    'country': ['USA', 'UK', 'Canada'],
    'leader': ['Biden', 'Sunak', 'Trudeau'],
    'approval_rating': [42, 28, 31]
})
# === END MODIFICATION ===

print("Your dataset:")
print(my_data)
print("\nGreat job! You just created a Python dataframe!")

In [None]:
# Exercise 2: Filter and analyze your data
# Task: Filter for countries with approval > 30 and calculate mean

# === MODIFY THIS SECTION ===
# Change the filter condition
high_approval = my_data[my_data['approval_rating'] > 30]
# === END MODIFICATION ===

print("Filtered results:")
print(high_approval)
print(f"\nMean approval rating: {high_approval['approval_rating'].mean():.1f}%")

In [None]:
# Exercise 3: Text processing
# Task: Count words in political statements

sample_statements = pd.DataFrame({
    'statement': [
        "We will build a better future",
        "The economy needs immediate reform",
        "Climate change is our greatest challenge"
    ]
})

# === MODIFY THIS SECTION ===
# Add a column counting words (hint: split and count)
sample_statements['word_count'] = sample_statements['statement'].apply(lambda x: len(x.split()))
# === END MODIFICATION ===

print("Word counts:")
print(sample_statements)

## 📊 Part 8: Quick Visualization

Just like R's ggplot, Python has plotting! We'll use plotly (similar grammar to ggplot)

In [None]:
import plotly.express as px

# Create a simple bar chart
# R: ggplot(data, aes(x=speaker, y=speech_length, fill=party)) + geom_bar()
# Python with plotly:

fig = px.bar(data, 
             x='speaker', 
             y='speech_length',
             color='party',
             title='Speech Length by Speaker',
             color_discrete_map={'Democrat': 'blue', 'Republican': 'red'})

fig.show()

print("📊 This is like ggplot2 in R! Same concept, slightly different syntax.")

## 🎓 Part 9: Functions - Almost Identical to R!

In [None]:
# R function:
# calculate_readability <- function(text) {
#   words <- str_count(text, "\\w+")
#   sentences <- str_count(text, "[.!?]")
#   return(words / sentences)
# }

# Python function (notice: no curly braces, just indentation!)
def calculate_readability(text):
    """Calculate simple readability score"""
    words = len(text.split())
    sentences = len(re.findall(r'[.!?]', text)) or 1  # Avoid division by zero
    return words / sentences

# Test it
test_text = "This is important. Is it clear? We must act now!"
score = calculate_readability(test_text)
print(f"Readability score: {score:.1f} words per sentence")

In [None]:
# === YOUR TURN: Create a function ===
# Task: Create a function that counts political keywords

# === MODIFY THIS SECTION ===
def count_political_keywords(text):
    """Count political keywords in text"""
    # Add your keywords
    keywords = ['democracy', 'freedom', 'rights', 'justice']
    
    # Convert to lowercase for matching
    text_lower = text.lower()
    
    # Count keywords
    count = sum(1 for word in keywords if word in text_lower)
    return count
# === END MODIFICATION ===

# Test your function
test_speech = "Democracy and freedom are fundamental rights in our justice system"
keyword_count = count_political_keywords(test_speech)
print(f"Found {keyword_count} political keywords")

## 🌟 Part 10: Key Differences to Remember

### 1. **Indexing starts at 0, not 1**

In [None]:
my_list = ['first', 'second', 'third']
print(f"R: my_list[1] = 'first'")
print(f"Python: my_list[0] = '{my_list[0]}'")
print(f"Python: my_list[1] = '{my_list[1]}'")

### 2. **Indentation matters (no curly braces)**

In [None]:
# R uses curly braces:
# if (x > 5) {
#   print("big")
# } else {
#   print("small")
# }

# Python uses indentation:
x = 10
if x > 5:
    print("big")       # Must be indented!
else:
    print("small")     # Same indentation level as if

### 3. **Use : for slicing (not just for sequences)**

In [None]:
numbers = [0, 1, 2, 3, 4, 5]
print(f"R: numbers[2:4] would give indices 2,3,4")
print(f"Python: numbers[2:4] = {numbers[2:4]}  (gives indices 2,3)")
print("Note: Python's upper bound is exclusive!")

## ✅ Congratulations! You've Completed the R-to-Python Translation Guide!

### 📝 What You've Learned:
- ✅ Python DataFrames are just like R dataframes
- ✅ Your R logic translates directly to Python
- ✅ The syntax is different but concepts are the same
- ✅ You can do everything in Python that you do in R

### 🎯 Next Steps:
1. Save this notebook for reference
2. Try modifying the exercises with your own data
3. Move on to Notebook 2: Python Essentials for Text Analysis

### 💡 Remember:
- **You're not starting from scratch** - your R knowledge is valuable!
- **Python is a tool** - focus on your research, not becoming a programmer
- **Help is always available** - post questions in Slack!

### 📚 Quick Reference Card:
```
R → Python Cheat Sheet:
-----------------------
library(pkg) → import pkg
df$col → df['col']
filter() → df[condition]
mutate() → df['new_col'] = 
group_by() → df.groupby()
%>% → . (dot chaining)
is.na() → .isna()
!is.na() → .notna()
```

**🚀 Ready for the next notebook? You've got this!**