# R Programming - Unit III: Data Frames & Control Statements
### Prof. Anjit Raja R  
Welcome to **Unit III – Data Frames and Control Statements**. This notebook includes concept notes, runnable R code examples, exercises, and lab questions aligned with the syllabus for M.Sc. DSBA (2025–2027).


## Learning Outcomes
By the end of this unit, students will be able to:
1. Create and manipulate data frames in R.
2. Merge, subset, and transform data frames.
3. Work with factors and tables; understand levels and ordering.
4. Implement control statements (if, for, while) effectively.
5. Write functions (including recursive and replacement functions) and use environment/scope concepts.


## Quick Diagram: Data Frame Workflow (textual)
```
Raw Data (CSV/Excel) --> read.csv()/read.table() --> data.frame()
       |                                           |
   Cleaning (NA, types) ----------------------> Transformation (mutate, subset)
       |                                           |
   Merge/Join (merge(), dplyr::left_join()) --> Analysis/Modeling
```
*Tip:* Use `str()`, `glimpse()` (dplyr), and `summary()` to inspect data frames.


In [ ]:
# --- Creating Data Frames ---
id <- 1:6
name <- c('Amit','Bala','Chitra','Dev','Esha','Faisal')
age <- c(21,22,20,23,21,22)
dept <- c('CSE','ECE','CSE','ME','CSE','ECE')
marks <- c(78,85,90,69,82,76)
students_df <- data.frame(ID=id, Name=name, Age=age, Dept=dept, Marks=marks, stringsAsFactors=FALSE)
print('Students Data Frame:')
print(students_df)

# Inspect structure and summary
str(students_df)
summary(students_df)


In [ ]:
# --- Reading/Writing CSV (examples) ---
# write.csv(students_df, 'students_sample.csv', row.names=FALSE)
# df <- read.csv('students_sample.csv')
## (In Colab, upload/download files via the Files sidebar or Google Drive mounting.)

# --- Subsetting and Selecting Columns ---
students_df[students_df$Marks > 80, ]   # rows with marks > 80
students_df[, c('Name','Marks')]
subset(students_df, Dept=='CSE' & Marks>75)


In [ ]:
# --- Merging Data Frames ---
exam_df <- data.frame(ID=c(2,4,6,7), ExamScore=c(88,79,85,90))
print('Exam Data:')
print(exam_df)

# Inner join (only matching IDs)
merged_inner <- merge(students_df, exam_df, by='ID')
print('Inner Merge:')
print(merged_inner)

# Left join (keep all students)
merged_left <- merge(students_df, exam_df, by='ID', all.x=TRUE)
print('Left Merge (with NAs for missing exam scores):')
print(merged_left)


In [ ]:
# --- Factors and Tables ---
dept_factor <- factor(students_df$Dept)
dept_factor
levels(dept_factor)
table(students_df$Dept)

# Ordered factor example (performance level)
perf <- factor(c('low','high','medium','high','low'), levels=c('low','medium','high'), ordered=TRUE)
perf


In [ ]:
# --- Control Statements: if, for, while ---
# if / else example
x <- 75
if (x >= 90) {
  print('Grade A')
} else if (x >= 75) {
  print('Grade B')
} else {
  print('Grade C or below')
}

# for loop example: print names
for (nm in students_df$Name) {
  cat('Student:', nm, '\n')
}

# while loop example: sum until > 100
s <- 0
i <- 1
while (s <= 100 && i <= nrow(students_df)) {
  s <- s + students_df$Marks[i]
  i <- i + 1
}
cat('Sum reached:', s, 'after', i-1, 'students\n')

In [ ]:
# --- Functions: Definition, Return, Default Args ---
grade_calc <- function(mark) {
  if (mark >= 90) return('A')
  else if (mark >= 75) return('B')
  else if (mark >= 60) return('C')
  else return('D')
}

sapply(students_df$Marks, grade_calc)

# Function with default argument
greet <- function(name='Student') {
  paste('Hello', name)
}
greet('Anjit')
greet()


In [ ]:
# --- Recursion Example: Factorial ---
fact <- function(n) {
  if (n <= 1) return(1)
  else return(n * fact(n-1))
}
fact(5)

# Replacement function example: add '!' to names
add_exclaim <- function(x) {
  paste0(x, '!')
}
students_df$Name <- add_exclaim(students_df$Name)
students_df$Name


## Mini Exercises
1. Read a CSV of your choice (or use `students_df`) and create a new column `Pass` (TRUE if Marks>=75).
2. Merge two data frames where keys have different names (use `by.x` and `by.y`).
3. Create an ordered factor for `Marks` with levels: Low (<65), Medium (65-80), High (>80).
4. Write a function that returns the top 2 students by marks from a data frame.


## Lab Questions (Assignment)
1. Given a dataset of sales (store, product, month, revenue), perform exploratory data analysis: summary stats, top products, and month-wise trends.
2. Implement data cleaning steps: detect and handle missing values, fix incorrect data types, and handle duplicates.
3. Demonstrate environment and scope: create a function that modifies a variable in the parent environment using `<<-` and explain why it should be used with caution.


## Summary & Key Functions
- `data.frame(), read.csv(), merge(), subset(), factor(), table(), str(), summary()`
- Control flow: `if`, `for`, `while` — use vectorized operations when possible for performance.
- Functions: default args, returning values, recursion, replacement functions.


In [ ]:
cat('\n✅ Unit III Completed: Data Frames & Control Statements - Notes, Code & Exercises included!')