In [None]:

# R Programming Complete Tutorial in Hinglish
'''
## Table of Contents
1. [R Kya Hai?](#r-kya-hai)
2. [Installation](#installation)
3. [Basic Syntax](#basic-syntax)
4. [Data Types](#data-types)
5. [Variables aur Operations](#variables-aur-operations)
6. [Data Structures](#data-structures)
7. [Control Structures](#control-structures)
8. [Functions](#functions)
9. [Data Import/Export](#data-import-export)
10. [Data Manipulation](#data-manipulation)
11. [Data Visualization](#data-visualization)
12. [Statistical Analysis](#statistical-analysis)
13. [Machine Learning](#machine-learning)
14. [Advanced Topics](#advanced-topics)
15. [Best Practices](#best-practices) ###
'''
## R Kya Hai?

R ek **statistical programming language** hai jo data analysis, statistical computing, aur graphics ke liye banai gayi hai. Ye open-source hai aur data science mein bahut popular hai.

**Key Features:**
- Statistical analysis ke liye specially designed
- Powerful data visualization capabilities
- Huge library ecosystem (CRAN packages)
- Cross-platform support
- Strong community support
- Free aur open-source

**R ke Uses:**
- Data Analysis & Statistics
- Machine Learning
- Data Visualization
- Bioinformatics
- Financial Analysis
- Research & Academia

## Installation

### R Install karna:
1. **R Software**: https://cran.r-project.org/ se download karo
2. **RStudio** (Recommended IDE): https://posit.co/download/rstudio-desktop/

### Installation verify karna:
```r
# R version check
version
# ya
R.version.string

# Working directory check
getwd()
```

## Basic Syntax

### Comment karna:
```r
# Ye ek single line comment hai
# R mein sirf single line comments hote hain

# Multi-line comments ke liye har line mein # lagana padta hai
# Jaise ki yahan kiya hai
# Ye multi-line comment ka example hai
```

### Basic Commands:
```r
# Simple calculations
2 + 3        # Addition
10 - 4       # Subtraction
5 * 6        # Multiplication
20 / 4       # Division
2^3          # Power (2 ka cube)
10 %% 3      # Modulus (remainder)

# Help lena
help(mean)   # Function ke bare mein help
?mean        # Same as help
??regression # Topic search

# Objects list karna
ls()         # Current environment ke objects

# Environment clean karna
rm(x)        # Specific object remove
rm(list=ls()) # Saare objects remove
```

## Data Types

### Numeric Types:
```r
# Integer
age <- 25L          # L lagane se integer hota hai
class(age)          # "integer"

# Numeric/Double
height <- 5.8
class(height)       # "numeric"

# Complex numbers
complex_num <- 3 + 4i
class(complex_num)  # "complex"
```

### Character (String):
```r
name <- "Rahul Kumar"
city <- 'Delhi'
class(name)         # "character"

# String operations
nchar(name)         # String length
toupper(name)       # Uppercase
tolower(name)       # Lowercase
substr(name, 1, 5)  # Substring
paste("Hello", "World", sep=" ")  # String concatenation
```

### Logical (Boolean):
```r
is_student <- TRUE
is_working <- FALSE
class(is_student)   # "logical"

# Logical operations
TRUE & FALSE        # AND
TRUE | FALSE        # OR
!TRUE              # NOT
```

### Type checking aur conversion:
```r
# Type check karna
is.numeric(25)      # TRUE
is.character("abc") # TRUE
is.logical(TRUE)    # TRUE

# Type conversion
as.numeric("123")   # String to number
as.character(123)   # Number to string
as.logical(1)       # Number to logical (1=TRUE, 0=FALSE)
```

## Variables aur Operations

### Variable assignment:
```r
# Assignment operators
x <- 10              # Preferred method
y = 20               # Alternative method
25 -> z              # Right assignment (rarely used)

# Multiple assignments
a <- b <- c <- 5

# Variable naming rules
student_name <- "Priya"    # Valid
student.age <- 20          # Valid (dot allowed)
# 2student <- "Invalid"    # Invalid (can't start with number)
# student-name <- "Invalid" # Invalid (hyphen not allowed)
```

### Arithmetic Operations:
```r
# Basic operations
a <- 10
b <- 3

addition <- a + b        # 13
subtraction <- a - b     # 7
multiplication <- a * b  # 30
division <- a / b        # 3.333333
power <- a ^ b          # 1000
modulus <- a %% b       # 1 (remainder)
integer_division <- a %/% b  # 3

# Math functions
sqrt(16)        # Square root: 4
abs(-5)         # Absolute value: 5
round(3.14159, 2)  # Round to 2 decimals: 3.14
ceiling(3.2)    # Round up: 4
floor(3.8)      # Round down: 3
log(10)         # Natural logarithm
log10(100)      # Base 10 logarithm: 2
```

### Comparison Operations:
```r
# Comparison operators
10 == 10    # Equal to: TRUE
10 != 5     # Not equal to: TRUE
10 > 5      # Greater than: TRUE
10 < 5      # Less than: FALSE
10 >= 10    # Greater than or equal: TRUE
10 <= 5     # Less than or equal: FALSE
```

### Logical Operations:
```r
# Logical operators
TRUE & TRUE     # AND: TRUE
TRUE | FALSE    # OR: TRUE
!TRUE          # NOT: FALSE

# Element-wise logical operations on vectors
c(TRUE, FALSE) & c(TRUE, TRUE)   # c(TRUE, FALSE)
c(TRUE, FALSE) | c(FALSE, FALSE) # c(TRUE, FALSE)
```

## Data Structures

### 1. Vectors (Most important):
```r
# Numeric vector
numbers <- c(1, 2, 3, 4, 5)
ages <- c(25, 30, 35, 40)

# Character vector
names <- c("Amit", "Priya", "Raj", "Sita")
cities <- c("Delhi", "Mumbai", "Chennai", "Kolkata")

# Logical vector
results <- c(TRUE, FALSE, TRUE, TRUE)

# Mixed type vector (sab kuch character ban jayega)
mixed <- c(1, "hello", TRUE)  # "1" "hello" "TRUE"

# Vector operations
length(numbers)     # Vector ka size: 5
numbers[1]         # First element: 1
numbers[c(1,3,5)]  # Multiple elements: 1 3 5
numbers[2:4]       # Range: 2 3 4
numbers[-1]        # First element chhodkar: 2 3 4 5

# Vector arithmetic
v1 <- c(1, 2, 3)
v2 <- c(4, 5, 6)
v1 + v2           # c(5, 7, 9)
v1 * 2            # c(2, 4, 6)

# Useful vector functions
sum(numbers)       # Total: 15
mean(numbers)      # Average: 3
min(numbers)       # Minimum: 1
max(numbers)       # Maximum: 5
sort(numbers)      # Sorted vector
unique(c(1,1,2,3,2)) # Unique values: 1 2 3
```

### 2. Lists (Different types store kar sakte hain):
```r
# Simple list
student <- list(
  name = "Rahul",
  age = 25,
  marks = c(85, 90, 78),
  passed = TRUE
)

# List elements access karna
student$name           # "Rahul"
student[["age"]]       # 25
student[[3]]           # c(85, 90, 78)

# List mein add karna
student$city <- "Delhi"

# Nested list
company <- list(
  name = "TechCorp",
  employees = list(
    names = c("A", "B", "C"),
    salaries = c(50000, 60000, 55000)
  ),
  founded = 2020
)

# Nested access
company$employees$names
company[["employees"]][["salaries"]]
```

### 3. Matrices (2D arrays):
```r
# Matrix create karna
mat1 <- matrix(1:12, nrow=3, ncol=4)
mat1
#      [,1] [,2] [,3] [,4]
# [1,]    1    4    7   10
# [2,]    2    5    8   11
# [3,]    3    6    9   12

# By row fill karna
mat2 <- matrix(1:12, nrow=3, ncol=4, byrow=TRUE)
mat2
#      [,1] [,2] [,3] [,4]
# [1,]    1    2    3    4
# [2,]    5    6    7    8
# [3,]    9   10   11   12

# Matrix properties
nrow(mat1)         # Number of rows: 3
ncol(mat1)         # Number of columns: 4
dim(mat1)          # Dimensions: 3 4

# Matrix indexing
mat1[2, 3]         # Row 2, Column 3: 8
mat1[1, ]          # Poori first row: 1 4 7 10
mat1[, 2]          # Poora second column: 4 5 6
mat1[1:2, 2:3]     # Submatrix

# Matrix operations
mat_a <- matrix(1:4, nrow=2)
mat_b <- matrix(5:8, nrow=2)

mat_a + mat_b      # Element-wise addition
mat_a * mat_b      # Element-wise multiplication
mat_a %*% mat_b    # Matrix multiplication
t(mat_a)           # Transpose
```

### 4. Data Frames (Most important for data analysis):
```r
# Data frame create karna
students_df <- data.frame(
  name = c("Amit", "Priya", "Raj", "Sita"),
  age = c(20, 22, 19, 21),
  marks = c(85, 92, 78, 88),
  city = c("Delhi", "Mumbai", "Chennai", "Kolkata"),
  passed = c(TRUE, TRUE, TRUE, TRUE)
)

# Data frame inspect karna
head(students_df)      # First 6 rows
tail(students_df)      # Last 6 rows
str(students_df)       # Structure
summary(students_df)   # Statistical summary
names(students_df)     # Column names
rownames(students_df)  # Row names

# Data frame indexing
students_df$name              # Column access
students_df[["marks"]]        # Alternative column access
students_df[1, ]              # First row
students_df[, 2]              # Second column
students_df[1:2, c("name", "marks")]  # Specific rows & columns

# Filtering
students_df[students_df$marks > 85, ]
students_df[students_df$city == "Delhi", ]
high_performers <- students_df[students_df$marks > 80 & students_df$age < 22, ]

# Adding columns
students_df$grade <- ifelse(students_df$marks >= 85, "A", "B")
students_df$total_score <- students_df$marks * 1.1

# Adding rows
new_student <- data.frame(
  name = "Neha",
  age = 20,
  marks = 90,
  city = "Pune",
  passed = TRUE,
  grade = "A",
  total_score = 99
)
students_df <- rbind(students_df, new_student)
```

### 5. Arrays (Multi-dimensional):
```r
# 3D array
arr <- array(1:24, dim = c(3, 4, 2))
arr

# Array indexing
arr[1, 2, 1]       # Element access
arr[, , 1]         # First 2D slice
arr[1, , ]         # First row across all slices
```

## Control Structures

### If-Else Statements:
```r
# Simple if
score <- 85
if (score >= 60) {
  print("Pass")
}

# If-else
if (score >= 85) {
  grade <- "A"
} else if (score >= 70) {
  grade <- "B"
} else if (score >= 60) {
  grade <- "C"
} else {
  grade <- "F"
}

# Vectorized ifelse
marks <- c(85, 67, 92, 45, 78)
grades <- ifelse(marks >= 60, "Pass", "Fail")
grades

# Nested ifelse
results <- ifelse(marks >= 85, "Excellent",
                 ifelse(marks >= 70, "Good",
                       ifelse(marks >= 60, "Average", "Fail")))
```

### Loops:

#### For Loop:
```r
# Simple for loop
for (i in 1:5) {
  print(paste("Number:", i))
}

# Loop over vector
fruits <- c("Apple", "Banana", "Orange")
for (fruit in fruits) {
  print(paste("I like", fruit))
}

# Loop over data frame rows
for (i in 1:nrow(students_df)) {
  name <- students_df$name[i]
  marks <- students_df$marks[i]
  print(paste(name, "scored", marks))
}

# Nested loops
for (i in 1:3) {
  for (j in 1:3) {
    print(paste(i, "x", j, "=", i*j))
  }
}
```

#### While Loop:
```r
# Simple while loop
count <- 1
while (count <= 5) {
  print(paste("Count is", count))
  count <- count + 1
}

# Practical example
sum <- 0
i <- 1
while (sum < 100) {
  sum <- sum + i
  i <- i + 1
}
print(paste("Sum reached", sum))
```

#### Repeat Loop:
```r
# Repeat with break
x <- 1
repeat {
  print(x)
  x <- x + 1
  if (x > 5) {
    break
  }
}
```

### Loop Control:
```r
# Next (continue equivalent)
for (i in 1:10) {
  if (i %% 2 == 0) {
    next  # Skip even numbers
  }
  print(i)
}

# Break
for (i in 1:10) {
  if (i == 6) {
    break  # Stop at 6
  }
  print(i)
}
```

## Functions

### Basic Functions:
```r
# Simple function
greet <- function(name) {
  return(paste("Hello", name))
}

# Function call
greet("Rahul")  # "Hello Rahul"

# Function with multiple parameters
calculate_area <- function(length, width) {
  area <- length * width
  return(area)
}

calculate_area(10, 5)  # 50

# Function with default parameters
greet_with_title <- function(name, title = "Mr.") {
  return(paste("Hello", title, name))
}

greet_with_title("Sharma")           # "Hello Mr. Sharma"
greet_with_title("Priya", "Ms.")     # "Hello Ms. Priya"
```

### Advanced Functions:
```r
# Function with multiple return values
calculate_stats <- function(numbers) {
  result <- list(
    mean = mean(numbers),
    median = median(numbers),
    sd = sd(numbers),
    min = min(numbers),
    max = max(numbers)
  )
  return(result)
}

numbers <- c(10, 15, 20, 25, 30)
stats <- calculate_stats(numbers)
stats$mean    # 20
stats$sd      # 7.905694

# Function with conditional logic
grade_calculator <- function(marks) {
  if (marks >= 90) {
    return("A+")
  } else if (marks >= 80) {
    return("A")
  } else if (marks >= 70) {
    return("B")
  } else if (marks >= 60) {
    return("C")
  } else {
    return("F")
  }
}

# Vectorized function using sapply
marks <- c(95, 87, 76, 65, 45)
grades <- sapply(marks, grade_calculator)
grades  # "A+" "A"  "B"  "C"  "F"
```

### Anonymous Functions (Lambda):
```r
# Anonymous function with sapply
numbers <- c(1, 4, 9, 16, 25)
sqrt_values <- sapply(numbers, function(x) sqrt(x))
sqrt_values  # 1 2 3 4 5

# Anonymous function with lapply
numbers_list <- list(a = 1:5, b = 6:10, c = 11:15)
means <- lapply(numbers_list, function(x) mean(x))
means
```

### Built-in Functions (Important ones):
```r
# Statistical functions
numbers <- c(10, 15, 20, 25, 30, 35, 40)

mean(numbers)      # Average: 25
median(numbers)    # Middle value: 25
sd(numbers)        # Standard deviation
var(numbers)       # Variance
sum(numbers)       # Total: 175
prod(numbers)      # Product of all numbers
min(numbers)       # Minimum: 10
max(numbers)       # Maximum: 40
range(numbers)     # Min and Max: 10 40
quantile(numbers)  # Quartiles

# Apply functions
# sapply - returns vector/array
result1 <- sapply(1:5, function(x) x^2)  # 1 4 9 16 25

# lapply - returns list
result2 <- lapply(1:5, function(x) x^2)  # List with 5 elements

# Apply on data frames
numeric_cols <- sapply(students_df, is.numeric)
numeric_data <- students_df[, numeric_cols]
column_means <- sapply(numeric_data, mean, na.rm = TRUE)
```

## Data Import/Export

### CSV Files:
```r
# CSV read karna
data <- read.csv("file.csv")
data <- read.csv("file.csv", header = TRUE, sep = ",")

# CSV write karna
write.csv(students_df, "students.csv", row.names = FALSE)

# Alternative (faster for large files)
library(readr)
data <- read_csv("file.csv")
write_csv(students_df, "students.csv")
```

### Excel Files:
```r
# Excel read karna (library install karni padegi)
install.packages("readxl")
library(readxl)

data <- read_excel("file.xlsx")
data <- read_excel("file.xlsx", sheet = "Sheet1")
data <- read_excel("file.xlsx", sheet = 2)

# Excel write karna
install.packages("writexl")
library(writexl)
write_xlsx(students_df, "students.xlsx")
```

### Text Files:
```r
# Text file read karna
lines <- readLines("file.txt")

# Text file write karna
writeLines(c("Line 1", "Line 2", "Line 3"), "output.txt")

# Tab-delimited files
data <- read.delim("file.txt", sep = "\t")
write.table(data, "output.txt", sep = "\t", row.names = FALSE)
```

### R Data Files:
```r
# R objects save karna
save(students_df, file = "students.RData")
save.image("workspace.RData")  # Complete workspace save

# R objects load karna
load("students.RData")

# Single object save/load
saveRDS(students_df, "students.rds")
students_df <- readRDS("students.rds")
```

## Data Manipulation

### Basic Data Operations:
```r
# Sample data create karna
df <- data.frame(
  name = c("Amit", "Priya", "Raj", "Sita", "Kiran"),
  age = c(25, 30, 22, 28, 35),
  salary = c(50000, 75000, 45000, 65000, 80000),
  department = c("IT", "HR", "IT", "Finance", "IT"),
  city = c("Delhi", "Mumbai", "Chennai", "Delhi", "Mumbai")
)

# Filtering rows
it_employees <- df[df$department == "IT", ]
high_earners <- df[df$salary > 60000, ]
delhi_it <- df[df$city == "Delhi" & df$department == "IT", ]

# Selecting columns
names_salaries <- df[, c("name", "salary")]
selected_cols <- df[, 2:4]  # Columns 2 to 4

# Sorting
df_sorted <- df[order(df$salary), ]              # Ascending by salary
df_sorted_desc <- df[order(-df$salary), ]        # Descending by salary
df_multi_sort <- df[order(df$department, -df$salary), ]  # Multi-column sort

# Adding new columns
df$annual_salary <- df$salary * 12
df$age_group <- ifelse(df$age < 30, "Young", "Senior")
df$salary_grade <- cut(df$salary,
                      breaks = c(0, 50000, 70000, Inf),
                      labels = c("Low", "Medium", "High"))

# Removing columns
df$annual_salary <- NULL
df_subset <- df[, !names(df) %in% c("city", "age_group")]
```

### Using dplyr (Modern approach):
```r
# dplyr install aur load karna
install.packages("dplyr")
library(dplyr)

# Pipe operator %>% use karna
df %>%
  filter(department == "IT") %>%
  select(name, salary) %>%
  arrange(desc(salary))

# Common dplyr operations
# Filter
it_employees <- df %>% filter(department == "IT")
high_earners <- df %>% filter(salary > 60000)

# Select
basic_info <- df %>% select(name, age, salary)
without_city <- df %>% select(-city)

# Arrange (sort)
sorted_df <- df %>% arrange(salary)
sorted_desc <- df %>% arrange(desc(salary))

# Mutate (add/modify columns)
df_new <- df %>%
  mutate(
    annual_salary = salary * 12,
    age_category = ifelse(age < 30, "Young", "Senior"),
    salary_ratio = salary / mean(salary)
  )

# Group by aur summarize
dept_summary <- df %>%
  group_by(department) %>%
  summarise(
    count = n(),
    avg_salary = mean(salary),
    total_salary = sum(salary),
    min_age = min(age),
    max_age = max(age)
  )

# Multiple grouping
city_dept_summary <- df %>%
  group_by(city, department) %>%
  summarise(
    employees = n(),
    avg_salary = mean(salary)
  )
```

### String Operations:
```r
# String functions
text <- c("Hello World", "R Programming", "Data Science")

# Basic operations
nchar(text)          # Length of each string
toupper(text)        # Uppercase
tolower(text)        # Lowercase
substr(text, 1, 5)   # First 5 characters

# String manipulation with stringr
install.packages("stringr")
library(stringr)

# Detect patterns
str_detect(text, "R")           # Which contain "R"
str_count(text, "a")            # Count of "a" in each
str_locate(text, "o")           # Position of first "o"

# Replace and modify
str_replace(text, "World", "Universe")
str_remove(text, "Hello ")
str_split("a,b,c,d", ",")       # Split string

# Real example
names <- c("amit kumar", "priya.sharma", "raj_singh")
cleaned_names <- names %>%
  str_to_title() %>%           # Title case
  str_replace_all("[._]", " ") # Replace dots and underscores
```

## Data Visualization

### Base R Graphics:
```r
# Sample data
x <- 1:10
y <- x^2

# Basic plots
plot(x, y)                    # Scatter plot
plot(x, y, type = "l")        # Line plot
plot(x, y, type = "b")        # Both points and lines

# Histogram
data <- rnorm(1000, mean = 50, sd = 15)
hist(data)
hist(data, breaks = 20, col = "lightblue", main = "Distribution")

# Bar plot
categories <- c("A", "B", "C", "D")
values <- c(23, 45, 56, 78)
barplot(values, names.arg = categories, col = "lightgreen")

# Box plot
boxplot(data)
boxplot(salary ~ department, data = df, main = "Salary by Department")

# Scatter plot matrix
pairs(df[, c("age", "salary")])
```

### ggplot2 (Professional visualization):
```r
install.packages("ggplot2")
library(ggplot2)

# Basic ggplot structure
# ggplot(data, aes(x, y)) + geom_*() + labs() + theme()

# Scatter plot
ggplot(df, aes(x = age, y = salary)) +
  geom_point() +
  labs(title = "Age vs Salary", x = "Age", y = "Salary")

# Add color by department
ggplot(df, aes(x = age, y = salary, color = department)) +
  geom_point(size = 3) +
  labs(title = "Age vs Salary by Department") +
  theme_minimal()

# Bar plot
ggplot(df, aes(x = department)) +
  geom_bar(fill = "steelblue") +
  labs(title = "Employees by Department") +
  theme_classic()

# Histogram
ggplot(df, aes(x = salary)) +
  geom_histogram(bins = 5, fill = "lightcoral", alpha = 0.7) +
  labs(title = "Salary Distribution") +
  theme_bw()

# Box plot
ggplot(df, aes(x = department, y = salary)) +
  geom_boxplot(fill = "lightgreen") +
  labs(title = "Salary Distribution by Department") +
  theme_minimal()

# Line plot
time_series <- data.frame(
  month = 1:12,
  sales = c(100, 120, 140, 110, 160, 180, 200, 190, 220, 240, 210, 250)
)

ggplot(time_series, aes(x = month, y = sales)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "red", size = 2) +
  labs(title = "Monthly Sales", x = "Month", y = "Sales") +
  theme_light()

# Faceting (multiple plots)
ggplot(df, aes(x = age, y = salary)) +
  geom_point() +
  facet_wrap(~department) +
  labs(title = "Age vs Salary by Department") +
  theme_minimal()

# Advanced customization
ggplot(df, aes(x = department, y = salary, fill = city)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_brewer(palette = "Set3") +
  labs(
    title = "Total Salary by Department and City",
    x = "Department",
    y = "Salary",
    fill = "City"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )
```

## Statistical Analysis

### Descriptive Statistics:
```r
# Sample data
data <- c(23, 45, 56, 78, 34, 67, 89, 12, 90, 43)

# Central tendency
mean(data)         # Average
median(data)       # Middle value
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
Mode(data)         # Most frequent value

# Dispersion
var(data)          # Variance
sd(data)           # Standard deviation
range(data)        # Min and max
IQR(data)          # Interquartile range
quantile(data)     # Quartiles

# Summary statistics
summary(data)      # Complete summary
```


ERROR: Error in parse(text = input): <text>:4:7: unexpected symbol
3: ## Table of Contents
4: 1. [R Kya
         ^
