```r

my_data <- read.csv("data.csv")
# Check top rows
head(my_data)

# View structure
str(my_data)

### summary functions
```r

summary(df)   # Summary of all columns
nrow(df)      # Number of rows
ncol(df)      # Number of columns
names(df)     # Column names


___

### data manipulation

```r
install.packages("dplyr") #install once
library(dplyr) #load every time


___

```r
df <- data.frame(name=c("alice","bob","diana"),
                 age=c(20,30,28),
                 score=c(34,55,66))



___

### select rows

```r
df %>% filter(age>25)
df %>% filter(passed == TRUE)

### mutate Add or change columns

```r
df %>% mutate(new_score = score + 5)
df %>% mutate(status = ifelse(passed, "Yes", "No"))

### arrange() – Sort rows

```r
df %>% arrange(age)         # Ascending
df %>% arrange(desc(score)) # Descending



___

### 5. summarise() and group_by() – Group + Aggregate

```r
df %>%
  group_by(group) %>%
  summarise(avg_score = mean(score), count = n())


#### Step 1: `df %>%`
- This is the start of a pipe.
- it says:"Take the data frame df and pass it to the next function."

- So it’s like saying:
```r
group_by(df, group)
```


###  Step 2: `group_by(group)`

- This groups your data by the values in the `group` column.
- it returns a grouped data frame, which looks like the original data but with a "**Groups:**" message at the top.
- Because grouping is just **metadata** — it tells the next functions (summarise, mutate, etc.) how to behave.

###  Step 3: summarise(avg_score = mean(score), count = n())
- This creates a summary table with one row per group.
 

___

1. **arrange()**  
  - Sort the data
  - `df %>% arrange(score)`
  - `df %>% arrange(desc(score))`
2. **rename()**
  - Rename columns
  - `df %>% rename(final_score=score)`
3.  **distinct()**
  - Remove duplicates
  - `df %>% distinct(group)`
4.  **slice()**
  -  Select rows by position
  - `df %>% slice(1)` # first row
  - `df %>% slice(1:2)`    # First two rows
  

5. **pull()** 
  - Extract a column as a **vector**
  - Useful for quick plots or modeling:
  - `df %>% pull(score)`

6. **case_when()**
 - Multiple conditions (like SQL CASE)
```r
df %>% mutate(
  grade = case_when(
    score >= 90 ~ "A",
    score >= 80 ~ "B",
    TRUE ~ "C"
  )
)



7. **across()**
 - Apply functions to multiple columns (powerful!)

- `df %>% mutate(across(c(age, score), log))`   # log-transform age and score


___

1. **joins**
   - left_join()	Keep all from left, match from right
   - right_join()	Keep all from right
   - inner_join()	Keep matching rows only
   - full_join()	Keep all rows from both
   - anti_join()	Keep only rows that don’t match

```r
left_join(df1, df2, by = "id")


2.  **Binding & combining data frames**
  - bind_rows(df1, df2) – Stack rows (like rbind)
  - bind_cols(df1, df2) – Add columns side by side (like cbind)



3. **Handling missing values**
  - is.na() – Detect NAs
  - filter(!is.na(score)) – Keep only rows where score is not NA
  - mutate(score = ifelse(is.na(score), 0, score)) – Replace NA with 0
  - drop_na() from **tidyr** – Drop rows with any NAs

___

4. **Pivoting (reshaping data) – from wide to long and vice versa**
 - 

- A `tibble` is a modern version of a **data frame**, provided by the tibble package (part of the `tidyverse`). It's designed to be more user-friendly, consistent, and robust than base R data frames.

##### wide format

```r
df_wide <- tibble(
  name = c("Alice", "Bob", "Charlie"),
  math = c(85, 90, 88),
  science = c(92, 87, 91)
)

```r
# A tibble: 3 × 3
  name    math science
  <chr>  <dbl>   <dbl>
1 Alice     85      92
2 Bob       90      87
3 Charlie   88      91


#### Convert to Long Format – pivot_longer()


```r
df_long <- df_wide %>%
  pivot_longer(cols = c(math, science),
               names_to = "subject",
               values_to = "score")

df_long

# A tibble: 6 × 3
  name    subject score
  <chr>   <chr>   <dbl>
1 Alice   math       85
2 Alice   science    92
3 Bob     math       90
4 Bob     science    87
5 Charlie math       88
6 Charlie science    91


```r
df_wide_back <- df_long %>%
  pivot_wider(names_from = subject,
              values_from = score)

df_wide_back

# A tibble: 3 × 3
  name    math science
  <chr>  <dbl>   <dbl>
1 Alice     85      92
2 Bob       90      87
3 Charlie   88      91


___

5. Nesting & grouped operations

```r
df %>%
  group_by(group) %>%
  nest()     # Turns each group into its own tibble
