# 04 – Summary & Missing Data

Core R concepts: summarize data and handle missing values.

*Part of the [Foundations: Python, R & SQL](../README.md) series.*

## 1. Summary Statistics

In [1]:
# Sample data
data <- c(2, 4, 6, 8, 10)

In [2]:
mean(data)

In [3]:
median(data)

In [4]:
sd(data)

In [5]:
min(data)

In [6]:
max(data)

In [7]:
summary(data)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2       4       6       6       8      10 

## 2. Detecting Missing Values

In [8]:
# Vector with missing values
x <- c(5, NA, 9, NA, 3)

is.na(x)

In [9]:
# Count of missing values
sum(is.na(x))  

In [10]:
# At least one missing
any(is.na(x))  

In [11]:
# All missing
all(is.na(x))  

## 3. Handling Missing Data

In [12]:
# Remove missing values
x_clean <- na.omit(x)
x_clean

In [13]:
# Replace missing values
x_fixed <- ifelse(is.na(x), mean(x, na.rm = TRUE), x)
x_fixed

## 4. Summary for Data Frames

In [14]:
df <- data.frame(
  id = 1:5,
  score = c(90, 85, NA, 88, NA)
)

df

id,score
1,90.0
2,85.0
3,
4,88.0
5,


In [15]:
summary(df)

       id        score      
 Min.   :1   Min.   :85.00  
 1st Qu.:2   1st Qu.:86.50  
 Median :3   Median :88.00  
 Mean   :3   Mean   :87.67  
 3rd Qu.:4   3rd Qu.:89.00  
 Max.   :5   Max.   :90.00  
             NA's   :2      

In [16]:
is.na(df$score)

## Summary

| Task                       | Function / Syntax                   |
|----------------------------|-------------------------------------|
| Mean, median, std dev      | `mean()`, `median()`, `sd()`        |
| Detect NA                  | `is.na()`, `sum(is.na())`           |
| Remove NAs                 | `na.omit()`                         |
| Replace NAs                | `ifelse(is.na(x), value, x)`        |
| Summary for data frames    | `summary(df)`|
