## 🎓 Course Title: **Data Storytelling with R: From Raw Data to Insightful Reports**

### 🔧 Tools Used:

* **Google Colab with R kernel**
* `tidyverse`, `skimr`, `GGally`, `ggplot2`, `broom`, `performance`, `showtext`, `patchwork`, `modelsummary`
* **Canva** (for final data story visuals)

# 📘 Data Science with R: 20-Lesson Course Plan
**Project Theme:** Analyzing the link between education expenditure and refugee population by country

---

## 🟦 Module 1: Setup & Introduction (Lessons 1–3)

### 📗 Lesson 1: Course Introduction & Project Overview
- What is data science? What is data mining?
- Introduce the research question and datasets (GDP + RefugeeData)
- What makes a good data science project?
- Tools: R, Google Colab, tidyverse
- [Hands-on] Create your first Colab R notebook

---

### 📗 Lesson 2: R Basics Refresher
- Variables, data types, vectors, lists, data frames
- Basic R syntax: functions, `if`, `for`, `apply()`
- [Hands-on] Mini coding quiz: manipulate small data frames

---

### 📗 Lesson 3: Libraries, Folder Structure & Data Import
- Install and load packages: `tidyverse`, `readxl`, `janitor`, `skimr`
- How to organize a project: raw, clean, output folders
- [Hands-on] Upload and read GDP.csv and RefugeeData.csv

---

## 🟦 Module 2: Data Wrangling & Cleaning (Lessons 4–6)

### 📗 Lesson 4: Inspect & Clean the Data
- Understand `glimpse()`, `head()`, `tail()`, `summary()`
- Clean column names with `janitor::clean_names()`
- [Hands-on] Explore both datasets and clean them

---

### 📗 Lesson 5: Tidy the GDP Dataset
- Wide vs long format
- Use `pivot_longer()` to reshape GDP
- Extract and clean year columns using `str_extract()`
- [Hands-on] Make GDP dataset tidy

---

### 📗 Lesson 6: Merge Datasets & Handle Missing Data
- Use `left_join()`, match by country and year
- Handle missing values: `drop_na()`, `replace_na()`
- Convert year to numeric, country to factor
- [Hands-on] Merge refugee and GDP data into one clean set

---

## 🟦 Module 3: Exploratory Data Analysis (Lessons 7–9)

### 📗 Lesson 7: Univariate EDA with ggplot2
- Visualize one variable: histogram, bar, boxplot
- Customize with `theme_minimal()`, titles, colors
- [Hands-on] Plot refugee counts and education spending

---

### 📗 Lesson 8: Bivariate Visual Analysis
- Scatter plots: one variable vs another
- Add labels, fit lines, use color by region/country
- [Hands-on] Scatter plot refugee vs education expenditure

---

### 📗 Lesson 9: Summaries and Trends Over Time
- Use `group_by() + summarise()` to aggregate
- Plot country/year trends
- Use `patchwork` to combine multiple plots
- [Hands-on] Refugee and education trends in 3 countries

---

## 🟦 Module 4: Data Transformation & Scaling (Lessons 10–11)

### 📗 Lesson 10: Log Transformation for Skewed Data
- Why log is needed (e.g., refugee count skewed)
- Use `log1p()` to handle zero values
- [Hands-on] Compare before/after histograms of log(refugees)

---

### 📗 Lesson 11: Feature Engineering
- Create new columns: ratios, bins, categories
- Detect outliers using IQR or z-score
- [Hands-on] Engineer log variables and a refugee-to-population ratio (if population available)

---

## 🟦 Module 5: Correlation & Modeling (Lessons 12–15)

### 📗 Lesson 12: Correlation Analysis
- Pearson vs Spearman correlation
- Use `cor()`, `cor.test()` and interpret results
- [Hands-on] Correlation heatmap between variables

---

### 📗 Lesson 13: Linear Regression Modeling
- Concept: What is a regression line?
- Use `lm()` for simple linear model
- Interpret output with `summary()`, visualize with `geom_smooth()`
- [Hands-on] Model: refugees ~ education expenditure

---

### 📗 Lesson 14: Regression Diagnostics
- Residuals, heteroskedasticity, outliers
- Use `broom`, `performance::check_model()`
- [Hands-on] Plot residuals vs fitted values

---

### 📗 Lesson 15: Multiple Regression
- Add control variables (e.g., GDP per capita if available)
- Model comparison with `modelsummary::modelsummary()`
- [Hands-on] Build and compare 2 models

---

## 🟦 Module 6: Insights, Reports & Presentation (Lessons 16–20)

### 📗 Lesson 16: Interpret Model Results
- Coefficients, confidence intervals
- What is practically vs statistically significant
- [Hands-on] Write 3 findings based on model output

---

### 📗 Lesson 17: Visualizing Model Predictions
- Use `predict()`, plot predicted vs actual
- Use `geom_line()`, `geom_point()`, add trendlines
- [Hands-on] Plot prediction line and actual data points

---

### 📗 Lesson 18: Build a Country Profile (Mini-Project)
- For 1 country: show trends, refugee stats, education
- Combine plots into a dashboard with `patchwork`
- [Hands-on] Create a 1-page profile for chosen country

---

### 📗 Lesson 19: Generate Report with RMarkdown (optional in Colab)
- Structure: Intro → Methods → Results → Insights
- Use `rmarkdown::render()` (outside Colab)
- Export to PDF/HTML (demonstrate manually)
- [Hands-on] Turn your analysis into a report

---

### 📗 Lesson 20: Final Presentation & Poster
- Save charts with `ggsave()`
- Export visuals for Canva
- Create a final infographic for your project
- [Hands-on] Share insights in 3 slides or 1-pager

---

## 🏁 Course Outcome
By the end of the course, you will:
- Understand the full data science process
- Apply statistical and visual analysis in R
- Use real-world data to answer a meaningful question
- Communicate insights with clarity and confidence

---

## Reference

- Lander, J. P. (2017). R for everyone: Advanced analytics and graphics (2nd ed.). Addison-Wesley.
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R. Springer.
- Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2018). Introduction to data mining (2nd ed.). Pearson.