# Part 2, Lesson 4: Reproducible Workflows, R Markdown/Quarto & Final Projects
**Author:** Your Name  
**Date:** Block Lecture (4 hours)

# Part 2 – Lesson 4

## Reproducible Projects, Reporting with R Markdown/Quarto & Final Project Best Practices

Welcome to the **fourth 4-hour session** of Part 2! This lesson ties together everything from the course by focusing on:

1. **Recap & Q&A** (from Part 2, Lessons 1–3)
2. **Reproducible Workflows** & why they matter
3. **R Markdown & Quarto** for dynamic reporting
4. **Version Control & Collaboration**
5. **Final Project Guidance** & best practices
6. **Mini-Workshop**: Putting it all together

By the end, you’ll understand how to **structure** your projects, **document** your analyses, and **share** them with others in a transparent, reproducible way.


---
## 1. Recap & Q&A
Over the past lessons in **Part 2**, you learned:
1. **Database & SQL** fundamentals and advanced queries (joins, window functions).
2. Collecting data from **APIs** and **web scraping**.
3. Handling **JSON**, storing data in a database, and bridging SQL + R.

### Check-In
- Any open questions about your own data sources?
- Challenges with advanced SQL (window functions, indexing)?
- Successful scrapes or API calls?

We’ll now move to a higher-level **workflow** approach—how to keep your analysis well-organized and shareable.


---
## 2. Reproducible Workflows & Why They Matter
A **reproducible workflow** means any future person (or your future self) can rerun the code and get the same results.

### 2.1 Key Principles
1. **Automation**: Script everything. Avoid manual data cleaning in spreadsheets.
2. **Documentation**: Explain your steps, assumptions, and transformations.
3. **Version Control**: Track changes in code & data.
4. **Environment Management**: Record which R packages and versions you used.

### 2.2 Example Workflow Steps
1. Pull data from **API** or **database**.
2. Use **scripts** for cleaning and preprocessing.
3. Conduct **analysis** and **visualizations** in R.
4. Generate final **report** or **dashboard**.
5. Save everything in a structured, version-controlled **project**.


### 2.3 Project Folder Structure
```text
my-project/
├─ data/
│   ├─ raw/             # unmodified source data
│   └─ processed/       # cleaned or intermediate data
├─ code/
│   ├─ 01_scrape_api.R
│   ├─ 02_clean_data.R
│   └─ 03_analysis.R
├─ docs/
│   ├─ final_report.Rmd # or .qmd
│   └─ references.md
├─ output/
│   └─ final_visuals/
├─ .git/ (for version control)
├─ README.md            # overview of the project
└─ my-project.Rproj     # R project file (optional, if using RStudio)
```
This ensures consistent organization across multiple collaborators or future re-runs.


---
## 3. R Markdown & Quarto
### 3.1 What They Are
- **R Markdown**: A file format (`.Rmd`) that combines **Markdown** text + **R code chunks**. You “knit” it into HTML/PDF/Word.
- **Quarto**: A next-generation framework, supporting multi-language (R, Python, Julia) and more flexible output (documents, websites, books, etc.). `.qmd` is a Quarto file.

### 3.2 Creating a Simple R Markdown
```yaml
---
title: "My Analysis"
author: "My Name"
output: html_document
---
```
Then, in code chunks:
```{r}
summary(mtcars)
```
When you **knit**, the result is a neat **HTML** with embedded output.

### 3.3 Quarto Example
```yaml
# my_report.qmd
---
title: "My Analysis"
author: "My Name"
format: html
---
```
```{r}
summary(mtcars)
```
Then run `quarto render my_report.qmd` in your terminal (or use an IDE integration). Similar idea to R Markdown, but more flexible.


### 3.4 Benefits
- All analysis steps are **self-contained** in one document.
- Easy to share with collaborators or editors.
- Reduces risk of **copy-paste** errors since code is executed live.
- Encourages good documentation.

> **Exercise**: Create a short `.Rmd` or `.qmd` file that reads a CSV, does a quick summary, and outputs a plot. Knit to HTML.


---
## 4. Version Control & Collaboration
### 4.1 Why Version Control?
- Track **changes** over time.
- Revert to previous commits if something breaks.
- Collaborate on code with team members.

### 4.2 Basic Git Workflow
1. `git init` to create a local repo.
2. `git add .` and `git commit -m "Initial commit"`.
3. `git remote add origin ...` (if using GitHub/GitLab).
4. `git push origin main` (or `master`) to sync.

### 4.3 Git + R Markdown Tips
- The `.Rmd` or `.qmd` source is text-based, so Git can track changes easily.
- Avoid committing large binary files or data if possible. Instead, keep data in a separate repository or location.
- Use `.gitignore` to exclude files like `.Rhistory`, `.RData`, or large CSVs if needed.

> **Exercise**: Initialize a Git repo in your project folder. Commit your scripts and `.Rmd` file. Optionally create a private GitHub repo and push.


---
## 5. Final Project Guidance & Best Practices
### 5.1 Structuring Your Analysis
1. **Define a clear question** or hypothesis.
2. **Gather data** from reliable sources (APIs, databases, etc.).
3. **Clean & preprocess** the data thoroughly (document decisions!).
4. Perform **EDA** (exploratory data analysis) and consider advanced methods if needed (SQL window functions, regression, etc.).
5. Create **visualizations** that effectively communicate your findings.
6. Summarize in an **R Markdown/Quarto** report or a **Shiny** app.

### 5.2 Common Pitfalls
- **Lack of documentation**: Hard for others to replicate.
- **Mismatched data types**: E.g., strings vs. factors.
- **Ignoring context**: Statistical significance vs. real-world significance.
- **Poor time management**: Data collection or cleaning can be time-consuming.

### 5.3 Presenting Your Work
- Focus on the **story** or **angle**: Journalistic perspective.
- Provide **methodology**: Show how data was obtained and cleaned.
- Consider **ethics/privacy**: If personal data is involved, anonymize or handle securely.

### 5.4 Example Outline for a Thesis or Journalistic Piece
1. **Introduction**: Motivating question or story background.
2. **Data**: Source, licensing, initial structure.
3. **Methods**: Cleaning steps, analysis approach, software used.
4. **Findings**: Key tables, plots, insights.
5. **Discussion**: Interpret results, limitations.
6. **Conclusion**: Summarize main takeaway.
7. **Appendix**: R Markdown code, references, further tables.


---
## 6. Mini-Workshop: Putting It All Together
### 6.1 Proposed Activities
1. **Create a short R Markdown** (or Quarto) document that:
   - Loads a dataset (CSV or from a **SQL** database).
   - Performs a quick cleaning step (rename columns, filter rows).
   - Runs a simple summary or **SQL** query if using a DB.
   - Produces one or two **plots**.
   - Outputs a **conclusion** or short narrative.
2. **Use Git** to track changes:
   - Initialize a local repository.
   - Commit your `.Rmd/.qmd` & associated scripts.
   - Optionally push to GitHub.
3. **Share** your final HTML/PDF with the group or instructor.

### 6.2 Discussion
- Did you encounter any issues with data or dependencies?
- What was challenging to document?
- How would you expand this approach for a bigger project?


---
## Wrap-Up & Next Steps
You’ve now completed **Part 2, Lesson 4**, concluding this advanced module. Here’s a quick summary:

1. Importance of **reproducible workflows**: consistent file structures, minimal manual steps.
2. **R Markdown/Quarto**: a powerful way to blend text + code.
3. **Version control**: collaborating, tracking changes, ensuring code integrity.
4. **Final project best practices**: structure, clarity, documentation, ethical considerations.

### Where to Go from Here?
- Extend your knowledge with **Shiny** for interactive dashboards.
- Delve into **machine learning** or advanced statistical methods if relevant.
- Explore **Quarto** for building entire websites, books, or multi-lingual docs.
- Keep practicing with real datasets, applying all the cleaning, SQL, scraping, and documentation techniques.

**Congratulations** on completing this multi-part course! You now have a solid foundation in R for data preprocessing, SQL, APIs, web scraping, advanced data wrangling, and reproducible reporting. Use these skills to tackle your final thesis or journalism projects with confidence.

# End of Part 2, Lesson 4
