# Welcome to the Diploma Thesis Seminar II R & SQL Block Course

## Course Overview

Welcome to the two block-lesson course, where we will learn the basics of **data preprocessing**, **cleaning**, and an **introduction to SQL** using R. This notebook will give you a quick overview of:

- Course goals
- Topics covered & format
- Technical setup and resources
- How to get help
- Next steps

I believe these materials will greatly help you when dealing with real-life data for your diploma thesis.


## 1. Learning Goals

- Learn how to **set up** and **navigate** R (and RStudio/Jupyter) for data analysis.
- Understand **data importing/exporting**, cleaning techniques, and best practices in **data preprocessing**.
- Explore **basic Exploratory Data Analysis (EDA)** and simple visualization to understand your data.
- Get a **practical introduction to SQL** in R, including how to query datasets and join tables.
- Gain confidence applying these skills to real-world data for your **journalism projects** or **final thesis**.


## 2. Course Format

We have **8 hours** of instruction, split into **two 4-hour blocks**. Each block is hands-on, so come prepared to **code along** and **experiment**. Here’s a high-level look at what we’ll do:

### Block 1 (4 hours)

1. **Environment Setup & Basic R Recap**
   - Installing/loading packages, checking your R/Jupyter environment, refreshing R syntax.
2. **Data Import & Export**
   - Using built-in functions and `tidyverse` tools to read/write CSV, Excel, etc.
3. **Data Cleaning & Preprocessing**
   - Handling missing values, renaming columns, converting data types.
4. **Basic EDA**
   - Summaries, quick plots (histograms, bar plots, boxplots) to understand data quality.

### Block 2 (4 hours)

1. **Data Reshaping & Combining**
   - Using `dplyr` joins, pivoting data with `tidyr`.
2. **Introduction to SQL in R**
   - Working with `sqldf` or `DBI` + `RSQLite`, running basic queries (`SELECT`, `JOIN`, etc.).
3. **Real-World Workflow & Case Study**
   - End-to-end demonstration (import → clean → transform → query → visualize).
4. **Wrap-up & Thesis Guidance**
   - Best practices, tips for data documentation, referencing code, further resources.


## 3. Requirements & Setup

- **Software**: We will use [Binder](https://mybinder.org) to help you get coding right away without the need to set up a local environment. If you want set up all of the required software locally anyway, you’ll need a working installation of **R** (4.0 or above recommended) and **Jupyter** with an **R kernel**, or **RStudio** if you prefer.
- **Packages**: We’ll use libraries such as `tidyverse`, `readxl`, `DBI`, `RSQLite`, and `sqldf`. We’ll install them together as part of the course if needed, but in Binder, they will be pre-installed for you.
- **Hardware**: A laptop with sufficient memory (4GB+). If you have large datasets in mind, more memory is better.
- **Data**: We’ll provide example datasets. You’re also encouraged to bring data relevant to your thesis or journalistic projects.


## 4. Getting Help

- **During class**: Don’t hesitate to **ask questions**. We’ll troubleshoot code and environment issues as we go.
- **Online resources**:
  - R for Data Science (Hadley Wickham & Garrett Grolemund)
  - RStudio cheat sheets (https://rstudio.com/resources/cheatsheets/)
  - Online tutorials for SQL basics (w3schools, etc.)


## 5. Course Policies & Expectations

- **Attendance**: Because this is a hands-on course, it’s highly recommended you attend both sessions live.
- **Participation**: We’ll do in-class exercises. Bring your laptop, follow the demonstrations, and attempt the practice tasks.
- **Collaboration**: You’re encouraged to help each other and discuss coding solutions. However, for individual assignments (if any), submit your own work.
- **Respect**: We value an inclusive environment. Be respectful of peers’ questions and coding experience levels.


## 6. Next Steps

1. **Confirm your environment**: In Binder, the environment should already be set up for you. If you wish to work locally, make sure you can open Jupyter Notebooks with R kernel (or RStudio if that’s your preference).
2. **Check your packages**: When working locally, install or update `tidyverse`, `DBI`, `RSQLite`, `sqldf` if you’d like to get a head start.
3. **Download example files**: If you wish, you can download the sample CSV/Excel files for the first lesson.
4. **Come ready to learn**: Have your questions or potential datasets ready for the interactive parts.

---

**Created by Petr Čala**

_Last updated: [2025-02-26]_
