```{thebe-button}

```


---
<div align="center">

# Exploratory Data Analysis 
<!-- logo -->
<img src="../slides/IDA-Logo.png" alt="IDA Logo" width="400" style="margin-bottom: 1em;"/>


## An Introduction to EDA in R and Python for IDA Staff




### Institute for Defense Analyses
#### 730 East Glebe Road · Alexandria, Virginia 22305

---
</div>
<audio controls src="../audio/EDA Intro.m4a">



---
<div style="display: flex; align-items: flex-start; gap: 2rem;">

  <!-- Left column: text -->
  <div style="flex: 1; min-width: 0;">

  ### Exploratory Data Analysis

  **What is Exploratory Data Analysis (EDA)?**  
  - Systematic approach to summarizing and visualizing data before modeling  
  - Helps uncover structure, spot anomalies, test assumptions  

  **Key EDA Steps**  
  - Data cleaning & handling missing values  
  - Descriptive statistics (mean, median, variance)  
  - Visual inspections (histograms, box plots, scatterplots)  

  **Why EDA First?**  
  - Prevents “garbage in, garbage out” in downstream analyses  
  - Builds your intuition—know your data before you trust your models  

  </div>

  <!-- Right column: image + citation -->
  <div style="flex: 0 0 40%; text-align: center;">
      
  <img src="../slides/eda.png" alt="EDA illustration" width="400" style="margin-bottom: 1em;"/>

  <p style="font-size: 0.8em; line-height: 1.2; color: #555; text-align: left;">
  Azevedo, N (2023 July 21). _What is Exploratory Data Analysis? Steps & Examples._ Scalable Path.<br/>
  Retrieved June 25, 2025, from <a href="https://www.scalablepath.com/data-science/exploratory-data-analysis" target="_blank">scalablepath.com/eda</a>
  </p>

  </div>

</div>
<audio controls src="../audio/EDA Slide 2.m4a">


---
<div style="display: flex; align-items: flex-start; gap: 2rem;">

  <!-- Left column: text -->
  <div style="flex: 1; min-width: 0;">

  ### Motivation – Why does EDA even Matter?

  **Reveals Hidden Patterns in the Data**  
  _which informs_

  - **Anomaly Detection**  
  - **Resource Allocation**  
  - **Operational Planning**  
  - **Risk Mitigation**

  </div>

  <!-- Right column: slide image -->
  <div style="flex: 0 0 40%; text-align: center;">

  <img src="../slides/hidden.png" alt="Motivation – Why does EDA even Matter?" style="max-width: 50%; height: auto;"/>

  </div>
</div>
<audio controls src="../audio/EDA Slide 3.m4a">


---

## Next: Data Cleaning

The very first step in any analysis is **data cleaning**. Cleaning your data—by finding missing values, fixing data types, and handling outliers—lays the groundwork for reliable, accurate results. 

Proceed to the Data Cleaning section to get started.

---

### Available Notebooks

To support you as you learn the fundamentals of data cleaning, we’ve created two guided, interactive notebooks—no prior coding or data experience required:

- **Python Notebook**: `01_data_cleaning_python.ipynb`  
  Walks you through each step using the **pandas** library. You’ll learn how to load your dataset, spot and fill missing values, correct data types, and flag outliers—with clear explanations and examples along the way.

- **R Notebook**: `01_data_cleaning_R.ipynb`  
  Covers the exact same tasks using the **tidyverse** packages (`readr`, `dplyr`, `tidyr`). Each step is broken down in plain language so you can follow even if this is your first time in R.

> **Tip for Beginners**: Start with the notebook that feels most familiar—even if neither does! Then try the other to see how the same cleaning steps look in a different tool. This side-by-side practice will build your confidence and give you flexibility for future analyses.  

<audio controls src="../audio/Intro transition.m4a">