# Module 01: Literature Review Basics

Welcome to Module 01! Now that you understand what research is, let's learn how to **build on existing knowledge** through literature reviews.

## What You'll Learn

- What a literature review is and why it matters
- Different types of research papers
- How academic papers are structured
- How to organize your reading effectively
- Building a knowledge foundation for your research

## Prerequisites

- Completed Module 00
- Curiosity about how knowledge is created!

## Time Required

**30 minutes**

---

In [None]:
# ========================================
# Setup
# ========================================

import os
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Create output directory
output_dir = "outputs/notebook_01"
os.makedirs(output_dir, exist_ok=True)

print("‚úÖ Setup complete!")
print(f"Output directory: {output_dir}")

## Part 1: What is a Literature Review?

### Definition

A **literature review** is a comprehensive survey of existing research on a specific topic. It's like creating a map of what is already known before you venture into unknown territory.

### Why Do Literature Reviews?

#### Real-World Analogy
Imagine you want to open a coffee shop:
- ‚ùå **Bad approach**: Just open it without research
- ‚úÖ **Good approach**: Study existing coffee shops, customer preferences, successful strategies

Similarly, in research:
- ‚ùå **Bad approach**: Jump straight into experiments
- ‚úÖ **Good approach**: Review what others have discovered first

### Key Benefits

1. **Avoid Reinventing the Wheel**
   - Someone may have already solved your problem
   - Build on their work instead of starting from scratch

2. **Learn from Others' Mistakes**
   - See what didn't work and why
   - Avoid common pitfalls

3. **Identify Research Gaps**
   - Find what hasn't been studied yet
   - Discover opportunities for original contribution

4. **Understand Context**
   - See how your work fits into the bigger picture
   - Connect different research streams

5. **Build Credibility**
   - Show you understand the field
   - Demonstrate thorough preparation

## Part 2: Types of Research Papers

Not all papers are the same! Understanding different types helps you know what to expect.

### 1. Original Research Papers

**What**: Present new, original findings

**Structure**:
- Introduction (background, research question)
- Methods (how they did it)
- Results (what they found)
- Discussion (what it means)
- Conclusion

**Example**: "Deep Learning for Stock Price Prediction: A Novel LSTM Approach"

**When to read**: When you want specific techniques or results

### 2. Review Papers / Survey Papers

**What**: Summarize existing research on a topic

**Purpose**: Give you an overview of a field

**Example**: "A Survey of Machine Learning Methods for Time Series Forecasting"

**When to read**: When you're new to a topic (start here!)

### 3. Meta-Analysis Papers

**What**: Statistically combine results from multiple studies

**Purpose**: Find overall trends across many studies

**Example**: "Effectiveness of Deep Learning in Medical Imaging: A Meta-Analysis"

**When to read**: When you want evidence-based conclusions

### 4. Position/Opinion Papers

**What**: Author's perspective on a topic

**Purpose**: Stimulate discussion, propose new directions

**Example**: "Why Current AI Benchmarks Are Misleading"

**When to read**: For different perspectives and future directions

### 5. Technical Reports / White Papers

**What**: Detailed technical documentation

**Purpose**: Document systems, methods, or products

**Example**: "BERT: Pre-training of Deep Bidirectional Transformers" (Google)

**When to read**: For implementation details

In [None]:
# ========================================
# Visualize Paper Types Distribution
# ========================================

# Typical distribution in a literature review
paper_types = [
    "Original\nResearch",
    "Review/\nSurvey",
    "Meta-\nAnalysis",
    "Position/\nOpinion",
    "Technical\nReports",
]
percentages = [50, 25, 10, 8, 7]
colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd"]

# Create pie chart
fig, ax = plt.subplots(figsize=(10, 7))

wedges, texts, autotexts = ax.pie(
    percentages,
    labels=paper_types,
    autopct="%1.0f%%",
    colors=colors,
    startangle=90,
    textprops={"fontsize": 11},
)

# Make percentage text bold
for autotext in autotexts:
    autotext.set_color("white")
    autotext.set_fontweight("bold")
    autotext.set_fontsize(12)

ax.set_title(
    "Typical Distribution of Paper Types in a Literature Review",
    fontsize=14,
    fontweight="bold",
    pad=20,
)

plt.tight_layout()
plt.savefig(f"{output_dir}/paper_types_distribution.png", dpi=150, bbox_inches="tight")
print(f"‚úÖ Chart saved to: {output_dir}/paper_types_distribution.png")

plt.show()

print("\nüí° Pro Tip:")
print("Start with review papers to get an overview, then dive into")
print("original research papers for specific details!")

## Part 3: Anatomy of a Research Paper

Understanding paper structure helps you read more efficiently!

### Typical Research Paper Structure

#### 1. Title
- **What to look for**: Is this relevant to your research?
- **Tip**: Use titles for quick filtering

#### 2. Abstract (150-300 words)
- **Contains**: Problem, methods, results, conclusions
- **What to look for**: High-level summary
- **Tip**: Read this FIRST to decide if you should continue

#### 3. Introduction
- **Contains**: Background, motivation, research question
- **What to look for**: Why this research matters
- **Typical length**: 1-3 pages

#### 4. Related Work / Literature Review
- **Contains**: Summary of existing research
- **What to look for**: What's been done before, research gaps
- **Tip**: Great source for finding more papers!

#### 5. Methodology / Methods
- **Contains**: How the research was conducted
- **What to look for**: Experimental design, data, algorithms
- **Tip**: Most technical section, skim on first read

#### 6. Results
- **Contains**: Findings, data, figures, tables
- **What to look for**: Key outcomes
- **Tip**: Pay attention to figures and tables

#### 7. Discussion
- **Contains**: Interpretation, implications, limitations
- **What to look for**: What results mean, study limitations
- **Tip**: Understand the "so what?" factor

#### 8. Conclusion
- **Contains**: Summary, future work
- **What to look for**: Main takeaways, research directions

#### 9. References
- **Contains**: Cited papers
- **What to look for**: More papers to read!
- **Tip**: Work backwards through citations

### Reading Strategy: The 3-Pass Approach

**Pass 1: Quick Scan (5-10 minutes)**
- Read: Title, Abstract, Introduction, Conclusion
- Look at: Figures and tables
- Decide: Is this worth reading in detail?

**Pass 2: Deeper Read (30-60 minutes)**
- Read: Everything except detailed proofs/derivations
- Note: Key points, methods, results
- Decide: Do I need to understand this deeply?

**Pass 3: Full Understanding (2-4 hours)**
- Read: Everything in detail
- Try: Reproducing results, understanding proofs
- Goal: Complete mastery

**Note**: Not every paper needs Pass 3!

In [None]:
# ========================================
# Create a Sample Paper Reading Tracker
# ========================================

# Create a template for tracking papers you read
paper_tracker = pd.DataFrame(
    {
        "Paper Title": ["Example: Deep Learning for Time Series", "Add your papers here..."],
        "Authors": ["Smith et al.", ""],
        "Year": [2023, ""],
        "Type": ["Original Research", ""],
        "Pass Level": ["Pass 2", ""],
        "Key Finding": ["LSTMs outperform traditional methods by 15%", ""],
        "Relevance (1-5)": [5, ""],
        "Status": ["Read", "To Read"],
    }
)

# Save template
paper_tracker.to_csv(f"{output_dir}/paper_reading_tracker.csv", index=False)
print("‚úÖ Paper tracking template created!")
print(f"Location: {output_dir}/paper_reading_tracker.csv")
print("\nYou can use this to track papers as you read them.\n")

# Display template
print(paper_tracker.to_string(index=False))

print("\nüí° Pro Tip:")
print("Keep a reading tracker to organize your literature review.")
print("Update it as you read each paper!")

## Part 4: Organizing Your Literature Review

### Organization Strategies

#### 1. By Theme/Topic
**Example**: Stock prediction research
- Papers on technical indicators
- Papers on machine learning methods
- Papers on sentiment analysis
- Papers on hybrid approaches

#### 2. By Chronology
**Purpose**: Show evolution of ideas
- Early work (1990s): Statistical methods
- Mid period (2000s): Traditional ML
- Recent (2010s+): Deep learning

#### 3. By Methodology
**Example**: Group by approach
- Supervised learning methods
- Unsupervised learning methods
- Reinforcement learning methods

#### 4. By Findings/Conclusions
**Purpose**: Compare results
- Papers supporting hypothesis X
- Papers contradicting hypothesis X
- Papers with mixed results

### Tools for Organization

#### Reference Managers
- **Zotero** (Free, open source) ‚≠ê Recommended for beginners
- **Mendeley** (Free)
- **EndNote** (Paid)
- **Papers** (Paid, Mac only)

#### Note-Taking Tools
- **Notion** (Flexible databases)
- **Obsidian** (Linked notes)
- **Evernote** (Simple notes)
- **Google Docs** (Collaborative)

#### Simple Spreadsheet
- Like our tracker above!
- Easy to sort and filter
- Works for small reviews (< 50 papers)

## Part 5: Best Practices for Literature Reviews

### DO:

‚úÖ **Start with review papers**
   - Get the lay of the land first

‚úÖ **Be systematic**
   - Document your search strategy
   - Keep track of what you've read

‚úÖ **Read critically**
   - Question assumptions
   - Look for limitations

‚úÖ **Follow citation trails**
   - Check references in good papers
   - Look for papers that cite important work

‚úÖ **Take notes as you read**
   - You won't remember everything
   - Note key points, questions, ideas

‚úÖ **Update regularly**
   - New papers are published constantly
   - Set up alerts for your topic

### DON'T:

‚ùå **Read everything in full detail**
   - Use the 3-pass approach
   - Be selective with Pass 3

‚ùå **Only read papers that agree with you**
   - Confirmation bias is real
   - Seek contradictory evidence

‚ùå **Ignore older papers**
   - Foundational work is important
   - Classic papers are classics for a reason

‚ùå **Trust everything you read**
   - Even peer-reviewed papers can be wrong
   - Look for replication studies

‚ùå **Work in isolation**
   - Discuss papers with colleagues
   - Join reading groups

### Common Mistakes to Avoid

1. **Starting too broad**
   - ‚ùå "Machine learning in healthcare"
   - ‚úÖ "Deep learning for diabetic retinopathy detection"

2. **Not setting boundaries**
   - Define scope: time range, paper types, quality threshold

3. **Plagiarizing**
   - Always paraphrase and cite
   - Use quotation marks for direct quotes

4. **Summarizing without synthesizing**
   - Don't just list papers
   - Show connections and patterns

5. **Stopping too early**
   - Keep reading until you see repetition
   - "Saturation" means you're not finding new themes

In [None]:
# ========================================
# Create Literature Review Checklist
# ========================================

checklist = """
LITERATURE REVIEW CHECKLIST
========================================

Before You Start:
‚ñ° Define your research question clearly
‚ñ° Set scope boundaries (time, type, quality)
‚ñ° Choose reference management tool
‚ñ° Set up note-taking system

During the Review:
‚ñ° Start with 2-3 recent review papers
‚ñ° Follow citation trails
‚ñ° Use the 3-pass reading approach
‚ñ° Take notes on each paper
‚ñ° Track papers in your system
‚ñ° Look for patterns and themes
‚ñ° Identify research gaps

Quality Checks:
‚ñ° Read papers from reputable sources
‚ñ° Check citation counts (influence)
‚ñ° Look for peer-reviewed publications
‚ñ° Consider recency (field dependent)
‚ñ° Read critically, question assumptions

After the Review:
‚ñ° Synthesize findings (not just summarize)
‚ñ° Identify research gaps
‚ñ° Connect themes across papers
‚ñ° Update your understanding of the field
‚ñ° Document your search strategy

Red Flags (Papers to Question):
‚ñ° No peer review
‚ñ° Unrealistic claims
‚ñ° Missing methodology details
‚ñ° Cherry-picked results
‚ñ° No limitations discussed
‚ñ° Conflicts of interest not disclosed
"""

# Save checklist
with open(f"{output_dir}/literature_review_checklist.txt", "w") as f:
    f.write(checklist)

print(checklist)
print(f"\n‚úÖ Checklist saved to: {output_dir}/literature_review_checklist.txt")
print("\nPrint this and use it for your literature reviews!")

---

## Summary

Congratulations on completing Module 01!

### Key Takeaways

‚úÖ **Literature reviews** survey existing research to build on prior knowledge

‚úÖ **Different paper types** serve different purposes (start with review papers!)

‚úÖ **Papers have standard structure** - learn it to read more efficiently

‚úÖ **Use the 3-pass approach** - not every paper needs deep reading

‚úÖ **Organize systematically** - use trackers and reference managers

‚úÖ **Read critically** - question assumptions, look for limitations

### What You Can Do Now

- Explain why literature reviews are important
- Identify different types of research papers
- Navigate a research paper efficiently
- Use the 3-pass reading approach
- Set up a paper tracking system

### Practice Exercise

**Exercise**: Literature Review Starter

1. Pick a data science topic you're interested in
2. Find ONE review/survey paper on that topic
3. Do a Pass 1 reading (10 minutes)
4. Add it to the paper tracker CSV we created
5. List 3 key themes from the paper

This will prepare you for Module 02 where we'll learn to find papers!

---

### Up Next

In **Module 02: Finding and Reading Papers**, you'll learn:
- How to use research databases (Google Scholar, arXiv)
- Effective search strategies
- Evaluating paper quality
- Advanced reading techniques
- Efficient note-taking methods

---

**Ready to continue?** Move on to `02_finding_and_reading_papers.ipynb`!

**Want to review?** Go back to the sections you found challenging.