# Module 02: Finding and Reading Papers

Welcome to Module 02! Now you know why literature reviews matter - let's learn **how to find and read research papers effectively**.

## What You'll Learn

- How to use research databases (Google Scholar, arXiv, PubMed)
- Effective search strategies and keywords
- Evaluating paper quality and credibility
- Efficient reading and note-taking techniques
- Managing your paper library

## Prerequisites

- Completed Modules 00-01
- Internet access

## Time Required

**40 minutes**

---

In [None]:
# ========================================
# Setup
# ========================================

import os
import pandas as pd
import matplotlib.pyplot as plt
import requests
from datetime import datetime

# Create output directory
output_dir = "outputs/notebook_02"
os.makedirs(output_dir, exist_ok=True)

print("✅ Setup complete!")
print(f"Output directory: {output_dir}")

## Part 1: Research Databases

Where do you actually find research papers? Here are the most important databases:

### Google Scholar (Best for Beginners)
**URL**: https://scholar.google.com

**Pros**:
- Free and easy to use
- Covers all disciplines
- Shows citation counts
- Finds papers even if behind paywalls

**Cons**:
- Quality varies (includes everything)
- Can be overwhelming

**Best for**: Starting your search, finding highly cited papers

### arXiv (Computer Science, Math, Physics)
**URL**: https://arxiv.org

**Pros**:
- Free, open access
- Latest research (preprints)
- Direct PDF downloads

**Cons**:
- Not peer-reviewed (preprints)
- Mainly CS/Math/Physics

**Best for**: Cutting-edge ML/AI research

### PubMed (Biomedical/Health Sciences)
**URL**: https://pubmed.ncbi.nlm.nih.gov

**Pros**:
- Free
- High quality (peer-reviewed)
- Medical/health focus

**Cons**:
- Limited to biomedical research

**Best for**: Healthcare data science

### Others Worth Knowing

- **Semantic Scholar**: AI-powered search (https://www.semanticscholar.org)
- **IEEE Xplore**: Engineering papers (subscription often needed)
- **ACM Digital Library**: Computer science (subscription often needed)
- **SSRN**: Social sciences (https://www.ssrn.com)
- **ResearchGate**: Networking + papers

## Part 2: Effective Search Strategies

### The PICO Framework

Originally from medicine, but useful for any research:

- **P**opulation: Who/what you're studying
- **I**ntervention: What you're applying
- **C**omparison: What you're comparing to
- **O**utcome: What you're measuring

**Example**: Stock price prediction
- **P**: Stock market data
- **I**: LSTM neural networks
- **C**: Traditional ARIMA models
- **O**: Prediction accuracy

**Search query**: "LSTM stock prediction accuracy ARIMA comparison"

### Search Operators (Google Scholar)

#### Exact Phrase
```
"deep learning"
```
Finds exact phrase

#### OR
```
"machine learning" OR "deep learning"
```
Finds papers with either term

#### Exclude
```
"machine learning" -medical
```
Excludes medical applications

#### Author Search
```
author:"Yoshua Bengio"
```
Find papers by specific author

#### Year Range
```
"deep learning" 2020..2024
```
Papers from 2020-2024

#### Title Search
```
intitle:"convolutional neural networks"
```
Search in titles only

### Building Better Queries

**Start Broad, Then Narrow**

1. **Too Broad**: "machine learning"
   - Returns millions of papers

2. **Better**: "machine learning stock prediction"
   - More specific, still thousands

3. **Even Better**: "LSTM stock price prediction malaysia"
   - Specific method + domain + location

4. **Very Specific**: "LSTM stock price prediction malaysia 2020..2024"
   - Recent research in specific area

### Alternative Keywords

Same concept, different words:

| Concept | Alternative Terms |
|---------|------------------|
| Machine Learning | ML, statistical learning, predictive modeling |
| Neural Networks | Deep learning, artificial neural networks, ANN |
| Prediction | Forecasting, estimation, projection |
| Classification | Categorization, recognition, identification |
| Performance | Accuracy, effectiveness, results |

In [None]:
# ========================================
# Create Search Strategy Template
# ========================================

search_template = """
RESEARCH PAPER SEARCH TEMPLATE
========================================

1. DEFINE YOUR RESEARCH QUESTION
   Question: _________________________________________
   
2. IDENTIFY KEY CONCEPTS (PICO)
   Population: _______________________________________
   Intervention: _____________________________________
   Comparison: _______________________________________
   Outcome: __________________________________________

3. GENERATE KEYWORDS
   Primary terms:
   - _______________________________________________
   - _______________________________________________
   
   Alternative terms:
   - _______________________________________________
   - _______________________________________________

4. BUILD SEARCH QUERIES
   Query 1 (Broad):
   _________________________________________________
   
   Query 2 (Medium):
   _________________________________________________
   
   Query 3 (Specific):
   _________________________________________________

5. SET INCLUSION CRITERIA
   Year range: ________ to ________
   Paper types: □ Original research  □ Reviews  □ Meta-analysis
   Languages: □ English  □ Other: ___________
   Quality threshold: _______________________________

6. SET EXCLUSION CRITERIA
   Exclude:
   - _______________________________________________
   - _______________________________________________

7. CHOOSE DATABASES
   □ Google Scholar
   □ arXiv
   □ PubMed
   □ Other: ________________________________________

8. TRACK YOUR SEARCHES
   Date: ___________
   Database: ___________
   Query used: _____________________________________
   Results found: ___________
   Papers selected for reading: ___________
"""

# Save template
with open(f"{output_dir}/search_strategy_template.txt", "w") as f:
    f.write(search_template)

print(search_template)
print(f"\n✅ Template saved to: {output_dir}/search_strategy_template.txt")
print("\nUse this template to plan your literature search!")

## Part 3: Evaluating Paper Quality

Not all papers are equally trustworthy! Here's how to evaluate quality:

### Quick Quality Checks

#### 1. Publication Venue
**High Quality**:
- Top conferences (NeurIPS, ICML, CVPR for ML)
- Peer-reviewed journals
- University press

**Question Carefully**:
- Predatory journals (ask your advisor)
- Unreviewed preprints
- Blog posts (not academic)

#### 2. Citation Count
**Rule of Thumb** (field dependent):
- Recent paper (< 2 years): 10+ citations is good
- Older paper (> 5 years): 100+ citations is influential
- Classic paper: 1000+ citations

**Caution**: New papers haven't had time to be cited!

#### 3. Author Reputation
- Check author's other work
- Look at their affiliations
- H-index (higher = more influential)

#### 4. Methodology Quality
**Good signs**:
- Clear experimental design
- Appropriate statistical tests
- Large enough sample size
- Code/data availability

**Red flags**:
- Vague methodology
- Missing details
- No error bars or confidence intervals
- Cherry-picked results

#### 5. Transparency
**Look for**:
- Limitations discussed
- Negative results reported
- Conflicts of interest disclosed
- Reproducibility information

### The CRAAP Test

A framework for evaluating sources:

**C**urrency: Is it recent enough?
**R**elevance: Does it address your question?
**A**uthority: Are the authors credible?
**A**ccuracy: Is the information reliable?
**P**urpose: Why was it written?

In [None]:
# ========================================
# Create Paper Quality Evaluation Scorecard
# ========================================

quality_scorecard = pd.DataFrame(
    {
        "Criterion": [
            "Publication Venue",
            "Peer Review Status",
            "Citation Count",
            "Author Reputation",
            "Methodology Clarity",
            "Sample Size",
            "Statistical Rigor",
            "Transparency",
            "Code/Data Available",
            "Limitations Discussed",
        ],
        "Weight": [10, 15, 10, 10, 15, 10, 10, 10, 5, 5],
        "Score (1-5)": ["", "", "", "", "", "", "", "", "", ""],
        "Notes": ["", "", "", "", "", "", "", "", "", ""],
    }
)

print("PAPER QUALITY EVALUATION SCORECARD")
print("=" * 70)
print("\nInstructions: Rate each criterion from 1-5")
print("1 = Poor, 2 = Fair, 3 = Good, 4 = Very Good, 5 = Excellent")
print("\nTotal possible score: 500 points (sum of Weight × Score)\n")
print(quality_scorecard.to_string(index=False))

# Save scorecard
quality_scorecard.to_csv(f"{output_dir}/paper_quality_scorecard.csv", index=False)
print(f"\n✅ Scorecard saved to: {output_dir}/paper_quality_scorecard.csv")
print("\nScoring Guide:")
print("  400-500 points: Excellent paper, highly trustworthy")
print("  300-399 points: Good paper, generally reliable")
print("  200-299 points: Fair paper, use with caution")
print("  < 200 points: Poor quality, probably skip")

## Part 4: Efficient Reading Techniques

### The 3-Pass Method (Detailed)

Remember from Module 01? Let's go deeper:

#### Pass 1: Quick Scan (5-10 minutes)

**Goal**: Decide if worth reading in detail

**What to read**:
1. Title and abstract
2. Introduction (first few paragraphs)
3. Section headings
4. Conclusion
5. Figures and captions
6. References (scan for familiar papers)

**Questions to answer**:
- What is the main contribution?
- Is it relevant to my research?
- Is the quality acceptable?
- Should I read in more detail?

**Output**: Yes/No decision + 1-sentence summary

#### Pass 2: Detailed Read (30-60 minutes)

**Goal**: Understand the paper's content

**What to read**:
1. Everything except detailed math/proofs
2. Look carefully at figures, tables, graphs
3. Note key points and techniques

**What to do**:
- Take notes as you read
- Mark important passages
- Note questions/confusion
- Jot down new ideas

**Questions to answer**:
- What methods did they use?
- What were the main results?
- What are the limitations?
- Do I believe the conclusions?

**Output**: Detailed notes, understand 80% of paper

#### Pass 3: Deep Understanding (2-4 hours)

**Goal**: Complete mastery (only for critical papers!)

**What to do**:
- Read every detail including proofs
- Work through examples yourself
- Try to reproduce results
- Identify assumptions
- Think about extensions

**Questions to answer**:
- Can I explain this paper to someone else?
- Can I implement their method?
- What would I do differently?
- How can I build on this?

**Output**: Near-expert understanding

### Active Reading Strategies

#### 1. Ask Questions While Reading
- Why did they make this choice?
- What are alternative explanations?
- How does this connect to other work?
- What's missing?

#### 2. Visualize Concepts
- Draw diagrams
- Create flowcharts
- Sketch the architecture

#### 3. Summarize in Your Own Words
- Don't just highlight
- Paraphrase key points
- Explain it like you're teaching

#### 4. Connect to What You Know
- How is this similar to X?
- How is this different from Y?
- What's the bigger picture?

### Note-Taking Methods

#### Cornell Notes
```
┌─────────────┬───────────────────────┐
│             │                       │
│  Keywords   │   Main Notes          │
│  Questions  │   - Point 1           │
│             │   - Point 2           │
│             │                       │
├─────────────┴───────────────────────┤
│  Summary (in your own words)        │
└─────────────────────────────────────┘
```

#### Mind Mapping
- Central idea in middle
- Branch out to subtopics
- Visual and connective

#### Structured Template
- Citation
- Research question
- Methods
- Key findings
- Limitations
- My thoughts/questions
- How it relates to my work

In [None]:
# ========================================
# Create Paper Summary Template
# ========================================

summary_template = """
RESEARCH PAPER SUMMARY TEMPLATE
========================================

CITATION
Authors: _____________________________________________
Title: _______________________________________________
Year: ________  Venue: _______________________________
DOI/URL: _____________________________________________

OVERVIEW (Pass 1 - After quick scan)
Main contribution (1 sentence):
_____________________________________________________
_____________________________________________________

Relevance to my research (1-5): ____
Quality (1-5): ____

RESEARCH QUESTION
What problem are they solving?
_____________________________________________________
_____________________________________________________

METHODOLOGY (Pass 2 - After detailed read)
Approach/Methods used:
_____________________________________________________
_____________________________________________________

Data:
_____________________________________________________

KEY FINDINGS
Main results:
1. __________________________________________________
2. __________________________________________________
3. __________________________________________________

Supporting evidence:
_____________________________________________________
_____________________________________________________

STRENGTHS
- ___________________________________________________
- ___________________________________________________
- ___________________________________________________

LIMITATIONS
- ___________________________________________________
- ___________________________________________________
- ___________________________________________________

CONNECTIONS
Related to these papers:
- ___________________________________________________
- ___________________________________________________

Contrasts with:
- ___________________________________________________

MY THOUGHTS
Questions this raised:
- ___________________________________________________
- ___________________________________________________

Ideas for my research:
- ___________________________________________________
- ___________________________________________________

What I'd do differently:
- ___________________________________________________

ACTIONABLE INSIGHTS
How can I apply this?
_____________________________________________________
_____________________________________________________

FOLLOW-UP
Papers to read next (from references):
- ___________________________________________________
- ___________________________________________________

TAGS/KEYWORDS
_____________________________________________________

DATE READ: ___________
PASS LEVEL: □ Pass 1  □ Pass 2  □ Pass 3
"""

# Save template
with open(f"{output_dir}/paper_summary_template.txt", "w") as f:
    f.write(summary_template)

print(summary_template)
print(f"\n✅ Template saved to: {output_dir}/paper_summary_template.txt")
print("\nUse this template to take notes on each paper you read!")

## Part 5: Managing Your Paper Library

### Reference Management Software

#### Zotero (Recommended for Beginners)
**Free, open source**

**Features**:
- Browser extension (one-click save)
- PDF management and annotation
- Automatic citation generation
- Sync across devices
- Organize with collections

**How to use**:
1. Install Zotero desktop app
2. Install browser connector
3. Click connector when viewing paper
4. Paper saved with metadata!

#### Mendeley
**Free (owned by Elsevier)**

**Features**:
- Similar to Zotero
- Social networking features
- Mobile app

### File Organization

#### Option 1: By Topic
```
Papers/
├── Time_Series/
│   ├── LSTM_methods/
│   └── Traditional_methods/
├── Stock_Prediction/
└── Technical_Indicators/
```

#### Option 2: By Date
```
Papers/
├── 2024/
├── 2023/
└── Older/
```

#### Option 3: By Reading Status
```
Papers/
├── To_Read/
├── Reading/
├── Read_Summarized/
└── Key_Papers/
```

**Pro Tip**: Use reference manager instead of manual filing!

---

## Summary

Congratulations on completing Module 02!

### Key Takeaways

✅ **Use multiple databases** - Google Scholar, arXiv, PubMed for different purposes

✅ **Build effective search queries** - Use PICO framework and search operators

✅ **Evaluate quality carefully** - Check venue, citations, methodology, transparency

✅ **Read efficiently** - 3-pass method, not every paper needs deep reading

✅ **Take structured notes** - Use templates, summarize in your own words

✅ **Use reference managers** - Zotero or Mendeley save time and effort

### What You Can Do Now

- Find relevant papers using research databases
- Build effective search queries
- Evaluate paper quality and credibility
- Use the 3-pass reading method
- Take structured, actionable notes
- Organize your paper library

### Practice Exercise

**Exercise**: Find and Evaluate Papers

1. Pick your research topic from Module 01
2. Use the search template to build 3 queries
3. Search Google Scholar and arXiv
4. Find 5 relevant papers
5. Do Pass 1 reading on all 5
6. Use the quality scorecard on the best 2
7. Do Pass 2 reading on 1 paper
8. Fill out the summary template

This will prepare you for Module 03!

---

### Up Next

In **Module 03: Research Methodology**, you'll learn:
- How to formulate research questions
- Developing testable hypotheses
- Types of research approaches
- Designing your research plan
- Avoiding common pitfalls

---

**Ready to continue?** Move on to `03_research_methodology.ipynb`!

**Need more practice?** Do the practice exercise above before moving on.