# Module 02: Data Collection with yfinance

**Difficulty**: ‚≠ê (Beginner)

**Estimated Time**: 60 minutes

**Prerequisites**: 
- Completed Module 00: Setup and Introduction
- Completed Module 01: Bursa Malaysia Fundamentals
- Understanding of pandas DataFrames

## Learning Objectives

By the end of this notebook, you will be able to:
1. Download stock data at different intervals (daily, weekly, monthly, intraday)
2. Efficiently download multiple stocks simultaneously
3. Handle missing data and data quality issues
4. Save and load stock data locally for offline analysis
5. Build a watchlist database for your favorite Malaysian stocks
6. Retrieve company information and fundamental data

## Introduction: Why Data Quality Matters

**"Garbage in, garbage out"** - This famous programming principle is especially true in stock analysis.

### The Foundation of Technical Analysis

Every trading decision, indicator calculation, and backtest depends on **quality data**. Poor data leads to:
- ‚ùå Incorrect indicator signals
- ‚ùå False trading opportunities
- ‚ùå Misleading backtest results
- ‚ùå Potential losses in real trading

### What You'll Master

By the end of this module, you'll have a robust data collection system that:
- ‚úÖ Downloads clean, validated data
- ‚úÖ Handles errors gracefully
- ‚úÖ Stores data efficiently for offline use
- ‚úÖ Scales to analyze hundreds of stocks

Let's build your data infrastructure!

In [None]:
# Setup: Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
from datetime import datetime, timedelta
from pathlib import Path
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Visualization configuration
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (14, 7)

# Set random seed for reproducibility
np.random.seed(42)

print("‚úÖ Environment setup complete!")
print(f"Today's date: {datetime.now().strftime('%Y-%m-%d')}")

## 1. Understanding Data Intervals

yfinance supports multiple time intervals for different trading strategies:

### Available Intervals

| Interval | Use Case | Max History | Best For |
|----------|----------|-------------|----------|
| **1m** | 1 minute | 7 days | Day trading, scalping |
| **5m** | 5 minutes | 60 days | Day trading |
| **15m** | 15 minutes | 60 days | Intraday swing trading |
| **1h** | 1 hour | 730 days | Short-term analysis |
| **1d** | 1 day | All history | Swing trading, position trading |
| **1wk** | 1 week | All history | Long-term trends |
| **1mo** | 1 month | All history | Very long-term analysis |

### Choosing the Right Interval

- **Beginners**: Start with **daily (1d)** data - easier to analyze, less noise
- **Swing Traders**: Daily and weekly data
- **Day Traders**: 1m, 5m, or 15m data (requires full-time attention)
- **Long-term Investors**: Weekly or monthly data for big picture

Let's download data at different intervals!

In [None]:
# Download Maybank data at different intervals
ticker = '1155.KL'

# Define date ranges
end_date = datetime.now()
start_date_daily = end_date - timedelta(days=365)  # 1 year for daily
start_date_intraday = end_date - timedelta(days=7)  # 7 days for intraday

print(f"Downloading {ticker} (Maybank) at different intervals...\n")

# Daily data
daily_data = yf.download(ticker, start=start_date_daily, end=end_date, 
                         interval='1d', progress=False)
print(f"‚úÖ Daily data:   {len(daily_data)} rows")

# Weekly data
weekly_data = yf.download(ticker, start=start_date_daily, end=end_date, 
                          interval='1wk', progress=False)
print(f"‚úÖ Weekly data:  {len(weekly_data)} rows")

# Hourly data (last 60 days)
start_date_hourly = end_date - timedelta(days=60)
hourly_data = yf.download(ticker, start=start_date_hourly, end=end_date, 
                          interval='1h', progress=False)
print(f"‚úÖ Hourly data:  {len(hourly_data)} rows")

# 5-minute data (last 7 days)
minute_5_data = yf.download(ticker, start=start_date_intraday, end=end_date, 
                            interval='5m', progress=False)
print(f"‚úÖ 5-min data:   {len(minute_5_data)} rows")

print("\nüìä Notice how row count varies by interval!")
print("More granular intervals = more data points but shorter history.")

In [None]:
# Let's visualize the same stock at different intervals
fig, axes = plt.subplots(2, 2, figsize=(16, 10))
fig.suptitle('Maybank (1155.KL) at Different Time Intervals', 
             fontsize=16, fontweight='bold')

# Daily chart
axes[0, 0].plot(daily_data.index, daily_data['Adj Close'], linewidth=1.5)
axes[0, 0].set_title('Daily Data (1 Year)', fontsize=12)
axes[0, 0].set_xlabel('Date')
axes[0, 0].set_ylabel('Price (RM)')
axes[0, 0].grid(True, alpha=0.3)

# Weekly chart
axes[0, 1].plot(weekly_data.index, weekly_data['Adj Close'], 
                linewidth=2, color='orange')
axes[0, 1].set_title('Weekly Data (1 Year)', fontsize=12)
axes[0, 1].set_xlabel('Date')
axes[0, 1].set_ylabel('Price (RM)')
axes[0, 1].grid(True, alpha=0.3)

# Hourly chart
if len(hourly_data) > 0:
    axes[1, 0].plot(hourly_data.index, hourly_data['Adj Close'], 
                    linewidth=1, color='green')
    axes[1, 0].set_title('Hourly Data (60 Days)', fontsize=12)
    axes[1, 0].set_xlabel('Date')
    axes[1, 0].set_ylabel('Price (RM)')
    axes[1, 0].grid(True, alpha=0.3)
    axes[1, 0].tick_params(axis='x', rotation=45)

# 5-minute chart
if len(minute_5_data) > 0:
    axes[1, 1].plot(minute_5_data.index, minute_5_data['Adj Close'], 
                    linewidth=0.5, color='red', alpha=0.7)
    axes[1, 1].set_title('5-Minute Data (7 Days)', fontsize=12)
    axes[1, 1].set_xlabel('Date')
    axes[1, 1].set_ylabel('Price (RM)')
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("\nüí° Key Observations:")
print("‚Ä¢ Daily/weekly = smooth trends, clear patterns")
print("‚Ä¢ Hourly/5-min = more noise, harder to analyze")
print("‚Ä¢ For learning: START with daily data!")

## 2. Downloading Multiple Stocks Efficiently

Analyzing one stock is good, but comparing multiple stocks is better! Let's download several Malaysian blue-chips at once.

In [None]:
# Create a watchlist of top Malaysian stocks
watchlist = {
    'Maybank': '1155.KL',
    'Public Bank': '1295.KL',
    'CIMB': '1023.KL',
    'Sime Darby Plantation': '5285.KL',
    'Gamuda': '5398.KL',
    'Nestle Malaysia': '4707.KL',
    'Sunway': '5211.KL',
    'Tenaga Nasional': '5347.KL'
}

print("Your Watchlist:")
print("=" * 50)
for name, ticker in watchlist.items():
    print(f"{name:25s} : {ticker}")

print(f"\nüìä Total stocks: {len(watchlist)}")

In [None]:
# Method 1: Download all tickers at once (fastest)
# yfinance can download multiple tickers in a single call

ticker_list = list(watchlist.values())
print(f"Downloading {len(ticker_list)} stocks...\n")

# Download all at once
start_date = '2023-01-01'
end_date = '2024-12-31'

all_data = yf.download(ticker_list, start=start_date, end=end_date, 
                       progress=True, group_by='ticker')

print(f"\n‚úÖ Downloaded {len(all_data)} days of data for {len(ticker_list)} stocks")
print(f"Data shape: {all_data.shape}")

In [None]:
# Extract closing prices for all stocks
# This creates a DataFrame with one column per stock

closing_prices = pd.DataFrame()

for name, ticker in watchlist.items():
    if ticker in all_data:
        # Extract Adj Close column for this ticker
        closing_prices[name] = all_data[ticker]['Adj Close']

# Display first few rows
print("Closing Prices DataFrame:")
print(closing_prices.head())

print(f"\nShape: {closing_prices.shape}")
print(f"Columns: {closing_prices.columns.tolist()}")

In [None]:
# Visualize all stocks on one chart
# Normalize to 100 at start for fair comparison

plt.figure(figsize=(16, 8))

# Normalize each stock to 100 at start
normalized_prices = (closing_prices / closing_prices.iloc[0]) * 100

# Plot each stock
for column in normalized_prices.columns:
    plt.plot(normalized_prices.index, normalized_prices[column], 
             linewidth=2, label=column, alpha=0.8)

plt.axhline(y=100, color='gray', linestyle='--', alpha=0.5)
plt.title('Malaysian Blue-Chip Performance Comparison (2023-2024)\nNormalized to 100', 
         fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Normalized Price (Start = 100)', fontsize=12)
plt.legend(loc='best', fontsize=9, ncol=2)
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("\nüí° This chart shows relative performance - which stocks outperformed!")

In [None]:
# Calculate performance metrics for all stocks
performance_summary = pd.DataFrame()

for name in closing_prices.columns:
    prices = closing_prices[name].dropna()
    
    if len(prices) > 0:
        start_price = prices.iloc[0]
        end_price = prices.iloc[-1]
        total_return = ((end_price - start_price) / start_price) * 100
        
        # Calculate volatility (standard deviation of daily returns)
        daily_returns = prices.pct_change().dropna()
        volatility = daily_returns.std() * np.sqrt(252) * 100  # Annualized
        
        performance_summary = pd.concat([performance_summary, pd.DataFrame({
            'Stock': [name],
            'Start Price': [f"RM{start_price:.2f}"],
            'End Price': [f"RM{end_price:.2f}"],
            'Total Return': [f"{total_return:.2f}%"],
            'Volatility': [f"{volatility:.2f}%"]
        })], ignore_index=True)

# Sort by total return
performance_summary = performance_summary.sort_values('Total Return', 
                                                      ascending=False, 
                                                      key=lambda x: x.str.rstrip('%').astype(float))

print("\nPerformance Summary (2023-2024):")
print("=" * 80)
print(performance_summary.to_string(index=False))

print("\nüí° Higher return often comes with higher volatility (risk)!")

## 3. Handling Missing Data and Data Quality

Real-world data is messy. Let's learn to handle common issues:

### Common Data Issues

1. **Missing dates**: Market holidays, trading halts
2. **Zero volume**: Corporate actions, data errors
3. **Price gaps**: News events, earnings announcements
4. **Outliers**: Fat-finger errors, flash crashes

### Data Quality Checks

In [None]:
# Function to validate stock data quality

def validate_stock_data(data, ticker_name):
    """
    Perform comprehensive data quality checks.
    
    Args:
        data (DataFrame): Stock data from yfinance
        ticker_name (str): Name/ticker for reporting
    
    Returns:
        dict: Quality report
    """
    print(f"\nData Quality Report: {ticker_name}")
    print("=" * 60)
    
    # Check 1: Missing values
    missing_values = data.isnull().sum()
    print(f"\n1. Missing Values:")
    print(missing_values)
    
    # Check 2: Zero volume days
    zero_volume_days = (data['Volume'] == 0).sum()
    print(f"\n2. Zero Volume Days: {zero_volume_days}")
    
    # Check 3: Price consistency (High >= Low)
    price_inconsistencies = (data['High'] < data['Low']).sum()
    print(f"\n3. Price Inconsistencies (High < Low): {price_inconsistencies}")
    
    # Check 4: Extreme price changes (>20% in one day)
    daily_returns = data['Adj Close'].pct_change()
    extreme_moves = (abs(daily_returns) > 0.20).sum()
    print(f"\n4. Extreme Price Moves (>20% daily): {extreme_moves}")
    
    if extreme_moves > 0:
        print("   Dates with extreme moves:")
        extreme_dates = data[abs(daily_returns) > 0.20].index
        for date in extreme_dates:
            ret = daily_returns.loc[date]
            print(f"   - {date.strftime('%Y-%m-%d')}: {ret*100:+.2f}%")
    
    # Check 5: Data completeness
    total_days = (data.index[-1] - data.index[0]).days
    trading_days_expected = total_days / 7 * 5  # Rough estimate
    completeness = (len(data) / trading_days_expected) * 100
    print(f"\n5. Data Completeness: ~{completeness:.1f}%")
    print(f"   ({len(data)} rows over {total_days} calendar days)")
    
    # Overall assessment
    issues = missing_values.sum() + zero_volume_days + price_inconsistencies
    
    if issues == 0 and completeness > 90:
        print("\n‚úÖ Overall: Data quality is EXCELLENT")
    elif issues < 10 and completeness > 85:
        print("\n‚ö†Ô∏è  Overall: Data quality is GOOD (minor issues)")
    else:
        print("\n‚ùå Overall: Data quality needs attention")
    
    return {
        'missing_values': missing_values.sum(),
        'zero_volume_days': zero_volume_days,
        'price_inconsistencies': price_inconsistencies,
        'extreme_moves': extreme_moves,
        'completeness': completeness
    }

# Test with Maybank data
quality_report = validate_stock_data(daily_data, 'Maybank (1155.KL)')

In [None]:
# Handling missing data - forward fill method
# This is appropriate for stock prices (carry forward last known price)

def clean_stock_data(data):
    """
    Clean stock data by handling missing values.
    
    Args:
        data (DataFrame): Raw stock data
    
    Returns:
        DataFrame: Cleaned data
    """
    cleaned = data.copy()
    
    # Forward fill price data (use last known price)
    price_columns = ['Open', 'High', 'Low', 'Close', 'Adj Close']
    cleaned[price_columns] = cleaned[price_columns].fillna(method='ffill')
    
    # Fill volume with 0 (if no trading occurred)
    cleaned['Volume'] = cleaned['Volume'].fillna(0)
    
    # Drop any remaining rows with missing values
    cleaned = cleaned.dropna()
    
    return cleaned

# Clean the data
cleaned_daily_data = clean_stock_data(daily_data)

print(f"Original data: {len(daily_data)} rows")
print(f"Cleaned data:  {len(cleaned_daily_data)} rows")
print(f"Rows removed:  {len(daily_data) - len(cleaned_daily_data)}")

## 4. Saving and Loading Data Locally

**Why save data locally?**

1. **Faster analysis**: No need to re-download every time
2. **Offline work**: Analyze without internet connection
3. **Consistent data**: Same data for backtesting
4. **API limits**: Avoid hitting rate limits

We'll use the `../data/` directory structure.

In [None]:
# Create data directory structure if it doesn't exist
data_dir = Path('../data')
raw_dir = data_dir / 'raw'
processed_dir = data_dir / 'processed'
sample_dir = data_dir / 'sample'

# Create directories
for directory in [raw_dir, processed_dir, sample_dir]:
    directory.mkdir(parents=True, exist_ok=True)
    print(f"‚úÖ Directory ready: {directory}")

print("\nüìÅ Data directory structure created!")

In [None]:
# Function to save stock data

def save_stock_data(data, ticker, data_type='raw'):
    """
    Save stock data to CSV file.
    
    Args:
        data (DataFrame): Stock data to save
        ticker (str): Stock ticker symbol
        data_type (str): 'raw', 'processed', or 'sample'
    
    Returns:
        Path: File path where data was saved
    """
    # Determine directory based on data type
    if data_type == 'raw':
        directory = raw_dir
    elif data_type == 'processed':
        directory = processed_dir
    else:
        directory = sample_dir
    
    # Clean ticker for filename (remove .KL suffix)
    clean_ticker = ticker.replace('.KL', '')
    
    # Create filename with date
    date_str = datetime.now().strftime('%Y%m%d')
    filename = f"{clean_ticker}_{data_type}_{date_str}.csv"
    filepath = directory / filename
    
    # Save to CSV
    data.to_csv(filepath)
    
    print(f"‚úÖ Saved: {filepath}")
    print(f"   Size: {filepath.stat().st_size / 1024:.2f} KB")
    print(f"   Rows: {len(data)}")
    
    return filepath

# Save Maybank data
saved_file = save_stock_data(cleaned_daily_data, '1155.KL', data_type='processed')

In [None]:
# Function to load stock data

def load_stock_data(ticker, data_type='processed'):
    """
    Load stock data from CSV file (most recent).
    
    Args:
        ticker (str): Stock ticker symbol
        data_type (str): 'raw', 'processed', or 'sample'
    
    Returns:
        DataFrame: Loaded stock data
    """
    # Determine directory
    if data_type == 'raw':
        directory = raw_dir
    elif data_type == 'processed':
        directory = processed_dir
    else:
        directory = sample_dir
    
    # Clean ticker
    clean_ticker = ticker.replace('.KL', '')
    
    # Find matching files (use most recent)
    pattern = f"{clean_ticker}_{data_type}_*.csv"
    matching_files = sorted(directory.glob(pattern))
    
    if not matching_files:
        raise FileNotFoundError(f"No data files found for {ticker} in {directory}")
    
    # Load most recent file
    latest_file = matching_files[-1]
    data = pd.read_csv(latest_file, index_col=0, parse_dates=True)
    
    print(f"‚úÖ Loaded: {latest_file.name}")
    print(f"   Rows: {len(data)}")
    print(f"   Date range: {data.index[0].strftime('%Y-%m-%d')} to {data.index[-1].strftime('%Y-%m-%d')}")
    
    return data

# Test loading
loaded_data = load_stock_data('1155.KL', data_type='processed')

# Verify it matches original
print(f"\n‚úÖ Data loaded successfully!")
print(f"First row matches: {loaded_data.iloc[0]['Close'] == cleaned_daily_data.iloc[0]['Close']}")

## 5. Retrieving Company Information

yfinance provides more than just price data - you can also get company fundamentals!

In [None]:
# Get comprehensive company information
ticker = yf.Ticker('1155.KL')

# Company info (dictionary)
info = ticker.info

print("Maybank Company Information:")
print("=" * 60)

# Display key information
key_fields = [
    'longName', 'symbol', 'sector', 'industry', 'country',
    'marketCap', 'previousClose', 'open', 'volume',
    'dividendYield', 'trailingPE', 'forwardPE'
]

for field in key_fields:
    if field in info:
        value = info[field]
        
        # Format based on field type
        if field == 'marketCap':
            value = f"RM {value:,.0f}"
        elif field == 'dividendYield' and value:
            value = f"{value*100:.2f}%"
        elif field in ['previousClose', 'open']:
            value = f"RM {value:.2f}"
        elif field == 'volume':
            value = f"{value:,}"
        
        print(f"{field:20s}: {value}")

In [None]:
# Get financial statements (if available)
# Note: Coverage may be limited for some Malaysian stocks

print("\nAttempting to retrieve financial data...\n")

# Balance Sheet
try:
    balance_sheet = ticker.balance_sheet
    if not balance_sheet.empty:
        print("‚úÖ Balance Sheet available")
        print(f"   Columns: {len(balance_sheet.columns)}")
    else:
        print("‚ö†Ô∏è  Balance Sheet: Limited data")
except:
    print("‚ùå Balance Sheet: Not available")

# Income Statement
try:
    income_stmt = ticker.income_stmt
    if not income_stmt.empty:
        print("‚úÖ Income Statement available")
        print(f"   Columns: {len(income_stmt.columns)}")
    else:
        print("‚ö†Ô∏è  Income Statement: Limited data")
except:
    print("‚ùå Income Statement: Not available")

# Cash Flow
try:
    cash_flow = ticker.cashflow
    if not cash_flow.empty:
        print("‚úÖ Cash Flow available")
        print(f"   Columns: {len(cash_flow.columns)}")
    else:
        print("‚ö†Ô∏è  Cash Flow: Limited data")
except:
    print("‚ùå Cash Flow: Not available")

print("\nüí° Note: Fundamental data coverage varies by stock.")
print("   For detailed financials, check Bursa Malaysia announcements.")

## 6. Building a Watchlist Database

Let's create a system to manage and update a watchlist of stocks.

In [None]:
# Complete watchlist management system

class MalaysianStockWatchlist:
    """
    Manage a watchlist of Malaysian stocks with automatic data updates.
    """
    
    def __init__(self, name='My Watchlist'):
        self.name = name
        self.stocks = {}
        self.data_cache = {}
    
    def add_stock(self, name, ticker, sector='Other'):
        """Add a stock to the watchlist."""
        self.stocks[ticker] = {
            'name': name,
            'sector': sector,
            'added_date': datetime.now()
        }
        print(f"‚úÖ Added: {name} ({ticker}) - {sector}")
    
    def remove_stock(self, ticker):
        """Remove a stock from the watchlist."""
        if ticker in self.stocks:
            name = self.stocks[ticker]['name']
            del self.stocks[ticker]
            print(f"‚ùå Removed: {name} ({ticker})")
        else:
            print(f"‚ö†Ô∏è  {ticker} not in watchlist")
    
    def update_all_data(self, days=365):
        """Download latest data for all stocks in watchlist."""
        print(f"\nUpdating data for {len(self.stocks)} stocks...\n")
        
        end_date = datetime.now()
        start_date = end_date - timedelta(days=days)
        
        for ticker in self.stocks:
            try:
                data = yf.download(ticker, start=start_date, end=end_date, 
                                  progress=False)
                self.data_cache[ticker] = data
                print(f"‚úÖ {ticker}: {len(data)} rows")
            except Exception as e:
                print(f"‚ùå {ticker}: Error - {str(e)}")
    
    def get_summary(self):
        """Get summary of all stocks in watchlist."""
        print(f"\n{self.name}")
        print("=" * 70)
        print(f"{'Ticker':<12} {'Name':<25} {'Sector':<15} {'Status'}")
        print("-" * 70)
        
        for ticker, info in self.stocks.items():
            status = '‚úÖ Data' if ticker in self.data_cache else '‚ö†Ô∏è  No data'
            print(f"{ticker:<12} {info['name']:<25} {info['sector']:<15} {status}")
        
        print(f"\nTotal stocks: {len(self.stocks)}")
        print(f"Data cached: {len(self.data_cache)}")

# Create watchlist
my_watchlist = MalaysianStockWatchlist('Malaysian Blue-Chips')

# Add stocks
my_watchlist.add_stock('Maybank', '1155.KL', 'Banking')
my_watchlist.add_stock('Public Bank', '1295.KL', 'Banking')
my_watchlist.add_stock('Gamuda', '5398.KL', 'Construction')
my_watchlist.add_stock('Nestle Malaysia', '4707.KL', 'Consumer')
my_watchlist.add_stock('Sime Darby Plantation', '5285.KL', 'Plantation')

# Update data
my_watchlist.update_all_data(days=365)

# Show summary
my_watchlist.get_summary()

## 7. Practice Exercises

Apply what you've learned!

### Exercise 1: Weekly vs Daily Comparison

Download **CIMB (1023.KL)** data for 2024 at both daily and weekly intervals. Create a comparison chart and explain which interval would be better for:
a) Swing trading (1-3 week holds)
b) Long-term investing (6+ months)

In [None]:
# YOUR CODE HERE



### Exercise 2: Build Your REIT Portfolio

Create a watchlist of 5 Malaysian REITs. Download their data and calculate:
1. Total return for each REIT in 2024
2. Average daily volume
3. Which REIT would you invest in and why?

REITs to consider: 5123.KL (Sentral), 5302.KL (CapitaLand Malaysia), 5184.KL (YTL Hospitality)

In [None]:
# YOUR CODE HERE



### Exercise 3: Data Quality Detective

Download data for **Sunway (5211.KL)** for 2024. Run the `validate_stock_data()` function and investigate any data quality issues you find. Are there any extreme price moves? If so, can you find news explaining them?

In [None]:
# YOUR CODE HERE



### Exercise 4: Save Your Portfolio

Download data for 3 stocks of your choice. Save each one using the `save_stock_data()` function to the appropriate directory (raw/processed/sample). Then verify you can load them back successfully.

In [None]:
# YOUR CODE HERE



## 8. Summary and Key Takeaways

Congratulations! You now have a solid foundation in data collection.

### ‚úÖ Skills Mastered

1. **Multiple Intervals**: Download data at daily, weekly, hourly, and minute intervals
2. **Batch Downloads**: Efficiently download multiple stocks simultaneously
3. **Data Quality**: Validate and clean stock data
4. **Persistence**: Save and load data locally for offline analysis
5. **Watchlist Management**: Build and maintain a stock watchlist system
6. **Company Info**: Retrieve fundamental data beyond just prices

### üìä Key Takeaways

1. **Start Simple**: Daily data is best for learning (less noise)
2. **Quality Matters**: Always validate data before analysis
3. **Save Locally**: Faster analysis and consistent results
4. **Multiple Stocks**: Compare performance across stocks/sectors
5. **Clean Data**: Handle missing values and outliers appropriately

### üîß Tools You Built

- `calculate_transaction_costs()`: Cost calculator (from Module 01)
- `validate_stock_data()`: Data quality checker
- `clean_stock_data()`: Data cleaning function
- `save_stock_data()` / `load_stock_data()`: Persistence functions
- `MalaysianStockWatchlist`: Complete watchlist management class

### üéØ What's Next?

In **Module 03: Introduction to Technical Indicators**, you'll learn:
- Moving Averages (SMA, EMA)
- Trend identification
- Support and resistance levels
- Your first trading signals!

### üí° Pro Tips

1. **Regular Updates**: Update your watchlist data weekly
2. **Version Control**: Include date in filenames for tracking
3. **Backup Data**: Keep copies of important datasets
4. **Document Issues**: Note any data quality problems you find
5. **Cross-Reference**: Verify important data points with official sources

### üìö Additional Resources

- [yfinance Documentation](https://pypi.org/project/yfinance/)
- [Pandas Data Cleaning Guide](https://pandas.pydata.org/docs/user_guide/missing_data.html)
- [Bursa Malaysia Announcements](https://www.bursamalaysia.com/market_information/announcements) - Official data source

---

**Excellent work completing Module 02!** üéâ

You now have a robust data collection and management system. This is the foundation for all future technical analysis.

**Next up**: `03_moving_averages_and_trends.ipynb` - Your first technical indicators!

---

*"In God we trust. All others must bring data." - W. Edwards Deming*