Educational-Only Market Data Collection Project
A clean, modular Python project for collecting historical intraday minute-level data for major Indian stock market indices.
PLEASE READ BEFORE USING:
- ๐ LICENSE - MIT License with data usage terms
โ ๏ธ DISCLAIMER.md - Comprehensive legal disclaimer and terms- ๐ค CONTRIBUTING.md - How to contribute to this project
This project is strictly for EDUCATIONAL, RESEARCH, and ACADEMIC purposes only.
- NOT intended for commercial use
- NOT intended for live trading or investment decisions
- Data may be incomplete, delayed, or inaccurate
- Always verify data from official sources before any financial decisions
- Use at your own risk
By using this software, you acknowledge that:
- Market data collection must comply with data provider's Terms of Service
- You are responsible for ensuring your usage complies with applicable laws
- The authors assume no liability for any financial losses
- This is a learning tool, not a production trading system
This project collects data for the following Indian market indices:
- NIFTY 50 (
^NSEI) - NSE's benchmark index - BANK NIFTY (
^NSEBANK) - NSE's banking sector index - SENSEX (
^BSESN) - BSE's benchmark index
The following minute-level timeframes are supported:
- 1 minute (
1min) - 3 minutes (
3min) - Resampled from 1min data - 5 minutes (
5min) - 10 minutes (
10min) - Resampled from 5min data - 15 minutes (
15min) - 30 minutes (
30min) - 60 minutes (
60min)
Each timeframe is stored in separate CSV files for organized data management.
project_root/
โ
โโโ data/ # All collected data (auto-created)
โ โโโ nifty50/
โ โ โโโ 1min/
โ โ โโโ 3min/
โ โ โโโ 5min/
โ โ โโโ 10min/
โ โ โโโ 15min/
โ โ โโโ all_available_minutes/
โ โ
โ โโโ banknifty/
โ โ โโโ 1min/
โ โ โโโ 3min/
โ โ โโโ 5min/
โ โ โโโ 10min/
โ โ โโโ 15min/
โ โ โโโ all_available_minutes/
โ โ
โ โโโ sensex/
โ โโโ 1min/
โ โโโ 3min/
โ โโโ 5min/
โ โโโ 10min/
โ โโโ 15min/
โ โโโ all_available_minutes/
โ
โโโ src/ # Source code
โ โโโ fetchers/
โ โ โโโ __init__.py
โ โ โโโ intraday_fetcher.py # Data fetching logic
โ โโโ utils/
โ โ โโโ __init__.py
โ โ โโโ logger.py # Logging utilities
โ โ โโโ file_manager.py # File I/O operations
โ โโโ __init__.py
โ โโโ main.py # Main orchestration script
โ
โโโ logs/ # Log files (auto-created)
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
- Python 3.8 or higher
- pip (Python package manager)
- Internet connection for data fetching
cd /path/to/your/projectpip install -r requirements.txtThis will install:
pandas- Data manipulationyfinance- Yahoo Finance API clientpytz- Timezone handlingrequests- HTTP requestsaiohttp- Async HTTP requestspydantic- Data validation
Fetch all indices with all default timeframes:
python src/main.py# Only NIFTY 50 and BANK NIFTY
python src/main.py --symbols nifty50 banknifty
# Only SENSEX
python src/main.py --symbols sensex# Only 1min and 5min data
python src/main.py --timeframes 1min 5min
# Only 15min data for all indices
python src/main.py --timeframes 15min# Last 30 days (default is 60)
python src/main.py --days-back 30
# Last 7 days (useful for 1min data)
python src/main.py --timeframes 1min --days-back 7# NIFTY 50 only, 1min and 5min, last 7 days
python src/main.py --symbols nifty50 --timeframes 1min 5min --days-back 7python src/main.py --verbosepython src/main.py --output-dir /path/to/custom/data/folderpython src/main.py --helpAll data is saved in CSV format with the following standardized columns:
| Column | Type | Description |
|---|---|---|
datetime |
string | ISO 8601 format, timezone-aware IST |
open |
float | Opening price |
high |
float | Highest price in the interval |
low |
float | Lowest price in the interval |
close |
float | Closing price |
volume |
int | Trading volume (if available) |
symbol |
string | Index symbol (nifty50, banknifty, sensex) |
timeframe |
string | Timeframe (1min, 5min, etc.) |
datetime,open,high,low,close,volume,symbol,timeframe
2024-01-15 09:15:00+0530,21500.50,21520.75,21495.25,21510.00,1234567,nifty50,1min
2024-01-15 09:16:00+0530,21510.00,21530.50,21505.00,21525.25,1345678,nifty50,1minThis project uses Yahoo Finance as the primary free data source via the yfinance library.
Yahoo Finance free tier has important limitations:
- 1-minute data: Available for the last 7 days only
- 5-minute and 15-minute data: Available for the last 60 days
- Historical data beyond these periods is not available for free
- Data may have gaps, especially during market holidays or low-volume periods
- No official SLA or data quality guarantees
-
Symbol Mapping: Our standard symbols are mapped to Yahoo Finance tickers
nifty50โ^NSEIbankniftyโ^NSEBANKsensexโ^BSESN
-
Interval Handling: Some intervals are resampled from finer data
3minis resampled from1mindata10minis resampled from5mindata
-
Data Validation: Each DataFrame is validated for:
- Required columns
- Data types
- OHLC relationships (High โฅ Low, etc.)
- Null values
-
File Organization: Data is automatically saved to the correct directory structure
-
Retry Logic: Failed requests are retried up to 3 times with exponential backoff
-
Rate Limiting: 1-second delay between requests to respect API limits
The project is built with clean separation of concerns:
-
Data Sources (
src/fetchers/intraday_fetcher.py)- Abstract base class
DataSourceBasefor easy extension YahooFinanceSourceimplementation- Easy to add new data sources (NSE API, paid providers, etc.)
- Abstract base class
-
File Management (
src/utils/file_manager.py)- Handles all file I/O operations
- Data validation and integrity checks
- Standardized file naming
- No hardcoded paths (uses
pathlib)
-
Logging (
src/utils/logger.py)- Colored console output
- File-based logging
- Configurable log levels
-
Orchestration (
src/main.py)- Command-line interface
- Progress tracking
- Summary reports
Adding a new data source is simple:
from src.fetchers.intraday_fetcher import DataSourceBase
class MyCustomSource(DataSourceBase):
def fetch_intraday_data(self, symbol, interval, start_date, end_date):
# Your implementation
pass
def get_supported_intervals(self):
return ['1min', '5min', '15min']
# Use it
fetcher = IntradayDataFetcher(data_source=MyCustomSource())Filenames follow this pattern:
{symbol}_{timeframe}_{start_date}_{end_date}.csv
Examples:
nifty50_1min_20240101_20240107.csvbanknifty_5min_20240101_20240201.csvsensex_15min_latest.csv
After each run, a summary CSV is generated:
summary_YYYYMMDD_HHMMSS.csv
Contains:
- Symbol
- Timeframe
- Number of rows
- Start and end dates
- Success/failure status
Edit INDICES_CONFIG in src/fetchers/intraday_fetcher.py:
INDICES_CONFIG = {
'nifty50': {...},
'banknifty': {...},
'your_custom_index': {
'name': 'Your Index',
'exchange': 'NSE',
}
}Edit STANDARD_TIMEFRAMES in src/fetchers/intraday_fetcher.py:
STANDARD_TIMEFRAMES = ['1min', '5min', '15min', '30min']Modify days_back parameter in main.py or use --days-back flag.
- Yahoo Finance: Review Yahoo's Terms of Service
- Data is provided "as-is" without guarantees
- Respect rate limits and fair use policies
- This project uses official APIs, not web scraping
- Rate limiting is implemented to avoid server overload
- Educational use typically falls under fair use
- Data is not suitable for live trading
- May contain errors, gaps, or delays
- Always use official broker data for actual trading
Problem: Empty DataFrames or "No data returned" messages
Solutions:
- Check date range (Yahoo Finance limits apply)
- For 1min data, use
--days-back 7or less - Verify symbol names are correct
- Check internet connection
Problem: ModuleNotFoundError
Solutions:
pip install -r requirements.txtProblem: Cannot create directories or files
Solutions:
- Check write permissions in project directory
- Run with appropriate user permissions
- Use
--output-dirto specify writable location
Problem: Too many requests errors
Solutions:
- Built-in rate limiting should prevent this
- If issues persist, increase delay in
YahooFinanceSource
Enable verbose logging to see detailed information:
python src/main.py --verboseCheck log files in ./logs/ directory for detailed error traces.
Potential improvements for this educational project:
-
Additional Data Sources
- NSE official API integration
- BSE data integration
- Support for other free data providers
-
Data Quality
- Automated data quality checks
- Gap detection and filling
- Outlier detection
-
Performance
- Parallel downloads using async/await
- Incremental updates (only fetch new data)
- Caching mechanisms
-
Analysis Tools
- Built-in data visualization
- Basic technical indicators
- Data exploration notebooks
-
Additional Features
- Support for individual stocks (not just indices)
- Options chain data
- Fundamental data integration
- NSEpy - NSE Python library
- jugaad-data - Alternative NSE data
This is an educational project. Contributions for learning purposes are welcome:
- Fork the repository
- Create a feature branch
- Add your improvements
- Test thoroughly
- Submit a pull request
Focus areas:
- Additional data sources
- Improved error handling
- Better documentation
- Unit tests
- Performance optimizations
This project is released for educational purposes. Users are responsible for:
- Complying with data provider Terms of Service
- Using data ethically and legally
- Not using for unauthorized commercial purposes
- Yahoo Finance for providing free market data access
- Pandas community for excellent data tools
- Python community for the amazing ecosystem
For issues related to:
- Code bugs: Check logs and error messages
- Data availability: Contact data provider (Yahoo Finance)
- Feature requests: Educational contributions welcome
This project demonstrates:
- Clean Python architecture
- Modular design patterns
- Error handling and retry logic
- File I/O operations
- API integration
- Data validation
- Logging best practices
- Command-line interfaces
Use it to learn about:
- Financial data structures
- Time series data handling
- Data engineering workflows
- Production-grade Python code
Remember: This is a learning tool. Always verify data and comply with all applicable laws and terms of service.
Happy Learning! ๐๐