An intelligent Instagram research assistant powered by OpenAI GPT and Apify. Send natural language requests to scrape and analyze Instagram profiles, posts, and hashtags with AI-driven insights.
β οΈ Disclaimer: This tool is for educational and research purposes only. Users must comply with Instagram's Terms of Service and applicable laws. Use responsibly and ethically.
- Natural Language Interface: Ask questions in plain English, get intelligent insights
- Contextual Conversations: Multi-turn conversations with memory
- Smart Tool Selection: LLM automatically chooses the right scraping tools
- Analytical Insights: Get engagement analysis, trends, and recommendations
- Profile Scraping: Followers, bio, verification status, posts count
- Post Scraping: Captions, likes, comments, hashtags, media URLs
- Hashtag Analysis: Recent posts, trends, top performers
- Async Job Management: Handle long-running scrapes efficiently
- REST API: Full-featured FastAPI application
- Bearer Token Auth: Secure API access
- Type Safety: Pydantic models for all data
- Comprehensive Logging: Track all operations
- Production-Ready: Systemd service, deployment scripts included
- Python 3.11 or higher
- Poetry
- Apify API Token
- OpenAI API Key
-
Clone the repository
git clone https://github.com/MeshCore-ai/instagram-scraper.git cd instagram-scraper -
Install dependencies
poetry install
-
Configure environment
cp deploy/.env.template .env # Edit .env with your API keys and tokens -
Generate a bearer token
openssl rand -hex 32 # Add this to MESH_BEARER_SECRET in .env -
Run the development server
make dev # Or: poetry run uvicorn instagram_scraper.app:app --reload
The API will be available at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.
All configuration is managed through environment variables. See deploy/.env.template for all options.
| Variable | Description | Where to Get |
|---|---|---|
MESH_BEARER_SECRET |
API authentication token | Generate with openssl rand -hex 32 |
APIFY_API_TOKEN |
Apify API token | https://console.apify.com/account/integrations |
OPENAI_API_KEY |
OpenAI API key | https://platform.openai.com/api-keys |
| Variable | Default | Description |
|---|---|---|
OPENAI_MODEL |
gpt-4-turbo |
OpenAI model to use |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
MAX_RESULTS_LIMIT |
200 |
Maximum scraping results |
REQUEST_TIMEOUT |
300 |
Timeout in seconds |
PORT |
8000 |
Server port |
π Comprehensive Examples: See examples/ for complete code examples in curl, Python, and JavaScript covering all use cases.
Ask the agent anything about Instagram in natural language:
Example: Profile Analysis
curl -X POST http://localhost:8000/agent/chat \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"message": "Analyze the Instagram profile @nike. How many followers do they have and what is their engagement strategy?"
}'Example: Hashtag Research
curl -X POST http://localhost:8000/agent/chat \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"message": "Show me the top posts for #fitness from the last week"
}'Example: Comparison
curl -X POST http://localhost:8000/agent/chat \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"message": "Compare the engagement rates of @nike and @adidas"
}'For structured requests without LLM overhead:
Scrape a Profile
curl -X POST http://localhost:8000/scrape/profile \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"username": "instagram"}'Scrape Hashtag Posts
curl -X POST http://localhost:8000/scrape/hashtag \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"hashtag": "travel",
"limit": 50
}'For long-running scrapes:
Start a Job
curl -X POST http://localhost:8000/scrape/async \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"scrape_type": "hashtag",
"parameters": {
"hashtag": "photography",
"limit": 200
}
}'
# Returns: {"run_id": "abc123", "status_url": "/job/abc123/status"}Check Status
curl http://localhost:8000/job/abc123/status \
-H "Authorization: Bearer YOUR_TOKEN"Get Results
curl http://localhost:8000/job/abc123/results \
-H "Authorization: Bearer YOUR_TOKEN"GET /health- Health check (no auth required)
POST /agent/chat- Chat with AI agent (natural language)
POST /scrape/profile- Scrape Instagram profilePOST /scrape/posts- Scrape specific posts by URLsPOST /scrape/hashtag- Scrape hashtag posts
POST /scrape/async- Start async scraping jobGET /job/{run_id}/status- Check job statusGET /job/{run_id}/results- Get job resultsDELETE /job/{run_id}- Cancel running job
Full API documentation available at /docs when server is running.
make help # Show all available commands
make install # Install dependencies
make dev # Run development server with auto-reload
make test # Run tests
make test-cov # Run tests with coverage
make format # Format code with black
make lint # Run linting (ruff + mypy)
make clean # Remove cache and build artifactsinstagram-scraper/
βββ src/instagram_scraper/
β βββ app.py # FastAPI application
β βββ config.py # Configuration management
β βββ models.py # Pydantic data models
β βββ auth.py # Authentication
β βββ apify_client.py # Apify scraper wrapper
β βββ agent/
β βββ llm_agent.py # LLM orchestrator
β βββ tools.py # Function tool definitions
β βββ prompts.py # System prompts
βββ tests/ # Test suite
βββ deploy/ # Deployment scripts
β βββ .env.template
β βββ instagram-scraper.service
β βββ deploy.sh
β βββ server-setup.sh
βββ Makefile # Development commands
See DEPLOYMENT.md for detailed deployment instructions.
-
Prepare the server
ssh ubuntu@your-server wget https://raw.githubusercontent.com/MeshCore-ai/instagram-scraper/main/deploy/server-setup.sh chmod +x server-setup.sh ./server-setup.sh
-
Configure environment
# Edit /home/ubuntu/.env with your API keys nano /home/ubuntu/.env -
Deploy from local machine
./deploy/deploy.sh production ubuntu@your-server
The service will be running on port 8000 and managed by systemd.
The system consists of three main layers:
-
API Layer (FastAPI)
- REST endpoints with bearer token authentication
- Request validation with Pydantic
- Async job management
-
Intelligence Layer (OpenAI GPT)
- Natural language understanding
- Smart tool selection and orchestration
- Context management for conversations
- Response formatting and insights
-
Scraping Layer (Apify)
- Instagram profile scraping
- Post and hashtag scraping
- Job status tracking
- Error handling and retries
- Authentication: All endpoints (except
/health) require bearer token - Secret Management: Never commit
.envfiles; use environment-specific configs - Rate Limiting: Consider implementing rate limiting in production
- HTTPS: Always use HTTPS in production to protect tokens
- API Keys: Rotate Apify and OpenAI keys regularly
- Monitoring: Log all authentication attempts and API usage
- Public Data Only: Can only access publicly available Instagram data
- Instagram API Limits: Subject to Instagram's rate limits and restrictions
- Private Accounts: Limited information available for private profiles
- Cost: Usage incurs costs from both Apify and OpenAI APIs. Monitor your usage carefully.
- No Warranty: Provided as-is without guarantees. Instagram may change their platform at any time.
- Rate Limiting: Be respectful of rate limits to avoid being blocked by Instagram or Apify
- Data Privacy: Handle scraped data responsibly and in compliance with privacy laws
Service won't start
# Check logs
sudo journalctl -u instagram-scraper -f
# Verify environment variables
cat /home/ubuntu/.env
# Check service status
sudo systemctl status instagram-scraperAuthentication errors
- Verify
MESH_BEARER_SECRETmatches between server and requests - Check Authorization header format:
Bearer YOUR_TOKEN
Scraping fails
- Verify Apify API token is valid
- Check Apify account has sufficient credits
- Ensure Instagram username/URL is correct
LLM not responding
- Verify OpenAI API key is valid
- Check OpenAI account has credits
- Review logs for detailed error messages
Contributions are welcome! Please ensure:
- Code passes all linting checks (
make lint) - Tests pass (
make test) - Follow existing code style
- Update documentation as needed
MIT License
Copyright (c) 2025 MeshCore AI
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Important: This tool is provided for educational and research purposes. Users are responsible for:
- Terms of Service: Complying with Instagram's Terms of Service and Community Guidelines
- Rate Limits: Respecting rate limits and API usage policies from Instagram and Apify
- Data Rights: Ensuring they have the right to scrape and use data for their intended purpose
- Privacy Laws: Following applicable data protection laws (GDPR, CCPA, etc.)
- Ethical Use: Not using this tool for spam, harassment, or malicious purposes
- Commercial Use: Understanding any commercial use restrictions from Instagram and third-party services
Disclaimer: The authors and contributors of this project are not responsible for misuse of this tool or any violations of third-party terms of service. Users assume all legal risks associated with using this software.
For issues, questions, or contributions:
- Open an issue on GitHub
- Review the API documentation at
/docs - Check logs with
sudo journalctl -u instagram-scraper -f
Built with β€οΈ by MeshCore AI using FastAPI, OpenAI, and Apify