Instagram Scraper Agent

An intelligent Instagram research assistant powered by OpenAI GPT and Apify. Send natural language requests to scrape and analyze Instagram profiles, posts, and hashtags with AI-driven insights.

⚠️ Disclaimer: This tool is for educational and research purposes only. Users must comply with Instagram's Terms of Service and applicable laws. Use responsibly and ethically.

Features

🤖 AI-Powered Intelligence

Natural Language Interface: Ask questions in plain English, get intelligent insights
Contextual Conversations: Multi-turn conversations with memory
Smart Tool Selection: LLM automatically chooses the right scraping tools
Analytical Insights: Get engagement analysis, trends, and recommendations

📊 Instagram Scraping Capabilities

Profile Scraping: Followers, bio, verification status, posts count
Post Scraping: Captions, likes, comments, hashtags, media URLs
Hashtag Analysis: Recent posts, trends, top performers
Async Job Management: Handle long-running scrapes efficiently

🛠️ Developer-Friendly

REST API: Full-featured FastAPI application
Bearer Token Auth: Secure API access
Type Safety: Pydantic models for all data
Comprehensive Logging: Track all operations
Production-Ready: Systemd service, deployment scripts included

Quick Start

Prerequisites

Python 3.11 or higher
Poetry
Apify API Token
OpenAI API Key

Installation

Clone the repository

git clone https://github.com/MeshCore-ai/instagram-scraper.git
cd instagram-scraper

Install dependencies
```
poetry install
```

Configure environment

cp deploy/.env.template .env
# Edit .env with your API keys and tokens

Generate a bearer token

openssl rand -hex 32
# Add this to MESH_BEARER_SECRET in .env

Run the development server

make dev
# Or: poetry run uvicorn instagram_scraper.app:app --reload

The API will be available at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.

Configuration

All configuration is managed through environment variables. See deploy/.env.template for all options.

Required Variables

Variable	Description	Where to Get
`MESH_BEARER_SECRET`	API authentication token	Generate with `openssl rand -hex 32`
`APIFY_API_TOKEN`	Apify API token	https://console.apify.com/account/integrations
`OPENAI_API_KEY`	OpenAI API key	https://platform.openai.com/api-keys

Optional Variables

Variable	Default	Description
`OPENAI_MODEL`	`gpt-4-turbo`	OpenAI model to use
`LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARNING, ERROR)
`MAX_RESULTS_LIMIT`	`200`	Maximum scraping results
`REQUEST_TIMEOUT`	`300`	Timeout in seconds
`PORT`	`8000`	Server port

Usage Examples

📚 Comprehensive Examples: See examples/ for complete code examples in curl, Python, and JavaScript covering all use cases.

Natural Language Chat (Recommended)

Ask the agent anything about Instagram in natural language:

Example: Profile Analysis

curl -X POST http://localhost:8000/agent/chat \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Analyze the Instagram profile @nike. How many followers do they have and what is their engagement strategy?"
  }'

Example: Hashtag Research

curl -X POST http://localhost:8000/agent/chat \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Show me the top posts for #fitness from the last week"
  }'

Example: Comparison

curl -X POST http://localhost:8000/agent/chat \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Compare the engagement rates of @nike and @adidas"
  }'

Direct Scraping Endpoints

For structured requests without LLM overhead:

Scrape a Profile

curl -X POST http://localhost:8000/scrape/profile \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"username": "instagram"}'

Scrape Hashtag Posts

curl -X POST http://localhost:8000/scrape/hashtag \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "hashtag": "travel",
    "limit": 50
  }'

Asynchronous Jobs

For long-running scrapes:

Start a Job

curl -X POST http://localhost:8000/scrape/async \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "scrape_type": "hashtag",
    "parameters": {
      "hashtag": "photography",
      "limit": 200
    }
  }'
# Returns: {"run_id": "abc123", "status_url": "/job/abc123/status"}

Check Status

curl http://localhost:8000/job/abc123/status \
  -H "Authorization: Bearer YOUR_TOKEN"

Get Results

curl http://localhost:8000/job/abc123/results \
  -H "Authorization: Bearer YOUR_TOKEN"

API Endpoints

Health & Status

GET /health - Health check (no auth required)

Agent Endpoints

POST /agent/chat - Chat with AI agent (natural language)

Direct Scraping

POST /scrape/profile - Scrape Instagram profile
POST /scrape/posts - Scrape specific posts by URLs
POST /scrape/hashtag - Scrape hashtag posts

Async Jobs

POST /scrape/async - Start async scraping job
GET /job/{run_id}/status - Check job status
GET /job/{run_id}/results - Get job results
DELETE /job/{run_id} - Cancel running job

Full API documentation available at /docs when server is running.

Development

Available Make Commands

make help       # Show all available commands
make install    # Install dependencies
make dev        # Run development server with auto-reload
make test       # Run tests
make test-cov   # Run tests with coverage
make format     # Format code with black
make lint       # Run linting (ruff + mypy)
make clean      # Remove cache and build artifacts

Project Structure

instagram-scraper/
├── src/instagram_scraper/
│   ├── app.py              # FastAPI application
│   ├── config.py           # Configuration management
│   ├── models.py           # Pydantic data models
│   ├── auth.py             # Authentication
│   ├── apify_client.py     # Apify scraper wrapper
│   └── agent/
│       ├── llm_agent.py    # LLM orchestrator
│       ├── tools.py        # Function tool definitions
│       └── prompts.py      # System prompts
├── tests/                  # Test suite
├── deploy/                 # Deployment scripts
│   ├── .env.template
│   ├── instagram-scraper.service
│   ├── deploy.sh
│   └── server-setup.sh
└── Makefile               # Development commands

Deployment

See DEPLOYMENT.md for detailed deployment instructions.

Quick Deployment to Ubuntu Server

Prepare the server

ssh ubuntu@your-server
wget https://raw.githubusercontent.com/MeshCore-ai/instagram-scraper/main/deploy/server-setup.sh
chmod +x server-setup.sh
./server-setup.sh

Configure environment

# Edit /home/ubuntu/.env with your API keys
nano /home/ubuntu/.env

Deploy from local machine

./deploy/deploy.sh production ubuntu@your-server

The service will be running on port 8000 and managed by systemd.

Architecture

The system consists of three main layers:

API Layer (FastAPI)
- REST endpoints with bearer token authentication
- Request validation with Pydantic
- Async job management
Intelligence Layer (OpenAI GPT)
- Natural language understanding
- Smart tool selection and orchestration
- Context management for conversations
- Response formatting and insights
Scraping Layer (Apify)
- Instagram profile scraping
- Post and hashtag scraping
- Job status tracking
- Error handling and retries

Security Considerations

Authentication: All endpoints (except /health) require bearer token
Secret Management: Never commit .env files; use environment-specific configs
Rate Limiting: Consider implementing rate limiting in production
HTTPS: Always use HTTPS in production to protect tokens
API Keys: Rotate Apify and OpenAI keys regularly
Monitoring: Log all authentication attempts and API usage

Limitations & Important Notes

Public Data Only: Can only access publicly available Instagram data
Instagram API Limits: Subject to Instagram's rate limits and restrictions
Private Accounts: Limited information available for private profiles
Cost: Usage incurs costs from both Apify and OpenAI APIs. Monitor your usage carefully.
No Warranty: Provided as-is without guarantees. Instagram may change their platform at any time.
Rate Limiting: Be respectful of rate limits to avoid being blocked by Instagram or Apify
Data Privacy: Handle scraped data responsibly and in compliance with privacy laws

Troubleshooting

Common Issues

Service won't start

# Check logs
sudo journalctl -u instagram-scraper -f

# Verify environment variables
cat /home/ubuntu/.env

# Check service status
sudo systemctl status instagram-scraper

Authentication errors

Verify MESH_BEARER_SECRET matches between server and requests
Check Authorization header format: Bearer YOUR_TOKEN

Scraping fails

Verify Apify API token is valid
Check Apify account has sufficient credits
Ensure Instagram username/URL is correct

LLM not responding

Verify OpenAI API key is valid
Check OpenAI account has credits
Review logs for detailed error messages

Contributing

Contributions are welcome! Please ensure:

Code passes all linting checks (make lint)
Tests pass (make test)
Follow existing code style
Update documentation as needed

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Legal & Compliance

Important: This tool is provided for educational and research purposes. Users are responsible for:

Terms of Service: Complying with Instagram's Terms of Service and Community Guidelines
Rate Limits: Respecting rate limits and API usage policies from Instagram and Apify
Data Rights: Ensuring they have the right to scrape and use data for their intended purpose
Privacy Laws: Following applicable data protection laws (GDPR, CCPA, etc.)
Ethical Use: Not using this tool for spam, harassment, or malicious purposes
Commercial Use: Understanding any commercial use restrictions from Instagram and third-party services

Disclaimer: The authors and contributors of this project are not responsible for misuse of this tool or any violations of third-party terms of service. Users assume all legal risks associated with using this software.

Support

For issues, questions, or contributions:

Open an issue on GitHub
Review the API documentation at /docs
Check logs with sudo journalctl -u instagram-scraper -f

Built with ❤️ by MeshCore AI using FastAPI, OpenAI, and Apify

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
deploy		deploy
examples		examples
src/instagram_scraper		src/instagram_scraper
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
DEPLOYMENT.md		DEPLOYMENT.md
Makefile		Makefile
README.md		README.md
USER_STORIES.md		USER_STORIES.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Instagram Scraper Agent

Features

🤖 AI-Powered Intelligence

📊 Instagram Scraping Capabilities

🛠️ Developer-Friendly

Quick Start

Prerequisites

Installation

Configuration

Required Variables

Optional Variables

Usage Examples

Natural Language Chat (Recommended)

Direct Scraping Endpoints

Asynchronous Jobs

API Endpoints

Health & Status

Agent Endpoints

Direct Scraping

Async Jobs

Development

Available Make Commands

Project Structure

Deployment

Quick Deployment to Ubuntu Server

Architecture

Security Considerations

Limitations & Important Notes

Troubleshooting

Common Issues

Contributing

License

Legal & Compliance

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages