A comprehensive system for scraping Reddit posts about memecoins, analyzing sentiment and token mentions, and executing automated trades on Solana using Jupiter Aggregator.
This project was built for a hackathon to demonstrate automated sentiment analysis and trading of memecoins. It combines:
- Web Scraping: Automated Reddit scraping using Browser Cash's hosted browsers
- AI Analysis: Token identification using Browser Cash's Agent API
- Automated Trading: Buy/sell execution on Solana via Jupiter Aggregator
-
Browser Cash Integration (
src/browser_cash_client.py)- Manages remote browser sessions via Browser Cash API
- Uses Playwright with CDP (Chrome DevTools Protocol) for browser control
- Handles navigation, script execution, and session management
-
Reddit Scraper (
src/reddit_scraper.py)- Scrapes posts from multiple subreddits simultaneously
- Extracts post metadata (title, content, upvotes, comments, timestamps)
- Navigates to individual posts to scrape comments
- Filters posts from the past week
- Handles infinite scroll to load more posts
-
Agent API Client (
src/agent_client.py)- Uses Browser Cash Agent API for AI-powered token identification
- Analyzes post titles, content, and comments to identify token names
- Implements queuing with semaphore to prevent session limit errors
- Includes retry logic with exponential backoff
-
Jupiter Trading Client (
src/jupiter_client.py)- Interfaces with Jupiter Aggregator API for Solana token swaps
- Token address lookup (with Birdeye fallback)
- Price quotes and swap execution
- Transaction signing using Solana Web3.py
-
Data Models (
src/models.py)Postdataclass for structured scraped data- JSON serialization support
- β
Parallel Subreddit Scraping: Scrapes 3 subreddits simultaneously (
altcoin,CryptoMoonShots,pumpfun) - β Historical Scraping: Scrapes all posts from the past week
- β Comment Extraction: Navigates to each post to scrape comments
- β Infinite Scroll: Aggressively scrolls to load more posts
- β Duplicate Prevention: Tracks seen posts to avoid duplicates
- β Incremental Saving: Saves posts to JSON in real-time as they're scraped
- β Thread-Safe: Uses locks to prevent JSON file corruption with parallel instances
- β
Regex Fallback: Fast
$TOKENpattern matching in titles (2-5 characters) - β AI Analysis: Uses Agent API to analyze post content + comments for token names
- β Queued Processing: Agent calls are queued globally to prevent session limits
- β Retry Logic: Automatic retries with exponential backoff on failures
- β Token Lookup: Finds token addresses from ticker symbols
- β Price Quotes: Gets swap quotes from Jupiter
- β Automated Swaps: Executes buy/sell orders on Solana
- β
Transaction Signing: Signs transactions with private key from
.env - β Error Handling: Robust error handling and retry logic
- Python 3.8+
- Browser Cash API key (Browser API + Agent API)
- Solana wallet with private key (for trading)
.envfile with API keys and wallet credentials
- Clone the repository:
git clone https://github.com/Cirbble/codejam2025.git
cd codejam2025- Install dependencies:
pip install -r requirements.txt
python -m playwright install chromium- Configure environment variables:
cp .env.example .envEdit .env with your credentials:
BROWSER_CASH_API_KEY=your_browser_api_key
AGENT_CASH_API_KEY=your_agent_api_key
MILAN_HOST=gcp-usc1-1.milan-taurine.tera.space
SOLANA_PRIVATE_KEY=your_solana_private_key_hereIf you have a Phantom wallet with a 12-word recovery phrase:
- Use a tool like
mnemoniclibrary to convert seed phrase to private key - The private key should be 64 bytes (128 hex characters)
- Store it securely in
.env(never commit to git!)
python main.pyThis will:
- Start 3 parallel browser sessions (one per subreddit)
- Scrape posts from the past week
- Extract comments from each post
- Identify tokens using AI (queued to prevent session limits)
- Save all data to
scraped_posts.jsonincrementally
Posts are saved to scraped_posts.json with the following structure:
{
"id": 1,
"source": "r/pumpfun",
"platform": "reddit",
"title": "Check out $TOKEN - going to the moon!",
"content": "Post content here...",
"author": "username",
"timestamp": "2025-01-15T10:30:00Z",
"post_age": "2 hours ago",
"upvotes_likes": 42,
"comment_count": 5,
"comments": ["Comment 1", "Comment 2", ...],
"link": "https://www.reddit.com/r/pumpfun/comments/...",
"token_name": "TOKEN",
"sentiment_score": null,
"hype_score": null
}Test buying a token:
python test_buy_hege.pyThis will:
- Check your wallet balance
- Look up the token address
- Get a quote for $1 worth
- Execute the buy order
- Display the transaction hash
Edit src/config.py to change which subreddits are scraped:
MEMECOIN_SUBREDDITS = [
"CryptoMoonShots",
"SatoshiStreetBets",
"altcoin",
"pumpfun",
# Add more...
]Edit main.py to change which subreddits run in parallel:
SUBREDDITS = ["altcoin", "CryptoMoonShots", "pumpfun"]In main.py, adjust limit_per_subreddit (posts per page):
posts = scraper.scrape_all_subreddits(
limit_per_subreddit=25, # Posts per page
scrape_comments=True,
take_screenshots=False,
output_file=output_file
)If you see "Session limit reached" errors:
- The scraper uses a global semaphore to queue agent calls (max 1 concurrent)
- Browser sessions are limited by Browser Cash's API limits
- Try reducing the number of parallel instances
If you see ERR_CONNECTION_RESET:
- Reddit may be rate-limiting your requests
- The scraper includes retry logic with exponential backoff
- Try reducing the number of parallel instances
If tokens aren't being identified:
- Check that Agent API has sufficient credits
- Verify the regex pattern matches (e.g.,
$TOKENin title) - Check agent logs for failures
- Ensure comments are being scraped (agent analyzes comments too)
If trading fails:
- Verify your wallet has sufficient SOL balance
- Check that the token address is correct
- Ensure Jupiter API is accessible (check DNS if needed)
- Verify your private key is correct (64 bytes)
.
βββ main.py # Main entry point (parallel scraping)
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (not in git)
βββ scraped_posts.json # Output file (generated)
βββ src/
β βββ __init__.py
β βββ config.py # Configuration (API keys, subreddits)
β βββ models.py # Data models (Post dataclass)
β βββ browser_cash_client.py # Browser Cash API client
β βββ agent_client.py # Agent API client (token identification)
β βββ reddit_scraper.py # Reddit scraping logic
β βββ jupiter_client.py # Jupiter trading client
β βββ twitter_scraper.py # Twitter scraper (not used)
βββ test_buy_hege.py # Test script for trading
βββ test_jupiter.py # Jupiter API test
βββ test_network.py # Network diagnostics
βββ README.md # This file
- Never commit
.envto git - it contains sensitive API keys and private keys - Private keys: Store securely, never share
- API keys: Rotate if exposed
- Trading: Start with small amounts for testing
- Uses Browser Cash's Session API to create remote browser sessions
- Connects via CDP (Chrome DevTools Protocol) using Playwright
- Handles navigation, script execution, and page interactions
- Manages session lifecycle (start, stop, cleanup)
- Uses Browser Cash's Agent API for AI-powered analysis
- Sends post content + comments to agent for token identification
- Implements queuing to prevent session limit errors
- Includes retry logic for reliability
- Uses Jupiter Aggregator API (
lite-api.jup.ag) for swaps - Supports versioned transactions (v1 API)
- Handles token account creation (rent costs)
- Signs transactions with Solana Web3.py
- Parallel subreddit scraping using
threading.Thread - Global semaphore for agent API calls (prevents session limits)
- Thread-safe JSON file updates using locks
- Global post ID counter for unique IDs across instances
- Scraping Speed: ~3-5 posts per minute per subreddit (includes comment scraping)
- Token Identification: Queued, ~10-30 seconds per post (depending on queue)
- Parallel Instances: 3 subreddits scraped simultaneously
- Memory Usage: Moderate (browser sessions + Playwright)
- Reddit rate limiting may slow down scraping
- Agent API session limits require queuing
- Browser Cash API has concurrent session limits
- Token identification accuracy depends on post content quality
- Trading requires sufficient SOL balance for gas + token account rent
- Sentiment analysis scoring
- Hype score calculation
- Automated trading based on sentiment thresholds
- Twitter/X integration
- Telegram channel monitoring
- Real-time monitoring (vs. historical scraping)
- Dashboard/UI for monitoring
- Database storage (vs. JSON files)
- More robust error recovery
- Performance optimizations
This is a hackathon project. Contributions welcome!
See LICENSE file for details.
- Browser Cash for hosted browser infrastructure
- Jupiter Aggregator for Solana DEX aggregation
- Reddit for the platform we scrape
- Solana for the blockchain we trade on
Built for CodeJam 2025 π