A real-time web application for tracking biotech and biopharma mergers, acquisitions, and partnerships using the GDELT 2.0 DOC API.
# Navigate to project directory
cd '/Users/junchilu/Desktop/2025/AI & Data Science/00_Projects/easy_prompt_app'
# Create virtual environment (if not exists)
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Option 1: Direct command
streamlit run app.py
# Option 2: Use startup script
./start_app.sh
- Local URL: http://localhost:8501
- Network URL: http://192.168.0.13:8501
easy_prompt_app/
โโโ app.py # ๐ฏ MAIN APPLICATION (635 lines)
โโโ requirements.txt # ๐ฆ Python dependencies
โโโ start_app.sh # ๐ Quick launcher script
โโโ venv/ # ๐ Virtual environment
โโโ README.md # ๐ This comprehensive guide
Single Source of Truth: Everything is in app.py
- no confusion about which file to run!
- GDELT 2.0 DOC API Integration: Fetches biotech news from thousands of sources
- No API Key Required: Free public API access
- Smart Filtering: Automatically filters for biotech/pharma deals
- Deal Size Parsing: Converts "$2.3B", "800 million", "1.2bn" to USD millions
- Company Extraction: Identifies acquirer and target companies
- Deal Type Classification: Acquisition, merger, partnership, investment
- Deduplication: Removes duplicate articles by URL
- KPI Dashboard: Total deals, deals with size, median/largest deal values
- Bar Chart: Top 10 deals by size (Acquirer โ Target)
- Timeline: Deal dots over time with size proportional to value
- Network Graph: Acquirer-target relationships with edge thickness = deal size
- Data Table: Complete deal information with clickable source links
- Backend: Python 3.13.5
- Frontend: Streamlit
- Data Processing: Pandas, NumPy
- Visualizations: Plotly, NetworkX, Matplotlib, Seaborn
- API: GDELT 2.0 DOC API
source venv/bin/activate
streamlit run app.py
- Use the sidebar to set search parameters
- Adjust date range (30-365 days)
- Customize search queries
- Click "Fetch Deals Data" to search for recent deals
- The app automatically searches multiple biotech-related queries
- Results are filtered for biotech/pharma relevance
- View KPI cards with key metrics
- Analyze top deals in the bar chart
- Track deal timeline
- Explore company relationships in network graph
- Browse detailed deals table
The app automatically searches for:
- "biotech acquisition AND (merck OR abbvie OR pfizer OR novartis OR roche)"
- "biopharma merger AND (therapeutics OR pharmaceuticals OR biotech)"
- "pharmaceutical acquisition AND (drug OR medicine OR therapy)"
- "biotech partnership AND (drug development OR clinical trial)"
- "pharma merger AND (biotech OR therapeutics)"
- "biotech buyout AND (drug OR medicine)"
- Deal Size Patterns: Recognizes $2.3B, 800 million, 1.2bn USD, etc.
- Company Names: Extracts acquirer and target from headlines
- Deal Types: Classifies as acquisition, merger, partnership, investment
- Biotech Filtering: Keeps only biotech/pharma related deals
The app successfully extracts deals like:
- Merck nears $10 billion deal for biotech Verona โ $10,000M, Merck โ Verona
- AbbVie Finalizes Acquisition Of Capstan Therapeutics โ AbbVie โ Capstan Therapeutics
- Recursion Pharmaceuticals acquiring REV102 rights โ Recursion Pharmaceuticals โ REV102
- Date Range: 30-365 days lookback
- Max Records: 250 per query (configurable)
- Search Terms: Customizable biotech queries
$2.3B
,$2.3 billion
,2.3B dollars
$800M
,$800 million
,800M USD
$500K
,$500 thousand
,500K dollars
- Basic company name extraction (could use NER)
- Limited deal type classification
- No historical data persistence
- Basic deduplication logic
- Named Entity Recognition (NER) for better company extraction
- Machine learning for deal type classification
- Database integration for historical tracking
- Advanced filtering and search capabilities
- Email alerts for new deals
- Export to Excel/CSV functionality
# Check virtual environment
source venv/bin/activate
python --version
# Reinstall dependencies
pip install -r requirements.txt
# Use a different port
streamlit run app.py --server.port 8502
- Check your internet connection
- Verify GDELT API is accessible
- Check the Streamlit logs for error messages
- Endpoint:
https://api.gdeltproject.org/api/v2/doc/doc
- Authentication: None required
- Rate Limits: Reasonable usage limits apply
- Documentation: GDELT Project
This is a workshop project designed for easy modification and extension. Key areas for improvement:
- Data Extraction: Enhance company name and deal size parsing
- Visualizations: Add more chart types and interactive features
- Data Storage: Implement database persistence
- UI/UX: Improve the Streamlit interface
- Performance: Optimize API calls and data processing
This project is created for educational and workshop purposes.
Here's the effective prompt to create this Biopharma M&A Radar app:
๐ Cursor Kick-off Prompt
I want to build a Biopharma M&A Radar web app.
Requirements:
Stack: Use Python + Streamlit (preferred for fast prototyping).
API source: Use the GDELT 2.0 DOC API (JSON, no API key required) to fetch recent news headlines about biotech / biopharma mergers, acquisitions, and partnerships.
Data extraction: For each article, parse the headline, URL, date, acquirer, target, deal size (USD), and deal type. Use regex and heuristics to capture deal size (e.g., $2.3B, 800 million, etc.) and normalize to USD numeric values.
Deduplicate by URL or similar titles. Filter only healthcare/biopharma deals.
Visualization:
KPI cards (total deals, # with deal size, median deal size, largest deal).
Bar chart: Top 10 deals by size (Acquirer โ Target, Year).
Timeline: Deal dots over time, size = deal value.
Breakdown by big pharma companies: Pie chart and bar chart showing deal distribution.
Table: Show extracted fields in a clean table with links to the source articles.
Keep code clean, modular, and well-commented so I can edit easily in the workshop.
Deliverables in this session:
Create a Streamlit app skeleton with placeholders for data fetching, parsing, and visualizations.
Implement the GDELT API fetch function with a test query for "biotech acquisition" in the last 6 months.
Return raw JSON results in a Streamlit preview to confirm API works.
Key considerations:
- Handle timezone issues in date comparisons
- Improve company name extraction with known biotech/pharma company lists
- Add realistic sample deal sizes for demonstration
- Filter for recent deals only (within specified date range)
- Remove network graph visualization (too complex)
- Focus on big pharma breakdown instead
- Clean up folder structure - consolidate documentation
- Ensure single source of truth (one main app.py file)
Just run: streamlit run app.py
That's it! No confusion, no multiple files to choose from. Everything you need is in the main app.py
file.
Built with โค๏ธ for the biopharma community