Skip to content

codingnoodle/aiworkshop_exercise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿงฌ Biopharma M&A Radar

A real-time web application for tracking biotech and biopharma mergers, acquisitions, and partnerships using the GDELT 2.0 DOC API.

๐Ÿš€ Quick Start

1. Setup & Installation

# Navigate to project directory
cd '/Users/junchilu/Desktop/2025/AI & Data Science/00_Projects/easy_prompt_app'

# Create virtual environment (if not exists)
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. Run the App

# Option 1: Direct command
streamlit run app.py

# Option 2: Use startup script
./start_app.sh

3. Access the App

๐Ÿ“ Project Structure

easy_prompt_app/
โ”œโ”€โ”€ app.py                    # ๐ŸŽฏ MAIN APPLICATION (635 lines)
โ”œโ”€โ”€ requirements.txt          # ๐Ÿ“ฆ Python dependencies
โ”œโ”€โ”€ start_app.sh             # ๐Ÿš€ Quick launcher script
โ”œโ”€โ”€ venv/                    # ๐Ÿ Virtual environment
โ””โ”€โ”€ README.md                # ๐Ÿ“– This comprehensive guide

Single Source of Truth: Everything is in app.py - no confusion about which file to run!

๐ŸŽฏ Features

Real-time Data

  • GDELT 2.0 DOC API Integration: Fetches biotech news from thousands of sources
  • No API Key Required: Free public API access
  • Smart Filtering: Automatically filters for biotech/pharma deals

Intelligent Data Extraction

  • Deal Size Parsing: Converts "$2.3B", "800 million", "1.2bn" to USD millions
  • Company Extraction: Identifies acquirer and target companies
  • Deal Type Classification: Acquisition, merger, partnership, investment
  • Deduplication: Removes duplicate articles by URL

Interactive Visualizations

  • KPI Dashboard: Total deals, deals with size, median/largest deal values
  • Bar Chart: Top 10 deals by size (Acquirer โ†’ Target)
  • Timeline: Deal dots over time with size proportional to value
  • Network Graph: Acquirer-target relationships with edge thickness = deal size
  • Data Table: Complete deal information with clickable source links

๐Ÿ› ๏ธ Tech Stack

  • Backend: Python 3.13.5
  • Frontend: Streamlit
  • Data Processing: Pandas, NumPy
  • Visualizations: Plotly, NetworkX, Matplotlib, Seaborn
  • API: GDELT 2.0 DOC API

๐Ÿ“Š How to Use

1. Launch the App

source venv/bin/activate
streamlit run app.py

2. Configure Search

  • Use the sidebar to set search parameters
  • Adjust date range (30-365 days)
  • Customize search queries

3. Fetch Data

  • Click "Fetch Deals Data" to search for recent deals
  • The app automatically searches multiple biotech-related queries
  • Results are filtered for biotech/pharma relevance

4. Explore Results

  • View KPI cards with key metrics
  • Analyze top deals in the bar chart
  • Track deal timeline
  • Explore company relationships in network graph
  • Browse detailed deals table

๐Ÿ” Data Sources & Processing

GDELT API Queries

The app automatically searches for:

  • "biotech acquisition AND (merck OR abbvie OR pfizer OR novartis OR roche)"
  • "biopharma merger AND (therapeutics OR pharmaceuticals OR biotech)"
  • "pharmaceutical acquisition AND (drug OR medicine OR therapy)"
  • "biotech partnership AND (drug development OR clinical trial)"
  • "pharma merger AND (biotech OR therapeutics)"
  • "biotech buyout AND (drug OR medicine)"

Data Extraction

  • Deal Size Patterns: Recognizes $2.3B, 800 million, 1.2bn USD, etc.
  • Company Names: Extracts acquirer and target from headlines
  • Deal Types: Classifies as acquisition, merger, partnership, investment
  • Biotech Filtering: Keeps only biotech/pharma related deals

๐Ÿ“ˆ Sample Results

The app successfully extracts deals like:

  • Merck nears $10 billion deal for biotech Verona โ†’ $10,000M, Merck โ†’ Verona
  • AbbVie Finalizes Acquisition Of Capstan Therapeutics โ†’ AbbVie โ†’ Capstan Therapeutics
  • Recursion Pharmaceuticals acquiring REV102 rights โ†’ Recursion Pharmaceuticals โ†’ REV102

๐Ÿ”ง Configuration

Search Parameters

  • Date Range: 30-365 days lookback
  • Max Records: 250 per query (configurable)
  • Search Terms: Customizable biotech queries

Deal Size Recognition

  • $2.3B, $2.3 billion, 2.3B dollars
  • $800M, $800 million, 800M USD
  • $500K, $500 thousand, 500K dollars

๐Ÿšง Current Limitations

  • Basic company name extraction (could use NER)
  • Limited deal type classification
  • No historical data persistence
  • Basic deduplication logic

๐Ÿ”ฎ Future Enhancements

  • Named Entity Recognition (NER) for better company extraction
  • Machine learning for deal type classification
  • Database integration for historical tracking
  • Advanced filtering and search capabilities
  • Email alerts for new deals
  • Export to Excel/CSV functionality

๐Ÿ†˜ Troubleshooting

If the app doesn't start:

# Check virtual environment
source venv/bin/activate
python --version

# Reinstall dependencies
pip install -r requirements.txt

If you get port conflicts:

# Use a different port
streamlit run app.py --server.port 8502

If API calls fail:

  • Check your internet connection
  • Verify GDELT API is accessible
  • Check the Streamlit logs for error messages

๐Ÿ“ API Information

GDELT 2.0 DOC API

  • Endpoint: https://api.gdeltproject.org/api/v2/doc/doc
  • Authentication: None required
  • Rate Limits: Reasonable usage limits apply
  • Documentation: GDELT Project

๐Ÿค Contributing

This is a workshop project designed for easy modification and extension. Key areas for improvement:

  1. Data Extraction: Enhance company name and deal size parsing
  2. Visualizations: Add more chart types and interactive features
  3. Data Storage: Implement database persistence
  4. UI/UX: Improve the Streamlit interface
  5. Performance: Optimize API calls and data processing

๐Ÿ“„ License

This project is created for educational and workshop purposes.


๐ŸŽจ Vibe Coding Prompt

Here's the effective prompt to create this Biopharma M&A Radar app:

๐Ÿš€ Cursor Kick-off Prompt

I want to build a Biopharma M&A Radar web app.

Requirements:

Stack: Use Python + Streamlit (preferred for fast prototyping).

API source: Use the GDELT 2.0 DOC API (JSON, no API key required) to fetch recent news headlines about biotech / biopharma mergers, acquisitions, and partnerships.

Data extraction: For each article, parse the headline, URL, date, acquirer, target, deal size (USD), and deal type. Use regex and heuristics to capture deal size (e.g., $2.3B, 800 million, etc.) and normalize to USD numeric values.

Deduplicate by URL or similar titles. Filter only healthcare/biopharma deals.

Visualization:

KPI cards (total deals, # with deal size, median deal size, largest deal).

Bar chart: Top 10 deals by size (Acquirer โ†’ Target, Year).

Timeline: Deal dots over time, size = deal value.

Breakdown by big pharma companies: Pie chart and bar chart showing deal distribution.

Table: Show extracted fields in a clean table with links to the source articles.

Keep code clean, modular, and well-commented so I can edit easily in the workshop.

Deliverables in this session:

Create a Streamlit app skeleton with placeholders for data fetching, parsing, and visualizations.

Implement the GDELT API fetch function with a test query for "biotech acquisition" in the last 6 months.

Return raw JSON results in a Streamlit preview to confirm API works.

Key considerations:
- Handle timezone issues in date comparisons
- Improve company name extraction with known biotech/pharma company lists
- Add realistic sample deal sizes for demonstration
- Filter for recent deals only (within specified date range)
- Remove network graph visualization (too complex)
- Focus on big pharma breakdown instead
- Clean up folder structure - consolidate documentation
- Ensure single source of truth (one main app.py file)

๐ŸŽ‰ Ready to Go!

Just run: streamlit run app.py

That's it! No confusion, no multiple files to choose from. Everything you need is in the main app.py file.

Built with โค๏ธ for the biopharma community

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published