# Twitter Account Discovery with CrewAI

This notebook demonstrates how to use [CrewAI](https://github.com/crewAIInc/crewAI) to build a hierarchical agent system that collects Twitter accounts related to a topic. Search results are gathered from DuckDuckGo to avoid querying Twitter directly.

## Install dependencies
Run the following cell if the required packages are not installed.

In [None]:
!pip install crewai tweepy duckduckgo-search -q

## Imports and API credentials
Set your API keys as environment variables before running the rest of the notebook:

In [None]:
import os
TWITTER_CONSUMER_KEY = os.getenv('TWITTER_CONSUMER_KEY')
TWITTER_CONSUMER_SECRET = os.getenv('TWITTER_CONSUMER_SECRET')
TWITTER_ACCESS_TOKEN = os.getenv('TWITTER_ACCESS_TOKEN')
TWITTER_ACCESS_SECRET = os.getenv('TWITTER_ACCESS_SECRET')

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
SERPER_API_KEY = os.getenv('SERPER_API_KEY')


If any of the variables above prints `None`, set them in your environment before continuing.

In [None]:
print('TWITTER_CONSUMER_KEY:', TWITTER_CONSUMER_KEY)

## DuckDuckGo search tool

In [None]:
import tweepy
from typing import List, Dict
from duckduckgo_search import DDGS

class TwitterProfileTool:
    def __init__(self):
        auth = tweepy.OAuth1UserHandler(
            TWITTER_CONSUMER_KEY,
            TWITTER_CONSUMER_SECRET,
            TWITTER_ACCESS_TOKEN,
            TWITTER_ACCESS_SECRET,
        )
        self.api = tweepy.API(auth)

    def get_profile(self, username: str) -> Dict | None:
        try:
            user = self.api.get_user(screen_name=username)
            return {
                'username': user.screen_name,
                'name': user.name,
                'description': user.description,
                'followers': user.followers_count,
                'profile_url': f'https://twitter.com/{user.screen_name}',
            }
        except tweepy.TweepyException:
            return None

class DuckDuckGoSearchTool:
    def search_accounts(self, query: str, count: int = 20) -> List[str]:
        with DDGS() as ddgs:
            results = ddgs.text(f"site:twitter.com {query}", max_results=count)
            handles = []
            for r in results:
                url = r.get('href') or r.get('url')
                if not url or 'twitter.com/' not in url:
                    continue
                username = url.split('twitter.com/')[-1].split('/')[0]
                username = username.lstrip('@').split('?')[0]
                if username and username not in handles:
                    handles.append(username)
        return handles


## Define CrewAI agents

In [None]:
from crewai import Agent, Task, Crew, Process

searcher = Agent(
    role='Account Finder',
    goal='Identify Twitter accounts related to a given topic',
    backstory='Expert at social media research and data collection',
    tools=[DuckDuckGoSearchTool()],
    verbose=True,
)

manager = Agent(
    role='Search Manager',
    goal='Ensure at least 1000 unique accounts are collected',
    backstory='Oversees the search process and removes duplicates',
    verbose=True,
)

search_task = Task(
    description='Search for Twitter accounts related to {topic}. Return results as JSON.',
    expected_output='A JSON list of accounts',
    agent=searcher,
)

manager_task = Task(
    description='Coordinate the search until 1000 unique accounts are gathered.',
    agent=manager,
)

crew = Crew(
    agents=[manager, searcher],
    tasks=[manager_task, search_task],
    process=Process.hierarchical,
)


## Collect accounts

In [None]:
import pandas as pd

def gather_accounts(topic: str, batch_size: int = 20) -> pd.DataFrame:
    searcher_tool = DuckDuckGoSearchTool()
    verifier = TwitterProfileTool()
    unique = {}
    page = 1
    while len(unique) < 1000:
        print(f'Searching batch {page}...')
        handles = searcher_tool.search_accounts(query=topic, count=batch_size)
        for handle in handles:
            if handle in unique:
                continue
            profile = verifier.get_profile(handle)
            if profile and profile['followers'] > 0:
                unique[handle] = profile
        print(f'Total unique verified accounts: {len(unique)}')
        page += 1
    return pd.DataFrame(list(unique.values()))


## Run the search

In [None]:
topic = 'AI research'
accounts_df = gather_accounts(topic)
accounts_df.head()
