# Introducion
This script collects tweets containing the query "AI Art" to support an exploration of online toxicity within discussions about artificial intelligence and digital art. It employs the Tweepy library for interfacing with the Twitter API, using multiple API keys to manage rate limits dynamically. The script retrieves tweets in batches, handles pagination with a "next token" mechanism, and includes error handling for rate limits and other issues. Collected tweets are deduplicated based on unique tweet IDs and saved in JSON format, including the tweet ID, text, and timestamp. The output will provide valuable data for analyzing themes, sentiments, and toxic behavior in this online discourse.



In [166]:
import time
import json
import tweepy


In [197]:

API_KEYS = [
    
    {
        "API_KEY": 'Yours api key',
        "API_SECRET": 'Your api secret',
        "BEARER_TOKEN" :'Your bearer token',
    },
    
  

]

## Top topics
- AI art ethical （done）
- AI Killing Creativity （done）
- AI Art Copyright (done) 70
- AI Art Exploitation (done) 30
- AI Art Fairness (done) 1
- AI Art Soulless (done) 96
- AI Art Theft (done) 99

In [200]:
query = "AI Art Theft" 
max_results = 100  
total_tweets_to_fetch = 1000  
pause_time = 30  


In [201]:

tweets = []  
next_token = None  
current_api_index = 0  
rate_limit_count = 0 
total_rate_limit_count = 0 

# Capture theme-related tweets
while len(tweets) < total_tweets_to_fetch:
    try:
        # Change current API key
        api = API_KEYS[current_api_index]
        client = tweepy.Client(bearer_token=api["BEARER_TOKEN"])
        print(f"Using API key {current_api_index + 1} grab themed tweets...")

        # Using API 
        response = client.search_recent_tweets(
            query=query,
            max_results=max_results,
            next_token=next_token
        )


        # if there has data, saving it to the list
        if response.data:
            tweets.extend(response.data)
            print(f"Currently, a total of {len(tweets)} topic tweets have been captured.")

        else:
            print("No more tweets to capture.")

            

        # Get paging token
        next_token = response.meta.get("next_token")
        if not next_token:
            print(f"API key {current_api_index + 1} Grabbing has been completed.")
            current_api_index += 1  # change it to next API key
            next_token = None  # Reset paging token
            if current_api_index >= len(API_KEYS):  
                print("All API keys have been used up.")

    except tweepy.TooManyRequests:
        # Handle rate limit
        print(f"API key {current_api_index + 1} has hit the rate limit, pausing for {pause_time // 60} minutes...")
        time.sleep(pause_time)
        rate_limit_count += 1  # Count the rate limit occurrences
        total_rate_limit_count += 1  # Update total rate limit count

        # If all API keys hit the rate limit, skip comment fetching and end fetching
        if total_rate_limit_count >= len(API_KEYS) * 3:
            print("All API keys have reached the rate limit, skipping comment fetching and saving data directly...")
            break

    except Exception as e:
        # Handle other errors
        print(f"An error occurred: {e}")
        break


# remove the duplicates
unique_tweets = list({tweet.id: tweet for tweet in tweets}.values())


# make it to json files
tweet_data = [{"id": tweet.id, "text": tweet.text, "created_at": str(tweet.created_at)} for tweet in unique_tweets]


# saving the files
with open("tweets.json", "w", encoding="utf-8") as f:
    json.dump(tweet_data, f, ensure_ascii=False, indent=4)

print(f"A total of {len(unique_tweets)} topic tweets have been captured.")



Using API key 1 grab themed tweets...
Currently, a total of 99 topic tweets have been captured.
Using API key 1 grab themed tweets...
API key 1 has hit the rate limit, pausing for 0 minutes...
Using API key 1 grab themed tweets...
API key 1 has hit the rate limit, pausing for 0 minutes...
Using API key 1 grab themed tweets...
API key 1 has hit the rate limit, pausing for 0 minutes...
All API keys have reached the rate limit, skipping comment fetching and saving data directly...
A total of 99 topic tweets have been captured.
