# First protocol
_Code written and runs in python 3.11.0. Modify environment variables and queries as needed._  
_Please use venv_

## Protocol
· Start with keywords:

- Smartchain

- Nft

- Airdrop

- Crypto

- …etc.

1. Sample up to 10k tweets containing at least one term from 100 random hours from the past year (so 1M tweets)

2. Determine the most engaged (top) with users from this combined sample (100 or 1000)

3. Pull up to 1000 comments for each top user

4. Determine top users whose comments mention at least three users other than the top user

5. Expand top user sample if we don’t have at least 100 airdrop seeders

6. Time series chart plots:

7. Top user activity

8. Airdrop seeder activity

9. Negative reaction activity? (based on sentiment analysis of replies to airdrop messages)

10. External crypto value signals (from where?)



# Dependencies
Run the following commands in the terminal to install the required packages

$pip install requests  
  
$pip install pandas  
  
$pip install datetime  
  
$pip install python-dateutil
  

--------------------  
create a files <data.json> in the same folder as this jupyter notebook


# Authentication step
In the code cell below replace bearer_token with your bearer token. Run the cell, then delete your bearer token.
This creates the token as an environment variable to be used under the name TOKEN. The token can then be removed so that others do not have access to your token when code is shared via GitHub. I will change this to dotenv and a .gitignore file later I just havent done that yet.

In [155]:
os.environ['TOKEN'] = ''

In [156]:
"""This cell creates functions to be used for authentication as well as creating endpoints."""
import requests

def auth():
    """Retrieves your bearer token."""
    return os.getenv('TOKEN')

def create_headers(bearer_token):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    return headers


def create_full_search_url(keyword: str, start_date: list[str], end_date: list[str], max_results: int = 100):
    
    search_url: str = "https://api.twitter.com/2/tweets/search/all" 

    #change params based on the endpoint you are using
    query_params = {'query': keyword,
                    'start_time': start_date,
                    'end_time': end_date,
                    'max_results': max_results,
                    'expansions': 'referenced_tweets.id.author_id',
                    'tweet.fields': 'id,author_id,conversation_id,created_at,in_reply_to_user_id,lang,public_metrics,referenced_tweets,source,text',
                    #'user.fields': 'id,name,public_metrics,username,verified',
                    #'place.fields': 'country',
                    'next_token': {}}
    return (search_url, query_params)

def search_retweet_id_url(id: str):
    
    search_url: str = f"https://api.twitter.com/2/tweets/{id}" 

    #change params based on the endpoint you are using
    query_params = {'tweet.fields': 'author_id'}
    return (search_url, query_params)

def connect_to_endpoint(url, headers, params, next_token = None):
    params['next_token'] = next_token   #params object received from create_url function
    response = requests.request("GET", url, headers = headers, params = params)
    print("\nEndpoint Response Code: " + str(response.status_code))
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

#print(auth())

In [128]:
import datetime
import random
"""This code cell contains two functions (is_leap_year and random_date) which help generate a random one hour date range when random_date() is called"""
# Use by calling "start_time, end_time = random_date()"

def is_leap_year(year):
    """Returns True if the given year is a leap year, False otherwise."""
    if year % 4 == 0:
        if year % 100 == 0:
            if year % 400 == 0:
                return True
            else:
                return False
        else:
            return True
    else:
        return False

def sort_timestamps(timestamps):
    # Convert timestamps to datetime objects
    datetimes = [datetime.datetime.fromisoformat(ts) for ts in timestamps]
    # Sort datetime objects
    datetimes.sort()
    # Convert sorted datetime objects back to timestamps
    sorted_timestamps = [dt.strftime("%Y-%m-%dT%H:%M:%SZ") for dt in datetimes]
    return sorted_timestamps

def random_date():
    """Generate a random one hour date range within the last year in RFC 3339 format to be used with twitter API."""
    month = random.randint(1, 12)
    year = random.randint(datetime.datetime.now().year - 1, datetime.datetime.now().year)
    if month <= datetime.datetime.now().month:
        year = datetime.datetime.now().year
    else:
        year = datetime.datetime.now().year - 1
    if month == datetime.datetime.now().month:
        day = random.randint(1, datetime.datetime.now().day - 1)
    elif month == 2:
        if is_leap_year(year):
            day = random.randint(1, 29)
        else:
            day = random.randint(1, 28)
    elif month in [1, 3, 5, 7, 8, 10, 12]:
        day = random.randint(1, 31)
    else:
        day = random.randint(1, 30)
    hour = random.randint(0, 23)
    start_time = datetime.datetime(year, month, day, hour)
    end_time = start_time + datetime.timedelta(hours=1)
    start_timestamp = start_time.strftime("%Y-%m-%dT%H:%M:%SZ")
    end_timestamp = end_time.strftime("%Y-%m-%dT%H:%M:%SZ")
    return start_timestamp, end_timestamp

def return_n_random_hour_ranges_sorted(n: int) -> list:
    start_time1_list = list()
    end_time1_list = list()
    for i in range(0, n):
        s1, s2 = random_date()
        while s1 in start_time1_list:
            s1, s2 = random_date()
        start_time1_list.append(s1)
        end_time1_list.append(s2)

    sorted_start = sort_timestamps(start_time1_list)
    sorted_end = sort_timestamps(end_time1_list)
    return (sorted_start, sorted_end)

In [178]:
"""WORKING EXAMPLE OF SO FAR, Above is for more function use etc."""
import json
import pandas as pd
import csv
from typing import Optional
import time

def tweets_per_range(keyword: str, start_times_list: list, end_times_list: list, results_per_range: int, next_token: Optional[str] = None) -> json:
    bearer_token = auth()
    headers: dict[str, str] = create_headers(bearer_token)
    json_obj_by_time_range: dict[str, dict] = dict()
    max_results: int = 500
    for i in range(0, len(start_times_list)):
        total_count = 0
        # Creates url and connects to endpoint then assignts the JSON STRING API response to json_response
        url = create_full_search_url(keyword, start_times_list[i], end_times_list[i], max_results)
        json_obj_response = connect_to_endpoint(url[0], headers, url[1], next_token) # prints response code
        print(f"Outer for loop enpoint called for list index {i} / {len(start_times_list) - 1}")
        json_obj_response.pop('includes', None)
        json_obj_response['time'] = (f"{start_times_list[i]} --- {end_times_list[i]}")

        # Appends the json object API response to the json_obj_data dictionary.
        json_obj_by_time_range[f'time_range_{i}'] = json_obj_response
        total_count += json_obj_response['meta']['result_count']
        time.sleep(5)
        
        while total_count <= results_per_range:
            if 'next_token' in json_obj_response['meta']:
                next_token = json_obj_response['meta']['next_token']

                # Creates url and connects to endpoint then assignts the JSON API response to json_response
                #url = create_full_search_url(keyword, start_times_list[i], end_times_list[i], max_results)
                json_obj_response = connect_to_endpoint(url[0], headers, url[1], next_token) # prints response code
                print(f"While loop enpoint called: index {i} / {len(start_times_list) - 1}")
                next_token = None
                if 'data' in json_obj_response:
                    json_obj_by_time_range[f'time_range_{i}']['data'].append(json_obj_response['data'])
                    total_count += json_obj_response['meta']['result_count']
                    print(f"data key found and data appended: {total_count} / {results_per_range} tweets in this range scraped")
                else:
                    print("empty next token")
            else:
                print("No more tweets to scrape, total tweets will be less than amount desired.")
                print(f"total results {total_count}")
                next_token = None
                break
            time.sleep(5)
        time.sleep(5)
    return(json_obj_by_time_range)
        

def return_user_ids_original_tweets():
    
    return

In [165]:
"""WORKING EXAMPLE OF SO FAR, Above is for more function use etc."""
import json
import pandas as pd
import csv




bearer_token = auth()
"""input the necessary inputs below."""
headers = create_headers(bearer_token)
keyword = "Smartchain OR Airdrop OR Crypto"
# Use the commented out code below when using full access api
#start_time, end_time = random_date()
start_time = "2023-01-02T17:00:00Z"
end_time = "2023-01-04T20:00:00Z"

max_results = 10
url = create_full_search_url(keyword, start_time, end_time, max_results)
json_response = connect_to_endpoint(url[0], headers, url[1])
main_dict = dict()
"""This function scrapes through a JSON data file from a twitter API call and adds the retweeted user id to the referenced_tweets key."""
def analyze_retweet_counts():
    tweet_metrics_dict: dict = dict()
    for i in range(0, len(json_response['data'])):
        if 'referenced_tweets' in json_response['data'][i] and json_response['data'][i]['referenced_tweets'][0]['type'] == "retweeted":
            original_tweet_id_from_retweet = json_response['data'][i]['referenced_tweets'][0]['id']
            json_response['data'][i]['original_tweet_user_id'] = original_tweet_user_id
            if original_tweet_id_from_retweet in tweet_metrics_dict:
                tweet_metrics_dict[original_tweet_id_from_retweet] += 1
            else:
                tweet_metrics_dict[original_tweet_id_from_retweet] = 1
        else:
            this_tweet_id = json_response['data'][i]['id']
            if this_tweet_id in tweet_metrics_dict:
                tweet_metrics_dict[this_tweet_id] += 1
            else:
                tweet_metrics_dict[this_tweet_id] = 1
    return(tweet_metrics_dict)

analyze_retweet_counts()

json_response['time'] = (f"{start_time} --- {end_time}")
main_dict['time_range_1'] = json_response

print(json.dumps(main_dict, indent=15))


Endpoint Response Code: 200


NameError: name 'original_tweet_user_id' is not defined

# Main Working Cell

The below cell is intended to be the final cell which ties together all functions into the 'solution' to the protocol

In [159]:
start_list, end_list = return_n_random_hour_ranges_sorted(2)

['2022-02-26T23:00:00Z', '2022-06-11T08:00:00Z']


In [166]:
#start_list = ["2023-01-10T17:00:00Z", "2023-01-11T17:00:00Z"] #"2023-01-08T17:00:00Z", 
#end_list = ["2023-01-10T18:00:00Z", "2023-01-11T18:00:00Z"] #"2023-01-08T18:00:00Z",

json_final_data = tweets_per_range("Smartchain OR Airdrop OR Crypto OR Nft", start_list, end_list, 10000)

json_to_file = json.dumps(json_final_data)
with open("data.json", "w") as outfile:
    outfile.write(json_to_file)


Endpoint Response Code: 200
Outer for loop enpoint called for list index 0 / 2

Endpoint Response Code: 200
While loop enpoint called: index 0 / 2
data key found and data appended: 694 / 10000 tweets in this range scraped

Endpoint Response Code: 200
While loop enpoint called: index 0 / 2
data key found and data appended: 1036 / 10000 tweets in this range scraped

Endpoint Response Code: 200
While loop enpoint called: index 0 / 2
data key found and data appended: 1383 / 10000 tweets in this range scraped

Endpoint Response Code: 200
While loop enpoint called: index 0 / 2
data key found and data appended: 1721 / 10000 tweets in this range scraped

Endpoint Response Code: 200
While loop enpoint called: index 0 / 2
data key found and data appended: 2045 / 10000 tweets in this range scraped

Endpoint Response Code: 200
While loop enpoint called: index 0 / 2
data key found and data appended: 2400 / 10000 tweets in this range scraped

Endpoint Response Code: 200
While loop enpoint called: i

In [196]:
f = open('data.json')

data = json.load(f)
  
count = 0
#print((data['time_range_1']['data'][300]))
for key in data['time_range_1']['data']:
    if len(key) > 20:
        count+=len(key)
    else:
        count += 1
print(count)
#print(count)
#print(json.dumps(data['time_range_1']['data'][360], indent = 4))
#for item in data['time_range_0']['data']:
    #print(len(item))
#print(json.dumps(data['time_range_0']['data'][360], indent = 4))


10134
