# Pulling Climate Account data from Twitter

Logan Beskoon

I will be pulling followers from CFiguerers' and EcoSenseNow's accounts. They are in an interesting climate space, one calling for climate action now and the other trying to make sense of climate discussion positioning against fear.

I would like to pull followers from Greta Thurnberg's account; however, she has millions and millions. Sooo that's for another day. 

In [30]:
#Imports
import datetime
import tweepy as tw

# I've put my API keys in a .py file called API_keys.py
from API_keys import api_key, api_key_secret, access_token, access_token_secret

In [31]:
# Authenticate the Tweepy API
auth = tweepy.OAuthHandler(api_key,api_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

# Grab Followers

In [3]:

# I'm putting the handles in a list to iterate through below
chandles = ['EcoSenseNow', 'CFigueres']


# This will iterate through each Twitter handle that we're collecting from
for screen_name in chandles:
    
    # Tells Tweepy we want information on the handle we're collecting from
    # The next line specifies which information we want, which in this case is the number of followers 
    user = api.get_user(screen_name=screen_name) 
    followers_count = user.followers_count

    # Let's see roughly how long it will take to grab all the follower IDs. 
    print(f'''
    @{screen_name} has {followers_count} followers. 
    That will take roughly {followers_count/(5000*60):.0f} hours and {followers_count/(5000):.2f} minutes
    ''')


    @EcoSenseNow has 103108 followers. 
    That will take roughly 0 hours and 20.62 minutes
    

    @CFigueres has 160055 followers. 
    That will take roughly 1 hours and 32.01 minutes
    


Okay, so it's going to take a hot second. Sigh.

In [5]:
# This creates a dictionary containing a list for each Twitter handle we'll be grabbing follower IDs from
id_dict = {'EcoSenseNow' : [],
           'CFigueres' : []}

# Grabs the time when we start making requests to the API
start_time = datetime.datetime.now()

# .keys() allows us to iterate through each key in the dictionary
for handle in id_dict.keys():
    
    # Each page contains 5,000 records, so since we know there are much more than 5,000 followers for both
    # the Storm and Aces, we must iterate through each of the pages in order to get all follower IDs
    # To grab the follower IDs, we will be using followers_ids
    for page in tweepy.Cursor(api.get_follower_ids,
                              # This is how we will get around the issue of not being able to grab all ids at once
                              # Once the rate limit is hit, we will be notified that we must wait 15 mins (900 secs)
                              wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True,
                              screen_name=handle).pages():

        # The page variable comes back as a list, so we have to use .extend rather than .append
        id_dict[handle].extend(page)
        

# Let's see how long it took to grab all follower IDs
end_time = datetime.datetime.now()
elapsed_time = end_time - start_time
print(elapsed_time)

Unexpected parameter: wait_on_rate_limit
Unexpected parameter: wait_on_rate_limit_notify
Unexpected parameter: compression
Unexpected parameter: wait_on_rate_limit
Unexpected parameter: wait_on_rate_limit_notify
Unexpected parameter: compression
Unexpected parameter: wait_on_rate_limit
Unexpected parameter: wait_on_rate_limit_notify
Unexpected parameter: compression
Unexpected parameter: wait_on_rate_limit
Unexpected parameter: wait_on_rate_limit_notify
Unexpected parameter: compression
Unexpected parameter: wait_on_rate_limit
Unexpected parameter: wait_on_rate_limit_notify
Unexpected parameter: compression
Unexpected parameter: wait_on_rate_limit
Unexpected parameter: wait_on_rate_limit_notify
Unexpected parameter: compression
Unexpected parameter: wait_on_rate_limit
Unexpected parameter: wait_on_rate_limit_notify
Unexpected parameter: compression
Unexpected parameter: wait_on_rate_limit
Unexpected parameter: wait_on_rate_limit_notify
Unexpected parameter: compression
Unexpected param

0:45:09.039050


In [17]:
#take a look at the results
id_dict['EcoSenseNow'][:10]

[1456148493400035330,
 824560350557245440,
 46043757,
 1456665298325606407,
 511541146,
 20859500,
 68105618,
 365577229,
 198452257,
 1110555799129071616]

In [13]:
length_key = len(id_dict['CFigueres'])
print(length_key)

160059


In [18]:
users = id_dict['CFigueres'][:10]

for user_id in users:
    
    user = api.get_user(user_id=user_id)
    print(user.screen_name)

ryangzepeda
Stefanoutdoors
2903Esteban
kosickey
GroisoD
Jackie_OPR
Clara_Mottura
portflyfishing
BeyhkerM
NStaple


In [39]:
#saving the id_dict I created to local .txt file for working purposes.
# open file for writing
f = open("id_dict.txt","w")

# write file
f.write( str(id_dict) )

# close file
f.close()

In [None]:
id_dict = {'EcoSenseNow' : [],
           'CFigueres' : []}



## Grab data from followers IDs

Now we need to go and get even more information from Twitter accounts based on the ID.

In [55]:
# The code inside this function comes from Brenden Connors
# This will quickly grab information about each follower.
def get_screen_names(list_of_ids, list_for_screen_names):
    
    start=0 #we have feed the API 100 ID's at a time, this will iterate through them
    end=0
    followers=[]
    while end-1 <= len(list_of_ids): #quit when we get past the end of our list
        end += 100 #update the end of our slice
        if end <= len(list_of_ids): #split into if else statement to slice correctly
            try:
                followers_temp = api.lookup_users(user_id=list_of_ids[start:end])
                followers.extend(followers_temp)
            except tw.TweepyException as err: 
                if err == 103: #if we get a rate limit error, go to sleep
                    print('sleeping, 900 seconds')
                    time.sleep(900)

        else:
            try:
                followers_temp = api.lookup_users(user_id=list_of_ids[start:])
                followers.extend(followers_temp)
            except tw.TweepyException as err:
                if err == 103:
                    print('sleeping, 900 seconds')
                    time.sleep(900)
            
        
        start = end #update our starting slice index for next loop
        
    list_for_screen_names.extend(followers)

In [56]:
# Let's put the function to use and make a new dictionary holding all user information
user_dict = {'EcoSenseNow': [],
                    'CFigueres' : []}

for handle in user_dict.keys():
    
    get_screen_names(id_dict[handle],user_dict[handle])

Rate limit reached. Sleeping for: 57
Rate limit reached. Sleeping for: 82


In [61]:
user_dict['EcoSenseNow'][0]

User(_api=<tweepy.api.API object at 0x0000029B2A3F3FD0>, _json={'id': 1456148493400035330, 'id_str': '1456148493400035330', 'name': 'John Smith', 'screen_name': 'notfakhongs1', 'location': '', 'description': 'Banned for saying Colin Powell should have been nixed for war crimes, been on twitter since 2013', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 11, 'friends_count': 483, 'listed_count': 0, 'created_at': 'Thu Nov 04 06:37:18 +0000 2021', 'favourites_count': 61, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 4, 'lang': None, 'status': {'created_at': 'Mon Nov 08 00:13:37 +0000 2021', 'id': 1457501525173960704, 'id_str': '1457501525173960704', 'text': '@FluorescentGrey I strongly believe child sexual abuse is portrayed in an completely erroneous way on both the righ… https://t.co/gUkbZVKG7v', 'truncated': True, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name

In [62]:
length_key = len(user_dict['EcoSenseNow'])
print(length_key)

103099


In [63]:
#saving the id_dict I created to local .txt file for working purposes.
# open file for writing
f = open("user_dict.txt","w")

# write file
f.write( str(id_dict) )

# close file
f.close()

Now we can write the data we want to two .txt files for further research.

In [66]:
headers = ['screen_name','name','ID','Location','Followers_Count','Friends_Count','description']

for figure in user_dict.keys():

    # Descriptions with emoji or non-Roman letters can cause trouble. Encoding your .txt file in utf-8 will help
    with open(f'{figure}_followers.txt','w', encoding='utf-8') as out_file:
        out_file.write('\t'.join(headers) + '\n')

        for idx, user in enumerate(user_dict[figure]):
            
            # For accounts set to private, we won't be able to get the description unless we follow them
            # Putting in a try/except statement, we can get around this issue.
            name = str(user.name).replace('\t',' ').replace('\n',' ')
            location = str(user.location).replace('\t',' ').replace('\n',' ')
            description = str(user.description).replace('\t',' ').replace('\n',' ')
            outline = [user.screen_name, name, user.id, location, user.followers_count, user.friends_count, description]

            out_file.write('\t'.join([str(item) for item in outline]) + '\n')

                
            #if idx == 100:
                #break