# Scrape Users Tweets | Data with Python Tweepy | Tweepy Tutorial

Tutorial: https://youtu.be/1ELzPZcpTsg

The following code uses **Tweepy's API Class**: The API class provides access to the entire twitter RESTful API methods. Each method can accept various parameters and return responses. For more information about these methods please refer to [API Reference](https://docs.tweepy.org/en/stable/api.html).

**Sections**:
- Authentication
- Creating API Object
- Getting User Info
- Retrieve User Info and Tweets for Multiple Users
- Import JSON File

In [1]:
import tweepy
from tweepy import OAuthHandler
import json
import os

In [2]:
tweepy.__version__

'4.1.0'

## Authentication

In [3]:
with open('Twitter_API_keys_melihcanyardi.json', 'r') as f:
    keys_tokens = json.load(f)
    
consumer_key = keys_tokens['consumer_key']
consumer_secret = keys_tokens['consumer_secret']
access_token = keys_tokens['access_token']
access_token_secret = keys_tokens['access_token_secret']

In [4]:
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

## Creating API Object

- **wait_on_rate_limit**: Whether or not to automatically wait for rate limits to replenish
- **wait_on_rate_limit_notify**: Whether or not to print a notification when Tweepy is waiting for rate limits to replenish
    - !!! _`wait_on_rate_limit_notify` is not available on tweepy versions >= 4.0.0_ !!!

**Note:** Rate limit is the **MONTHLY TWEET CAP USAGE** in Twitter Developer Dashboard.

In [5]:
#api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True) ## wait_on_rate_limit_notify is not available on tweepy versions >= 4.0.0
api = tweepy.API(auth, wait_on_rate_limit=True)

## Getting User Info

In [6]:
user_info = api.get_user(screen_name = 'elonmusk')
print(type(user_info))
print(user_info)

<class 'tweepy.models.User'>
User(_api=<tweepy.api.API object at 0x0000021B69EAC100>, _json={'id': 44196397, 'id_str': '44196397', 'name': 'Elon Musk', 'screen_name': 'elonmusk', 'location': '', 'profile_location': None, 'description': '', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 62343823, 'friends_count': 105, 'listed_count': 80114, 'created_at': 'Tue Jun 02 20:12:29 +0000 2009', 'favourites_count': 10713, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': True, 'statuses_count': 15899, 'lang': None, 'status': {'created_at': 'Thu Nov 04 22:11:36 +0000 2021', 'id': 1456383656512180230, 'id_str': '1456383656512180230', 'text': '@RealSkyWatcher @thesheetztweetz 🤣🤣', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'RealSkyWatcher', 'name': 'SkyWatcher', 'id': 1164004907839557633, 'id_str': '1164004907839557633', 'indices': [0, 15]}, {'screen_name': 'thesheetztweetz

In [7]:
print(f"Name: {user_info.name}")
print(f"Description: {user_info.description}")
print(f"Location: {user_info.location}")
print(f"Followers Count: {user_info.followers_count}")
print(f"Friends Count: {user_info.friends_count}")

Name: Elon Musk
Description: 
Location: 
Followers Count: 62343823
Friends Count: 105


## Retrieve User Info and Tweets for Multiple Users

- api.get_user(screen_name = user)
- api.user_timeline(screen_name = user)

In [8]:
if not os.path.isdir('output'):
    os.mkdir('output')

In [9]:
#userID_list = ['elonmusk']
userID_list = ["sundarpichai", "elonmusk", "finkd", "JeffBezos", "BillGates"]

In [10]:
for user in userID_list:
    user_info = api.get_user(screen_name = user)
    name = user_info.name
    description = user_info.description
    location = user_info.location
    followers_count = user_info.followers_count
    friends_count = user_info.friends_count
    tweets = api.user_timeline(screen_name = user, count = 200, include_rts = False, tweet_mode = 'extended')

    user_info = {
        "Tweets Found": len(tweets),
        "Name": name,
        "Bio": description,
        "followers_count": followers_count,
        "friends_count": friends_count
    }
    
    tweets_list = []
    tweets_list.append(user_info)
    print(f"Tweets Found ({user}): ", len(tweets))
        
    #for index, info in enumerate(tweets):
    for info in tweets:
        tweet_dict = {
            #'sr.no': index,
            'ID': info.id,
            'date_time_posted': str(info.created_at),
            'tweet': info.full_text
        }
        tweets_list.append(tweet_dict)
        
    with open(f'output/{user}.json', 'w') as outfile:
        json.dump(tweets_list, outfile, indent=4)

Tweets Found (sundarpichai):  163
Tweets Found (elonmusk):  22
Tweets Found (finkd):  19
Tweets Found (JeffBezos):  188
Tweets Found (BillGates):  172


## Read JSON File

In [11]:
for user in userID_list:
    print(f"User: {user}\n")
    with open(f"output/{user}.json", "r") as f:
        data = json.load(f)
        print(data)
    print("\n" + "-"*125 + "\n")

User: sundarpichai

[{'Tweets Found': 163, 'Name': 'Sundar Pichai', 'Bio': 'CEO,  Google and Alphabet', 'followers_count': 4033668, 'friends_count': 359}, {'ID': 1456127065871110145, 'date_time_posted': '2021-11-04 05:12:00+00:00', 'tweet': 'Happy Diwali to everyone celebrating the festival of lights! (Look for the 🪔 when you search for "Diwali" on Google:) \nhttps://t.co/7Lzc3FvDNl'}, {'ID': 1455945516672434184, 'date_time_posted': '2021-11-03 17:10:35+00:00', 'tweet': 'Proud to partner with @hiringourheroes to launch Career Forward as part of a new $20M commitment to help veterans, transitioning service members, and military spouses with digital skills training, job placement support + more to expand economic opportunity https://t.co/AOPNMVgtfQ'}, {'ID': 1454185182899019781, 'date_time_posted': '2021-10-29 20:35:39+00:00', 'tweet': 'In 2019 we committed to making Google Career Certificates available in 100+ community colleges to help students grow their careers. Starting today, the G