# Web Scraping with Python – Scrape Data from Twitter using Tweepy

In [10]:
"""
- If you are a data enthusiast, you'll likely agree that one of the richest sources of real-world data is social media. Sites like Twitter are full of data.
- You can use the data you can get from social media in a number of ways, like sentiment analysis (analyzing people's thoughts) on a specific issue or field of interest.

- There are several ways you can scrape (or gather) data from Twitter. And in this project, we will look at using Tweepy to scrape Twitter.

- We will scrape public conversations from people on a specific trending topic, as well as tweets from a particular user.

Now without further ado, let's get started.

Tweepy:
Introduction to Our Scraping Tool
Tweepy is a Python library for integrating with the Twitter API. Because Tweepy is connected with the Twitter API, you can perform complex queries in addition to scraping tweets. It enables you to take advantage of all of the Twitter API's capabilities.

But there are some drawbacks - like the fact that its standard API only allows you to collect tweets for up to a week (that is, Tweepy does not allow recovery of tweets beyond a week window, so historical data retrieval is not permitted).

Also, there are limits to how many tweets you can retrieve from a user's account. 
"""


"\n- If you are a data enthusiast, you'll likely agree that one of the richest sources of real-world data is social media. Sites like Twitter are full of data.\n- You can use the data you can get from social media in a number of ways, like sentiment analysis (analyzing people's thoughts) on a specific issue or field of interest.\n\n- There are several ways you can scrape (or gather) data from Twitter. And in this project, we will look at using Tweepy to scrape Twitter.\n\n- We will scrape public conversations from people on a specific trending topic, as well as tweets from a particular user.\n\nNow without further ado, let's get started.\n\nTweepy:\nIntroduction to Our Scraping Tool\nTweepy is a Python library for integrating with the Twitter API. Because Tweepy is connected with the Twitter API, you can perform complex queries in addition to scraping tweets. It enables you to take advantage of all of the Twitter API's capabilities.\n\nBut there are some drawbacks - like the fact that 

# How to Use Tweepy to Scrape Tweets:
- Before we begin using Tweepy, we must first make sure that our Twitter credentials are ready. With that, we can connect Tweepy to our API key and begin scraping.

- If you do not have Twitter credentials, you can register for a Twitter developer account. You will be asked some basic questions about how you intend to use the Twitter API. After that, you can begin the implementation.

- The first step is to install the Tweepy library on your local machine, which you can do by typing:pip install git+https://github.com/tweepy/tweepy.git

# How to Scrape Tweets from a User on Twitter
- Now that we’ve installed the Tweepy library, let’s scrape 100 tweets from a user called  "@PythonJobsFeed" on Twitter. We'll look at the full code implementation that will let us do this:

In [11]:
import tweepy
import time
import pandas as pd

consumer_key = "xxxxxx" #Your API/Consumer key 
consumer_secret = "xxxxxxxxx" #Your API/Consumer Secret Key
access_token = "xxxxxxx"    #Your Access token key
access_token_secret = "xxxxxxx" #Your Access token Secret key

#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
    consumer_key, consumer_secret,
    access_token, access_token_secret
)

#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)

"""
- In the above code, we've imported the Tweepy library into our code, then we've created some variables where we store our Twitter credentials (The Tweepy authentication handler requires four of our Twitter credentials). 

- So we then pass in those variable into the Tweepy authentication handler and save them into another variable.

- Then the last statement of call is where we instantiated the Tweepy API and passed in the require parameters.
"""


username = "PythonJobsFeed"
no_of_tweets =100


try:
    #The number of tweets we want to retrieved from the user
    tweets = api.user_timeline(screen_name=username, count=no_of_tweets)
    
    #Pulling Some attributes from the tweet
    attributes_container = [[tweet.created_at, tweet.favorite_count,tweet.source,  tweet.text] for tweet in tweets]

    #Creation of column list to rename the columns in the dataframe
    columns = ["Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
    
    #Creation of Dataframe
    tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
    print('Status Failed On,',str(e))
    time.sleep(3)
"""
In the above code, we created the name of the user (the @name in Twitter) we want to retrieved the tweets from and also the number of tweets. We then created an exception handler to help us catch errors in a more effective way.

After that, the api.user_timeline() returns a collection of the most recent tweets posted by the user we picked in the screen_name parameter and the number of tweets you want to retrieve.

In the next line of code, we passed in some attributes we want to retrieve from each tweet and saved them into a list. To see more attributes you can retrieve from a tweet, read this.

In the last chunk of code we created a dataframe and passed in the list we created along with the names of the column we created.

Note that the column names must be in the sequence of how you passed them into the attributes container (that is, how you passed those attributes in a list when you were retrieving the attributes from the tweet).

If you correctly followed the steps I described, you should have something like this:
"""


Status Failed On, 403 Forbidden
453 - You currently have access to a subset of Twitter API v2 endpoints and limited v1.1 endpoints (e.g. media post, oauth) only. If you need access to this endpoint, you may need a different access level. You can learn more here: https://developer.twitter.com/en/portal/product


'\nIn the above code, we created the name of the user (the @name in Twitter) we want to retrieved the tweets from and also the number of tweets. We then created an exception handler to help us catch errors in a more effective way.\n\nAfter that, the api.user_timeline() returns a collection of the most recent tweets posted by the user we picked in the screen_name parameter and the number of tweets you want to retrieve.\n\nIn the next line of code, we passed in some attributes we want to retrieve from each tweet and saved them into a list. To see more attributes you can retrieve from a tweet, read this.\n\nIn the last chunk of code we created a dataframe and passed in the list we created along with the names of the column we created.\n\nNote that the column names must be in the sequence of how you passed them into the attributes container (that is, how you passed those attributes in a list when you were retrieving the attributes from the tweet).\n\nIf you correctly followed the steps I d

In [12]:
print(tweets_df.head(20))


NameError: name 'tweets_df' is not defined