This project scrapped data using twitter API and analyzed data using sentiment analysis and machine learning.
First of all, let's look at the web-scrapping part:
import necessary packages critical to web-scrapping:

In [None]:
import csv 
from twython import Twython 
import pandas as pd

Twitter API requires consumer key and token to legally get the data, however, it only allows individual account to scrap at most 200 tweets each day and it only gives you the tweets during the last 7 days.

In [2]:
consumer_key='' 
consumer_secret='' 
access_token_key='' 
access_token_secret='' 
twitter = Twython(consumer_key,consumer_secret, access_token_key, access_token_secret)

The main part of scrapping is made into a class with different defintions:
1. Tell twitter that we want 200 tweets from these specific users for whatever timeline of these twitter users.
2. Write files into csv.
3. Filter tweets to the ones with keywords "vaccine" etc..., which are the ones we want from these accounts. In the meantime, get these filtered tweets' information such as favorite count, text of tweets, user_follower counts, etc..
4. Get these senders' profile information.
5. Call for full tweets from specified ids with these keywords.

In [3]:
class web_scrapping twitter:
    def get_user_timeline():
        """
        Get all the tweets from the user list timeline
        """
        users = ["HelenBranswell", "CMSRIResearch", "EsotericExposal", "vaccinationmyth", "STHGibbs"]
        results = []
        for user in users:
            try:
                this_timeline = twitter.get_user_timeline(screen_name=user, count=200)
                results.extend(this_timeline)
            except Exception:
                print('There was an exception on: '+user)
                continue
        return results
    def to_csv(df, file_name):
        """
        Write file to csv 
        """
        writer = pd.CsvWriter(file_name)
        df.to_csv(writer,'Sheet1', index=False)
        writer.save()
    def filter_tweets(results):
        """
        Filter out tweets to only contain the following key words, no RT, no duplicate tweets, 
        and only contains the following fields. 
        """
        data = []
        tweet_set = set()
        keywords = ['vaccine', 'vaccines', 'vaccination','vaccinate', 'vaccinations']
        for tweet in results:
            try:
                if is_valid(tweet, keywords):
                    row  = {
                    'id': tweet['id'],
                    'text': str(tweet['text']).encode('utf-8'),
                    'retweet_count': tweet['retweet_count'], 
                    'timestamp' :tweet['created_at'], 
                    'media': [], 
                    'favorite_count': tweet['favorite_count'], 
                    'user_follower_count': tweet['user']['followers_count'],
                    'user_id': tweet['user']['id'], 
                    'user_screen_name' : str(tweet['user']['screen_name']).encode('utf-8'),
                    'url' : [],
                    'hash_tags': []
                    }
                    data.append(row)
                tweet_set.add(tweet['id'])
            except Exception as e:
                print(e)
        return data
    def is_valid(tweet, keywords):
        text = str(tweet['text'])
        return 'RT' not in text and any([k in text for k in keywords])
    def get_user_information():
        """
        Get the profile of the user 
        """
        results = {}
        users = ["HelenBranswell", "CMSRIResearch", "EsotericExposal", "vaccinationmyth", "STHGibbs"]
        for user in users:
            results['user'] = twitter.show_user(screen_name=user)
        return results
    def get_full_tweet(df):
        """
        Get information of a list of tweet ids
        """
        r = []
        for i, tweet_id in enumerate(df['id']):
            try:
                tweet = twitter.show_status(id=str(tweet_id), tweet_mode='extended')
                df.loc[i,'text'] = tweet['full_text']
                r.append(tweet)
            except Exception as e:
                print(tweet_id, e)
        return r

Get the results according to class functions.
Get user first and scrap tweets with keywords from these users and get all these filtered tweets and define it as a dataframe.

In [9]:
results = get_user_timeline()
results2 = filter_tweets(results)
df = pd.DataFrame(results2)
tweets = get_full_tweet(df)

print the scrapped data to see what does it look like

In [10]:
print ( df )

     favorite_count hash_tags                   id media  retweet_count  \
0                 2        []  1005899221852147712    []              0   
1                 9        []  1005853124861616128    []             12   
2                15        []  1005493698489208833    []             51   
3                51        []  1005481133507792896    []             36   
4                 0        []  1005209205337415681    []              0   
5                14        []  1005172926713384961    []             14   
6                 0        []  1005170203909554180    []              0   
7                 5        []  1005113336130830336    []              9   
8                 3        []  1005051983634550786    []             14   
9                 2        []  1004870459782025217    []              0   
10               11        []  1004484833370877952    []              4   
11                3        []  1004113140613832704    []              0   
12                1      

Save it to csv file

In [11]:
df.to_csv('result_1.csv')