# Interacting with the Twitter API

Twitter can be used as a data source for various data science projects, including Geo-spatial analysis (where are users tweeting about certain subjects?) and sentiment analysis (how do users feel about certain subjects?).

In this exercise we will learn how to stream real-time Twitter data. We will practice storing it in a dataframe to get some visualizations, as well as storing the data in a SQLite database, and building a web-app using Streamlit. Let's enumerate the tasks needed by dividing it into 3 areas:

1. Database set-up: This can be done directly in the RDBMS of your choice, however we choose to use SQLite in this project.

2. Tweepy: Credentials are required to interact with the Tweepy API. Once these have been obtained from dev.twitter.com we can set up a stream with keyword filters.

3. Streamlit: Once we have our data stream working, we’ll need to set up our web app using Streamlit. This is surprisingly simple and can be done within a single python file!


Tweepy is a Python library to access the Twitter API. You’ll need to set up a twitter application at dev.twitter.com to attain a set of authentication keys to use with the API. Streaming with Tweepy comprises of three objects; Stream, StreamListener, OAuthHandler. The latter simply handles API authentication and requires the unique keys from the creation of your Twitter app. As Tweepy has been updated last year, in order to avoid version conflicts and find more documentation, the streamlistener code will be provided to work with version v3.10.0 of Tweepy. In case you decide to use Tweepy v4.0.0 make sure to find the correct code for the StreamListener.

**Good practice to include in your project**

-Keep secrets and configuration out of version control**

You really don't want to leak your Twitter secret key or database username and password on Github. Here's one way to do this, by storing your secrets and config variables in a special file (You learned it in the Cookiecutter template)

Create a .env file in the project root folder. Thanks to the .gitignore, this file should never get committed into the version control repository. Here's an example:

```py
# example .env file
DATABASE_URL=postgres://username:password@localhost:5432/dbname
AWS_ACCESS_KEY=myaccesskey
AWS_SECRET_ACCESS_KEY=mysecretkey
OTHER_VARIABLE=something
```

-Using a package to load these variables automatically

There is a package called python-dotenv to load up all the entries in this file as environment variables so they are accessible with os.environ.get. Here's an example snippet adapted from the python-dotenv documentation applied in the cookiecutter data science template:

```py
# src/data/dotenv_example.py
import os
from dotenv import load_dotenv, find_dotenv

# find .env automatically by walking up directories until it's found
dotenv_path = find_dotenv()

# load up the entries as environment variables
load_dotenv(dotenv_path)

database_url = os.environ.get("DATABASE_URL")
other_variable = os.environ.get("OTHER_VARIABLE")
```

## Part I: Storing in a dataframe

### Authentication

In [None]:
# Import package
import tweepy,json

# Store OAuth authentication credentials in relevant variables

access_token = ACCESS_TOKEN
access_token_secret = ACCESS_TOKEN_SECRET
consumer_key = CONSUMER_KEY
consumer_secret = CONSUMER_SECRET

# Pass OAuth details to tweepy's OAuth handler

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

### Streaming Tweets

In [None]:
from stream import TweetListener

# Initialize Stream listener
l = TweetListener()

# Create your Stream object with authentication
stream = tweepy.Stream(auth, l)

# Filter Twitter Streams to capture data by the keywords:
stream.filter(['russia', 'ukraine'])

### Exploring the data

In [None]:
from stream import TweetListener

# Initialize Stream listener
l = TweetListener()

# Create your Stream object with authentication
stream = tweepy.Stream(auth, l)

# Filter Twitter Streams to capture data by the keywords:
stream.filter(['russia', 'ukraine'])

### Building a dataframe with our Twitter data

In [None]:
# Import package
import pandas as pd

# Build DataFrame of tweet texts and languages
df = pd.DataFrame(tweets_data, columns=['text', 'lang'])

# Print head of DataFrame
print(df.head())

### Analizing some text

In [None]:
# Initialize list to store tweet counts
[russia, ukraine] = [0, 0, 0, 0]

# Iterate through df, counting the number of tweets in which
# each candidate is mentioned
for index, row in df.iterrows():
    russia += word_in_text('russia', row['text'])
    ukraine += word_in_text('ukraine', row['text'])

### Visualizing the data

In [None]:
# Import packages
import matplotlib.pyplot as plt
import seaborn as sns

# Set seaborn style
sns.set(color_codes=True)

# Create a list of labels:cd
cd = ['russia', 'ukraine']

# Plot the bar chart
ax = sns.barplot(cd, [russia, ukraine])
ax.set(ylabel="count")
plt.show()

## Part II: Storing in a SQLite database

The task here will be to apply your knowledge on creating a SQLite database and try reproduce the Streamlit app created in the following example:

https://github.com/jonathanreadshaw/streamlit-twitter-stream/blob/master/stream.py

Some explanation on the repo has been provided here: https://towardsdatascience.com/tracking-the-race-for-10-downing-street-live-tweet-dashboard-using-tweepy-mysql-and-streamlit-6084e88b4dd8#:~:text=The%20StreamListener%20class%20is%20used%20to%20define%20how,a%20new%20tweet%20is%20present%20in%20the%20stream.

**1. First let’s create our SQLite database.**

Creation of the table and database transactions will be handled using SQLAlchemy. This is a Python library most commonly used as an Object Relational Mapper (ORM), handling the communication between our Python code and the database. SQLAlchemy allows database tables to be represented as Python classes and the use of functions to automatically execute SQL statements.

The class below represents the table we will use to store tweets:

In [None]:
#models.py

from sqlalchemy import Column, Integer, String, DateTime, Boolean, Float
from database import Base


class Tweet(Base):
    __tablename__ = 'tweets'
    id = Column(Integer, primary_key=True)
    body = Column(String(1000), nullable=False)
    keyword = Column(String(256), nullable=False)
    tweet_date = Column(DateTime, nullable=False)
    location = Column(String(100))
    verified_user = Column(Boolean)
    followers = Column(Integer)
    sentiment = Column(Float)

    def __init__(self, body, keyword, tweet_date, location, verified_user, followers, sentiment):
        self.body = body
        self.keyword = keyword
        self.tweet_date = tweet_date
        self.location = location
        self.verified_user = verified_user
        self.followers = followers
        self.sentiment = sentiment

    def __repr__(self):
        return '<Tweet %r>' % self.body

In a separate file database.py we define various variables that are required to perform our database operations using SQLAlchemy:

Make sure to include the code to create the engine to your database.

In [None]:
#database.py

from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from contextlib import contextmanager

from config import DBConfig


# CREATE ENGINE

#YOUR CODE HERE


Session = scoped_session(sessionmaker(autocommit=False, bind=engine))
Base = declarative_base()

@contextmanager
def session_scope():
    session = Session()
    try:
        yield session
        session.commit()
    except:
        session.rollback()
        raise
    finally:
        session.close()


def init_db():
    Base.metadata.create_all(bind=engine)

-session_scope(): this function uses the context managed decorator to provide a SQLAlchemy session object to perform transactions. The decorator allows for use within the with…as… syntax, and will commit or rollback depending on success, before closing the session.

-init_db(): this function will create the tables defined by our SQLAlchemy models if they don’t exist.

For more details you can go to the SQLAlchemy documentation.

**2. Tweepy Stream**

The StreamListener class is used to define how each incoming tweet should be handled. Again, make sure to see the explanation of each part of the StreamListener class in the example repo, here: https://towardsdatascience.com/tracking-the-race-for-10-downing-street-live-tweet-dashboard-using-tweepy-mysql-and-streamlit-6084e88b4dd8#:~:text=The%20StreamListener%20class%20is%20used%20to%20define%20how,a%20new%20tweet%20is%20present%20in%20the%20stream.

## Part III: Creating a Streamlit app

For your Streamlit web app you can take as a guide the following repo code: https://github.com/jonathanreadshaw/streamlit-twitter-stream/blob/master/stream.py

Some ideas to include in your app (that have been included in the example repo):

-Influential Tweets: Tweets from users with the largest number of followers.

-Recent Tweets: The most recent tweets for the keywords in question.

-Hourly/Daily volume by keyword

-Hourly/Daily sentiment by keyword

-Location of tweets

These metrics/visualisations will allow users to keep track of trends in near real-time. We make use of the following streamlit features:

-Caching: this is a key feature of streamlit. We can cache computationally expensive operation simply using the @st.cache decorator. These can be configured to expire after a set time (e.g. for loading fresh data)

-Widgets: streamlit makes it easy to add interactivity to your app with buttons, drop downs etc.

Source: 

https://drivendata.github.io/cookiecutter-data-science/

Repo example explanation: https://towardsdatascience.com/tracking-the-race-for-10-downing-street-live-tweet-dashboard-using-tweepy-mysql-and-streamlit-6084e88b4dd8#:~:text=The%20StreamListener%20class%20is%20used%20to%20define%20how,a%20new%20tweet%20is%20present%20in%20the%20stream.

Integrating SQLite with SQLAlchemy: https://realpython.com/python-sqlite-sqlalchemy/

Example repo: https://github.com/jonathanreadshaw/streamlit-twitter-stream/blob/master/stream.py