# Extracting Images and Textual Information From Twitter


## Introduction

Here we are using twitter API to extract tweets related with brands of our interest and also we are getting the images associated with the tweet. Every tweet is then passed through google's sentimental analysis API and tweet is classified into positive, negative or neutral. Images are passed through object localization and CNN model to identify logos present in the image. All this information is then collected and visualized in tableau (data visualization tool). 

## Importing Essential Libraries

In [1]:
import tweepy #Library required for Twitter API
import csv, re
import pandas as pd
import os
import wget
import logging

### Authentication keys

Here we are defining keys to authenticate with twitter API and start calling API functions to extract tweets for our analysis.   

You need to register for a Twitter dev account https://developer.twitter.com     

Look at the Twitter data model https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet

Apply for a Twitter Developer Account

Go to the Twitter developer site to apply for a developer account.

Step 2: Create an Application

Twitter grants authentication credentials to apps, not accounts. An app can be any tool or bot that uses the Twitter API. So you need to register your an app to be able to make API calls.

To register your app, go to your Twitter apps page and select the Create an app option.

You need to provide the following information about your app and its purpose:

App name: a name to identify your application (such as examplebot)
Application description: the purpose of your application (such as An example bot for a Real Python article)
Your or your application’s website URL: required, but can be your personal site’s URL since bots don’t need a URL to work
Use of the app: how users will use your app (such as This app is a bot that will automatically respond to users)
Step 3: Create the Authentication Credentials

To create the authentication credentials, go to your Twitter apps page. Here’s what the Apps page looks like:

Edit app details
Here you’ll find the Details button of your app. Clicking this button takes you to the next page, where you can generate the credentials.

By selecting the Keys and tokens tab, you can generate and copy the key, token, and secrets to use them in your code:

Generate keys and tokens
After generating the credentials, save them to later use them in your code.





In [2]:
consumer_key = "hlFX43j87dphdCnFdNUuWz5"
consumer_secret = "q2aE68MxpDS6UfIh0Q9OQMd12T7ci5eT8rGaoJHauEXLPk7"
access_key = "34749634-MLtSPUGnJfYk2PcGaDm3QxZfEPiJrgc4UwjgCu"
access_secret = "HXZymUSEauHQkWaKcS4VTgS5zj7RfUcCg7xU9OUXH"

In [3]:
#Creating an empty dataframe to store the information
tweets =pd.DataFrame(columns=["id","created_at","text","media_url","location"])

In [4]:
tweets.columns

Index(['id', 'created_at', 'text', 'media_url', 'location'], dtype='object')

### Extracting Tweets

We are using tweepy.cursor method to get all the tweets hashtagged with 'dunkin-donuts', also we are only using the tweets that have an image posted along with it. Since images are core part of our application.

In [5]:
import datetime, time
last_week = datetime.date.today() - datetime.timedelta(9)
since_tweets = datetime.datetime.strptime(time.strftime("%Y-%m-%d"), "%Y-%m-%d")
print (since_tweets)

2022-10-22 00:00:00


As we have previously seen, the Twitter API requires that all requests use OAuth to authenticate. So you need to create the required authentication credentials to be able to use the API. These credentials are four text strings:

Consumer key
Consumer secret
Access token
Access secret

In [6]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

try:
    api.verify_credentials()
    print("Authentication OK")
except:
    print("Error during authentication")

Authentication OK


In [7]:
timeline = api.home_timeline()
for tweet in timeline:
    print(f"{tweet.user.name} said {tweet.text}")

Larry Sharpe said Had dinner and talked about business, sports and the future of New York with potential supporters at Mo's Bar and G… https://t.co/w7FgGv8Q6R
Vermin Supreme (TM) said RT @wonderofscience: Fascinating footage of a human white blood cell chasing a bacterium captured through a microscope. Credit: David Roger…
WikiLeaks said Australia's Journalists Union: "The decision to uphold #Assange extradition imperils journalism everywhere"… https://t.co/37IukQWDE2
Tom Fitton said Biden schooled for equating PPP loans with student loan handout to bash GOP: ‘Policy-illiterate talking point'… https://t.co/t4PEucYDTm
Rolling Stone said British-Armenian producer Hagop Tchaparian has been documenting his life, travels, and family history through field… https://t.co/t9hCGrAaVz
Cato Institute said RT @HumanProgress: Starlink receivers are highly mobile, weighing just 15 lbs, and the cost of smuggling them into Iran would be relatively…
Northeastern Men’s Hockey said Attaboy Ritzy 🚨

#Howli

In [8]:
user = api.get_user(screen_name="NikBearBrown")
print("Most recent tweet: " + user.status.text)

Most recent tweet: I've always thought spiders shouldn't drink coffee. https://t.co/tXgEi8Dxo6


In [9]:
print("User details:")
print(user.name)
print(user.description)
print(user.location)

print("Last 20 Followers:")
for follower in user.followers():
    print(follower.name)

User details:
Nik Bear Brown
Northeastern Engineering Associate Teaching Professor - Aspiring TikTok Dancer
Pursue it. Relentlessly. #Nerdlife
Boston
Last 20 Followers:
Myah Rowold
Skyler Scrimsher
Rory Yerka
Milly Muir
Les Yetton
Isabelle Krinke
Ora Raheem
Esme-rose Mccaslin
Debbra Wadas
Ella-may Virzi
Vilma Rathgeb
Madilyn Rueckert
Nel Laskin
Murron Heinzelman
Fallon Poss
Loraine Gibbens
Teresa Adjutant
Myrta Kates
Almeda Linquist
Dmytro Iakubovskyi


In [10]:
import datetime, time
now = datetime.date.today()
date_since = datetime.datetime.strptime(time.strftime("%Y-%m-%d"), "%Y-%m-%d")
print (date_since)
keywords=['KimKardashian', 'kardashian']
print (' '.join(keywords))
num_tweets=5
print ("num_tweets ", num_tweets)

2022-10-22 00:00:00
KimKardashian kardashian
num_tweets  5


In [11]:
new_search = "kim+kardashian -filter:retweets"

In [12]:
# https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets
# https://docs.tweepy.org/en/stable/api.html#search-tweets
tweets = tweepy.Cursor(api.search_tweets,
              q=new_search,
              lang="en",
              until=date_since).items(num_tweets)

In [13]:
cnt=0
tweets_data = [] #initialize master list to hold our ready tweets
for tweet in tweets:
    print(cnt)    
    print(tweet)
    cnt=cnt+1

0
Status(_api=None, _json={'created_at': 'Fri Oct 21 15:21:39 +0000 2022', 'id': 1583478629308456962, 'id_str': '1583478629308456962', 'text': '@YouTube &amp; other online "influencers" are to the past decade what Kim Kardashian, Paris Hilton &amp; other brainless ta… https://t.co/ExvVAPH2lm', 'truncated': True, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'YouTube', 'name': 'YouTube', 'id': 10228272, 'id_str': '10228272', 'indices': [0, 8]}], 'urls': [{'url': 'https://t.co/ExvVAPH2lm', 'expanded_url': 'https://twitter.com/i/web/status/1583478629308456962', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [125, 148]}]}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': 10228272, 'in_reply_to_user_id_str': '10228272', 'in_reply_to_screen_name': 'YouTube', 'use

In [14]:
# https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets
# https://docs.tweepy.org/en/stable/api.html#search-tweets
tweets = tweepy.Cursor(api.search_tweets,
              q=new_search,
              lang="en",
              until=date_since).items(num_tweets)

In [15]:
tweets

<tweepy.cursor.ItemIterator at 0x7fdf90abf610>

In [16]:
cnt=0
tweets_data = [] #initialize master list to hold our ready tweets
for tweet in tweets:
    print(cnt)    
    tweets_data.append([tweet.id_str,tweet.created_at,tweet.text.encode("utf-8"),tweet.user.location])    
    cnt=cnt+1

0
1
2
3


In [17]:
tweets_data

[['1583478629308456962',
  datetime.datetime(2022, 10, 21, 15, 21, 39, tzinfo=datetime.timezone.utc),
  b'@YouTube &amp; other online "influencers" are to the past decade what Kim Kardashian, Paris Hilton &amp; other brainless ta\xe2\x80\xa6 https://t.co/ExvVAPH2lm',
  'By Bellingham Bay'],
 ['1582951338085859328',
  datetime.datetime(2022, 10, 20, 4, 26, 23, tzinfo=datetime.timezone.utc),
  b'Whatsup about... (Kim Kardashian Gets Candid About Body Image, Breaks Down Her Photo Editingon 20. October 2022 at\xe2\x80\xa6 https://t.co/42WLcziGwB',
  'on Earth! '],
 ['1582053837518651394',
  datetime.datetime(2022, 10, 17, 17, 0, 2, tzinfo=datetime.timezone.utc),
  b'\xf0\x9f\x9b\x91 CAUTION \xf0\x9f\x9b\x91\n\nKim Kardashian Coin ( $KKC / WETH)\nToken contract:\n0x1beE2C11B71f53F71629279823c6ff8AEeF94F1E\n98.90% l\xe2\x80\xa6 https://t.co/eWT5PXIRh8',
  'EtherScan'],
 ['1581735360274522115',
  datetime.datetime(2022, 10, 16, 19, 54, 31, tzinfo=datetime.timezone.utc),
  b'\xf0\x9f\x9b\x91 C

## The following dataframe has been made with all the information related to #DunkinDonuts

In [19]:
tweets_df = pd.DataFrame(tweets_data,columns = ["ID","Created_at","Text","Location"])
tweets_df    

Unnamed: 0,ID,Created_at,Text,Location
0,1583478629308456962,2022-10-21 15:21:39+00:00,"b'@YouTube &amp; other online ""influencers"" ar...",By Bellingham Bay
1,1582951338085859328,2022-10-20 04:26:23+00:00,b'Whatsup about... (Kim Kardashian Gets Candid...,on Earth!
2,1582053837518651394,2022-10-17 17:00:02+00:00,b'\xf0\x9f\x9b\x91 CAUTION \xf0\x9f\x9b\x91\n\...,EtherScan
3,1581735360274522115,2022-10-16 19:54:31+00:00,b'\xf0\x9f\x9b\x91 CAUTION \xf0\x9f\x9b\x91\n\...,EtherScan


## Appending the sentiment score to the exisiting dataframe and writing to a CSV File. Downloading images to a particular folder.

In [20]:
tweets_df.head()

Unnamed: 0,ID,Created_at,Text,Location
0,1583478629308456962,2022-10-21 15:21:39+00:00,"b'@YouTube &amp; other online ""influencers"" ar...",By Bellingham Bay
1,1582951338085859328,2022-10-20 04:26:23+00:00,b'Whatsup about... (Kim Kardashian Gets Candid...,on Earth!
2,1582053837518651394,2022-10-17 17:00:02+00:00,b'\xf0\x9f\x9b\x91 CAUTION \xf0\x9f\x9b\x91\n\...,EtherScan
3,1581735360274522115,2022-10-16 19:54:31+00:00,b'\xf0\x9f\x9b\x91 CAUTION \xf0\x9f\x9b\x91\n\...,EtherScan


In [21]:
outfile=re.sub(r"\s+", '_', new_search)
outfile=outfile+'.csv'
print(outfile)
tweets_df.to_csv(outfile, sep=',', encoding='utf-8')

kim+kardashian_-filter:retweets.csv
