# Media Rating by Retweet and Quote Count

This notebook is meant to follow [Evaluating Content](./Evaluating_Content.ipynb).

The SQLite3 database created in that notebook will be access 

## Objectives

Create big data visualizations using Pandas and Seaborn packages. Interact with data from an SQLite3 database using Pandas.

### Learning Goals
- Use Pandas to extract SQLite3 database data.
- Become familiar with Pandas Dataframes.
- Utilize Seaborn package to create visualizations.

## Requirements

**None**



## Prepare the environment
- Load Packages
- Create a copy of the original database
- Open a connection.

In [1]:
# Load Packages

# Enable Matplotlib Juupyter Widget Backend
%matplotlib widget
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import ListedColormap
from shutil import copyfile

In [2]:
# Copy Database
DB_FILE = "./pandasRRtweets.db"
copyfile(".tweetsRickyRenuncia-final.txt.db", DB_FILE)

'./pandasRRtweets.db'

In [3]:
# Connect to database copy
connection = sqlite3.connect(DB_FILE)

## **Access Database Query**

Pandas `pd` includes a method called `pd.read_sql_query` that given an SQL query and a database connection will generate a Dataframe.

Lets try it getting the table names from the database copy using the `connection` and this query 
```
SELECT name 
FROM sqlite_master 
WHERE type='table';
```

In [4]:
tables = pd.read_sql_query(
    """SELECT name, sql
    FROM sqlite_master 
    WHERE type='table';""",
    connection
)
tables.head(15)

Unnamed: 0,name,sql
0,tweet,"CREATE TABLE tweet (tweet_id TEXT, state INTEG..."
1,tweet_detail,CREATE TABLE tweet_detail (\n tweet...
2,tweet_traduction,CREATE TABLE tweet_traduction (\n t...
3,tweet_user_detail,CREATE TABLE tweet_user_detail (\n ...
4,tweet_auto_detail,CREATE TABLE tweet_auto_detail (\n ...
5,tweet_user,CREATE TABLE tweet_user (\n user_id...
6,tweet_match_media,CREATE TABLE tweet_match_media(\n t...
7,tweet_media,CREATE TABLE tweet_media (\n media_...
8,db_update,CREATE TABLE db_update (\n version ...
9,tweet_slang,CREATE TABLE tweet_slang (\n tweet_...


A complete table can be retrieved by using a simmilar method `pd.read_sql_table` that takes a table name and the connection.

In [39]:
auto_detail = pd.read_sql_query("SELECT * FROM tweet_auto_detail;", connection)

auto_detail.loc[ auto_detail.has_media == 0, "has_media_label"] = "Media"
auto_detail.loc[ auto_detail.has_media == 1, "has_media_label"] = "No Media"

# auto_detail["datePublished"]=pd.to_datetime(auto_detail["datePublished"], unit='s', origin='unix')
# auto_detail["datePublished"].dt.tz_localize('UTC').dt.tz_convert('America/Puerto_Rico')

auto_detail.head(5)

Unnamed: 0,tweet_id,isBasedOn,identifier,url,dateCreated,datePublished,user_id,has_media,language,retweetCount,quoteCount,text,favoriteCount,has_media_label
0,1151635675713855490,,https://twitter.com/32619518/status/1151635675713855490,https://twitter.com/any_user/status/1151635675713855490,1619755000.0,1563406000.0,32619518,0,es,606,0,"ES LA CORRUPCIÓN GENTE - El CPI lanza una bomba de historia que incluye elementos de la investigación federal q de ser cierto lo q esto dice, Elías Sánchez y otros montaron una organización para agenciarse dinero de forma abrumadora. #rickyrenuncia https://t.co/lH6cNQceif",897,Media
1,1150860690971877377,,https://twitter.com/72967102/status/1150860690971877377,https://twitter.com/any_user/status/1150860690971877377,1619755000.0,1563222000.0,72967102,0,es,440,0,"Molusco al que tanto hemos criticado ya llegó a la protesta! Bad Bunny y Residente también dijeron que van. Ednita exhortó a que se tiraran a las calles. ESTAMOS VIVIENDO HISTORIA CABRONES, estamos viendo un gobierno caer en Puerto Rico por 1ra vez en la historia. \n#RickyRenuncia",819,Media
2,1150888859254820864,,https://twitter.com/1040158665158864897/status/1150888859254820864,https://twitter.com/any_user/status/1150888859254820864,1619755000.0,1563228000.0,1040158665158864897,1,es,1742,0,“Lárgate para el carajo pa’ la China o pa’ el Japón” 😂😂😂\n\n#RickyRenuncia https://t.co/anixxn5PUO,2249,No Media
3,1151502333420998656,,https://twitter.com/1665552775/status/1151502333420998656,https://twitter.com/any_user/status/1151502333420998656,1619755000.0,1563375000.0,1665552775,0,,14,0,#RickyVeteYa #RickyVeteYa #RickyVeteYa #RickyVeteYa #RickyVeteYa #RickyVeteYa #RickyVeteYa #RickyVeteYa #RickyVeteYa #RickyVeteYa,9,Media
4,1151307556406431744,1.1512370765845832e+18,https://twitter.com/50846903/status/1151307556406431744,https://twitter.com/any_user/status/1151307556406431744,1619755000.0,1563328000.0,50846903,1,es,0,0,"""Ummmmm.... ehhhhh ....uhhhhh .... uhhhhh"" dicho por Ricky compone al menos 2 min de este video. #RickyRenunciaYa #RickyRenuncia https://t.co/OW2iNW8j0t",7,No Media


In [40]:
auto_detail["datePublished"]

0       1.563406e+09
1       1.563222e+09
2       1.563228e+09
3       1.563375e+09
4       1.563328e+09
            ...     
2539    1.563233e+09
2540    1.563399e+09
2541    1.563414e+09
2542    1.563386e+09
2543    1.563247e+09
Name: datePublished, Length: 2544, dtype: float64

In [41]:
count_by_lang = auto_detail.rename(columns = {"language":"Language"}, inplace = False)[["Language", "tweet_id"]].groupby("Language").count()
count_by_lang=count_by_lang.rename(
    columns = {"tweet_id":"Total"}, inplace = False)
count_by_lang

Unnamed: 0_level_0,Total
Language,Unnamed: 1_level_1
ca,2
cy,2
de,1
en,344
es,1688
et,1
eu,2
fr,3
ht,3
in,5


In [42]:
auto_detail[["has_media", "tweet_id"]].groupby("has_media").count()

Unnamed: 0_level_0,tweet_id
has_media,Unnamed: 1_level_1
0,1363
1,1181


In [43]:
# with_media = 
pd.set_option('display.max_colwidth', None)
def prettyPrintDataFrame(df, url_columns=["url"]):
    from IPython.display import HTML
    tmp_df = df.copy()
    for column_name in url_columns:
        tmp_df[column_name] = tmp_df[column_name].apply(lambda x:'<a href="{0}">{0}</a>'.format(x))
    return HTML(tmp_df.to_html(escape=False))
        

In [44]:
# Get Top 30 Tweets with Media by Retweet Count
top_30_retweet = auto_detail.loc[auto_detail["has_media"]==1].sort_values("retweetCount", ascending=False).head(30)[["retweetCount", "tweet_id", "url"]]
prettyPrintDataFrame(top_30_retweet)

Unnamed: 0,retweetCount,tweet_id,url
1130,12255,1151319129455906816,https://twitter.com/any_user/status/1151319129455906816
1349,6992,1150989830639235073,https://twitter.com/any_user/status/1150989830639235073
1025,6587,1151568699838685185,https://twitter.com/any_user/status/1151568699838685185
771,5649,1151652566138261504,https://twitter.com/any_user/status/1151652566138261504
1775,4939,1151229730504486912,https://twitter.com/any_user/status/1151229730504486912
838,4431,1150943366215024640,https://twitter.com/any_user/status/1150943366215024640
135,4336,1138785914757533696,https://twitter.com/any_user/status/1138785914757533696
1146,4292,1151992842178269185,https://twitter.com/any_user/status/1151992842178269185
1113,3669,1151579864027344897,https://twitter.com/any_user/status/1151579864027344897
1506,3527,1151182753922211843,https://twitter.com/any_user/status/1151182753922211843


In [45]:
# Get Top 30 Tweets with Media by Favorite(❤️) Count
top_30_favorite = auto_detail.loc[auto_detail["has_media"]==1].sort_values("favoriteCount", ascending=False).head(30)[["favoriteCount", "tweet_id", "url"]]
prettyPrintDataFrame(top_30_favorite)

Unnamed: 0,favoriteCount,tweet_id,url
1130,42931,1151319129455906816,https://twitter.com/any_user/status/1151319129455906816
1025,14453,1151568699838685185,https://twitter.com/any_user/status/1151568699838685185
1775,14198,1151229730504486912,https://twitter.com/any_user/status/1151229730504486912
771,12066,1151652566138261504,https://twitter.com/any_user/status/1151652566138261504
838,11292,1150943366215024640,https://twitter.com/any_user/status/1150943366215024640
1146,11154,1151992842178269185,https://twitter.com/any_user/status/1151992842178269185
211,10579,1151588571545001991,https://twitter.com/any_user/status/1151588571545001991
61,9943,1151987508760072192,https://twitter.com/any_user/status/1151987508760072192
1113,9555,1151579864027344897,https://twitter.com/any_user/status/1151579864027344897
1349,8682,1150989830639235073,https://twitter.com/any_user/status/1150989830639235073


In [46]:
# Graph Date vs
# auto_detail.datePublished.apply(lambda dt: int(str(dt.year)+ str(dt.month) + str(dt.day) +str(dt.hour) + str(dt.minute)))
# auto_detail.datePublished.apply(lambda dt: "{:02}{:02}{:02}".format(dt.day, dt.hour, dt.minute))

AttributeError: 'float' object has no attribute 'day'

In [55]:
import matplotlib.dates as mdates
# fig = plt.figure()
# ax=fig.add_subplot(projection="3d")
# ax=fig.add_subplot()

# fig = plt.figure(projection="3d", figsize=(8,6))
fig2 = plt.figure(figsize=(8,6))
ax = Axes3D(fig2, auto_add_to_figure=False)
fig2.add_axes(ax)

# get colormap from seaborn
cmap = ListedColormap(sns.color_palette("husl", 2).as_hex())
dates = [ pd.to_datetime(date, format='%Y-%m-%d') for date in auto_detail["datePublished"] ]
# print(dates)
# plot
sc = ax.scatter(
    auto_detail["retweetCount"], # X
    auto_detail["favoriteCount"], # Y
    auto_detail["datePublished"], # Z
#     dates,
#     auto_detail.datePublished.apply(lambda dt: int("{:02}{:02}{:02}".format(dt.day, dt.hour, dt.minute))),
    s=40,
    c=auto_detail[["has_media"]],
    cmap=cmap,
    alpha=1,
    marker="o"
)
ax.set_xlabel("Retweet Count")
ax.set_ylabel("Favorite Count")
ax.set_zlabel("Date Published")
ax.zaxis.set_major_formatter(mdates.DayLocator(interval=1))
# ax.zaxis.set_major_formatter(mdates.DateFormatter('%d'))

# Legend
plt.legend(*sc.legend_elements(), bbox_to_anchor=(1., 1), loc=2)
# graph = sns.scatterplot(data=auto_detail, size="datePublished", y="favoriteCount", hue="has_media", x="retweetCount")
# sns.scatterplot(data=auto_detail, x="datePublished", y="favoriteCount")


# graph=sns.FacetGrid(auto_detail,col="has_media", hue="has_media")
# graph.map(sns.scatterplot, "datePublished","retweetCount",size="favoriteCount")
# graph.map(sns.histplot, "retweetCount")
# graph.set_yscale("log")
# graph.add_legend()

plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

TypeError: 'formatter' must be an instance of matplotlib.ticker.Formatter, not a matplotlib.dates.DayLocator

In [38]:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
# matplotlib.interactive(True)
sns.set(style="darkgrid")

fig = plt.figure()
ax = fig.add_subplot(111, projection = '3d')

x= auto_detail["datePublished"]
y=auto_detail["favoriteCount"]
z=auto_detail["retweetCount"]
hue=auto_detail["has_media"]

ax.scatter(x,y,z)
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

TypeError: The DTypes <class 'numpy.dtype[int64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.

In [53]:
date_retweet_media = auto_detail[["datePublished", "retweetCount", "favoriteCount", "has_media"]]

In [54]:
from tweet_requester.display import TweetInteractiveClassifier as Tweet
from tweet_requester.session import TSess
from twitter_secrets import C_BEARER_TOKEN
tweet_session = TSess(
    C_BEARER_TOKEN, 
    compression_level=5, 
    sleep_time=3, # Minimal sleep between requests to avoid hitting rate limits
    cache_dir="./tweet_cache/", 
    hash_split=True
)
tweet = Tweet(tweet_id="1151319129455906816", session=tweet_session)

In [61]:
for key in tweet.data.keys():
    if "quote" in str(key).lower():
        print(key)
tweet.data.keys()

is_quote_status


dict_keys(['created_at', 'id', 'id_str', 'full_text', 'truncated', 'display_text_range', 'entities', 'extended_entities', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'lang'])