## Disney+ launch in Brazil: analyzing conversations on Twitter

Questions to anwser:

- Most engaged users

- To find peaks and specific events that may be related to them

- Verified ccounts

Important notes about the data:
    
- Twitter API returns data considering UTC timezone. The offical Brasilia timezone is -3 UTC. So data appears here with an addition of 3 hours.
- Twitter API does not return all data from each day, but a mix from popular and recent tweets. 
- Specifications of the query: keyword:"DisneyPlus", language: "pt"

## Opening data

In [7]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import networkx as nx
%matplotlib inline

In [8]:
#Reading data that was already been concatenated and a preprocessed
data = pd.read_pickle(r'C:\Users\amand\Documents\Jupyter Notebooks\DisneyPlus\data_fulltext_pkl\disneyplus_concat_data.pkl')

In [None]:
#Structure of the data
data.shape

## Visualization Options

In [10]:
#Display the tweet full text
pd.set_option('display.max_colwidth', None)

In [11]:
#Show more than one output
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Cleaning

In [9]:
data.info()
#Fill None values with 0 to not interfere into the id columns.
data = data.fillna(0)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66127 entries, 0 to 66126
Data columns (total 19 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   index                       66127 non-null  int64         
 1   user_screen_name            66127 non-null  object        
 2   user_id                     66127 non-null  object        
 3   user_is_verified            66127 non-null  bool          
 4   retweeted_from_screen_name  44275 non-null  object        
 5   retweeted_from_id           66127 non-null  object        
 6   retweeted_from_is_verified  44275 non-null  object        
 7   screen_name_mention_1       51004 non-null  object        
 8   id_mention_1                66127 non-null  object        
 9   in_reply_to_screen_name     66127 non-null  object        
 10  in_reply_to_user_id         66127 non-null  int64         
 11  created_at                  66127 non-null  datetime64

## Subsets

In [None]:
## Data without RTs
mask_no_rt = data['retweeted_from_id'] == 0
data_no_rt = data[mask_no_rt].copy()

In [None]:
data_no_rt.head()
data_no_rt.info()

## Users - Overview

- We can observe that there are different kinds of users in this list: streaming services(globoplay, DisneyPlusBR), e-commerce(MercadoLivre), app store(GooglePlay), guides (guiadisneyplus, disneygobr), fan accounts for the tv show Bia (nessavcb, irmasurquiza), k-pop fans(GIFT4EHYUNG, pjmsmilez, PRINC3JIKK).

- Globoplay and Disney+ made a partership allowing customers to sign up for both of their services for a special price. 
- Considering the number of tweets about it, we can observe that Globoplay invested in interactions with their potential customers.

- The text reveals another partnership, this time with Mercado Livre, an e-commerce platform. 
- The offer included up to 6 months of Disney+ for free for customers with a certain number of points on its loyalty program, called Mercado Pontos.

In [None]:
#Number of users considering RTs
data.user_id.unique().shape

In [None]:
# Number of users, excluding RTs
data_no_rt.user_id.unique().shape

In [None]:
#Most engaged users, excluding RTs
data_no_rt.user_screen_name.value_counts().head(15).to_frame()

In [None]:
#Closer look into why globloplay tweeted about Disney+
mask_globoplay = data_no_rt['user_screen_name'] == 'globoplay'
data_no_rt[mask_globoplay]['full_text'].head()

In [None]:
#Closer look into why Mercado Livre tweeted about Disney+
mask_mercadolivre = data_no_rt['user_screen_name'] == 'MercadoLivre'
data_no_rt[mask_mercadolivre]['full_text'].head()

## Tweets histogram

In [None]:
#Historgram considering all data
px.histogram(data.created_at)

## Analyzing Peaks

- RT percentage
- Comments

### Nov 12: from 17 to 18 pm

- Reactions to the realease of the official date of the new Marvel Series Wanda Division

In [None]:
mask_date = (data['created_at'] > '2020-11-12 17:00:00') & (data['created_at'] <= '2020-11-12 17:59:59')
data[mask_date].sort_values(by='favorite_count', ascending=False).full_text.head()

### Nov 15: from 2 to 3 am

- People were commenting about Disney+ special launching content that was available on several channels such as YouTube 
and the streaming service of brazilian tv channel GloboPlay

In [None]:
mask2 = (data['created_at'] > '2020-11-15 02:00:00') & (data['created_at'] <= '2020-11-15 02:59:59')
data[mask2].sort_values(by='favorite_count', ascending=False).full_text.head()

In [None]:
# Nov 17 from 3 to 4 am

mask3 = (data['created_at'] > '2020-11-17 03:00:00') & (data['created_at'] <= '2020-11-17 03:59:59')
data[mask3].sort_values(by='favorite_count', ascending=False).full_text.head(10)

## The countdown for the app realease. 
## Some fans seemed disappointed because the app was not realeased at midnight.

In [None]:
## Nov 17 from 12 to 13 pm

mask4 = (data['created_at'] > '2020-11-17 12:00:00') & (data['created_at'] <= '2020-11-17 12:59:59')
data[mask4].sort_values(by='favorite_count', ascending=False).full_text.head()

## Official launch: content available

In [None]:
## Nov 17 from 12 to 13 pm

mask5 = (data['created_at'] > '2020-11-17 15:00:00') & (data['created_at'] <= '2020-11-17 15:59:59')
data[mask5].sort_values(by='favorite_count', ascending=False).full_text.head()
## Official launch: content available Part2

In [None]:
## Nov 18 from 0 to 1 am

mask6 = (data['created_at'] > '2020-11-18 00:00:00') & (data['created_at'] <= '2020-11-18 00:59:59')
data[mask6].sort_values(by='favorite_count', ascending=False).full_text.head()

## Official launch: content available

In [None]:
## Nov 18 from 17 to 18 pm

mask7 = (data['created_at'] > '2020-11-18 17:00:00') & (data['created_at'] <= '2020-11-18 17:59:59')
data[mask7].sort_values(by='favorite_count', ascending=False).full_text.head()
## Official launch: content available

## Rall of fame publications

Most Retweeted Posts

- One of the most retweeted publications was about caracteristcs of the old movies tapes

- People promising disney accounts for free

- Asking about others people accounts





In [None]:
#Most retweeted publication
data.sort_values('retweet_count', ascending=False).head(1).retweet_full_text

In [None]:
# Most liked publication 
data.sort_values('favorite_count', ascending=False).head(1).full_text

## Verified accounts

- Verified accounts that tweeted about DisneyPlus

- Some accounts were expected to appear on that list, because they are Disney+ partners: 

    - RedeGlobo, for example, is the company that owns GloboPlay, a sreaming service that made a partnetship with Disney+.
    - Claudia Leitte and Dani Calabresa are brazilian artists who participated on the special show produced for the app launch in Brazil.
    - FoxSportsBrazil and ESPN are part of the Disney Companies.

- We can also see media and journalists specialized on movies/tv.

    - Omelete is a blog specialized on movies and tv.
    - Antonio Tabet is a



In [None]:
verified = data['user_is_verified'] == True

#Verified accounts who tweeted about Disney+ by number of tweets
data[verified]['user_screen_name'].value_counts().head(20).to_frame()

In [None]:
#Most followed verified accounts
data[verified].sort_values(by='user_followers', ascending=False).user_screen_name.unique()[:10]

In [None]:
user_mask = data['user_screen_name'] == 'otaviano'
data[user_mask].full_text

## Interactions - Source Code

In [None]:
# Na verdade o que ela faz aqui e criar uma funcao que 
#para cada ususario existe uma lista de usuarios com os quais ele interagiu.
#Entao depois na hora que itera sobre o dataset, pada cada linha, linha usuario


# Get the interactions between the different users
def get_interactions(row):
    # From every row of the original dataframe
    # First we obtain the 'user_id' and 'screen_name'
    user = row["user_id"], row["user_screen_name"]
    # Be careful if there is no user id
    if user[0] is None:
        return (None, None), []
    
    # The interactions are going to be a set of tuples
    interactions = set()
    
    # Add all interactions 
    # First, we add the interactions corresponding to replies adding the id and screen_name
    interactions.add((row["in_reply_to_user_id"], row["in_reply_to_screen_name"]))
    # After that, we add the interactions with retweets
    interactions.add((row["retweeted_from_id"], row["retweeted_from_screen_name"]))
    # And later, the interactions with user mentions
    interactions.add((row["id_mention_1"], row["screen_name_mention_1"]))
    
    # Discard if user id is in interactions
    interactions.discard((row["user_id"], row["user_screen_name"]))
    # Discard all not existing values
    interactions.discard((None, None))
    # Return user and interactions
    return user, interactions

## Get nodes

In [12]:
subset= data.head(50).copy()

In [13]:
interactions = []

for index, row in subset.iterrows():
    user_sn = row['user_screen_name']
    retweet_from = row['retweeted_from_screen_name']
    reply_to = row['in_reply_to_screen_name']
    mention = row['screen_name_mention_1']
    

    interactions.append((user_sn, retweet_from))
    interactions.append((user_sn, reply_to))
    interactions.append((user_sn, mention))

In [15]:
interactions_cleaned = [x for x in interactions if 0 not in x]

In [16]:
interactions_set = set(interactions_cleaned)

## Graph- test

G.add_node(1)
G.clear()
G.add_nodes_from(interactions_set)
G.add_edges_from(interactions_set)
from matplotlib.pyplot import figure
figure(figsize=(20, 20))
nx.draw_networkx(G, with_labels=True)
nx.draw_networkx(G)
options = {
    "node_color": "#A0CBE2",
    "edge_color": colors,
    "width": 4,
    "edge_cmap": plt.cm.Blues,
    "with_labels": False,
}

- G = nx.Graph()
#### Se eu adicionar apenas os edges, ele cria automaticamente os nodes.
- G.add_edges_from(interactions_set)
- fig2 = plt.figure(figsize=(20,20))
- nx.draw_networkx(G, with_labels=True);

## To Gephi

In [None]:
##nx.write_gexf(F, "disneyplus_direct.gexf")

In [None]:
##nx.write_graphml(F, "disneyplus_3.graphml")

## Directed Graph

In [None]:
H = nx.DiGraph()

In [None]:
H.add_edges_from(interactions_set, Type='retweet')

In [None]:
H.nodes.data

In [None]:
d = dict(H.degree)
in_degrees = d.values()
nodelist = list(d.keys())

In [None]:
pos = nx.spring_layout(H)
fig5 = plt.figure(figsize=(20,20))
node_size=[v * 100 for v in in_degrees]
nx.draw_networkx_nodes(H, pos, nodelist=nodelist, node_size = node_size);
nx.draw_networkx_edges(H,pos,edgelist=interactions_rt_cleaned, edge_color='r');
nx.draw_networkx_labels(H, pos, F.nodes)

In [None]:
fig3 = plt.figure(figsize=(20,20))
nx.draw(H, nodelist=nodelist, node_size=[v * 100 for v in in_degrees], with_labels=True)

## Get Nodes v2

In [None]:
#get interactions separeted by type 

interactions_retweets = []
interactions_mentions = []
interactions_replies = []

for index, row in subset.iterrows():
    user_sn = row['user_screen_name']
    retweet_from = row['retweeted_from_screen_name']
    reply_to = row['in_reply_to_screen_name']
    mention = row['screen_name_mention_1']
    

    interactions_retweets.append((user_sn, retweet_from))
    interactions_replies.append((user_sn, reply_to))
    interactions_mentions.append((user_sn, mention))

In [None]:
interactions_rt_cleaned = [x for x in interactions_retweets if 0 not in x]
interactions_mt_cleaned = [x for x in interactions_mentions if 0 not in x]
interactions_rp_cleaned = [x for x in interactions_replies if 0 not in x]

interactions_rt = set(interactions_rt_cleaned)
interactions_mt = set(interactions_mt_cleaned)
interactions_rp = set(interactions_rp_cleaned)

## Colored graph

In [None]:
F = nx.DiGraph()

In [None]:
F.add_edges_from(interactions_rt_cleaned, type='retweet')

In [None]:
F.edges.data()

In [None]:
nodeslist2 = list(dict(F.degree).keys())

In [None]:
d2 = dict(F.degree)

In [None]:
fig3 = plt.figure(figsize=(20,10))
nx.draw(F, nodelist=nodeslist2, node_size=[v * 100 for v in d2.values()], with_labels=True)



In [None]:
F.clear()

In [None]:
#Add nodes and egdes to the empty graph based on a list of tuples
F.add_edges_from(interactions_rt_cleaned)
#It maps the nodes positions on the graph using direct-force algorithm
pos = nx.layout.spring_layout(F)

In [None]:
#The degree sets the node size
d2 = dict(F.degree)
#The nodes list is 
nodeslist2 = list(d2.keys())


In [None]:
fig4 = plt.figure(figsize=(20,20))
node_size=[v * 100 for v in d2.values()]
nx.draw_networkx_nodes(F, pos, nodelist=nodeslist2, node_size = node_size);
nx.draw_networkx_edges(F,pos,edgelist=interactions_rt_cleaned, edge_color='r');
nx.draw_networkx_labels(F, pos, F.nodes)

In [None]:
F.edges.data('color', default='red')

## Neo4j

In [1]:
import neo4j

In [3]:
#conect to the database. Its necessary to create a driver object
driver = neo4j.GraphDatabase.driver("bolt://localhost:7687",
             auth=("neo4j", "amanda"))

In [17]:
import nxneo4j as nx2

In [25]:
#Neo4j has a dict of setings that stabilishs the relation data

config = {
'node_label': 'user',
'relationship_type': 'retweeted',
'identifier_property': 'screen_name'}

G = nx2.DiGraph(driver, config)

In [26]:
G.add_edges_from(interactions_set)

In [1]:
G.delete_all()

NameError: name 'G' is not defined

In [27]:
nx2.draw(G)