<a href="https://colab.research.google.com/github/datacamp/Brand-Analysis-using-Social-Media-Data-in-R-Live-Training/blob/master/notebooks/brand_analysis_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p align="center">
<img src="https://github.com/datacamp/r-live-training-template/blob/master/assets/datacamp.svg?raw=True" alt = "DataCamp icon" width="50%">
</p>
<br><br>

## **Brand Analysis Using Social Media Data in R**

Welcome to this hands-on training where you will learn how to perform brand analysis from social media data using R. We will be using different R libraries to analyze twitter data and derive insights.

In this session, you will learn

* How to compare brand popularity by extracting and comparing follower counts
* How to promote a brand by identifying popular tweets
* How to evaluate brand salience and compare the same for two brands using tweet frequencies
* Understand brand perception through text mining and by visualizing key terms
* Perform sentiment analysis to understand customer's feelings and sentiments about a brand
* Visualize brand presence by plotting tweets on the map

## **The Dataset**

The datasets to be used in this training session are in RDS format which is a handy format to save and later import single R objects. These datasets comprise extracted live tweets using `rtweet` library. The datasets are:
* **tesla.rds**: Tweets searched on keyword 'tesla' pre-extracted from Twitter
* **toyota.rds**: Tweets searched on keyword 'toyota' pre-extracted from Twitter
* **tesla_small.rds**: Tweets searched on keyword 'tesla' pre-extracted from Twitter. This is a smaller dataset with fewer tweets.

* **car.rds**: Tweets searched on keyword 'electric car' pre-extracted from Twitter

Note that we will not be extracting live tweets from Twitter during this session as it invovles a setup process. We will be using pre-extracted tweets saved in RDS format.

**tesla.rds**: has 17979 records (tweets) and 90 columns of tweet text and associated metadata
**toyoto.rds**: has 17798 records (tweets) and 90 columns of tweet text and associated metadata
**tesla_small.rds**: has 500 records (tweets) and 90 columns of tweet text and associated metadata
**car.rds**: has 12925 records (tweets) and 90 columns of tweet text and associated metadata

All the datasets have the same set of columns and some of the important columns that we will work with are listed below:

- `user_id`: Twitter allocated unique ID for each twitter user.
- `created_at`: UTC time when this Tweet was created
- `screen_name`: The screen name or twitter handle that an user identifies themselves with
`text`: The actual tweet text posted by an user
- `retweet_count`: Number of times a given tweet has been retweeted.
- `followers_count`: The number of followers a twitter account currently has.
- `geo_coords`, `coords_coords`, `bbox_coords`: Represents geographic location of a tweet as reported by the user or client application


## **Getting started and exploring the dataset**

In [2]:
# Install R Packages
install.packages('rtweet')
#install.packages('dplyr')
#install.packages('reshape')
#install.packages('ggplot2')
#install.packages('qdapRegex')
#install.packages('ggplot2')
#install.packages('tm')
#install.packages('qdap')
#install.packages('wordcloud')
#install.packages('RColorBrewer')
#install.packages('syuzhet')
#install.packages('maps')

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependency ‘httpuv’




In [0]:
# Load main rtweet library
library(rtweet)

### 1. Compare brand popularity by extracting and comparing follower counts

In [0]:
# Create a variable to store twitter account names of 4 auto magazines
users <- c("BreakingAuto", "Motorpic", "Mpgomatic", "Cjponyparts")

# Extract user data for the twitter accounts stored in users
# users_twt <- lookup_users(users)

# Save extracted data a CSV file using fwrite from data.table library
#fwrite(users1, file = "users_twt")



In [7]:
# Import extracted user data from the csv file into a dataframe
users1 = read.csv("https://github.com/datacamp/Brand-Analysis-using-Social-Media-Data-in-R-Live-Training/blob/master/data/users1.csv")

# View few rows of the dataframe
head(users1)

Unnamed: 0_level_0,X..DOCTYPE.html.
Unnamed: 0_level_1,<fct>
1,<html lang=en>
2,<head>
3,<meta charset=utf-8>
4,<link rel=dns-prefetch href=https://github.githubassets.com>
5,<link rel=dns-prefetch href=https://avatars0.githubusercontent.com>
6,<link rel=dns-prefetch href=https://avatars1.githubusercontent.com>


In [8]:
# Create a data frame of screen names and follower counts
user_df <- users1[,c("screen_name","followers_count")]

ERROR: ignored

In [0]:
# Display and compare the follower counts for the 4 news sites
user_df

Inference

### 2. Promote a brand by identifying popular tweets using retweet counts

In [0]:
# Extract 18000 tweets on Tesla
#tweets = search_tweets("tesla", n = 18000, lang = "en", include_rts = FALSE)
#saveRDS(tweets, "tesla.rds")

In [0]:
# Import extracted tweets in RDS format into a dataframe
tesladf = readRDS("https://github.com/datacamp/Brand-Analysis-using-Social-Media-Data-in-R-Live-Training/blob/master/data/tesla.rds")

“cannot open compressed file 'https://github.com/datacamp/Brand-Analysis-using-Social-Media-Data-in-R-Live-Training/blob/master/data/tesla.rds', probable reason 'No such file or directory'”


ERROR: ignored

In [0]:
# Explore the tweet dataframe
dim(tesladf)
View(tesladf)

In [0]:
# Create a data frame of tweet text and retweet count
rtwt <- tesladf[,c("text", "retweet_count")]
head(rtwt)

In [0]:
# Import library
library(dplyr)

# Sort data frame based on descending order of retweet counts
rtwt_sort <- arrange(rtwt, desc(retweet_count))

In [0]:
# Exclude rows with duplicate text from sorted data frame
rtwt_unique <- unique(rtwt_sort, by = "text")

In [0]:
# Print top 6 unique posts retweeted most number of times
rownames(rtwt_unique) <- NULL
head(rtwt_unique)

Inference

### 3.	Evaluate brand salience

#### a) Visualizing frequency of tweets using time series plots

In [0]:
# View the tweet dataframe
head(tesladf)