# Prospectus

For this final assignment, I will be focusing on the Canadian Tire Data from the months of May, June, and December. From analyzing the data given, these three months have shown to be the most popular months for Canadian Tire to be mentioned in tweets out of both years. June, consisting of 2940 mentions in both years, May, consisting of 2465 mentions, and December with Canadian Tire being mentioned or relating to it 2417 times. By analyzing the most popular months of both years, we are able to gather enough data to determine if the change of the Canadian Tire slogan has shifted tweets to be more positive or not.
  
Some of the problems with the data, however, circle around the notion that there are a number of columns that are empty for quite some time, making it hard for the reader to read some of the information. There also appears to be various columns that do not have much relevance to the data needed, and will most likely be taken out later on. For example, the 'user_id' column contains information about the user's ID number. Although this can help determine who a user might be, the data also provides their username, as well as their screen name, thereby  removing any relevance the user_id column may carry.

In [1]:
# importing things
import pandas as pd
import numpy as np

# this creates a DataFrame filled with all of the information from the CSV file
tweets = pd.read_csv("data/canadian-tire_cct490.csv", index_col=False)

In [2]:
tweets.columns
# here, we're just looking at what kind of columns are in the file itself.
#there's quite a lot, and most of these are things we probably won't need.

Index(['id', 'created_at', 'text', 'user_id', 'user_name', 'user_screenname',
       'user_location', 'rt_id', 'rt_created_at', 'rt_text', 'rt_user_id',
       'rt_user_name', 'rt_user_screenname', 'rt_user_location', 'positive',
       'negative', 'rt_positive', 'rt_negative'],
      dtype='object')

In [3]:
tweets['created_at'].head(2)
#this shows us the type of text that is shown in the 'created at' column
    #since we're going to be focusing on the date, we're going to have to
    #figure out a way to separate the time and date from each other.

0    2013-01-09 05:38:29
1    2013-01-07 01:30:15
Name: created_at, dtype: object

In [4]:
# here, we are taking the 'created at' column and splitting the date and time
tweets['created_at'].str.split(expand = True)
# this here, creates a column called 'date'
tweets['date'] = tweets['created_at'].str.split(expand = True)[0]
# this creates a column that just looks at the year and day
tweets['month'] = tweets['date'].str.slice(0, 7)

In [5]:
may_2013 = tweets[tweets['month'] == '2013-05']
may_2013.shape # this shows how many tweets were made in May 2013

(1380, 20)

In [6]:
may2013 = may_2013[['month','text', 'positive', 'negative']]
may13_positive = may2013[may2013['positive'] > 0]

In [7]:
may13_positive.shape 
# this shows us just how many tweets, out of 1380, were positive

(471, 4)

In [8]:
may13_negative = may2013[may2013['negative'] > 0]
may13_negative.shape
# this shows us just how many tweets, out of the 1380, were negative

(173, 4)

# Midterm Report

One of the biggest problems I was having was figuring out a way to get the date and time to split. Thankfully, Professor Hanna had helped me out a bit there, showing how to split the date from the time, and went further into explaining how to just get the year and month.

After being able to find out how to do those two things, I was able to get how to specify which months I want to look at, and see just how many positive and negative tweets were mentioned in a specific month, just like I've done above.

All that is needed to do now, is to go through the rest of the months (May 2014, June 2013 & 2014, December 2013 & 2014). I'll also take a look at whether or not anyone, in 2014, had mentioned the new slogan, and whether or not it had received a positive or negative feedback.  

In [10]:
june_2013 = tweets[tweets['month'] == '2013-06']
june_2013.shape # this shows how many tweets were made in June 2013

(2087, 20)

In [11]:
june2013 = june_2013[['month','text', 'positive', 'negative']]
june13_positive = june2013[june2013['positive'] > 0]

In [15]:
june13_positive.shape 
# this shows us just how many tweets, out of 2087, were positive

(805, 4)

In [23]:
june13_negative = june2013[june2013['negative'] > 0]
june13_negative.shape # this will show how many tweets were negative

(255, 4)

As we can see here, out of the 2087 tweets that were made in regards to Canadian Tire, 805 of them were displaying positive comments towards the company, while 255 depicted negative comments. The remaining 1027 tweets, however, are tweets that are neither positive or negative towards the company. This means that majority of the tweets are most likely mentioning Canadian Tire, rather than actually commenting on the company itself.

In [13]:
december_2013 = tweets[tweets['month'] == '2013-12']
december_2013.shape # this shows how many tweets were made in December 2013

(1272, 20)

In [24]:
december2013 = december_2013[['month','text', 'positive', 'negative']]
december13_positive = december2013[december2013['positive'] > 0]
december13_positive.shape 
# this shows us just how many tweets, out of 1272, were positive

(367, 4)

In [26]:
december13_negative = december2013[december2013['negative'] > 0]
december13_negative.shape # this will show how many tweets were negative

(206, 4)

In [17]:
may_2014 = tweets[tweets['month'] == '2014-05']
may_2014.shape # this shows how many tweets were made in May 2014

(1085, 20)

In [18]:
may2014 = may_2014[['month','text', 'positive', 'negative']]
may14_positive = may2014[may2014['positive'] > 0]
may14_positive.shape 
# this shows us just how many tweets, out of 1085, were positive

(404, 4)

In [27]:
may14_negative = may2014[may2014['negative'] > 0]
may14_negative.shape # this will show how many tweets were negative

(149, 4)

In [19]:
june_2014 = tweets[tweets['month'] == '2014-06']
june_2014.shape # this shows how many tweets were made in June 2014

(853, 20)

In [20]:
june2014 = june_2014[['month','text', 'positive', 'negative']]
june14_positive = june2014[june2014['positive'] > 0]
june14_positive.shape 
# this shows us just how many tweets, out of 853, were positive

(307, 4)

In [28]:
june14_negative = june2014[june2014['negative'] > 0]
june13_negative.shape # this will show how many tweets were negative

(255, 4)

In [21]:
december_2014 = tweets[tweets['month'] == '2014-12']
december_2014.shape # this shows how many tweets were made in December 2014

(1145, 20)

In [22]:
december2014 = december_2014[['month','text', 'positive', 'negative']]
december14_positive = december2014[december2014['positive'] > 0]
december14_positive.shape 
# this shows us just how many tweets, out of 1145, were positive

(425, 4)

In [29]:
december14_negative = december2014[december2014['negative'] > 0]
december14_negative.shape # this will show how many tweets were negative

(138, 4)