# Twitter Sentiment Classifier

We have provided some synthetic (fake, semi-randomly generated) twitter data in a csv file named project_twitter_data.csv which has following columns:
* text of a tweet
* the number of retweets of that tweet
* the number of replies to that tweet. 

We have also words that express positive sentiment and negative sentiment, in the files positive_words.txt and negative_words.txt.
<br>
<br>

Your task is to build a sentiment classifier, which will detect how positive or negative each tweet is. The output will be a csv file which contains the following columns:
* Number of Retweets
* Number of Replies
* Positive Score (which is how many happy words are in the tweet)
* Negative Score (which is how many angry words are in the tweet)
* Net Score for each tweet. At the end, you upload the csv file to Excel or Google Sheets, and produce a graph of the Net Score vs Number of Retweets.
<br>
<br>

To start, define a function called strip_punctuation which takes one parameter, a string which represents a word, and removes characters considered punctuation from everywhere in the word. (Hint: remember the .replace() method for strings.)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Get twitter file data and we use readlines to iterate over every line
# We set up new file, intialize header, and we iterate over every line in input file to create the new row in the new file
# We convert every line to a list of columns by splitting on commas, and we use this list obect to have acccess to col. item in list
# we create new row by writing  


#--- Set Up Positive and Negative Word List
positive_file = "/content/drive/MyDrive/Interests/b. DS | ML | Programming/b. Progamming/Python/Projects/Twitter Sentiment Analysis - U.Michigan/positive_words.txt"
negative_file ="/content/drive/MyDrive/Interests/b. DS | ML | Programming/b. Progamming/Python/Projects/Twitter Sentiment Analysis - U.Michigan/negative_words.txt"

positive_words = []
with open(positive_file) as pos_file:
  for line in pos_file:
    if (line[0] != ';') and (line[0] != '\n'):
      positive_words.append(line.strip())

negative_words = []
with open(negative_file) as neg_file:
  for line in neg_file:
    if (line[0]!= ';') and (line[0] != '\n'):
      negative_words.append(line.strip())


#--- Define Functions
punctuation_chars = ["'", '"', ",", ".", "!", ":", ";", '#', '@']
def strip_punctuation(word):
  for c in word:
    if c in punctuation_chars:
      word = word.replace(c, '')
  return word

def pos_score(tweet_string):
  positive_count = 0
  tweet_words = tweet_string.split()
  for word in tweet_words:
    word = strip_punctuation(word).lower()
    if word in positive_words:
      positive_count += 1
  return positive_count

def neg_score(tweet_string):
  negative_count = 0
  tweet_words = tweet_string.split()
  for word in tweet_words:
    word = strip_punctuation(word).lower()
    if word in negative_words:
      negative_count += 1
  return negative_count

#--- Get Input File, Process Data, Create Output CSV
twitter_file = "/content/drive/MyDrive/Interests/b. DS | ML | Programming/b. Progamming/Python/Projects/Twitter Sentiment Analysis - U.Michigan/project_twitter_data.txt"
outputfile = '/content/drive/MyDrive/Interests/b. DS | ML | Programming/b. Progamming/Python/Projects/Twitter Sentiment Analysis - U.Michigan/twitter_outputfile.csv'

with open(twitter_file, 'r') as fh:
  source_rows = fh.readlines()

with open(outputfile, 'w') as outputcsv:
  outputcsv.write("Number of Retweets, Number of Replies, Positive Score, Negative Score, Net Score")
  outputcsv.write('\n')
  for source_row in source_rows[1:]:
    source_row_list = source_row.strip().split(',')
    tweet_string = source_row_list[0]
    output_row = '{},{},{},{},{}'.format(source_row_list[1]
                                     , source_row_list[2]
                                     , pos_score(tweet_string)
                                     , neg_score(tweet_string)
                                     , pos_score(tweet_string ) - neg_score(tweet_string ))
    outputcsv.write(output_row)
    outputcsv.write('\n')