**PySDS Week 02 Day 04 v.1 - Friday Formative - Merging DataFrames**

# Exercise 1. Merging and reporting on data

Recall that we have a table called PySDS_PolCandidates.csv. This table has a list of candidates with Twitter accounts. We also now have a database of tweets captured on the 5th and 6th of May, 2015 by British Politicians. The expanded dataset includes the set of tweets as replies to these politicians, but that is not being used here.

In [None]:
# Question 1.1: There are accounts in the roottweets database that are 
# not in the PolCandidates list and vice versa. 
# Filter the roottweets table / dataframe down to only the candidates 
# in the PolCandidates table. Then enter the values in the sentence below. 


######################################
# Answer Below Here 
import calendar, os, datetime, sqlite3
import pandas as pd

con = sqlite3.connect('PySDS_ElectionData_2015_may5-6.db')
cursor = con.cursor()

df_twt = pd.read_sql('SELECT * from roottweets',sqlite3.connect('PySDS_ElectionData_2015_may5-6.db'))
# display(df_twt.head(3))

df_pol = pd.read_csv("PySDS_PolCandidates.csv")
# display(df_pol.head(3))

before_tweets = df_twt.text.nunique()
before_accounts = df_twt.username.nunique()

df_polmerged = df_pol.merge(df_twt,left_on='twitter_username',right_on='username', how='inner')
# display(df_polmerge.head(3))

after_tweets = df_polmerged.text.nunique()
after_accounts = df_polmerged.username.nunique()


print( "Before filtering there were %s Tweets and %s accounts." % (before_tweets, before_accounts),
      "\nAfter filtering there were %s Tweets and %s accounts." % (after_tweets,after_accounts))

In [None]:
#####################################
# Question 1.1
# TA comments below here 

# ___ / 5. 
# Comments:
'''

'''


In [None]:
# Question 1.2: Using the newly filtered table, merge in the candidates' political 
# party from PolCandidates. Use this to enter values in the sentence below. 

######################################
# Answer Below Here 

df_con = df_polmerged.loc[df_polmerged['party'] == 'Conservative Party']

conservative_candidates_count = df_con.name.nunique()
conservative_tweets_count = df_con.text.nunique()
top_con_tweeter = df_con.groupby('name')['text'].count().idxmax()
top_con_tweet_count = df_con.groupby('name')['text'].count().max() 

df_lab = df_polmerged.loc[df_polmerged['party'] == 'Labour Party']

labour_candidates_count = df_lab.name.nunique()
labour_tweets_count = df_lab.text.nunique()
top_labour_tweeter = df_lab.groupby('name')['text'].count().idxmax()
top_labour_tweet_count = df_lab.groupby('name')['text'].count().max() 

print("The %s candidates from the Conservative party sent %s root tweets. The top tweeter was %s with %s tweets" \
      % (conservative_candidates_count, conservative_tweets_count, top_con_tweeter, top_con_tweet_count))

print("The %s candidates from the Labour party sent %s root tweets. The top tweeter was %s with %s tweets" \
      % (labour_candidates_count, labour_tweets_count, top_labour_tweeter, top_labour_tweet_count))


In [None]:
#####################################
# Question 1.2
# TA comments below here 

# ___ / 5. 
# Comments:
'''

'''

# Exercise 2. An acrostic of tweets. 

In [None]:
#################################################################
#
# Perhaps
# You'd
# Take
# Hacking
# Over
# Nothing?
#
# See https://en.wikipedia.org/wiki/Acrostic
#
# Fun Fact! Lewis Carroll's Through the Looking Glass contained a 
# poem with an acrostic of the full name of the real-life Alice. 
# 
#################################################################

# This exercise consists of two parts. In the first, you have to
# print out an acrostic. You select a codephrase, and then the words that 
# are printed on each line should come from the tweets database. They do not 
# have to come from the filtered table unless you want the party affiliation.
# 
# The horizontal words for the acrostic should be the first word of the 
# tweet. They should also be filtered somehow, such as 'tweets from the  
# Liberal Democrat party', 'tweets with a url', or 'tweets that have an 
# @mention' in them.
#
# The second part is that you have to then provide a user input prompt
# so that a user can see if they can make an acrostic with the same 
# set of tweets. If they can (i.e. the codephrase's letters are all contained
# within the set of tweets), print out the acrostic. Otherwise, let the user 
# know that the program cannot find an acrostic with that phrase. Ask them to 
# please try another phrase, or type "exit()" to exit. 
#
'''
Using tweets that <user defined> I made an acrostic: 

Tweets 
Rarely 
Accommodate
Politicians

Using the same set of tweets, now you try to make one: 
[                            ]
'''


# Notes: 
# - Each line in the acrostic should be a unique word, even if the codephrase 
#       has two of the same letter.  
# - Your acrostic codephrase has to be longer than 5 characters. 
# - Dont worry about representing lower/uppper case, spaces, or punctuation in 
#       your acrostic, but assume that users will try to type that in 
#       the input box.
# - If the user's attempted acrostic codephrase doesn't work
#       then it should let the user try again. 
# - The codephrase should make sense, but I fully expect the word list
#       from tweets not to make a lot of sense. 
# - If you find that the first word doesn't cut it, you can take the first 
#       'non-tweet' as in the first non-["rt", "@mention", "#hashtag"]
#
# hint: df['first_word'] = df["text"].map(lambda x: cleanWord(x))

#
#
# Rubric
# 5 pts. Functionality: Does your code work as directed (to test: 
#             we would enter your codephrase as input)
# 5 pts.  Robustness: Will user input break the code? How does it handle junk characters?
              
# 5 pts.  Code factoring: e.g., how well did you use functions/data strutures 
#             to help manage your queries?
# 5 pts.  Complexity of the filter on the tweets:  A relative / subjective 
#             assessment based on how you decided to filter and select tweets)

######################################
# Answer Below Here 

# I will be using the tweets of the Conservative Party to make an acrostic.

# Making a deep copy so I don't keep getting copy warning messages.
df_con2 = df_con.copy(deep=True)

# Now we split the tweet so we can take the first word.
df_con2['first_word'] = df_con2['text'].map(lambda word: word.split(' ')[0])

# Cleaning for terms like RT and weblinks.
# Didn't remove the @ and other symbols as if it is not in the acrostic input, the word will not be chosen.
# This also allows the option to create and acrostic with an '@'.
df_con2['first_word'].replace(regex=True,inplace=True,to_replace=r'https?://[\w\./?&=%]*',value=r'')
df_con2['first_word'].replace(regex=True,inplace=True,to_replace=r'RT',value=r'')

# Another deep copy just to be safe!
df_acrowords = df_con2['first_word'].copy(deep=True)

# Now I'm making my words into a list cause it's intuitively easier for me to work with.
# Plus now I can take out empty strings without too much hassle. 
acrowords = df_acrowords.values.tolist()
acrowords = [x.lower() for x in acrowords]
acrowords = list(filter(None, acrowords))

# My acrostic maker defined as a function
# A note to myself that I used enumerate to help index the word so I was able to delete it so it didn't repeat the word. It's not there for bants.
def acrosticMaker(ls, acroinput):
    acrostic = []
    for letter in acroinput:
        for index, word in enumerate(acrowords):
            if letter == word[0]:
                acrostic.append(word)
                del (acrowords[index])
                break
            
    return acrostic

# Testing 1, 2, 3.
print('A test acrostic:', acrosticMaker(acrowords, 'happytimes'))

# Now we let the audience try.
my_acrostic = (acrosticMaker(acrowords, 'voted'))

print('\nUsing tweets that of the Conservative Party I made an acrostic:\n') 
print(*my_acrostic,sep='\n')

print('\nUsing the same set of tweets, now you try to make one (lowercase only please):\n')
while True:
    word = input()
    try:
        if len(word) < 5:
            print('''You word length has to be greater than 5 o' wise one.\n''')
            continue
        elif len(word) > 4: 
            your_acrostic = acrosticMaker(acrowords, word)
            print(*your_acrostic,sep='\n')
            break
    except:
        print('Apologies, but it seems we cannot generate an acrostic from the word you have enetered.\n Please try again?')
        continue

In [None]:
#####################################
# TA comments below here 

# Functionality: 
# ___ / 5. 
# Comments 
'''

'''

# Robustness: 
# ___ / 5. 
# Comments 
'''

'''

# Code Factoring: 
# ___ / 5. 
# Comments 
'''

'''

# Filter Complexity: 
# ___ / 5. 
# Comments 
'''

'''