**PySDS Week 02 Day 03 v.1 - Exercise - Dates and more DataFrames**

Today we will continue to use the PySDS_PolCandidates.csv table and answer some more involved questions as DataFrame practice. 

First, I would like you to begin with a few practice exercises on parsing date times first. Then, using only filters, grouping and other features of DataFrames you should be able to accomplish the questions below. 

In [1]:
# Date Parsing exercises: 
from datetime import datetime, timezone 
import calendar, os
import pandas as pd

now = datetime.now(timezone.utc)
time_1 =  "June 20, 1985 12:35pm"
time_2 =  "10/10/10 10:10:10 +1000" # hint, the +1000 means UTC +10 hours 
time_3 = "534567890" #UTC time; hint: datetime.utcfromtimestamp(xx)

# Question 1. Using now(), which I realise will be a slightly 
# different time for everyone. report the time elapsed between 
# times 1,2,3 and now()

# Question 2. For each of the times above, what day of the week was it? 


# Answer

realtime_1 = datetime.strptime(time_1, "%B %d, %Y %H:%M%p").replace(tzinfo=timezone.utc)
realtime_2 = datetime.strptime(time_2, '%x %X %z')
realtime_3 = datetime.utcfromtimestamp(int(time_3)).replace(tzinfo=timezone.utc)

print("The time elapsed between now and {} is {}".format("realtime_1", now - realtime_1))
print("The time elapsed between now and {} is {}".format("realtime_2", now - realtime_2))
print("The time elapsed between now and {} is {}".format("realtime_3", now - realtime_3))

print("\nTime 1 was on {}.".format(calendar.day_name[realtime_1.weekday()]))
print("Time 2 was on {}.".format(calendar.day_name[realtime_2.weekday()]))
print("Time 3 was on {}.".format(calendar.day_name[realtime_3.weekday()]))


# Reviewer comments 




The time elapsed between now and realtime_1 is 12173 days, 18:18:44.931903
The time elapsed between now and realtime_2 is 2931 days, 6:43:34.931903
The time elapsed between now and realtime_3 is 11636 days, 3:48:54.931903

Time 1 was on Thursday.
Time 2 was on Sunday.
Time 3 was on Wednesday.


In [2]:
# Extended exercise part 1. 

# Using the data "PySDS_PolCandidates.csv" fill in the DataFrame below 
# with data. Also, try to ensure that it is formatted nicely. 
import pandas as pd 

media_combo_df = pd.DataFrame(columns=["Labour","Conservative","All Parties"],
                             index=["None",
                                   "Only Twitter",
                                   "Only Facebook",
                                   "Only Webpage",
                                   "Twitter and Facebook",
                                   "Facebook and Webpage",
                                   "Twitter and Webpage",
                                   "Twitter, Facebook and Webpage"
                                   ])
display(media_combo_df)

# Each cell should be the count of users of the total. So, if it is the 
# [None, Labour] cell it would be the number of Labour candidates
# who did not have either Twitter, Web or Facebook. 

# Here are some hints: If you ensure that the empty columns in the 
# PolCandidates.csv file are null, you can then use boolean logic to 
# select your variables. For example, 
# x = df['have_twitter'].notnull()
# y = df['have_facebook'].notnull() 
# then 
# have_both = df[x, y and z] 
# will get you the rows of the people who have both and 
# have_both['party'].value_counts() 
# will get you the count, by party, of the people 
# who have both twitter and facebook. 

# First do it for the Total then slice it for Labour and Conservative separately.


Unnamed: 0,Labour,Conservative,All Parties
,,,
Only Twitter,,,
Only Facebook,,,
Only Webpage,,,
Twitter and Facebook,,,
Facebook and Webpage,,,
Twitter and Webpage,,,
"Twitter, Facebook and Webpage",,,


In [3]:
df_pol = pd.read_csv("PySDS_PolCandidates.csv", index_col = 0)

# Filtering for individual social accounts. This does not yet account for those who have one and not the other
twt = df_pol["twitter_username"].notnull()
fb = df_pol["facebook_page_url"].notnull()
wp = df_pol["party_ppc_page_url"].notnull()

# Now we use the & combinations to get the specific numbers for the table
# Only Twitter
twtonly = df_pol[twt&~fb&~wp]
twtonlycounts = twtonly['party'].value_counts()

# Only Facebook
fbonly = df_pol[~twt&fb&~wp]
fbonlycounts = fbonly['party'].value_counts()

# Only Webpage
wponly = df_pol[~twt&~fb&wp]
wponlycounts = wponly['party'].value_counts()

# Only Twitter & Facebook
twtfb = df_pol[twt&fb&~wp]
twtfbcounts = twtfb['party'].value_counts()

# Only Twitter & Webpage
twtwp = df_pol[twt&~fb&wp]
twtwpcounts = twtwp['party'].value_counts()

# Only Facebook & Webpage
fbwp = df_pol[fb&wp&~twt]
fbwpcounts = fbwp['party'].value_counts()

# Have all of the above
allmedia = df_pol[twt&fb&wp]
allmediacounts = allmedia['party'].value_counts()

# Have none of the above
nomedia = df_pol[~twt&~fb&~wp]
nomediacounts = nomedia['party'].value_counts()

# Now we add the sums for All Parties to the table
media_combo_df.loc["Only Twitter", "All Parties"] = twtonlycounts.sum()
media_combo_df.loc["Only Facebook", "All Parties"] = fbonlycounts.sum()
media_combo_df.loc["Only Webpage", "All Parties"] = wponlycounts.sum()
media_combo_df.loc["Twitter and Facebook", "All Parties"] = twtfbcounts.sum()
media_combo_df.loc["Facebook and Webpage", "All Parties"] = fbwpcounts.sum()
media_combo_df.loc["Twitter and Webpage", "All Parties"] = twtwpcounts.sum()
media_combo_df.loc["Twitter, Facebook and Webpage", "All Parties"] = allmediacounts.sum()
media_combo_df.loc["None", "All Parties"] = nomediacounts.sum()

# Now we add the Labour figures
media_combo_df.loc["Only Twitter", "Labour"] = twtonlycounts['Labour Party']
media_combo_df.loc["Only Facebook", "Labour"] = fbonlycounts['Labour Party']
media_combo_df.loc["Only Webpage", "Labour"] = wponlycounts['Labour Party']
media_combo_df.loc["Twitter and Facebook", "Labour"] = twtfbcounts['Labour Party']
media_combo_df.loc["Facebook and Webpage", "Labour"] = fbwpcounts['Labour Party']
media_combo_df.loc["Twitter and Webpage", "Labour"] = twtwpcounts['Labour Party']
media_combo_df.loc["Twitter, Facebook and Webpage", "Labour"] = allmediacounts['Labour Party']
media_combo_df.loc["None", "Labour"] = nomediacounts['Labour Party']

# And finally the Conservative figures. 
media_combo_df.loc["Only Twitter", "Conservative"] = twtonlycounts['Conservative Party']
# media_combo_df.loc["Only Facebook", "Conservative"] = fbonlycounts['Conservative Party'] #It appears that no Conservatives have only Facebook accounts. Query whether that is correct?
media_combo_df.loc["Only Webpage", "Conservative"] = wponlycounts['Conservative Party']
media_combo_df.loc["Twitter and Facebook", "Conservative"] = twtfbcounts['Conservative Party']
media_combo_df.loc["Facebook and Webpage", "Conservative"] = fbwpcounts['Conservative Party']
media_combo_df.loc["Twitter and Webpage", "Conservative"] = twtwpcounts['Conservative Party']
media_combo_df.loc["Twitter, Facebook and Webpage", "Conservative"] = allmediacounts['Conservative Party']
# media_combo_df.loc["None", "Conservative"] = nomediacounts['Conservative Party']  #It also appears that none of the Conservatives have no media accounts. Query whether that is correct?

# Filling the NaN's with '0'
media_combo_df.fillna(0, inplace = True)

display(media_combo_df)

Unnamed: 0,Labour,Conservative,All Parties
,29,0,549
Only Twitter,211,2,672
Only Facebook,2,0,65
Only Webpage,9,73,395
Twitter and Facebook,66,1,313
Facebook and Webpage,4,28,113
Twitter and Webpage,180,318,1064
"Twitter, Facebook and Webpage",88,209,800


In [22]:
# Extended exercise part 2. 

# The raw counts in the table are useful, 
# but showing the relative percentage would be even more useful. 
# Create a new table that is formatted like the above, however, in 
# this table show the percent of the column total. 
# So for Labour that would be the percentage of Labour candidates
# who had 'only webpage', not the percentage of all candidates who
# are Labour and only have a webpage. 

# Hint to display a DataFrame as a percentage, try this: 
# df = pd.DataFrame(pd.Series(range(10))/10,columns=["var1"])
# df['var2'] = df['var1'].map(lambda n: '{:,.1%}'.format(n))
# df

# Answer below here
# First work out the totals for Labour, Conservative and All Parties
totalL = df_pol['party'].value_counts()['Labour Party']
totalC = df_pol['party'].value_counts()['Conservative Party']
totalAll = df_pol['party'].value_counts().sum()

# Then compute the divisions
media_combo_df['Labour Raw'] = media_combo_df['Labour']/totalL
media_combo_df['Conservative Raw'] = media_combo_df['Conservative']/totalC
media_combo_df['All Parties Raw'] = media_combo_df['All Parties']/totalAll

# And finally map them onto percentages to one decimal place. I tried combining both this and the divisions but it didn't work
media_combo_df['Labour %'] = media_combo_df['Labour Raw'].map(lambda n: '{:,.1%}'.format(n))
media_combo_df['Conservative %'] = media_combo_df['Conservative Raw'].map(lambda n: '{:,.1%}'.format(n))
media_combo_df['All Parties %'] = media_combo_df['All Parties Raw'].map(lambda n: '{:,.1%}'.format(n))

display(media_combo_df)

# tot_col = pivot


# Reviwers comments below here


Unnamed: 0,Labour,Conservative,All Parties,Labour %,Conservative %,All Parties %,Labour Raw,Conservative Raw,All Parties Raw
,29,0,549,4.9%,0.0%,13.8%,0.049236,0.0,0.138252
Only Twitter,211,2,672,35.8%,0.3%,16.9%,0.358234,0.00317,0.169227
Only Facebook,2,0,65,0.3%,0.0%,1.6%,0.003396,0.0,0.016369
Only Webpage,9,73,395,1.5%,11.6%,9.9%,0.01528,0.115689,0.099471
Twitter and Facebook,66,1,313,11.2%,0.2%,7.9%,0.112054,0.001585,0.078821
Facebook and Webpage,4,28,113,0.7%,4.4%,2.8%,0.006791,0.044374,0.028456
Twitter and Webpage,180,318,1064,30.6%,50.4%,26.8%,0.305603,0.503962,0.267943
"Twitter, Facebook and Webpage",88,209,800,14.9%,33.1%,20.1%,0.149406,0.33122,0.201461


In [25]:
# Extended exercise part 3. 

# Sum each of the columns in the previous exercise. 
# Do each of the columns sum to 100%? They should. 
# Use this exercise as a check that 
# each column sums to the expected total. 

# hint. 
# print(df["var1"].sum())

# Answer below here
print('The Labour % sum to', media_combo_df['Labour Raw'].sum())
print('The Conservative % sum to', media_combo_df['Conservative Raw'].sum())
print('The All Parties % sum to', media_combo_df['All Parties Raw'].sum())


# Reviewers comments below here 




The Labour % sum to 1.0
The Conservative % sum to 1.0
The All Parties % sum to 1.0
