# Visualization

There is a meeting coming up on wednesday, and I want to have some cool presentations to show what I can do.

This notebook will be all about visualizing my things.

I'm gonna try my very best to leverage the clean data set that I took from before. Also I'm gonna do my best to use Seaborn, since it's so easy and beautiful. One of the biggest challenges is the fact that the user study has a massive amount of options to reflect the public service. This means we have to drill fairly deep down to find what we want.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv('/Users/Owner/Documents/Work_transfer/User Study 2016/Clean_File.csv', encoding = 'latin1')

In [None]:
df.shape

In [None]:
df['Department'].value_counts()

top10dep = list(df['Department'].value_counts()[0:10].reset_index()['index'])
top5comm = list(df['Community'].value_counts()[0:5].reset_index()['index'])

In [None]:
depdf = df[df['Department'].isin(top10dep) & df['Community'].isin(top5comm)]

In [None]:
ax = sns.countplot(x = "Community", hue = "Department", data = depdf)
ax.legend(loc = 'upper left', bbox_to_anchor = (1,1))
ax.set_xticklabels(ax.xaxis.get_majorticklabels(), rotation=90)
ax.set_title("Top Departments by Top Communities")
plt.show()
# While this approach worked, this isn't really what I wanted to do.

In [None]:
ax1 = sns.countplot(x = "Community", data = df)
ax1.set_xticklabels(ax1.xaxis.get_majorticklabels(), rotation = 90)
ax1.set_title("Top Communities of Work on GCconnex")
plt.show()

In [None]:
#I want to look at the top communities of each 

g = sns.factorplot("Community", col = "Department", data = df[df['Department'].isin(top10dep)], kind = "count",
                  col_wrap = 5, size = 10)
g.set_xticklabels(rotation = 90)
plt.show()

The above graph is hardly presentable, but it does let me know what's going on. My goal is to look at some of the top departments and pull out the top 5 communities of each one. It shouldn't be too hard. All I have to do is go through each possible department, and pull the top 5 most popular communities from there, and then probably store that in a dictionary.

In [None]:
deplist = list(df['Department'].unique())

In [None]:
depcommunity = {}
for dep in deplist:
    
    comms = list(df[df['Department'] == dep]['Community'].value_counts().reset_index()['index'][0:5])

    
    depcommunity[dep] = df[(df['Department'] == dep) & (df['Community'].isin(comms))] # For each department, filter
    #the dataframe to only include the top five communities for that dataframe
    
    

In [None]:
for dep in range(len(deplist)):
    g = sns.countplot("Community", hue = "EasyUse", data = depcommunity[deplist[dep]])
    g.set_xticklabels(g.xaxis.get_majorticklabels(), rotation= 90)
    g.set_title("Ease of Use by Community of Work in " + deplist[dep])
    plt.savefig("/Users/Owner/Documents/Work_transfer/User Study 2016/Plots/"+deplist[dep]+" Plot.pdf")
    plt.tight_layout()

    plt.show()

    plt.clf()
    


In [None]:
f = sns.countplot("Community", hue = "EasyUse", data = df)
f.set_xticklabels(f.xaxis.get_majorticklabels(), rotation = 90)
f.set_title('Ease of Use of GCconnex by Community')
plt.show()

In [None]:
y = sns.countplot("SMLevel", hue = "EasyUse", data = df, order = ["Beginner", "Intermediate", 'Advanced'])
y.set_title("Ease of Use GCconnex By Social Media Experience")
plt.show()

In [None]:
len(df['Department'].unique())

In [None]:
z = sns.countplot("UsageLength", hue = "EasyUse", data = df,
                  order = ["Less than a month", "One month to less than six months", "Six months to less than one year",
                           "One year", "More than one year (please indicate the number of years):"])
z.set_title("Ease of Use by Length of Usage")
z.set_xticklabels(z.xaxis.get_majorticklabels(), rotation = 90)
plt.show() # No discernible pattern from this plot. No evidence of a learning curve from this one.

In [None]:
r = sns.countplot("HowOftenUse", hue = "EasyUse", data = df, order = ["Rarely (e.g. less than once a month)",
                                                                     "Occasionally (e.g. few times a month)",
                                                                      "Frequently (e.g. few times a week)",
                                                                      "Very Frequently (e.g. daily)"])
r.set_title("Ease of Use and Frequency of Use")
r.set_xticklabels(r.xaxis.get_majorticklabels(), rotation = 90)
plt.show()

In [None]:
df['HowOftenUse'].value_counts()

In [None]:
nouse = df.filter(regex = "NoUse").fillna(0)
nouse.columns = ("I Don't\nKnow Why\nI Would Use It", "I do not\nrequire Collaboration",
                 "Uncomfortable with\nworking so publicly", "My Supervisor\nDoesn't Approve",
                 "Don't have time to\nlearn something new",
                 "People I collaborate \nwith don't use it")
nouse["It doesn't have\nthe information\nI need"] = df['NoToolsInfo']
nouse["I don't see its value or\npurpose"] = df['NoPurpose']

In [None]:
g = nouse.sum().plot.bar()
g.set_title("Why People Do Not Use GCconnex")
g.set_ylabel("Number of Responses")
g.set_xticklabels(g.xaxis.get_majorticklabels(), rotation = 70)
plt.show()

In [None]:
t = sns.countplot('Age', data = df, hue = 'HowOftenUse', 
                  hue_order = ["Rarely (e.g. less than once a month)",
                                "Occasionally (e.g. few times a month)",
                                "Frequently (e.g. few times a week)",
                                "Very Frequently (e.g. daily)"],
                 order = ['24 years and under',
                          '25 to 29 years',
                          '30 to 34 years',
                          '35 to 39 years',
                          '40 to 44 years',
                          '45 to 49 years',
                          '50 to 54 years',
                          '55 to 59 years',
                          '60 years and over'])
t.set_xticklabels(t.xaxis.get_majorticklabels(), rotation = 90)
t.legend(loc = 'upper left', bbox_to_anchor = (1,1))
t.set_title("Frequency of Use by Age")

plt.show()

In [None]:
q = sns.countplot('Gender', data = df, hue = 'HowOftenUse',
                 hue_order = ["Rarely (e.g. less than once a month)",
                                                                     "Occasionally (e.g. few times a month)",
                                                                      "Frequently (e.g. few times a week)",
                                                                      "Very Frequently (e.g. daily)"])
q.set_title("Frequency of Use by Gender")
q.set_xticklabels(q.xaxis.get_majorticklabels(), rotation = 0)
q.legend(loc = 'upper left', bbox_to_anchor = (1,1))
plt.show()

In [None]:
dfeasy = df[df['EasyUse'] == "Yes"]

In [None]:
dfeasy_sums = pd.Series(dfeasy.filter(regex = 'Why').sum())

In [None]:
dfnoteasy_sums = pd.Series(df[df['EasyUse'] == "No"].filter(regex = "Why").sum())

In [None]:
print (len(dfeasy))
print (len(df[df['EasyUse'] == "No"]))

In [None]:
dfeasy_sums = 100*dfeasy_sums/len(dfeasy)

In [None]:
dfnoteasy_sums = 100*dfnoteasy_sums/len(df[df['EasyUse'] == "No"])

In [None]:
df["EasyUse"].unique()

In [None]:
dfpercentages = pd.concat([dfeasy_sums, dfnoteasy_sums], axis = 1)


In [None]:
dfpercentages.columns = ['Easy', 'Not Easy']

dfpercentages.plot.bar() # This is shown in percentages. So there isn't m
plt.show() # This is broken

In [None]:
sns.set_style("dark")
palette = ["#4C72B0", "#55A868", "#C44E52",
          "#8172B2", "#CCB974", "#64B5CD"]
newcolor = [palette[0], palette[2], palette[1]]


## Visually exploring some of the results of the random forest classification:

In [None]:
ei = sns.countplot("EasyInfo", data = df, hue = "EasyUse", palette = newcolor,
                   order = ["Yes", "No", "Don't know / Not sure"])
ei.set_title("Was it easy to find the information you needed?")
ei.set_xticklabels(ei.xaxis.get_majorticklabels(), rotation = 0)
ei.legend(loc = 'upper left', bbox_to_anchor = (1,1), title = "Did you find GCconnex easy to use?")
ei.set_xlabel(" ")
plt.show()

In [None]:
ob = sns.countplot("EasyOnBoarding", data = df, hue = "EasyUse", palette = newcolor,
                  order = ["Yes", "No", "Don't know / Not sure / Don't use"])
ob.set_title("Was it easy to navigate the onboarding module?")
ob.set_xticklabels(ob.xaxis.get_majorticklabels(), rotation = 0)
ob.legend(loc = 'upper left', bbox_to_anchor = (1,1),  title = "Did you find GCconnex easy to use?")
ob.set_xlabel(" ")
plt.show()

In [None]:
ig = sns.countplot("EasyInformationGroup", data = df, hue = "EasyUse", palette = newcolor,
                  order = ["Yes", "No", "Don't know / Not sure / Don't use"])
ig.set_title("Was it easy to find information in groups?")
ig.set_xticklabels(ig.xaxis.get_majorticklabels(), rotation = 0)
ig.legend(loc = 'upper left', bbox_to_anchor = (1,1),  title = "Did you find GCconnex easy to use?")
ig.set_xlabel(" ")
plt.show()

In [None]:
igc = sns.countplot("EasyInformationGroup", data = df, palette = newcolor)
igc.set_title("Was it easy to find information in groups?")
igc.set_xticklabels(igc.xaxis.get_majorticklabels(), rotation = 0)
igc.set_xlabel(" ")
plt.show()

In [None]:
ecg = sns.countplot("EasyCollabGroup", data = df, palette = newcolor, order = ["Yes",
                                                                               "No",
                                                                               "Don't know / Not sure / Don't use"])
ecg.set_title("Did you Find Collaborating and Communicating in a Group Easy?")
ecg.set_xticklabels(ecg.xaxis.get_majorticklabels(), rotation = 0)
ecg.set_xlabel(" ")
plt.show()

In [None]:
ecgdf = df[df['EasyCollabGroup'] == "Don't know / Not sure / Don't use"]

In [None]:
ecgdfcp = sns.countplot("EasyInfo", data = ecgdf, palette = newcolor, order = ['Yes',
                                                                                     'No',
                                                                                     "Don't know / Not sure"])
ecgdfcp.set_title("Do you find GCconnex Easy to Use? (Collab = Don't Know)")
ecgdfcp.set_xticklabels(ecgdfcp.xaxis.get_majorticklabels(), rotation = 0)
ecgdfcp.set_xlabel(" ")
plt.show()

In [None]:
communityplotslist = list(df['Community'].value_counts())

In [None]:
onboarding  = df[df['EasyOnBoarding'] == 'No']

In [None]:
def sectionanalysis(dataframe):   # Function to recreate all the graphs from a certain perspective
    
    
    ecgdfcp = sns.countplot("EasyUse", data = dataframe, palette = newcolor, order = ['Yes',
                                                                                     'No',
                                                                                     "Don't know / Not sure"])
    ecgdfcp.set_title("Do you find GCconnex Easy to Use? (Collab = Don't Know)")
    ecgdfcp.set_xticklabels(ecgdfcp.xaxis.get_majorticklabels(), rotation = 0)
    ecgdfcp.set_xlabel(" ")
    
    plt.show()
    
    ei = sns.countplot("EasyInfo", data = dataframe, hue = "EasyUse", palette = newcolor,
                   order = ["Yes", "No", "Don't know / Not sure"])
    ei.set_title("Was it easy to find the information you needed?")
    ei.set_xticklabels(ei.xaxis.get_majorticklabels(), rotation = 0)
    ei.legend(loc = 'upper left', bbox_to_anchor = (1,1), title = "Did you find GCconnex easy to use?")
    ei.set_xlabel(" ")
    
    plt.show()
    
    ecg = sns.countplot("EasyCollabGroup", data = dataframe, palette = newcolor, order = ["Yes",
                                                                               "No",
                                                                               "Don't know / Not sure / Don't use"])
    ecg.set_title("Did you Find Collaborating and Communicating in a Group Easy?")
    ecg.set_xticklabels(ecg.xaxis.get_majorticklabels(), rotation = 0)
    ecg.set_xlabel(" ")
    
    
    plt.show()
    
    ig = sns.countplot("EasyInformationGroup", data = dataframe, hue = "EasyUse", palette = newcolor,
                  order = ["Yes", "No", "Don't know / Not sure / Don't use"])
    ig.set_title("Was it easy to find information in groups?")
    ig.set_xticklabels(ig.xaxis.get_majorticklabels(), rotation = 0)
    ig.legend(loc = 'upper left', bbox_to_anchor = (1,1),  title = "Did you find GCconnex easy to use?")
    ig.set_xlabel(" ")
    
    plt.show()
    

# Those Who Found Onboarding Not Easy

In [None]:
sectionanalysis(onboarding)

In [None]:
df_EasyInfo_No = df[df['EasyInfo'] == "No"]

sectionanalysis(df_EasyInfo_No)

In [None]:
df_EasyInfo_Yes = df[df['EasyInfo'] == "Yes"]

sectionanalysis(df_EasyInfo_Yes)