Skip to content

danamkaplan/asana_takehome

Repository files navigation

Asana Takehome

The first step before any work or brainstorming can be done was to actually figure out which users are "adopted". This gnarly nested query should label all the user_ids that visited at least 3 times in one 7 day period.

Created a csv from the query:

mysql -u interview --password=interview --database=interview -P 3306 -h data-challenge-9x.cswh4gchpi8n.us-east-1.rds.amazonaws.com < adopted_users.sql | tr '\t' ',' > adopted_users.csv

Then I considered what data came with the user table (Initial thoughts in italics):

  • name: the user's name -Probably not a useful signal.
  • object_id: the user's id -Probably not a useful signal.
  • email: email address -Maybe a useful signal - to perhaps segment work email from personal, but a little too much effort/time in the string filtering for a couple hour project.
  • creation_source: how their account was created. This takes on one of 5 values: -Definitely an obvious categorical variable. Will include.
  • PERSONAL_PROJECTS: invited to join another user's personal workspace
  • GUEST_INVITE: invited to an organization as a guest (limited permissions)
  • ORG_INVITE: invited to an organization (as a full member)
  • SIGNUP: signed up via asana.com
  • SIGNUP_GOOGLE_AUTH: signed up using Google Authentication (using a Google email account for their login id)
  • creation_time: when they created their account -This variable could be used in many ways, segment by hour timebands, or workday hours, or simply day of week but like email, a little too much time, so I will pick to utilize this for weekend or not variable.
  • last_session_creation_time: unix timestamp of last login -Last login doesnt seem to be early enough in the "funnel" timeframe to predict adoption. Going to leave out now.
  • opted_in_to_mailing_list: whether they have opted into receiving marketing emails -Definitely an obvious categorical variable. Will include.
  • enabled_for_marketing_drip: whether they are on the regular marketing email drip -Definitely an obvious categorical variable. Will include.
  • org_id: the organization (group of users) they belong to invited_by_user_id: which user invited them to join (if applicable). -Hard to tell anything from the org id itself, but I thought maybe the size of the orginzation when the user joined might be interesting to consider.

For the size of the org when a user joined, I just decided to do the work in SQL again and grabbed the csv with:

mysql -u interview --password=interview --database=interview -P 3306 -h data-challenge-9x.cswh4gchpi8n.us-east-1.rds.amazonaws.com < size_org_joined.sql | tr '\t' ',' > size_org_joined.csv

My first inclination was "I am trying to predict a binary outcome (adopted or not). Let's make a logistic regression." Please look at the notebook for it here. As you can see, the model was fairly weak. The only two statistically significant independent variables are:

  • Personal Projects as a signup lowers the odds of adoption by a factor of ~32.9% (compared to the base case of organic signup through asana.com)
  • For every unit bigger an org is when a user joins, the base odds will be multiplied/lower by a factor of ~49.7%

The problem I realized with this model (AFTER I did all the regression work of course) is we are using mostly categorical independent variables to predict a categorical outcome. You can see the model is a really bad fit to begin with (Pseudo R-squ. of 0.02423). Regressions are kind of in the language of hypothesis tests: "If I change to a different categorical label from the base case or go up on more unit in a continuous label, how will that predict Y and is it significant?" This question doesn't really have a solid "base case" (or maybe I don't know Asana's product well enough). Basically, I would use this method first if we analyzed an experiment ran by Asana - not the type of data we already gathered.

The better way I should have started with is just some plots and visual interpretation like I quickly threw together with Tableau. Quick Points:

  • All the plots are the same format
    • Blue Bars are total VOLUME of users who signed up
    • Orange line is total RATE of (adopted/total) users
  • Creation Source
    • You can see the MASSIVE dip in the Personal Projects rate that I found with the regression. Sign up through Google and Guest Invites were the highest ratios
  • Enabled for Marketing
    • Only around 14.9% of total users enabled for Marketing and it only bumped up the adoption ratio by 0.07%
  • Opted in to Mailing list
    • ~25% of users opted in and there was still small ratio bump of 0.53% of adoption rate.
  • Date
    • Most interestingly, the day of week of signup has relatively similar VOLUME of signups for each bin. (only around 100 user difference between days out of 12000).
    • Wednesday, Saturday, and Sunday have the highest rates.
    • (Naively assuming weekend signups are non-business orgs) The two points above illustrate that Asana has both a high amount of users and also potentially adopted users for both business orgs and non traditional teams or personal uses.
  • Size of Org
    • It is kindof fun to show my stat. from earlier (negative correlation with org size) by scrolling right.
    • As the orange line (adoption rate) descends, you can see the org size (blue bar) increase sporadically.
  • MAJOR TAKEAWAYS
    • Small teams (orgs) that invite users as guests or a user that signs up through google or organically to join a small org. have the highest potential to become adopted users.
    • Mailing lists and enabled for marketing while having a positive effect on adoption, it is too small to be significant.

Further Consideration - Based on my points on this data set, if we actually wanted to actually predict adopted, I would use some sort of decision tree model. A lot of the signals are actually catergorical segmentation which lends well to decision trees for both accuracy and observation of the segments.

This could also be looked at through a cohort lens. If Asana changes its product often, I would look at the adoption rate of cohorts in a reasonable time frame (30 days out from signup or so). For a deeper dive, segmenting those cohorts would by the above categories will also reveal better information about what purposes or intentions users have with the product that would make the adopted.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors