In [1]:
import pandas as pd
import numpy as np
from numpy import NaN
import matplotlib
import matplotlib.pyplot as plt

pd.options.display.float_format = '{:,.2f}'.format

df = pd.read_csv("../output/all_services.csv", index_col=0, header=0, thousands=',')
df.sample(5)

Unnamed: 0,Segments,Services,Type,Service,Hires,Price,Price Type,Rating,Zip Code,Capital City,State,Abbreviation,Implied Revenue
26497,Wellness,Nutritionist,Main,Body Evolution 🍃 💪🏼,212.0,170.0,,5.0,78701,Austin,Texas,TX,36040.0
9567,Home Improvement,Handyman,Main,Besslopz Cleaning & Home Improvement Services LLC,41.0,30.0,hour,5.0,7102,Newark,New Jersey,NJ,1230.0
6613,Business,Logo Design,Main,Hope Tako,,350.0,logo,5.0,98507,Olympia,Washington,WA,
19363,Home Improvement,Lawn Mowing and Trimming,Main,Lawn Love Lawn Care,242.0,35.0,,3.5,75207,Dallas,Texas,TX,8470.0
16374,Wellness,Massage Therapy,Main,Wellness and Relaxation,600.0,180.0,,5.0,10007,New York City,New York,NY,108000.0


**General quetions:**
1. Which segments is Thumbtack focusing on? (indicator: # services, # hires, price * # hires) or is Thumbtack consistent in all segments & services? – looking for disparity between segments
2. Does the demand & supply in services change from a region to another?
3. Which segments are doing better or worse? (indicator: # hires/# sellers rate; # hires/# sellers from top sellers)
4. Is Thumbtack offering this many services because they have a low #hires/#sellers rate? (Hypothesis for having this many services)
5. Are there any segments or services that showcase more uncaptured potential? (indicator: high # hires/seller rate, low # sellers)
6. What types of sellers are the most successful? Individuals or business? Does it depend on the type of service and/or region? (indicator: manually look at top sellers)

**1. Which segments is Thumbtack focusing on?**

We can first see a basic count of services for each segment

In [2]:
# See ranking by count
servicesCount = df.groupby("Segments")["Segments"].count().to_frame(
    name="Count").reset_index()
servicesCount = servicesCount.sort_values(by="Count", ascending=False)
servicesCount["Contribution"] = servicesCount["Count"] / servicesCount["Count"].sum()
servicesCount[:9]

Unnamed: 0,Segments,Count,Contribution
0,Business,12446,0.44
4,Wellness,9806,0.35
2,Home Improvement,4443,0.16
3,Pets,872,0.03
1,Events,849,0.03


Then we can maybe see implied revenue from these segments

In [3]:
# Get sum of implied revenue by segment
dfRevenuBySegment = df.groupby("Segments")["Implied Revenue"].sum().to_frame(
    name="Implied Revenue").reset_index()
dfRevenuBySegment["Contribution"] = dfRevenuBySegment[
    "Implied Revenue"] / dfRevenuBySegment["Implied Revenue"].sum()

# Get sum of implied revenue by services
dfRevenuByServices = df.groupby("Services")["Implied Revenue"].sum().to_frame(
    name="Implied Revenue").reset_index()
dfRevenuByServices["Contribution"] = dfRevenuByServices[
    "Implied Revenue"] / dfRevenuByServices["Implied Revenue"].sum()

# Get sum of implied revenue by states
dfRevenuByRevenue = df.groupby("State")["Implied Revenue"].sum().to_frame(
    name="Implied Revenue").reset_index()
dfRevenuByRevenue["Contribution"] = dfRevenuByRevenue[
    "Implied Revenue"] / dfRevenuByRevenue["Implied Revenue"].sum()

In [4]:
# See top 10 implied revenue by segment
dfRevenuBySegment.sort_values(by="Contribution", ascending=False)[:9]

Unnamed: 0,Segments,Implied Revenue,Contribution
0,Business,131620933.0,0.61
4,Wellness,64214887.0,0.3
1,Events,10949791.0,0.05
2,Home Improvement,7152116.0,0.03
3,Pets,539208.0,0.0


In [5]:
# See top 10 implied revenue by services
dfRevenuByServices.sort_values(by="Contribution", ascending=False)[:9]

Unnamed: 0,Services,Implied Revenue,Contribution
14,Logo Design,83128150.0,0.39
2,Business Tax Preparation,32240677.0,0.15
16,Nutritionist,31362101.0,0.15
17,Personal Training,20585865.0,0.1
4,Computer Repair,13555566.0,0.06
6,DJ,10949791.0,0.05
13,Life Coaching,8646424.0,0.04
15,Massage Therapy,3042009.0,0.01
10,House Cleaning,2896900.0,0.01


In [6]:
# See top 10 implied revenue by zipcode
dfRevenuByRevenue.sort_values(by="Contribution", ascending=False)[:9]

Unnamed: 0,State,Implied Revenue,Contribution
4,California,13538273.0,0.06
43,Texas,9980152.0,0.05
9,Florida,9337052.0,0.04
32,New York,9302980.0,0.04
6,Connecticut,5647520.0,0.03
30,New Jersey,5467786.0,0.03
20,Maryland,5423309.0,0.03
38,Pennsylvania,5345279.0,0.02
7,Delaware,5200475.0,0.02


**2. Does the demand & supply in services change from a region to another?**

Outputs per regional segments & services:
- \# services
- \# hires
- \# hires/#services rate
- List of sellers per region

**3. Which segments are doing better or worse?**

**4. Is Thumbtack offering this many services because they have a low #hires/#sellers rate?**

**5. Are there any segments or services that showcase more uncaptured potential?**

**6. What types of sellers are the most successful? Individuals or business? Does it depend on the type of service and/or region?**