In [1]:
import pandas as pd
import numpy as np
from numpy import NaN
import matplotlib
import matplotlib.pyplot as plt

pd.options.display.float_format = '{:,.2f}'.format

df = pd.read_csv("../output/all_services.csv", index_col=0, header=0, thousands=',')
df.sample(5)

Unnamed: 0,Segments,Services,Type,Service,Hires,Price,Price Type,Rating,Zip Code,Capital City,State,Abbreviation,Implied Revenue
20461,Home Improvement,Handyman,Main,"Stanley's Home Improvement and Repairs, LLC",28.0,50.0,hour,5.0,73102,Oklahoma City,Oklahoma,OK,1400.0
20101,Events,Wedding and Event Makeup,Main,Remy C Bridal and Special Event Makeup,4.0,125.0,,5.0,7102,Newark,New Jersey,NJ,500.0
15953,Business,Corporate Law Attorney,Main,The Unger Firm LLC,,225.0,hour,5.0,85001,Phoenix,Arizona,AZ,
31757,Business,Statistical Data Analysis,Main,Project Consultants,,,,5.0,53205,Milwaukee,Wisconsin,WI,
21045,Business,Statistical Data Analysis,Main,Project Consultants,,,,5.0,23219,Richmond,Virginia,VA,


In [7]:
df.shape

(32174, 13)

**General quetions:**
1. Which segments is Thumbtack focusing on? Or is Thumbtack consistent in all segments & services?
2. Does the demand & supply in services change from a region to another?
3. Which segments are doing better or worse?
4. Is Thumbtack offering this many services because they have a low # hires/# sellers rate?
5. Are there any segments or services that showcase more uncaptured potential?
6. What types of sellers are the most successful? Individuals or business? Does it depend on the type of service and/or region?

**1. Which segments is Thumbtack focusing on?**

We can first see a basic count of services for each segment

In [2]:
# See ranking by count
servicesCount = df.groupby("Segments")["Segments"].count().to_frame(
    name="Count").reset_index()
servicesCount = servicesCount.sort_values(by="Count", ascending=False)
servicesCount["Contribution"] = servicesCount["Count"] / servicesCount["Count"].sum()
servicesCount[:9]

Unnamed: 0,Segments,Count,Contribution
0,Business,12446,0.39
4,Wellness,9806,0.3
1,Events,4607,0.14
2,Home Improvement,4443,0.14
3,Pets,872,0.03


Then we can maybe see implied revenue from these segments

In [3]:
# Get sum of implied revenue by segment
dfRevenuBySegment = df.groupby("Segments")["Implied Revenue"].sum().to_frame(
    name="Implied Revenue").reset_index()
dfRevenuBySegment["Contribution"] = dfRevenuBySegment[
    "Implied Revenue"] / dfRevenuBySegment["Implied Revenue"].sum()

# Get sum of implied revenue by services
dfRevenuByServices = df.groupby("Services")["Implied Revenue"].sum().to_frame(
    name="Implied Revenue").reset_index()
dfRevenuByServices["Contribution"] = dfRevenuByServices[
    "Implied Revenue"] / dfRevenuByServices["Implied Revenue"].sum()

# Get sum of implied revenue by states
dfRevenuByRevenue = df.groupby("State")["Implied Revenue"].sum().to_frame(
    name="Implied Revenue").reset_index()
dfRevenuByRevenue["Contribution"] = dfRevenuByRevenue[
    "Implied Revenue"] / dfRevenuByRevenue["Implied Revenue"].sum()

In [4]:
# See top 10 implied revenue by segment
dfRevenuBySegment.sort_values(by="Contribution", ascending=False)[:9]

Unnamed: 0,Segments,Implied Revenue,Contribution
0,Business,131620933.0,0.54
4,Wellness,64214887.0,0.26
1,Events,41374452.0,0.17
2,Home Improvement,7152116.0,0.03
3,Pets,539208.0,0.0


In [5]:
# See top 10 implied revenue by services
dfRevenuByServices.sort_values(by="Contribution", ascending=False)[:9]

Unnamed: 0,Services,Implied Revenue,Contribution
15,Logo Design,83128150.0,0.34
3,Business Tax Preparation,32240677.0,0.13
17,Nutritionist,31362101.0,0.13
18,Personal Training,20585865.0,0.08
25,Wedding Officiant,17048744.0,0.07
7,DJ,14368575.0,0.06
5,Computer Repair,13555566.0,0.06
14,Life Coaching,8646424.0,0.04
20,Photo Booth Rental,6365706.0,0.03


In [6]:
# See top 10 implied revenue by zipcode
dfRevenuByRevenue.sort_values(by="Contribution", ascending=False)[:9]

Unnamed: 0,State,Implied Revenue,Contribution
4,California,17106219.0,0.07
32,New York,12204914.0,0.05
43,Texas,11845890.0,0.05
9,Florida,10422044.0,0.04
30,New Jersey,7972786.0,0.03
20,Maryland,7575091.0,0.03
6,Connecticut,7454398.0,0.03
7,Delaware,6708264.0,0.03
38,Pennsylvania,6468783.0,0.03


**2. Does the demand & supply in services change from a region to another?**

Outputs per regional segments & services:
- \# services
- \# hires
- \# hires/#services rate
- List of sellers per region

**3. Which segments are doing better or worse?**

**4. Is Thumbtack offering this many services because they have a low #hires/#sellers rate?**

**5. Are there any segments or services that showcase more uncaptured potential?**

**6. What types of sellers are the most successful? Individuals or business? Does it depend on the type of service and/or region?**