In [1]:
import pandas as pd
import numpy as np
from numpy import NaN
import matplotlib
import matplotlib.pyplot as plt

pd.options.display.float_format = '{:,.2f}'.format

df = pd.read_csv("../output/all_services.csv", index_col=0, header=0, thousands=',')
df.sample(5)

Unnamed: 0,Segments,Services,Type,Service,Hires,Price,Price Type,Rating,Zip Code,Capital City,State,Abbreviation,Implied Revenue
11293,Home Improvement,Roof Repair or Maintenance,Main,"Innovative Home Concepts, Inc.",21.0,,,5.0,60176,Chicago,Illinois,IL,
44065,Events,Bartending,Main,Honeywater painting,,,,,33132,Miami,Florida,FL,
3164,Events,Wedding Officiant,Main,Stat's Mobile Notary,3.0,130.0,,3.0,87102,Albuquerque,New Mexico,NM,390.0
15913,Wellness,Massage Therapy,Main,Less-In-Pain Massage,,80.0,,5.0,21401,Annapolis,Maryland,MD,
25511,Events,Wedding and Event Makeup,Main,Svetlana,6.0,85.0,,5.0,33132,Miami,Florida,FL,510.0


In [2]:
df.shape

(12958, 13)

**General quetions:**
1. Which segments is Thumbtack focusing on? Or is Thumbtack consistent in all segments & services?
2. Does the demand & supply in services change from a region to another?
3. Which segments are doing better or worse?
4. Is Thumbtack offering this many services because they have a low # hires/# sellers rate?
5. Are there any segments or services that showcase more uncaptured potential?
6. What types of sellers are the most successful? Individuals or business? Does it depend on the type of service and/or region?

**1. Which segments is Thumbtack focusing on?**

We can first see a basic count of services for each segment

In [3]:
# See ranking by count
servicesCount = df.groupby("Segments")["Segments"].count().to_frame(
    name="Count").reset_index()
servicesCount = servicesCount.sort_values(by="Count", ascending=False)
servicesCount["Contribution"] = servicesCount["Count"] / servicesCount["Count"].sum()
servicesCount[:9]

Unnamed: 0,Segments,Count,Contribution
2,Home Improvement,4996,0.39
1,Events,3489,0.27
5,Wellness,1618,0.12
0,Business,1256,0.1
3,Lessons,1059,0.08
4,Pets,540,0.04


Then we can maybe see implied revenue from these segments

In [4]:
# Get sum of implied revenue by segment
dfRevenuBySegment = df.groupby("Segments")["Implied Revenue"].sum().to_frame(
    name="Implied Revenue").reset_index()
dfRevenuBySegment["Contribution"] = dfRevenuBySegment[
    "Implied Revenue"] / dfRevenuBySegment["Implied Revenue"].sum()

# Get sum of implied revenue by services
dfRevenuByServices = df.groupby("Services")["Implied Revenue"].sum().to_frame(
    name="Implied Revenue").reset_index()
dfRevenuByServices["Contribution"] = dfRevenuByServices[
    "Implied Revenue"] / dfRevenuByServices["Implied Revenue"].sum()

# Get sum of implied revenue by states
dfRevenuByRevenue = df.groupby("State")["Implied Revenue"].sum().to_frame(
    name="Implied Revenue").reset_index()
dfRevenuByRevenue["Contribution"] = dfRevenuByRevenue[
    "Implied Revenue"] / dfRevenuByRevenue["Implied Revenue"].sum()

In [5]:
# See top 10 implied revenue by segment
dfRevenuBySegment.sort_values(by="Contribution", ascending=False)[:9]

Unnamed: 0,Segments,Implied Revenue,Contribution
1,Events,21783105.0,0.52
2,Home Improvement,10936278.0,0.26
0,Business,4122663.0,0.1
5,Wellness,3626438.0,0.09
3,Lessons,1309301.0,0.03
4,Pets,307411.0,0.01


In [6]:
# See top 10 implied revenue by services
dfRevenuByServices.sort_values(by="Contribution", ascending=False)[:9]

Unnamed: 0,Services,Implied Revenue,Contribution
54,Wedding Officiant,9057776.0,0.22
20,DJ,5589660.0,0.13
34,Photo Booth Rental,3617129.0,0.09
14,Bed Bug Extermination,3336025.0,0.08
24,House Cleaning,2330147.0,0.06
29,Massage Therapy,2184432.0,0.05
56,Wedding and Event Makeup,1802999.0,0.04
3,Appliance Installation,1388462.0,0.03
11,Bartending,1356599.0,0.03


In [7]:
# See top 10 implied revenue by zipcode
dfRevenuByRevenue.sort_values(by="Contribution", ascending=False)[:9]

Unnamed: 0,State,Implied Revenue,Contribution
4,California,5858569.0,0.14
43,Texas,4041976.0,0.1
9,Florida,2428355.0,0.06
30,New Jersey,2178612.0,0.05
32,New York,1773589.0,0.04
25,Missouri,1608322.0,0.04
10,Georgia,1546945.0,0.04
13,Illinois,1379278.0,0.03
20,Maryland,1303163.0,0.03


**2. Does the demand & supply in services change from a region to another?**

Outputs per regional segments & services:
- \# services
- \# hires
- \# hires/#services rate
- List of sellers per region

**3. Which segments are doing better or worse?**

**4. Is Thumbtack offering this many services because they have a low #hires/#sellers rate?**

**5. Are there any segments or services that showcase more uncaptured potential?**

**6. What types of sellers are the most successful? Individuals or business? Does it depend on the type of service and/or region?**