# Market Overlap

### SCENARIO

The scenario is: A ficticious company named 'Alpha' is interested in acquiring another ficticious company named 'Beta'. They both belong to the hotelling industry, more specifically they are marketplaces for hotels to announce their accomodation details as well as handle all the booking process with travellers.

### UNDERSTANDING THE COMPANIES

__Alpha__ offer hotels in destinations worldwide. They've entered Brazil's market recently and have been investing in Facebook and Google Ads to get more hotels to publish in their marketplace. However, the CAC (customer acquisition cost) to get more hotels on board is too high and after several weeks trying to improve performance Alpha's board have decided to look for other options of increasing the number of hotels in their marketplace.

__Beta__ on the other hand, operates only in Brazil and although has a smaller scale than Alpha overall, it has a considerable amount of hotels already on board and operating.

### WHY THE MARKET OVERLAP ANALYSIS

Seeking a way to reduce the CAC, Alpha has made a move to buy Beta with all it's hotels and marketplace service. Even though Beta has shown interest on the deal, Alpha still want's to look at some real data in order to reach the conclusion if the CAC would really be lower than their current marketing investments.

As of now, both companies does not know if their hotels are unique to them or actually common between both, as its a common practice among hotels to publish their accomodations in different marketplaces.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [7]:
dtype = {
    'id': str
}
df = pd.read_csv('alpha_hotels.csv')
df2 = pd.read_csv('Beta Hotels.csv')

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1843 entries, 0 to 1842
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   id                 1843 non-null   int64 
 1   hotel_name         1843 non-null   object
 2   address            1843 non-null   object
 3   city               1843 non-null   object
 4   country            1843 non-null   object
 5   registration_date  1843 non-null   object
 6   latest_booking     1843 non-null   object
 7   total_bookings     1843 non-null   int64 
dtypes: int64(2), object(6)
memory usage: 115.3+ KB


In [4]:
df.head()

Unnamed: 0,id,hotel_name,address,city,country,registration_date,latest_booking,total_bookings
0,1,Altenwerth-Wilderman,55 Buhler Place,São Paulo,BR,2021-01-01,2021-05-05,74
1,2,Mayer LLC,78279 Brown Crossing,Santos,BR,2021-01-01,2021-06-21,329
2,3,Durgan-Ullrich,5 Independence Place,Fortaleza,BR,2021-01-01,2021-02-24,352
3,4,Johnston-Osinski,8230 Warbler Plaza,Fortaleza,BR,2021-01-01,2021-08-23,572
4,5,"Simonis, Grimes and Okuneva",48 Dahle Terrace,Brasilia,BR,2021-01-01,2021-09-19,547


In [8]:
df2.head()

Unnamed: 0,Id,Hotel Name,Address,City State,Registration Date,Last Booking,Total Bookings
0,1,Smith-West,5 Logan Center,Porto Alegre - RS,2019-05-15,2021-04-12,204
1,2,Skiles-Feil,418 Luster Street,Brasilia - DF,2019-05-28,2021-09-27,673
2,3,"Hills, Welch and Bernier",16 Superior Pass,Brasilia - DF,2019-06-02,2021-04-29,540
3,4,Feeney-Tillman,42135 Di Loreto Crossing,Porto Alegre - RS,2019-06-17,2021-09-29,316
4,5,Nitzsche Inc,11432 Westport Center,Fortaleza - CE,2019-06-21,2021-05-24,623


In [20]:
df2.rename({'Hotel Name': 'hotel_name', 'Id': 'id'}, axis=1, inplace=True)

In [21]:
nomes = pd.concat([df[['id', 'hotel_name']], df2[['id', 'hotel_name']]], ignore_index=True)

In [22]:
nomes.head()

Unnamed: 0,id,hotel_name
0,1,Altenwerth-Wilderman
1,2,Mayer LLC
2,3,Durgan-Ullrich
3,4,Johnston-Osinski
4,5,"Simonis, Grimes and Okuneva"


In [23]:
nomes.drop_duplicates('hotel_name', inplace=True, ignore_index=True)

In [25]:
import random

In [26]:
nomes['registration_id'] = random.sample(range(150000000000, 950000000000), nomes.shape[0])

In [29]:
nomes['registration_id']

0       439781659462
1       686594413636
2       916600154620
3       343118084017
4       598986938353
            ...     
3278    416225475038
3279    924811998291
3280    771365418194
3281    187119987662
3282    764530750402
Name: registration_id, Length: 3283, dtype: int64

In [30]:
nomes['registration_id'].value_counts()

439781659462    1
185792273369    1
407854512500    1
476911920228    1
383539793438    1
               ..
499638172372    1
592848393085    1
500359728211    1
383916749435    1
764530750402    1
Name: registration_id, Length: 3283, dtype: int64

In [41]:
df_alpha = pd.merge(left=df, right=nomes, how='inner', on='hotel_name')
df_alpha.head()

Unnamed: 0,id_x,hotel_name,address,city,country,registration_date,latest_booking,total_bookings,id_y,registration_id
0,1,Altenwerth-Wilderman,55 Buhler Place,São Paulo,BR,2021-01-01,2021-05-05,74,1,439781659462
1,2,Mayer LLC,78279 Brown Crossing,Santos,BR,2021-01-01,2021-06-21,329,2,686594413636
2,1215,Mayer LLC,1 Doe Crossing Street,Santos,BR,2021-12-25,2022-04-27,439,2,686594413636
3,3,Durgan-Ullrich,5 Independence Place,Fortaleza,BR,2021-01-01,2021-02-24,352,3,916600154620
4,4,Johnston-Osinski,8230 Warbler Plaza,Fortaleza,BR,2021-01-01,2021-08-23,572,4,343118084017


In [42]:
df_alpha.shape

(1843, 10)

In [43]:
df.shape

(1843, 8)

In [44]:
df_alpha.drop('id_y', axis=1, inplace=True)
df_alpha.rename({'id_x': 'id'}, axis=1, inplace=True)

In [49]:
df_alpha.head()

Unnamed: 0,id,hotel_name,address,city,country,registration_date,latest_booking,total_bookings,registration_id
0,1,Altenwerth-Wilderman,55 Buhler Place,São Paulo,BR,2021-01-01,2021-05-05,74,439781659462
1,2,Mayer LLC,78279 Brown Crossing,Santos,BR,2021-01-01,2021-06-21,329,686594413636
2,3,Durgan-Ullrich,5 Independence Place,Fortaleza,BR,2021-01-01,2021-02-24,352,916600154620
3,4,Johnston-Osinski,8230 Warbler Plaza,Fortaleza,BR,2021-01-01,2021-08-23,572,343118084017
4,5,"Simonis, Grimes and Okuneva",48 Dahle Terrace,Brasilia,BR,2021-01-01,2021-09-19,547,598986938353


In [48]:
df_alpha.sort_values('id', ignore_index=True, inplace=True)
df_alpha.to_csv('alpha_hotels_fixed.csv', index=False)

In [53]:
df_beta = pd.merge(left=df2, right=nomes, how='inner', on='hotel_name')
df_beta.head()

Unnamed: 0,id_x,hotel_name,Address,City State,Registration Date,Last Booking,Total Bookings,id_y,registration_id
0,1,Smith-West,5 Logan Center,Porto Alegre - RS,2019-05-15,2021-04-12,204,13,541486358072
1,2,Skiles-Feil,418 Luster Street,Brasilia - DF,2019-05-28,2021-09-27,673,9,402349901939
2,3,"Hills, Welch and Bernier",16 Superior Pass,Brasilia - DF,2019-06-02,2021-04-29,540,57,270040626150
3,4,Feeney-Tillman,42135 Di Loreto Crossing,Porto Alegre - RS,2019-06-17,2021-09-29,316,10,509930155842
4,5,Nitzsche Inc,11432 Westport Center,Fortaleza - CE,2019-06-21,2021-05-24,623,7,929930403364


In [54]:
df_beta.drop('id_y', axis=1, inplace=True)
df_beta.rename({'id_x': 'Id', 'registration_id': 'Registration Id', 'hotel_name': 'Hotel Name'}, axis=1, inplace=True)
df_beta.sort_values('Id', ignore_index=True, inplace=True)

In [55]:
df_beta.head()

Unnamed: 0,Id,Hotel Name,Address,City State,Registration Date,Last Booking,Total Bookings,Registration Id
0,1,Smith-West,5 Logan Center,Porto Alegre - RS,2019-05-15,2021-04-12,204,541486358072
1,2,Skiles-Feil,418 Luster Street,Brasilia - DF,2019-05-28,2021-09-27,673,402349901939
2,3,"Hills, Welch and Bernier",16 Superior Pass,Brasilia - DF,2019-06-02,2021-04-29,540,270040626150
3,4,Feeney-Tillman,42135 Di Loreto Crossing,Porto Alegre - RS,2019-06-17,2021-09-29,316,509930155842
4,5,Nitzsche Inc,11432 Westport Center,Fortaleza - CE,2019-06-21,2021-05-24,623,929930403364


In [56]:
df_beta.to_csv('beta_hotels_fixed.csv', index=False)