# Introduction

[BusinessFinancing.co.uk](https://businessfinancing.co.uk/the-oldest-company-in-almost-every-country/) researched the oldest company that is still in business in almost every country and compiled the results into a dataset. Our datasets contain the following information:
1. **businesses.csv**
2. **new_businesses.csv**
3. **categories.csv**
4. **countries.csv**

To better understand these historic businesses. We would like to know what characteristics enable a business to stand the test of time?

Now let's inspect the data before moving on.

In [1]:
# Import the pandas library under its usual alias 
import pandas as pd

# Read and display all data
businesses = pd.read_csv('../input/oldest-businesses/businesses.csv')
new_businesses = pd.read_csv('../input/oldest-businesses/new_businesses.csv')
countries = pd.read_csv('../input/oldest-businesses/countries.csv')
categories = pd.read_csv('../input/oldest-businesses/categories.csv')

display(businesses.head(), new_businesses.head(), countries.head(), categories.head())

Unnamed: 0,business,year_founded,category_code,country_code
0,Hamoud Boualem,1878,CAT11,DZA
1,Communauté Électrique du Bénin,1968,CAT10,BEN
2,Botswana Meat Commission,1965,CAT1,BWA
3,Air Burkina,1967,CAT2,BFA
4,Brarudi,1955,CAT9,BDI


Unnamed: 0,business,year_founded,category_code,country_code
0,Fiji Times,1869,CAT13,FJI
1,J. Armando Bermúdez & Co.,1852,CAT9,DOM


Unnamed: 0,country_code,country,continent
0,AFG,Afghanistan,Asia
1,AGO,Angola,Africa
2,ALB,Albania,Europe
3,AND,Andorra,Europe
4,ARE,United Arab Emirates,Asia


Unnamed: 0,category_code,category
0,CAT1,Agriculture
1,CAT2,Aviation & Transport
2,CAT3,Banking & Finance
3,CAT4,"Cafés, Restaurants & Bars"
4,CAT5,Conglomerate


# 1. Concatenate & Sort for Oldest Businesses

There are 2 data for businesses and we want to combine both into 1 DataFrame in order for us to have a comprehensive data to analyze in the future. Then, we will sort to find the oldest businesses.

In [2]:
# Concatenate two businesses and new_businesses
all_businesses = pd.concat([businesses, new_businesses])

# Sort businesses from oldest businesses to youngest
sorted_businesses = all_businesses.sort_values('year_founded')

# Display the first few lines of sorted_businesses
sorted_businesses.head()

Unnamed: 0,business,year_founded,category_code,country_code
64,Kongō Gumi,578,CAT6,JPN
94,St. Peter Stifts Kulinarium,803,CAT4,AUT
107,Staffelter Hof Winery,862,CAT9,DEU
106,Monnaie de Paris,864,CAT12,FRA
103,The Royal Mint,886,CAT12,GBR


# 2. The Oldest Businesses in North America

So far we've learned that Kongō Gumi is the world's oldest continuously operating business, beating out the second oldest business by well over 100 years! It's a little hard to read the country codes, though. Wouldn't it be nice if we had a list of country names to go along with the country codes?

Having useful information in different files is a common problem: for data storage, it's better to keep different types of data separate, but for analysis, we want all the data in one place. To solve this, we'll have to join the two tables together.

In [3]:
# Merge sorted_businesses with countries
businesses_countries = sorted_businesses.merge(countries, on='country_code')

# Filter businesses_countries to include countries in North America only
north_america = businesses_countries.query('continent == "North America"')
north_america.head()

Unnamed: 0,business,year_founded,category_code,country_code,country,continent
22,La Casa de Moneda de México,1534,CAT12,MEX,Mexico,North America
28,Shirley Plantation,1638,CAT1,USA,United States,North America
33,Hudson's Bay Company,1670,CAT17,CAN,Canada,North America
35,Mount Gay Rum,1703,CAT9,BRB,Barbados,North America
40,Rose Hall,1770,CAT19,JAM,Jamaica,North America


# 3. The Oldest Business on each Continents
Now we can see that the oldest company in North America is La Casa de Moneda de México, founded in 1534. Why stop there, though, when we could easily find out the oldest business on every continent?

In [4]:
# Create continent, which lists only the continent and oldest year_founded
continent = businesses_countries.groupby('continent').agg({'year_founded':'min'})

# Merge continent with businesses_countries
merged_continent = continent.merge(businesses_countries, on='year_founded')

# Subset continent so that only the four columns of interest are included
subset_merged_continent = merged_continent[['continent', 'country', 'business', 'year_founded']]
subset_merged_continent

Unnamed: 0,continent,country,business,year_founded
0,Africa,Mauritius,Mauritius Post,1772
1,Asia,Japan,Kongō Gumi,578
2,Europe,Austria,St. Peter Stifts Kulinarium,803
3,North America,Mexico,La Casa de Moneda de México,1534
4,Oceania,Australia,Australia Post,1809
5,South America,Peru,Casa Nacional de Moneda,1565


# 4. Unknown oldest businesses
BusinessFinancing.co.uk wasn't able to determine the oldest business for some countries, and those countries are simply left off. However, the countries that we created does include all countries in the world, regardless of whether the oldest business is known.

We can compare the two datasets in one DataFrame to find out which countries don't have a known oldest business!

In [5]:
# Use .merge() to create a DataFrame, all_countries
all_countries = all_businesses.merge(countries, on='country_code', how='right', indicator=True)

# Filter to include only countries without oldest businesses
missing_countries = all_countries[all_countries['_merge'] != 'both']

# Create a series of the country names with missing oldest business data
missing_countries_series = missing_countries['country']

# Display the series
print(missing_countries_series)

1                                Angola
7                   Antigua and Barbuda
18                              Bahamas
50                              Ecuador
59      Micronesia, Federated States of
63                                Ghana
65                               Gambia
69                              Grenada
79            Iran, Islamic Republic of
89                           Kyrgyzstan
91                             Kiribati
92                Saint Kitts and Nevis
107                              Monaco
108                Moldova, Republic of
110                            Maldives
112                    Marshall Islands
131                               Nauru
138                               Palau
139                    Papua New Guinea
143                            Paraguay
144                 Palestine, State of
153                     Solomon Islands
160                            Suriname
170                          Tajikistan
171                        Turkmenistan


# 5. Count Missing Countries in each Continent
It looks like we've got some holes in our dataset! Let's find out how many countries in each continent is missing the oldest.

In [6]:
# Group by continent and create a "count_missing" column
count_missing = missing_countries.groupby('continent')[['country']].agg('count')

# Change column name to count_missing 
count_missing.columns = ['count_missing']

# Sort for the continent with most missing data
count_missing.sort_values('count_missing', ascending = False)

Unnamed: 0_level_0,count_missing
continent,Unnamed: 1_level_1
Oceania,10
Asia,7
North America,5
Africa,3
South America,3
Europe,2


# 6. The Oldest Industries
Remember our oldest business in the world, Kongō Gumi?

We know Kongō Gumi was founded in the year 578 in Japan, but it's a little hard to decipher which industry it's in.

Let's use categories to understand how many oldest businesses are in each category of industry.

In [7]:
# Merge to businesses and Categories
businesses_categories = all_businesses.merge(categories, on='category_code')

# Create a DataFrame which lists the number of oldest businesses in each category
count_business_cats = businesses_categories.groupby('category')[['business']].agg('count')

# Rename columns and display the first five rows of both DataFrames
count_business_cats.columns = ['count']

display(count_business_cats.sort_values('count', ascending = False).head())

Unnamed: 0_level_0,count
category,Unnamed: 1_level_1
Banking & Finance,37
"Distillers, Vintners, & Breweries",23
Aviation & Transport,19
Postal Service,16
Manufacturing & Production,15


# 7. Restaurant Representation
No matter how we measure it, looks like Banking and Finance is an excellent industry to be in if longevity is our goal! Let's zoom in on another industry: cafés, restaurants, and bars. Which restaurants in our dataset have been around since before the year 1800?

In [8]:
# Filter using .query() for CAT4 businesses founded before 1800; sort results
old_restaurants = businesses_categories.query('year_founded < 1800 and category_code == "CAT4"')

# Sort the DataFrame
old_restaurants = old_restaurants.sort_values('year_founded')
display(old_restaurants)

Unnamed: 0,business,year_founded,category_code,country_code,category
144,St. Peter Stifts Kulinarium,803,CAT4,AUT,"Cafés, Restaurants & Bars"
145,Sean's Bar,900,CAT4,IRL,"Cafés, Restaurants & Bars"
141,Ma Yu Ching's Bucket Chicken House,1153,CAT4,CHN,"Cafés, Restaurants & Bars"


# 8. Oldest Industries
St. Peter Stifts Kulinarium is old enough that the restaurant is believed to have served Mozart - and it would have been over 900 years old even when he was a patron! Let's finish by looking at the oldest business in each category of commerce for each continent.

In [9]:
# Merge all businesses, countries, and categories together
businesses_categories_countries = businesses_categories.merge(countries, on='country_code')

# Sort businesses_categories_countries from oldest to most recent
businesses_categories_countries = businesses_categories_countries.sort_values('year_founded')

# Create the oldest by continent and category DataFrame
oldest_category = businesses_categories_countries.groupby('category')[['year_founded']].min().sort_values('year_founded')
display(oldest_category.head())

Unnamed: 0_level_0,year_founded
category,Unnamed: 1_level_1
Construction,578
"Cafés, Restaurants & Bars",803
"Distillers, Vintners, & Breweries",862
Manufacturing & Production,864
Agriculture,1218
