**1. The oldest businesses in the world**

Staffelter Hof Winery, Germany's oldest business, which was established in 862 under the Carolingian dynasty. It has continued to serve customers through dramatic changes in Europe such as the Holy Roman Empire, the Ottoman Empire, and both world wars. What characteristics enable a business to stand the test of time? 

To help answer this question, BusinessFinancing.co.uk researched the oldest company that is still in business in almost every country and compiled the results into a dataset. Let's explore this work to to better understand these historic businesses. Our datasets contain the following information:


In [None]:
# Importing the pandas library
import pandas as pd

# Loading the business.csv file as a DataFrame called businesses
businesses = pd.read_csv('businesses.csv')

# Loading the categories.csv file as a DataFrame called categories
categories = pd.read_csv('categories.csv')

# Loading the countries.csv file as a DataFrame called countries
countries = pd.read_csv('countries.csv')

In [None]:
# Displaying business information
businesses.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 163 entries, 0 to 162
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   business       163 non-null    object
 1   year_founded   163 non-null    int64 
 2   category_code  163 non-null    object
 3   country_code   163 non-null    object
dtypes: int64(1), object(3)
memory usage: 5.2+ KB


  **column : meaning**

*business* : Name of the business.

*year_founded* : Year the business was founded.

*category_code*	:	Code for the category of the business.

*country_code*	:	ISO 3166-1 3-letter country code.

In [None]:
# Displaying information about categories
categories.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   category_code  19 non-null     object
 1   category       19 non-null     object
dtypes: object(2)
memory usage: 432.0+ bytes


  **column : meaning**

*country_code* : ISO 3166-1 3-letter country code.

*country*	: Name of the country.

*continent*	:	Name of the continent that the country exists in.

In [None]:
# Displaying information about countries
countries.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   country_code  195 non-null    object
 1   country       195 non-null    object
 2   continent     195 non-null    object
dtypes: object(3)
memory usage: 4.7+ KB


  **column : meaning**

*category_code*	:	Code for the category of the business.

*category*	:	Description of the business category.

In [None]:
# Sorting businesses from oldest businesses to youngest
sorted_businesses = businesses.sort_values('year_founded')

# Displaying the first few lines of sorted_businesses
sorted_businesses.head()

Unnamed: 0,business,year_founded,category_code,country_code
64,Kongō Gumi,578,CAT6,JPN
94,St. Peter Stifts Kulinarium,803,CAT4,AUT
107,Staffelter Hof Winery,862,CAT9,DEU
106,Monnaie de Paris,864,CAT12,FRA
103,The Royal Mint,886,CAT12,GBR


**2. The oldest businesses in North America**

So far we've learned that Kongō Gumi is the world's oldest continuously operating business, beating out the second oldest business by well over 100 years! It's a little hard to read the country codes, though. 

Having useful information in different files is a common problem: for data storage, it's better to keep different types of data separate, but for analysis, we want all the data in one place. To solve this, we'll have to join the two tables together 'sorted_businesses' and 'countries'

In [None]:
# Combining sorted_businesses with countries
businesses_countries = sorted_businesses.merge(countries, on="country_code")

# Filtering businesses_countries to include countries in North America only
north_america = businesses_countries[businesses_countries['continent']=='North America']
north_america.head()

Unnamed: 0,business,year_founded,category_code,country_code,country,continent
22,La Casa de Moneda de México,1534,CAT12,MEX,Mexico,North America
28,Shirley Plantation,1638,CAT1,USA,United States,North America
33,Hudson's Bay Company,1670,CAT17,CAN,Canada,North America
35,Mount Gay Rum,1703,CAT9,BRB,Barbados,North America
40,Rose Hall,1770,CAT19,JAM,Jamaica,North America


**3. The oldest business on each continent**

Now we can see that the oldest company in North America is La Casa de Moneda de México, founded in 1534. Let's now look at the oldest company for all the continents.

In [None]:
# Creating continent, which lists only the continent and oldest year_founded
continent = businesses_countries.groupby("continent").agg({"year_founded":"min"})

# Combining continent with businesses_countries
merged_continent = continent.merge(businesses_countries, on=["continent", "year_founded"])

# Subsetting continent so that only the four columns of interest are included
subset_merged_continent = merged_continent[["continent", "country", "business", "year_founded"]]
subset_merged_continent

Unnamed: 0,continent,country,business,year_founded
0,Africa,Mauritius,Mauritius Post,1772
1,Asia,Japan,Kongō Gumi,578
2,Europe,Austria,St. Peter Stifts Kulinarium,803
3,North America,Mexico,La Casa de Moneda de México,1534
4,Oceania,Australia,Australia Post,1809
5,South America,Peru,Casa Nacional de Moneda,1565


**4. Unknown oldest businesses**

BusinessFinancing.co.uk wasn't able to determine the oldest business for some countries, and those countries are simply left off of businesses.csv and, by extension, businesses. However, the countries that we created does include all countries in the world, regardless of whether the oldest business is known.

We can compare the two datasets 'businesses' and 'countries'  in one DataFrame to find out which countries don't have a known oldest business.

In [None]:
# Using .merge() to create a DataFrame, all_countries
all_countries = businesses.merge(countries, on="country_code", how="right",  indicator=True)

# Filtering to include only countries without oldest businesses
missing_countries = all_countries[all_countries["_merge"] != "both"]

# Creating a series of the country names with missing oldest business data
missing_countries_series = missing_countries["country"]

# Displaying the series
missing_countries_series

1                                Angola
7                   Antigua and Barbuda
18                              Bahamas
48                   Dominican Republic
50                              Ecuador
57                                 Fiji
59      Micronesia, Federated States of
63                                Ghana
65                               Gambia
69                              Grenada
79            Iran, Islamic Republic of
89                           Kyrgyzstan
91                             Kiribati
92                Saint Kitts and Nevis
107                              Monaco
108                Moldova, Republic of
110                            Maldives
112                    Marshall Islands
131                               Nauru
138                               Palau
139                    Papua New Guinea
143                            Paraguay
144                 Palestine, State of
153                     Solomon Islands
160                            Suriname


**5. Add new older business data**

It seems that there are some holes in the data. To enhance the work of BusinessFinancing.co.uk there is a 'new_businesses.csv' file which contains some of the missing countries with newly discovered businesses.

In [None]:
# Loading the new_business.csv file as a DataFrame called businesses
new_businesses = pd.read_csv('new_businesses.csv')

# Displaying information about businesses
new_businesses.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   business       2 non-null      object
 1   year_founded   2 non-null      int64 
 2   category_code  2 non-null      object
 3   country_code   2 non-null      object
dtypes: int64(1), object(3)
memory usage: 192.0+ bytes


  **column : meaning**

*business* : Name of the business.

*year_founded* : Year the business was founded.

*category_code*	:	Code for the category of the business.

*country_code*	:	ISO 3166-1 3-letter country code.

All we have to do is combine the two 'new_businesses' and 'businesses' to have a more complete list of companies.

In [None]:
# Adding the data in new_businesses to the existing businesses
all_businesses = pd.concat([new_businesses, businesses])

# Combining and filter to find countries with missing business data
new_all_countries = all_businesses.merge(countries, on="country_code", how="outer",  indicator=True)
new_missing_countries = new_all_countries[new_all_countries["_merge"] != "both"]

# Grouping by continent and create a "count_missing" column
count_missing = new_missing_countries.groupby("continent").agg({"country":"count"})
count_missing.columns = ["count_missing"]
count_missing

Unnamed: 0_level_0,count_missing
continent,Unnamed: 1_level_1
Africa,3
Asia,7
Europe,2
North America,5
Oceania,10
South America,3


**6. The oldest industries**

Let's remember that the oldest business found was Kongō Gumi, we know that it was founded in the year 578 in Japan, but it is a bit difficult to decipher what industry it is in. Information about what the category_code column refers to is found in "datasets/categories.csv":

Use letscategories.csv to understand how many older companies are in each industry category.

In [None]:
# Importing categories.csv and merge to businesses
categories = pd.read_csv("categories.csv")
businesses_categories = businesses.merge(categories, on="category_code")

# Creating a DataFrame which lists the number of oldest businesses in each category
count_business_cats = businesses_categories.groupby("category").agg({"business":"count"})

# Creating a DataFrame which lists the cumulative years that businesses from each category have been operating
years_business_cats = businesses_categories.groupby("category").agg({"year_founded":"sum"})

# Renaming columns and display the first five rows of both DataFrames
count_business_cats.columns = ["count"]
years_business_cats.columns = ["total_years_in_business"]
display(count_business_cats.head(), years_business_cats.head())

Unnamed: 0_level_0,count
category,Unnamed: 1_level_1
Agriculture,6
Aviation & Transport,19
Banking & Finance,37
"Cafés, Restaurants & Bars",6
Conglomerate,3


Unnamed: 0_level_0,total_years_in_business
category,Unnamed: 1_level_1
Agriculture,10669
Aviation & Transport,36598
Banking & Finance,70302
"Cafés, Restaurants & Bars",8532
Conglomerate,5671


**7. Representation of the restaurant**

No matter how we measure it, it seems that banking and finance is an excellent industry to be in if longevity is our goal.

This data allows us to analyze various industries, just to give an example
Let's approach another industry: 

Cafes, restaurants and bars. Which restaurants in our data set have existed since before the year 1800?

In [None]:
# Filtering using .query() for CAT4 businesses founded before 1800; sort results
old_restaurants = businesses_categories.query('year_founded < 1800 and category_code == "CAT4"')

# Sorting the DataFrame
old_restaurants = old_restaurants.sort_values("year_founded")
old_restaurants

Unnamed: 0,business,year_founded,category_code,country_code,category
142,St. Peter Stifts Kulinarium,803,CAT4,AUT,"Cafés, Restaurants & Bars"
143,Sean's Bar,900,CAT4,IRL,"Cafés, Restaurants & Bars"
139,Ma Yu Ching's Bucket Chicken House,1153,CAT4,CHN,"Cafés, Restaurants & Bars"


**8. Categories and continents**

As a last example, let's finish by looking at the oldest businesses in each trade category for each continent.

In [None]:
# Combining all businesses, countries, and categories together
businesses_categories_countries = businesses_categories.merge(countries, on="country_code")

# Sorting businesses_categories_countries from oldest to most recent
businesses_categories_countries = businesses_categories_countries.sort_values("year_founded")

# Creating the oldest by continent and category DataFrame
oldest_by_continent_category = businesses_categories_countries.groupby(["continent", "category"]).agg({"year_founded":"min"})
oldest_by_continent_category.head(30)

Unnamed: 0_level_0,Unnamed: 1_level_0,year_founded
continent,category,Unnamed: 2_level_1
Africa,Agriculture,1947
Africa,Aviation & Transport,1854
Africa,Banking & Finance,1892
Africa,"Distillers, Vintners, & Breweries",1933
Africa,Energy,1968
Africa,Food & Beverages,1878
Africa,Manufacturing & Production,1820
Africa,Media,1943
Africa,Mining,1962
Africa,Postal Service,1772


**Conclusion**

So, Which and Where are the World's Oldest Businesses? The oldest company in Japan is Kongō Gumi, operating since the year 578. For Latin America, the answer differs, as the oldest company is Casa de Moneda located in Mexico, operating since 1534. For each continent, there may also be different answers. In Africa, we find the company Mauritius Post, operating since 1772. In Europe, we have St. Peter Stifts Kulinarium, founded in 803. In Oceania, we find Australia Post since 1809, among other examples.

The data also reveals the deficiencies that can occur in any dataset, such as the lack of information. Some countries do not have information about companies, so the search needs to be more specific, and in some cases, it requires more time and resources. For example, our data update initially showed ignorance of 189 localities in Samoa, but now we only have unknown information about 10 localities in the continent of Oceania.

Another important factor to consider is the category in which the oldest companies are found, which can be synonymous with stability in some areas. For example, in the Banking and Finance category, companies have a respectable sum of founding years, totaling 70,302 years. These data need to be carefully analyzed to be utilized as useful information. For instance, when looking at restaurants, a specific and limited search reveals that there are only three restaurants older than 1800: St. Peter Stifts Kulinarium, Sean's Bar, and Ma Yu Ching's Bucket Chicken House, founded between the years 803 and 1153. This gives us an idea of the types of businesses that prevail in the continents and the years they were established.

"Which?" and "Where?" to answer these questions can be done in various ways and always depends on the desired objective. Therefore, it is important to delimit the search and have clear goals by applying an objective and clear analysis.