# What and Where are the World's Oldest Businesses?

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adamelliotfields/datacamp/blob/main/notebooks/projects/worlds_oldest_businesses/notebook.ipynb)
[![Render nbviewer](https://raw.githubusercontent.com/jupyter/design/main/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/adamelliotfields/datacamp/blob/main/notebooks/projects/worlds_oldest_businesses/notebook.ipynb)

<figure>
  <img
    src="Eingang_zum_St_Peter_Stiftskeller.jpg"
    alt="St. Peter Stiftskeller, founded 803."
    width="320"
  />
  <figcaption>St. Peter Stiftskeller, founded 803. Credit: <a href="" target="_blank" rel="noopener noreferrer">Pakeha</a>.</figcaption>
</figure>

**Contents**
1. [How many businesses were founded before 1000?](#How-many-businesses-were-founded-before-1000?)
2. [Which businesses were founded before 1000?](#Which-businesses-were-founded-before-1000?)
3. [Exploring the categories](#Exploring-the-categories)
4. [Counting the categories](#Counting-the-categories)
5. [Oldest business by continent](#Oldest-business-by-continent)
6. [Joining everything for further analysis](#Joining-everything-for-further-analysis)
7. [Counting categories by continent](#Counting-categories-by-continent)
8. [Filtering counts by continent and category](#Filtering-counts-by-continent-and-category)

An important part of business is planning for the future and ensuring that the company survives changing market conditions. Some businesses do this really well and last for hundreds of years.

[BusinessFinancing.co.uk](https://businessfinancing.co.uk) [researched the oldest company](https://businessfinancing.co.uk/the-oldest-company-in-almost-every-country) that is still in business in (almost) every country and compiled the results into a dataset. In this project, you'll explore that dataset to see what they found.


In [1]:
%load_ext sql

%config SqlMagic.autopandas = False
%config SqlMagic.displaycon = False
%config SqlMagic.feedback = 0

import duckdb

conn = duckdb.connect(database=':memory:', read_only=False)
%sql conn

# load data
%sql create table businesses as from read_csv('businesses.csv', header=true, auto_detect=true);
%sql create table categories as from read_csv('categories.csv', header=true, auto_detect=true);
%sql create table countries as from read_csv('countries.csv', header=true, auto_detect=true);


Count
195


In [2]:
%%sql
-- describe businesses
select column_name, data_type from information_schema.columns where table_name = 'businesses';


column_name,data_type
business,VARCHAR
year_founded,BIGINT
category_code,VARCHAR
country_code,VARCHAR


In [3]:
%%sql
-- describe categories
select column_name, data_type from information_schema.columns where table_name = 'categories';


column_name,data_type
category_code,VARCHAR
category,VARCHAR


In [4]:
%%sql
-- describe countries
select column_name, data_type from information_schema.columns where table_name = 'countries';


column_name,data_type
country_code,VARCHAR
country,VARCHAR
continent,VARCHAR


In [5]:
%%sql
-- select oldest and newest `year_founded` from businesses
select min(year_founded), max(year_founded) from businesses;


min(year_founded),max(year_founded)
578,1999


## How many businesses were founded before 1000?

Wow! That's a lot of variation between countries. In one country, the oldest business was only founded in 1999. By contrast, the oldest business in the world was founded back in 578. That's pretty incredible that a business has survived for more than a millennium.

I wonder how many other businesses there are like that.


In [6]:
%%sql
-- count rows in businesses where `year_founded` is before 1000
select count(*) from businesses where year_founded < 1000;


count_star()
6


## Which businesses were founded before 1000?

Having a count is all very well, but I'd like more detail. Which businesses have been around for more than a millennium?

In [7]:
%%sql
-- Select all columns from businesses where the founding year was before 1000
-- Arrange the results from oldest to newest
select * from businesses where year_founded < 1000 order by year_founded;


business,year_founded,category_code,country_code
Kongō Gumi,578,CAT6,JPN
St. Peter Stifts Kulinarium,803,CAT4,AUT
Staffelter Hof Winery,862,CAT9,DEU
Monnaie de Paris,864,CAT12,FRA
The Royal Mint,886,CAT12,GBR
Sean's Bar,900,CAT4,IRL


## Exploring the categories

Now we know that the oldest, continuously operating company in the world is called Kongō Gumi. But was does that company do? The category codes in the businesses table aren't very helpful: the descriptions of the categories are stored in the categories table.

This is a common problem: for data storage, it's better to keep different types of data in different tables, but for analysis, you want all the data in one place. To solve this, you'll have to join the two tables together.

In [8]:
%%sql
-- Select business name, founding year, and country code from businesses; and category from categories
-- where the founding year was before 1000, arranged from oldest to newest
select b.business, b.year_founded, b.country_code, c.category from businesses b
inner join categories c on b.category_code = c.category_code
where b.year_founded < 1000
order by b.year_founded;


business,year_founded,country_code,category
Kongō Gumi,578,JPN,Construction
St. Peter Stifts Kulinarium,803,AUT,"Cafés, Restaurants & Bars"
Staffelter Hof Winery,862,DEU,"Distillers, Vintners, & Breweries"
Monnaie de Paris,864,FRA,Manufacturing & Production
The Royal Mint,886,GBR,Manufacturing & Production
Sean's Bar,900,IRL,"Cafés, Restaurants & Bars"


## Counting the categories

With that extra detail about the oldest businesses, we can see that Kongō Gumi is a construction company. In that list of six businesses, we also see a café, a winery, and a bar. The two companies recorded as "Manufacturing and Production" are both mints. That is, they produce currency.

I'm curious as to what other industries constitute the oldest companies around the world, and which industries are most common.

In [9]:
%%sql
-- Select the category and count of category (as "n")
-- arranged by descending count, limited to 10 most common categories
select c.category, count(c.category) as n from businesses b
inner join categories c on b.category_code = c.category_code
group by c.category
order by n desc
limit 10;


category,n
Banking & Finance,37
"Distillers, Vintners, & Breweries",22
Aviation & Transport,19
Postal Service,16
Manufacturing & Production,15
Media,7
Food & Beverages,6
Agriculture,6
"Cafés, Restaurants & Bars",6
Energy,4


## Oldest business by continent

It looks like "Banking & Finance" is the most popular category. Maybe that's where you should aim if you want to start a thousand-year business.

One thing we haven't looked at yet is where in the world these really old businesses are. To answer these questions, we'll need to join the businesses table to the countries table. Let's start by asking how old the oldest business is on each continent.

In [10]:
%%sql
-- Select the oldest founding year (as "oldest") from businesses, 
-- and continent from countries
-- for each continent, ordered from oldest to newest 
select min(b.year_founded) as oldest, c.continent from businesses b
inner join countries c on b.country_code = c.country_code
group by c.continent
order by oldest;


oldest,continent
578,Asia
803,Europe
1534,North America
1565,South America
1772,Africa
1809,Oceania


## Joining everything for further analysis

Interesting. There's a jump in time from the older businesses in Asia and Europe to the 16th Century oldest businesses in North and South America, then to the 18th and 19th Century oldest businesses in Africa and Oceania.

As mentioned earlier, when analyzing data it's often really helpful to have all the tables you want access to joined together into a single set of results that can be analyzed further. Here, that means we need to join all three tables.

In [11]:
%%sql
-- Select the business, founding year, category, country, and continent
-- Join businesses to categories and countries
select b.business, b.year_founded, c.category, co.country, co.continent from businesses b
inner join categories c on b.category_code = c.category_code
inner join countries co on b.country_code = co.country_code;


business,year_founded,category,country,continent
Spinzar Cotton Company,1930,Agriculture,Afghanistan,Asia
ALBtelecom,1912,Telecommunications,Albania,Europe
Andbank,1930,Banking & Finance,Andorra,Europe
Liwa Chemicals,1939,Manufacturing & Production,United Arab Emirates,Asia
Bank of the Province of Buenos Aires,1822,Banking & Finance,Argentina,South America
Yerevan Ararat Brandy-Wine-Vodka Factory,1877,"Distillers, Vintners, & Breweries",Armenia,Asia
Australia Post,1809,Postal Service,Australia,Oceania
St. Peter Stifts Kulinarium,803,"Cafés, Restaurants & Bars",Austria,Europe
Azerbaijan Caspian Shipping Company,1858,Aviation & Transport,Azerbaijan,Asia
Brarudi,1955,"Distillers, Vintners, & Breweries",Burundi,Africa


## Counting categories by continent

Having businesses joined to categories and countries together means we can ask questions about both these things together. For example, which are the most common categories for the oldest businesses on each continent?

In [12]:
%%sql
-- Count the number of businesses in each continent and category
select co.continent, c.category, count(b.business) as n from businesses b
inner join categories c on b.category_code = c.category_code
inner join countries co on b.country_code = co.country_code
group by co.continent, c.category;


continent,category,n
Asia,Agriculture,1
Europe,Telecommunications,1
Asia,Manufacturing & Production,3
South America,Banking & Finance,3
Asia,"Distillers, Vintners, & Breweries",2
Oceania,Postal Service,1
Europe,"Distillers, Vintners, & Breweries",12
Africa,Energy,1
Africa,Aviation & Transport,10
Asia,Food & Beverages,2


## Filtering counts by continent and category

Combining continent and business category led to a lot of results. It's difficult to see what is important. To trim this down to a manageable size, let's restrict the results to only continent/category pairs with a high count.

In [13]:
%%sql
-- Repeat that previous query, filtering for results having a count greater than 5
select co.continent, c.category, count(b.business) as n from businesses b
inner join categories c on b.category_code = c.category_code
inner join countries co on b.country_code = co.country_code
group by co.continent, c.category
having n > 5
order by n desc;


continent,category,n
Africa,Banking & Finance,17
Europe,"Distillers, Vintners, & Breweries",12
Africa,Aviation & Transport,10
Africa,Postal Service,9
Europe,Manufacturing & Production,8
Asia,Aviation & Transport,7
Asia,Banking & Finance,6
