# What and Where are the World's Oldest Businesses

## 1.Overview
An important part of business is planning for the future and ensuring that the business survives changing market conditions. Some businesses do this remarkably well and last for hundreds of years.

## 2. Objective

The objective is to explore data from [BusinessFinancing.co.uk](https://businessfinancing.co.uk/the-oldest-company-in-almost-every-country/) on the world's oldest businesses: when were they founded, and which industries do they belong to?

## 3. Data Collection

Like many business problems, the [data](https://www.kaggle.com/datasets/manuelandresespitia/what-and-where-are-the-worlds-oldest-businesses) we'll explore is contained in several different datasets. The database contains three tables.

<u>categories</u>

|column        |type    |meaning								|
|--------------|--------|---------------------------------------|
|category_code |varchar	|Code for the category of the business.	|
|category      |varchar	|Description of the business category.	|

<u>countries</u>

|column			|type	 |meaning											|
|---------------|--------|--------------------------------------------------|
|country_code	|varchar |ISO 3166-1 3-letter country code.					|
|country		|varchar |Name of the country.								|
|continent		|varchar |Name of the continent that the country exists in.	|

<u>businesses</u>


|column		   | type	 |meaning								|
|--------------|---------|--------------------------------------|
|business	   |varchar	 |Name of the business.					|
|year_founded  |int		 |Year the business was founded.		|
|category_code |varchar	 |Code for the category of the business.|
|country_code  |char	 |ISO 3166-1 3-letter country code.		|

### 3.1. Import libraries

In [1]:
# Data manipulation
import pandas as pd
# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Database connection
import os
from dotenv import load_dotenv
from sqlalchemy import create_engine
from urllib.parse import quote_plus

### 3.2. Database Connection

In [2]:
load_dotenv()

# MySQL database connection using SQLAlchemy
username = os.getenv('MYSQL_ROOT_USER')
password = os.getenv('MYSQL_ROOT_PASSWORD')
host = "localhost"
port = "3306"
databasename = "PROJECT"

# URL-encode the password
encoded_password = quote_plus(password)

# Construct the connection string with the encoded password
db_uri = f"mysql+pymysql://{username}:{encoded_password}@{host}:{port}/{databasename}"
# set echo=False and all logging will be disabled
engine = create_engine(db_uri,echo=False)

In [4]:
%load_ext sql
%sql engine
%config SqlMagic.displaylimit = 20

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


### 3.2 Data loading
First, create a DataFrame in Python using the pandas library, and then load the dataset.

In [7]:
# Load datasets:
df_categories = pd.read_csv('data/categories.csv')
df_countries = pd.read_csv('data/countries.csv')
df_businesses = pd.read_csv('data/businesses.csv')

print('\nDataframe Shape:',df_categories.shape)
print('df_categories:\n',df_categories.head())

print('\nDataframe Shape:',df_countries.shape)
print('df_countries:\n',df_countries.head())

print('Dataframe Shape:',df_businesses.shape)
print('df_businesses:\n',df_businesses.head())


Dataframe Shape: (19, 2)
df_categories:
   category_code                   category
0          CAT1                Agriculture
1          CAT2       Aviation & Transport
2          CAT3          Banking & Finance
3          CAT4  Cafés, Restaurants & Bars
4          CAT5               Conglomerate

Dataframe Shape: (195, 3)
df_countries:
   country_code               country continent
0          AFG           Afghanistan      Asia
1          AGO                Angola    Africa
2          ALB               Albania    Europe
3          AND               Andorra    Europe
4          ARE  United Arab Emirates      Asia
Dataframe Shape: (163, 4)
df_businesses:
                          business  year_founded category_code country_code
0                  Hamoud Boualem          1878         CAT11          DZA
1  Communauté Électrique du Bénin          1968         CAT10          BEN
2        Botswana Meat Commission          1965          CAT1          BWA
3                     Air Burkina 

**OBS:** The amount of data is lest than 1k rows, so we can load the data directly into the database without specifying the chunk size.

Second, verify if the `categories`, `countries`, `businesses` table exists in database.

In [8]:
%%sql
-- Check 'categories' tables if exist:
SHOW TABLES LIKE 'categories';

Tables_in_PROJECT (categories)
categories


In [9]:
%%sql
-- Check 'countries' tables if exist:
SHOW TABLES LIKE 'countries';

Tables_in_PROJECT (countries)
countries


In [10]:
%%sql
-- Check 'businesses' tables if exist:
SHOW TABLES LIKE 'businesses';

Tables_in_PROJECT (businesses)
businesses


In [11]:
%%sql
-- Remove 'categories', 'countries' and 'businesses' tables, if exist:
DROP TABLE IF EXISTS categories;
DROP TABLE IF EXISTS countries;
DROP TABLE IF EXISTS businesses;

In [13]:
%%sql
-- Create 'categories' table:
CREATE TABLE categories (
    category_code VARCHAR(5) PRIMARY KEY,
    category VARCHAR(50)
    );

-- Create 'countries' table:
CREATE TABLE countries (
    country_code CHAR(3) PRIMARY KEY,
    country VARCHAR(50),
    continent VARCHAR(20)
    );

-- Create 'countries' table:
CREATE TABLE businesses (
    business VARCHAR(64) PRIMARY KEY,
    year_founded INT,
    category_code VARCHAR(5),
    country_code CHAR(3)
    );

Finally, populate all tables from each python dataframes.

In [14]:
df_businesses.to_sql(name = "businesses", 
                     con = engine,
                     if_exists = 'append',
                    index= False)

df_categories.to_sql(name = "categories", 
                     con = engine,
                     if_exists = 'append',
                    index= False)

df_countries.to_sql(name = "countries", 
                     con = engine,
                     if_exists = 'append',
                    index= False)
engine.dispose()

In [15]:
%%sql
SELECT * FROM businesses LIMIT 5;

business,year_founded,category_code,country_code
1st National Bank of St Lucia,1938,CAT3,LCA
Affligem Brewery,1074,CAT9,BEL
Air Burkina,1967,CAT2,BFA
Air Madagascar,1962,CAT2,MDG
Air Seychelles,1977,CAT2,SYC


In [16]:
%%sql
SELECT * FROM categories LIMIT 5;

category_code,category
CAT1,Agriculture
CAT10,Energy
CAT11,Food & Beverages
CAT12,Manufacturing & Production
CAT13,Media


In [17]:
%%sql
SELECT * FROM countries LIMIT 5;

country_code,country,continent
AFG,Afghanistan,Asia
AGO,Angola,Africa
ALB,Albania,Europe
AND,Andorra,Europe
ARE,United Arab Emirates,Asia


## 4.Exploratory Data Analysis(EDA):
### 4.1. Data Dimensiones

In [19]:
%%sql
SELECT COUNT(*) AS Total_rows_info FROM businesses;

Total_rows_info
163


In [20]:
%%sql
SELECT COUNT(*) AS Total_rows_info FROM categories;

Total_rows_info
19


In [21]:
%%sql
SELECT COUNT(*) AS Total_rows_info FROM countries;

Total_rows_info
195


**OBS:** `businesses` table has 163 rows, `categories` table has 19 rows and `countries` table has 195 rows.

### 4.2. Data Type

In [22]:
%%sql
DESCRIBE businesses;

Field,Type,Null,Key,Default,Extra
business,varchar(64),NO,PRI,,
year_founded,int,YES,,,
category_code,varchar(5),YES,,,
country_code,char(3),YES,,,


In [23]:
%%sql
DESCRIBE categories;

Field,Type,Null,Key,Default,Extra
category_code,varchar(5),NO,PRI,,
category,varchar(50),YES,,,


In [24]:
%%sql
DESCRIBE countries;

Field,Type,Null,Key,Default,Extra
country_code,char(3),NO,PRI,,
country,varchar(50),YES,,,
continent,varchar(20),YES,,,


**OBS:** The data type is fine

### 4.3. Missing values
Let's identify missing values to explore the limitations of our database.

In [27]:
%%sql
-- For Numeric, date and time Data Types: missing value = NULL
-- For String: missing value = NULL or ''
-- We can join three tables to found missing values
SELECT
    COUNT(CASE WHEN B.business IS NULL OR B.business = '' THEN 1 END) AS m_business,
    COUNT(CASE WHEN B.year_founded IS NULL THEN 1 END) AS m_year_founded,
    COUNT(CASE WHEN CA.category IS NULL OR CA.category ='' THEN 1 END) AS m_category,
    COUNT(CASE WHEN CO.country IS NULL OR CO.country ='' THEN 1 END) AS m_country,
    COUNT(CASE WHEN CO.continent IS NULL OR CO.continent = '' THEN 1 END) AS m_continent
FROM businesses AS B
LEFT JOIN categories AS CA
    ON B.category_code = CA.category_code
LEFT JOIN countries AS CO
    ON B.country_code = CO.country_code
;

m_business,m_year_founded,m_category,m_country,m_continent
0,0,0,0,0


**OBS:** There are not missing values.

### 4.4. Duplicated rows:

In [28]:
%%sql
SELECT *, COUNT(*)
FROM businesses
GROUP BY business, year_founded, category_code, country_code
HAVING COUNT(*) > 1;

business,year_founded,category_code,country_code,COUNT(*)


In [29]:
%%sql
SELECT *, COUNT(*)
FROM categories
GROUP BY category_code, category
HAVING COUNT(*) > 1;

category_code,category,COUNT(*)


In [30]:
%%sql
SELECT * , COUNT(*)
FROM countries
GROUP BY country_code , country, continent
HAVING COUNT(*) >1;

country_code,country,continent,COUNT(*)


**OBS:** All tables have the correct data type, none have missing values or duplicate rows.

## 5.Data Preprocessing:


## 6. Data Analysis


### 6.1. The oldest business in the world
Find out the oldest and newest founding years from the ``businesses`` table

In [31]:
%%sql
SELECT
    business,
    year_founded,
    category_code,
    country_code
FROM businesses
WHERE year_founded = (
        SELECT MIN(year_founded) FROM businesses)
    OR year_founded = (
        SELECT MAX(year_founded) FROM businesses);

business,year_founded,category_code,country_code
Kongō Gumi,578,CAT6,JPN
Meridian Corporation,1999,CAT13,XK


**OBS:** As we can see, the oldest founding year was in 578 and the newest founding year was in 1999.

### 6.2. How many businesses were founded before 1000?
Get the count of rows in businesses where the founding year was before 1000

In [32]:
%%sql
SELECT COUNT(*) AS Founded_before_1000
FROM businesses
WHERE businesses.year_founded < 1000;

Founded_before_1000
6


**OBS:** There are 6 companies that have survived for more than a millennium.

### 6.3. Which businesses were founded before 1000?
Select all columns from businesses table where the founding year was before 1000. Arrange the results from oldest to newest.

In [33]:
%%sql
SELECT *
FROM businesses
WHERE year_founded < 1000
ORDER BY year_founded ASC;

business,year_founded,category_code,country_code
Kongō Gumi,578,CAT6,JPN
St. Peter Stifts Kulinarium,803,CAT4,AUT
Staffelter Hof Winery,862,CAT9,DEU
Monnaie de Paris,864,CAT12,FRA
The Royal Mint,886,CAT12,GBR
Sean's Bar,900,CAT4,IRL


**OBS:** The oldest and continuously operating company in the world is called Kongō Gumi.

### 6.4. Exploring the categories
Select business name, founding year, and country code from businesses Table; and category from categories table.  
The founding year was before 1000, arranged from oldest to newest.

In [34]:
%%sql
SELECT B.business, B.year_founded, CO.country, CA.category
FROM businesses AS B
LEFT JOIN categories AS CA
    ON B.category_code = CA.category_code
LEFT JOIN countries AS CO
    ON B.country_code = CO.country_code
WHERE year_founded < 1000
ORDER BY year_founded;

business,year_founded,country,category
Kongō Gumi,578,Japan,Construction
St. Peter Stifts Kulinarium,803,Austria,"Cafés, Restaurants & Bars"
Staffelter Hof Winery,862,Germany,"Distillers, Vintners, & Breweries"
Monnaie de Paris,864,France,Manufacturing & Production
The Royal Mint,886,United Kingdom,Manufacturing & Production
Sean's Bar,900,Ireland,"Cafés, Restaurants & Bars"


**OBS:**
* We can see that Kongō Gumi is a construction company from Japan.
* We also see a café, a winery, and a bar companies.
* The two companies recorded as "Manufacturing and Production" are both mints, they produce currency.

### 6.5. Counting the categories
Select the category and count of category  arranged by descending count, limited to 10 most common categorie.s

In [35]:
%%sql
SELECT CA.category, COUNT(CA.category) AS Quantity
FROM categories AS CA
LEFT JOIN businesses AS B
    ON CA.category_code = B.category_code
GROUP BY CA.category
ORDER BY Quantity DESC
LIMIT 10;

category,Quantity
Banking & Finance,37
"Distillers, Vintners, & Breweries",22
Aviation & Transport,19
Postal Service,16
Manufacturing & Production,15
Media,7
Agriculture,6
Food & Beverages,6
"Cafés, Restaurants & Bars",6
Energy,4


**OBS:** It seems that "Banking & Finance" is the most popular category.

### 6.6. Oldest business by continent
Locate in the world where are the old businesses. We'll need to join the businesses table to the countries table.

In [36]:
%%sql
SELECT MIN(B.year_founded) AS oldest, CO.continent
FROM businesses AS B
LEFT JOIN countries AS CO
    ON B.country_code = CO.country_code
GROUP BY CO.continent
ORDER BY oldest ASC;

oldest,continent
578,Asia
803,Europe
1534,North America
1565,South America
1772,Africa
1809,Oceania


**OBS:** There's a jump in time from the older businesses in Asia and Europe to the 16th Century oldest businesses in North and South America, then to the 18th and 19th Century oldest businesses in Africa and Oceania.

### 6.7. Joining everything for further analysis
When analyzing data it's often really helpful to have all the tables you want access to joined together into a single set of results that can be analyzed further.

In [37]:
%%sql
SELECT B.business, B.year_founded, CA.category, CO.country, CO.continent
FROM businesses AS B
LEFT JOIN categories AS CA
    ON B.category_code = CA.category_code
LEFT JOIN countries AS CO
    ON B.country_code = CO.country_code
LIMIT 10;

business,year_founded,category,country,continent
1st National Bank of St Lucia,1938,Banking & Finance,Saint Lucia,North America
Affligem Brewery,1074,"Distillers, Vintners, & Breweries",Belgium,Europe
Air Burkina,1967,Aviation & Transport,Burkina Faso,Africa
Air Madagascar,1962,Aviation & Transport,Madagascar,Africa
Air Seychelles,1977,Aviation & Transport,Seychelles,Asia
ALBtelecom,1912,Telecommunications,Albania,Europe
Andbank,1930,Banking & Finance,Andorra,Europe
Apatin Brewery,1756,"Distillers, Vintners, & Breweries",Serbia,Europe
Arab Bank,1930,Banking & Finance,Jordan,Asia
Arsenal AD,1878,Defense,Bulgaria,Europe


### 6.8. Counting categories by continent
Having businesses joined to categories and countries together means we can ask questions about both these things together. For example, which are the most common categories for the oldest businesses on each continent?

In [38]:
%%sql
SELECT CA.category, CO.continent, COUNT(B.business) AS Quantity
FROM businesses AS B
LEFT JOIN categories AS CA
    ON B.category_code = CA.category_code
LEFT JOIN countries AS CO
    ON B.country_code = CO.country_code
GROUP BY CA.category, CO.continent
LIMIT 10;

category,continent,Quantity
Banking & Finance,North America,4
"Distillers, Vintners, & Breweries",Europe,12
Aviation & Transport,Africa,10
Aviation & Transport,Asia,7
Telecommunications,Europe,1
Banking & Finance,Europe,5
Banking & Finance,Asia,6
Defense,Europe,1
"Distillers, Vintners, & Breweries",Africa,3
Banking & Finance,Africa,17


### 6.9. Filtering counts by continent and category
Combining continent and business category led to a lot of results. It's difficult to see what is important. To trim this down to a manageable size, let's restrict the results to only continent/category pairs with a high count.

In [39]:
%%sql
SELECT CA.category, CO.continent, COUNT(B.business) AS Quantity
FROM businesses AS B
LEFT JOIN categories AS CA
    ON B.category_code = CA.category_code
LEFT JOIN countries AS CO
    ON B.country_code = CO.country_code
GROUP BY CA.category, CO.continent
ORDER BY Quantity DESC
LIMIT 10;

category,continent,Quantity
Banking & Finance,Africa,17
"Distillers, Vintners, & Breweries",Europe,12
Aviation & Transport,Africa,10
Postal Service,Africa,9
Manufacturing & Production,Europe,8
Aviation & Transport,Asia,7
Banking & Finance,Asia,6
Banking & Finance,Europe,5
"Distillers, Vintners, & Breweries",North America,5
Postal Service,Europe,4


## 7.Conclusion

* The oldest company is a construction company named `Kongō Gum`, a `Japanese` company founded in the year `578`.

* There are 5 other companies founded before the year 1000. These companies are located in Europe and were established during the 800s and early 900s.

* `Banking & Finance` is the category with the largest number of companies.

* `Banking & Finance` category has more presence in Africa, followed by `Distillers, Vintners, & Breweries` in Europe and `Aviation & Transport` in Africa.

* It appears that Africa played a crucial role in the formation of many companies, possibly attributed to the abundance of natural resources and inexpensive labor.

## 8.References
* https://www.datacamp.com/projects/1168
* https://www.kaggle.com/datasets/manuelandresespitia/what-and-where-are-the-worlds-oldest-businesses
* https://deepnote.com/@manuelespitia1/What-and-where-are-the-worlds-oldest-businesses-cb2aea9f-e0fe-4c5b-9b5d-88d5d41160a5
* https://www.theceomagazine.com/business/management-leadership/japan-oldest-businesses/
* https://lacriaturacreativa.com/2020/02/19/este-mapa-muestra-las-empresas-mas-antiguas-de-cada-pais-del-mundo/