In [1]:
import pandas as pd
import numpy as np
from sqlalchemy import create_engine

In [2]:
df1 = pd.read_csv('./data/businesses.csv')

In [3]:
df2 = pd.read_csv('./data/categories.csv')

In [4]:
df3 = pd.read_csv('./data/countries.csv')

In [5]:
%load_ext sql

In [6]:
DB_NAME = 'Oldest Business'
DB_USER = "postgres"
DB_PASS = "hellosql"
DB_HOST = "localhost"
DB_PORT = "5432"

engine = create_engine(f"postgresql+psycopg2://{DB_USER}:{DB_PASS}@{DB_HOST}/{DB_NAME}")

#Table 1
TABLE_NAME = 'businessess'
with engine.connect() as conn:
    df1.to_sql(name= TABLE_NAME, con=conn,index=False, if_exists = 'replace')

print("Data successfully loaded")

Data successfully loaded


In [7]:
#Table 2
DB_NAME = 'Oldest Business'
DB_USER = "postgres"
DB_PASS = "hellosql"
DB_HOST = "localhost"
DB_PORT = "5432"


engine = create_engine(f"postgresql+psycopg2://{DB_USER}:{DB_PASS}@{DB_HOST}/{DB_NAME}")
TABLE_NAME = 'categories'
with engine.connect() as conn:
    df2.to_sql(name= TABLE_NAME, con=conn,index=False, if_exists = 'replace')

print("Data successfully loaded")

Data successfully loaded


In [8]:
#Table 3
DB_NAME = 'Oldest Business'
DB_USER = "postgres"
DB_PASS = "hellosql"
DB_HOST = "localhost"
DB_PORT = "5432"


engine = create_engine(f"postgresql+psycopg2://{DB_USER}:{DB_PASS}@{DB_HOST}/{DB_NAME}")
TABLE_NAME = 'countries'
with engine.connect() as conn:
    df3.to_sql(name= TABLE_NAME, con=conn,index=False, if_exists = 'replace')

print("Data successfully loaded")

Data successfully loaded


In [9]:
with engine.connect() as conn:
    display(pd.read_sql('SELECT * FROM businessess', conn))

Unnamed: 0,business,year_founded,category_code,country_code
0,Hamoud Boualem,1878,CAT11,DZA
1,Communauté Électrique du Bénin,1968,CAT10,BEN
2,Botswana Meat Commission,1965,CAT1,BWA
3,Air Burkina,1967,CAT2,BFA
4,Brarudi,1955,CAT9,BDI
...,...,...,...,...
158,Cafe Brasilero,1877,CAT4,URY
159,Hacienda Chuao,1660,CAT11,VEN
160,Australia Post,1809,CAT16,AUS
161,Bank of New Zealand,1861,CAT3,NZL


In [10]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [11]:
%env DATABASE_URL= postgresql+psycopg2://postgres:hellosql@localhost:5432/Oldest Business

env: DATABASE_URL=postgresql+psycopg2://postgres:hellosql@localhost:5432/Oldest Business


In [12]:
%sql SELECT * FROM countries LIMIT 10

10 rows affected.


country_code,country,continent
AFG,Afghanistan,Asia
AGO,Angola,Africa
ALB,Albania,Europe
AND,Andorra,Europe
ARE,United Arab Emirates,Asia
ARG,Argentina,South America
ARM,Armenia,Asia
ATG,Antigua and Barbuda,North America
AUS,Australia,Oceania
AUT,Austria,Europe


### 1. Select the oldest and newest founding years from the businesses table

In [13]:
%%sql SELECT MIN(year_founded) as oldest, MAX(year_founded) as newest
FROM businessess


 * postgresql+psycopg2://postgres:***@localhost:5432/Oldest Business
1 rows affected.


oldest,newest
578,1999


### 2. How many businesses were founded before 1000?

In [14]:
%%sql SELECT COUNT(*) 
FROM businessess 
WHERE year_founded < 1000;

 * postgresql+psycopg2://postgres:***@localhost:5432/Oldest Business
1 rows affected.


count
6


### 3. Which businesses were founded before 1000?

In [15]:
%%sql SELECT *
FROM businessess 
WHERE year_founded < 1000;

 * postgresql+psycopg2://postgres:***@localhost:5432/Oldest Business
6 rows affected.


business,year_founded,category_code,country_code
Kongō Gumi,578,CAT6,JPN
St. Peter Stifts Kulinarium,803,CAT4,AUT
The Royal Mint,886,CAT12,GBR
Monnaie de Paris,864,CAT12,FRA
Staffelter Hof Winery,862,CAT9,DEU
Sean's Bar,900,CAT4,IRL


### 4. Exploring the categories

Now we know that the oldest, continuously operating company in the world is called Kongō Gumi. But was does that company do? The category codes in the businesses table aren't very helpful: the descriptions of the categories are stored in the categories table.

This is a common problem: for data storage, it's better to keep different types of data in different tables, but for analysis, you want all the data in one place. To solve this, you'll have to join the two tables together.

In [16]:
%%sql SELECT *
FROM businessess as b
JOIN categories as c
ON b.category_code = c.category_code
WHERE business = 'Kongō Gumi'

 * postgresql+psycopg2://postgres:***@localhost:5432/Oldest Business
1 rows affected.


business,year_founded,category_code,country_code,category_code_1,category
Kongō Gumi,578,CAT6,JPN,CAT6,Construction


### 5. Counting the categories

With that extra detail about the oldest businesses, we can see that Kongō Gumi is a construction company. In that list of six businesses, we also see a café, a winery, and a bar. The two companies recorded as "Manufacturing and Production" are both mints. That is, they produce currency.

I'm curious as to what other industries constitute the oldest companies around the world, and which industries are most common.

In [17]:
%%sql SELECT c.category, COUNT(c.category) as n
FROM businessess as b
JOIN categories as c
ON b.category_code = c.category_code
GROUP BY c.category
ORDER BY COUNT(c.category) DESC;

 * postgresql+psycopg2://postgres:***@localhost:5432/Oldest Business
19 rows affected.


category,n
Banking & Finance,37
"Distillers, Vintners, & Breweries",22
Aviation & Transport,19
Postal Service,16
Manufacturing & Production,15
Media,7
Agriculture,6
Food & Beverages,6
"Cafés, Restaurants & Bars",6
Energy,4


### 6. Oldest business by continent
It looks like "Banking & Finance" is the most popular category. Maybe that's where you should aim if you want to start a thousand-year business.

One thing we haven't looked at yet is where in the world these really old businesses are. To answer these questions, we'll need to join the businesses table to the countries table. Let's start by asking how old the oldest business is on each continent.

In [18]:
%%sql SELECT b.year_founded as oldest, cn.continent
FROM businessess as b
JOIN countries as cn
ON b.country_code = cn.country_code
WHERE b.year_founded < 1000
ORDER BY  b.year_founded ASC, cn.continent;

 * postgresql+psycopg2://postgres:***@localhost:5432/Oldest Business
6 rows affected.


oldest,continent
578,Asia
803,Europe
862,Europe
864,Europe
886,Europe
900,Europe


### 7. Joining everything for further analysis
Interesting. There's a jump in time from the older businesses in Asia and Europe to the 16th Century oldest businesses in North and South America, then to the 18th and 19th Century oldest businesses in Africa and Oceania.

As mentioned earlier, when analyzing data it's often really helpful to have all the tables you want access to joined together into a single set of results that can be analyzed further. Here, that means we need to join all three tables.

In [19]:
%%sql SELECT b.business, b.year_founded, c.category_code, c.category, cn.country_code, cn.continent
FROM categories as c
JOIN businessess as b
ON b.category_code = c.category_code
JOIN countries as cn
ON b.country_code = cn.country_code

 * postgresql+psycopg2://postgres:***@localhost:5432/Oldest Business
163 rows affected.


business,year_founded,category_code,category,country_code,continent
Botswana Meat Commission,1965,CAT1,Agriculture,BWA,Africa
Spinzar Cotton Company,1930,CAT1,Agriculture,AFG,Asia
Shirley Plantation,1638,CAT1,Agriculture,USA,North America
Cotontchad,1971,CAT1,Agriculture,TCD,Africa
Casa de Ganaderos,1218,CAT1,Agriculture,ESP,Europe
Cameroon Development Corporation,1947,CAT1,Agriculture,CMR,Africa
Petroleum Development Oman,1937,CAT10,Energy,OMN,Asia
Electricite du Laos,1959,CAT10,Energy,LAO,Asia
North Oil Company,1928,CAT10,Energy,IRQ,Asia
Communauté Électrique du Bénin,1968,CAT10,Energy,BEN,Africa


### 8. Counting categories by continent

Having businesses joined to categories and countries together means we can ask questions about both these things together. For example, which are the most common categories for the oldest businesses on each continent?

In [20]:
%%sql SELECT   cn.continent, c.category, COUNT(c.category) as count
FROM categories as c
JOIN businessess as b
ON b.category_code = c.category_code
JOIN countries as cn
ON b.country_code = cn.country_code
GROUP BY cn.continent, c.category
ORDER BY count DESC;

 * postgresql+psycopg2://postgres:***@localhost:5432/Oldest Business
56 rows affected.


continent,category,count
Africa,Banking & Finance,17
Europe,"Distillers, Vintners, & Breweries",12
Africa,Aviation & Transport,10
Africa,Postal Service,9
Europe,Manufacturing & Production,8
Asia,Aviation & Transport,7
Asia,Banking & Finance,6
Europe,Banking & Finance,5
North America,"Distillers, Vintners, & Breweries",5
North America,Banking & Finance,4


### 9. Filtering counts by continent and category

Combining continent and business category led to a lot of results. It's difficult to see what is important. To trim this down to a manageable size, let's restrict the results to only continent/category pairs with a high count.

In [21]:
%%sql SELECT   cn.continent, c.category, COUNT(c.category) as count
FROM categories as c
JOIN businessess as b
ON b.category_code = c.category_code
JOIN countries as cn
ON b.country_code = cn.country_code
GROUP BY cn.continent, c.category
HAVING COUNT(c.category) > 5
ORDER BY count DESC;

 * postgresql+psycopg2://postgres:***@localhost:5432/Oldest Business
7 rows affected.


continent,category,count
Africa,Banking & Finance,17
Europe,"Distillers, Vintners, & Breweries",12
Africa,Aviation & Transport,10
Africa,Postal Service,9
Europe,Manufacturing & Production,8
Asia,Aviation & Transport,7
Asia,Banking & Finance,6
