## 1. The oldest businesses in the world
<p>This is Staffelter Hof Winery, Germany's oldest business, which was established in 862 under the Carolingian dynasty. It has continued to serve customers through dramatic changes in Europe such as the Holy Roman Empire, the Ottoman Empire, and both world wars. What characteristics enable a business to stand the test of time? Image credit: <a href="https://commons.wikimedia.org/wiki/File:MKn_Staffelter_Hof.jpg">Martin Kraft</a>
<img src="https://assets.datacamp.com/production/project_1383/./img/MKn_Staffelter_Hof.jpeg" alt="The entrance to Staffelter Hof Winery, a German winery established in 862." width="400"></p>
<p>To help answer this question, BusinessFinancing.co.uk <a href="https://businessfinancing.co.uk/the-oldest-company-in-almost-every-country">researched</a> the oldest company that is still in business in almost every country and compiled the results into a dataset. Let's explore this work to to better understand these historic businesses. Our datasets, which are all located in the <code>datasets</code> directory, contain the following information: </p>
<h3 id="businessesandnew_businesses"><code>businesses</code> and <code>new_businesses</code></h3>
<table>
<thead>
<tr>
<th style="text-align:left;">column</th>
<th>type</th>
<th>meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"><code>business</code></td>
<td>varchar</td>
<td>Name of the business.</td>
</tr>
<tr>
<td style="text-align:left;"><code>year_founded</code></td>
<td>int</td>
<td>Year the business was founded.</td>
</tr>
<tr>
<td style="text-align:left;"><code>category_code</code></td>
<td>varchar</td>
<td>Code for the category of the business.</td>
</tr>
<tr>
<td style="text-align:left;"><code>country_code</code></td>
<td>char</td>
<td>ISO 3166-1 3-letter country code.</td>
</tr>
</tbody>
</table>
<h3 id="countries"><code>countries</code></h3>
<table>
<thead>
<tr>
<th style="text-align:left;">column</th>
<th>type</th>
<th>meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"><code>country_code</code></td>
<td>varchar</td>
<td>ISO 3166-1 3-letter country code.</td>
</tr>
<tr>
<td style="text-align:left;"><code>country</code></td>
<td>varchar</td>
<td>Name of the country.</td>
</tr>
<tr>
<td style="text-align:left;"><code>continent</code></td>
<td>varchar</td>
<td>Name of the continent that the country exists in.</td>
</tr>
</tbody>
</table>
<h3 id="categories"><code>categories</code></h3>
<table>
<thead>
<tr>
<th style="text-align:left;">column</th>
<th>type</th>
<th>meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"><code>category_code</code></td>
<td>varchar</td>
<td>Code for the category of the business.</td>
</tr>
<tr>
<td style="text-align:left;"><code>category</code></td>
<td>varchar</td>
<td>Description of the business category.</td>
</tr>
</tbody>
</table>
<p>Now let's learn about some of the world's oldest businesses still in operation!</p>

In [46]:
# Import the pandas library under its usual alias 
import pandas as pd

# Load the business.csv file as a DataFrame called businesses
businesses = pd.read_csv('datasets/businesses.csv')
display(businesses)

# # Sort businesses from oldest businesses to youngest
sorted_businesses = businesses.sort_values(by = 'year_founded', ascending = True)

# # Display the first few lines of sorted_businesses
sorted_businesses.head()

Unnamed: 0,business,year_founded,category_code,country_code
0,Hamoud Boualem,1878,CAT11,DZA
1,Communauté Électrique du Bénin,1968,CAT10,BEN
2,Botswana Meat Commission,1965,CAT1,BWA
3,Air Burkina,1967,CAT2,BFA
4,Brarudi,1955,CAT9,BDI
...,...,...,...,...
158,Cafe Brasilero,1877,CAT4,URY
159,Hacienda Chuao,1660,CAT11,VEN
160,Australia Post,1809,CAT16,AUS
161,Bank of New Zealand,1861,CAT3,NZL


Unnamed: 0,business,year_founded,category_code,country_code
64,Kongō Gumi,578,CAT6,JPN
94,St. Peter Stifts Kulinarium,803,CAT4,AUT
107,Staffelter Hof Winery,862,CAT9,DEU
106,Monnaie de Paris,864,CAT12,FRA
103,The Royal Mint,886,CAT12,GBR


## 2. The oldest businesses in North America
<p>So far we've learned that Kongō Gumi is the world's oldest continuously operating business, beating out the second oldest business by well over 100 years! It's a little hard to read the country codes, though. Wouldn't it be nice if we had a list of country names to go along with the country codes?</p>
<p>Enter <code>countries.csv</code>, which is also located in the <code>datasets</code> folder. Having useful information in different files is a common problem: for data storage, it's better to keep different types of data separate, but for analysis, we want all the data in one place. To solve this, we'll have to join the two tables together. </p>
<h3 id="countries"><code>countries</code></h3>
<table>
<thead>
<tr>
<th style="text-align:left;">column</th>
<th>type</th>
<th>meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"><code>country_code</code></td>
<td>varchar</td>
<td>ISO 3166-1 3-letter country code.</td>
</tr>
<tr>
<td style="text-align:left;"><code>country</code></td>
<td>varchar</td>
<td>Name of the country.</td>
</tr>
<tr>
<td style="text-align:left;"><code>continent</code></td>
<td>varchar</td>
<td>Name of the continent that the country exists in.</td>
</tr>
</tbody>
</table>
<p>Since <code>countries.csv</code> contains a <code>continent</code> column, merging the datasets will also allow us to look at the oldest business on each continent! </p>

In [47]:
# Load countries.csv to a DataFrame
countries = pd.read_csv('datasets/countries.csv')

# Merge sorted_businesses with countries
businesses_countries = sorted_businesses.merge(countries, on = 'country_code')

# Filter businesses_countries to include countries in North America only
north_america = businesses_countries[businesses_countries['continent'] == 'North America']
north_america.head()

Unnamed: 0,business,year_founded,category_code,country_code,country,continent
22,La Casa de Moneda de México,1534,CAT12,MEX,Mexico,North America
28,Shirley Plantation,1638,CAT1,USA,United States,North America
33,Hudson's Bay Company,1670,CAT17,CAN,Canada,North America
35,Mount Gay Rum,1703,CAT9,BRB,Barbados,North America
40,Rose Hall,1770,CAT19,JAM,Jamaica,North America


## 3. The oldest business on each continent
<p>Now we can see that the oldest company in North America is La Casa de Moneda de México, founded in 1534. Why stop there, though, when we could easily find out the oldest business on every continent? </p>

### Explore the difference between two groupby methods (just a test)

In [48]:
continent = businesses_countries.groupby('continent').year_founded.min()
display(continent.head())

# Turn continent into a DataFrame
continent = continent.to_frame()
display(continent.head())
# Check the type of continent
display(continent.shape)
# Now 'continent' is the index. Retrieve the data for Asia
display(continent.loc['Asia'])
display(continent.loc['Asia'].values[0])
display(continent['year_founded'])
display(continent['year_founded'].values)

continent = continent.reset_index()
display(continent.head())

continent
Africa           1772
Asia              578
Europe            803
North America    1534
Oceania          1809
Name: year_founded, dtype: int64

Unnamed: 0_level_0,year_founded
continent,Unnamed: 1_level_1
Africa,1772
Asia,578
Europe,803
North America,1534
Oceania,1809


(6, 1)

year_founded    578
Name: Asia, dtype: int64

578

continent
Africa           1772
Asia              578
Europe            803
North America    1534
Oceania          1809
South America    1565
Name: year_founded, dtype: int64

array([1772,  578,  803, 1534, 1809, 1565])

Unnamed: 0,continent,year_founded
0,Africa,1772
1,Asia,578
2,Europe,803
3,North America,1534
4,Oceania,1809


In [49]:
# Create continent, which lists only the continent and oldest year_founded
# AssertionError: Your `continent` DataFrame should have `continent` as its index and should have a single column called `year_founded`.
# continent = businesses_countries.groupby('continent').year_founded.min() # this generates a Series, and not a DataFrame, so it cannot be used in the next step (merge)
continent = businesses_countries.groupby("continent").agg({"year_founded":"min"})

display(continent.head())

# Merge continent with businesses_countries
merged_continent = continent.merge(businesses_countries, on = ['continent', 'year_founded'])

# Subset continent so that only the four columns of interest are included
subset_merged_continent = merged_continent[['continent', 'country', 'business', 'year_founded']]
subset_merged_continent

Unnamed: 0_level_0,year_founded
continent,Unnamed: 1_level_1
Africa,1772
Asia,578
Europe,803
North America,1534
Oceania,1809


Unnamed: 0,continent,country,business,year_founded
0,Africa,Mauritius,Mauritius Post,1772
1,Asia,Japan,Kongō Gumi,578
2,Europe,Austria,St. Peter Stifts Kulinarium,803
3,North America,Mexico,La Casa de Moneda de México,1534
4,Oceania,Australia,Australia Post,1809
5,South America,Peru,Casa Nacional de Moneda,1565


In [50]:

continent = businesses_countries.groupby("continent").agg({"year_founded":"min"})
display(continent.head())

continent = continent.reset_index()
display(continent.head())


Unnamed: 0_level_0,year_founded
continent,Unnamed: 1_level_1
Africa,1772
Asia,578
Europe,803
North America,1534
Oceania,1809


Unnamed: 0,continent,year_founded
0,Africa,1772
1,Asia,578
2,Europe,803
3,North America,1534
4,Oceania,1809


## 4. Unknown oldest businesses
<p>BusinessFinancing.co.uk wasn't able to determine the oldest business for some countries, and those countries are simply left off of <code>businesses.csv</code> and, by extension, <code>businesses</code>. However, the <code>countries</code> that we created <em>does</em> include all countries in the world, regardless of whether the oldest business is known. </p>
<p>We can compare the two datasets in one DataFrame to find out which countries don't have a known oldest business! </p>

### indicator=True in merge method (just a test)

In [51]:
# In pandas, the _merge column is created when performing a merge operation with the indicator=True option. This option adds a special column called _merge to the output DataFrame, which indicates the source of each row. The _merge column can have three possible values:

import pandas as pd

# Create two sample DataFrames
df_left = pd.DataFrame({
    'id': [1, 2, 3],
    'value': ['A', 'B', 'C']
})

df_right = pd.DataFrame({
    'id': [2, 3, 4],
    'value': ['X', 'Y', 'Z']
})

display(df_left)
display(df_right)

# Perform the merge with indicator=True, and make it a dataframe
result = pd.merge(df_left, df_right, on='id', how='outer', indicator=True)
display(result)


Unnamed: 0,id,value
0,1,A
1,2,B
2,3,C


Unnamed: 0,id,value
0,2,X
1,3,Y
2,4,Z


Unnamed: 0,id,value_x,value_y,_merge
0,1,A,,left_only
1,2,B,X,both
2,3,C,Y,both
3,4,,Z,right_only


In [52]:
# Use .merge() to create a DataFrame, all_countries
all_countries = businesses.merge(countries, on="country_code", how="right",  indicator=True)
display(all_countries.head())  

# Filter to include only countries without oldest businesses
missing_countries = all_countries[all_countries["_merge"] != "both"] # _merge is a column added by merge

# Create a series of the country names with missing oldest business data
missing_countries_series = missing_countries["country"]

# Display the series
missing_countries_series

Unnamed: 0,business,year_founded,category_code,country_code,country,continent,_merge
0,Spinzar Cotton Company,1930.0,CAT1,AFG,Afghanistan,Asia,both
1,,,,AGO,Angola,Africa,right_only
2,ALBtelecom,1912.0,CAT18,ALB,Albania,Europe,both
3,Andbank,1930.0,CAT3,AND,Andorra,Europe,both
4,Liwa Chemicals,1939.0,CAT12,ARE,United Arab Emirates,Asia,both


1                                Angola
7                   Antigua and Barbuda
18                              Bahamas
48                   Dominican Republic
50                              Ecuador
57                                 Fiji
59      Micronesia, Federated States of
63                                Ghana
65                               Gambia
69                              Grenada
79            Iran, Islamic Republic of
89                           Kyrgyzstan
91                             Kiribati
92                Saint Kitts and Nevis
107                              Monaco
108                Moldova, Republic of
110                            Maldives
112                    Marshall Islands
131                               Nauru
138                               Palau
139                    Papua New Guinea
143                            Paraguay
144                 Palestine, State of
153                     Solomon Islands
160                            Suriname


## 5. Adding new oldest business data
<p>It looks like we've got some holes in our dataset! Fortunately, we've taken it upon ourselves to improve upon BusinessFinancing.co.uk's work and find oldest businesses in a few of the missing countries. We've stored the newfound oldest businesses in <code>new_businesses</code>, located at <code>"datasets/new_businesses.csv"</code>. It has the exact same structure as our <code>businesses</code> dataset. </p>
<h3 id="new_businesses"><code>new_businesses</code></h3>
<table>
<thead>
<tr>
<th style="text-align:left;">column</th>
<th>type</th>
<th>meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"><code>business</code></td>
<td>varchar</td>
<td>Name of the business.</td>
</tr>
<tr>
<td style="text-align:left;"><code>year_founded</code></td>
<td>int</td>
<td>Year the business was founded.</td>
</tr>
<tr>
<td style="text-align:left;"><code>category_code</code></td>
<td>varchar</td>
<td>Code for the category of the business.</td>
</tr>
<tr>
<td style="text-align:left;"><code>country_code</code></td>
<td>char</td>
<td>ISO 3166-1 3-letter country code.</td>
</tr>
</tbody>
</table>
<p>All we have to do is combine the two so that we've got one more complete list of businesses!</p>

### Vertical Concatenation (just a test)

In [53]:
import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Clara', 'David'],
    'Age': [22, 29]
})

# Concatenate the DataFrames vertically
vertical_concat = pd.concat([df1, df2], axis=0)

display(vertical_concat)

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
0,Clara,22
1,David,29


### Horizontal Concatenation

In [54]:
# Create another DataFrame
df3 = pd.DataFrame({
    'City': ['New York', 'Los Angeles'],
    'Job': ['Engineer', 'Doctor']
})

# Concatenate df1 and df3 horizontally
horizontal_concat = pd.concat([df1, df3], axis=1)

display(horizontal_concat)

Unnamed: 0,Name,Age,City,Job
0,Alice,25,New York,Engineer
1,Bob,30,Los Angeles,Doctor


### Example of Concatenating DataFrames with Different Numbers of Columns

In [55]:
import pandas as pd

# Create two DataFrames with different columns
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Clara', 'David'],
    'City': ['Boston', 'Seattle']
})

# Concatenate the DataFrames vertically
vertical_concat = pd.concat([df1, df2], axis=0, sort=False)
display(vertical_concat)

Unnamed: 0,Name,Age,City
0,Alice,25.0,
1,Bob,30.0,
0,Clara,,Boston
1,David,,Seattle


In [56]:
# Import new_businesses.csv
new_businesses = pd.read_csv("datasets/new_businesses.csv")
display(new_businesses.head())
display(businesses.head())
# Add the data in new_businesses to the existing businesses
all_businesses = pd.concat([new_businesses, businesses]) # default axis=0
display(all_businesses.head())

# Merge and filter to find countries with missing business data
new_all_countries = all_businesses.merge(countries, on="country_code", how="outer",  indicator=True)
display(new_all_countries.head())
new_missing_countries = new_all_countries[new_all_countries["_merge"] != "both"]

# Group by continent and create a "count_missing" column
count_missing = new_missing_countries.groupby("continent").agg({"country":"count"})
count_missing.columns = ["count_missing"]
count_missing

Unnamed: 0,business,year_founded,category_code,country_code
0,Fiji Times,1869,CAT13,FJI
1,J. Armando Bermúdez & Co.,1852,CAT9,DOM


Unnamed: 0,business,year_founded,category_code,country_code
0,Hamoud Boualem,1878,CAT11,DZA
1,Communauté Électrique du Bénin,1968,CAT10,BEN
2,Botswana Meat Commission,1965,CAT1,BWA
3,Air Burkina,1967,CAT2,BFA
4,Brarudi,1955,CAT9,BDI


Unnamed: 0,business,year_founded,category_code,country_code
0,Fiji Times,1869,CAT13,FJI
1,J. Armando Bermúdez & Co.,1852,CAT9,DOM
0,Hamoud Boualem,1878,CAT11,DZA
1,Communauté Électrique du Bénin,1968,CAT10,BEN
2,Botswana Meat Commission,1965,CAT1,BWA


Unnamed: 0,business,year_founded,category_code,country_code,country,continent,_merge
0,Spinzar Cotton Company,1930.0,CAT1,AFG,Afghanistan,Asia,both
1,,,,AGO,Angola,Africa,right_only
2,ALBtelecom,1912.0,CAT18,ALB,Albania,Europe,both
3,Andbank,1930.0,CAT3,AND,Andorra,Europe,both
4,Liwa Chemicals,1939.0,CAT12,ARE,United Arab Emirates,Asia,both


Unnamed: 0_level_0,count_missing
continent,Unnamed: 1_level_1
Africa,3
Asia,7
Europe,2
North America,5
Oceania,10
South America,3


## 6. The oldest industries
<p>Remember our oldest business in the world, Kongō Gumi? </p>
<table>
<thead>
<tr>
<th style="text-align:right;"></th>
<th style="text-align:left;">business</th>
<th style="text-align:right;">year_founded</th>
<th style="text-align:left;">category_code</th>
<th style="text-align:left;">country_code</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:right;">64</td>
<td style="text-align:left;">Kongō Gumi</td>
<td style="text-align:right;">578</td>
<td style="text-align:left;">CAT6</td>
<td style="text-align:left;">JPN</td>
</tr>
</tbody>
</table>
<p>We know Kongō Gumi was founded in the year 578 in Japan, but it's a little hard to decipher which industry it's in. Information about what the <code>category_code</code> column refers to is in <code>"datasets/categories.csv"</code>: </p>
<h3 id="categories"><code>categories</code></h3>
<table>
<thead>
<tr>
<th style="text-align:left;">column</th>
<th>type</th>
<th>meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"><code>category_code</code></td>
<td>varchar</td>
<td>Code for the category of the business.</td>
</tr>
<tr>
<td style="text-align:left;"><code>category</code></td>
<td>varchar</td>
<td>Description of the business category.</td>
</tr>
</tbody>
</table>
<p>Let's use <code>categories.csv</code> to understand how many oldest businesses are in each category of industry.</p>

In [57]:
# Import categories.csv and merge to businesses
categories = pd.read_csv("datasets/categories.csv")
businesses_categories = businesses.merge(categories, on="category_code")
display(businesses_categories.head())

# Create a DataFrame which lists the number of oldest businesses in each category
count_business_cats = businesses_categories.groupby('category').agg({'business': 'count'})

# Rename column and display the first five rows of the DataFrame
count_business_cats.columns = ['count']
display(count_business_cats.head())

Unnamed: 0,business,year_founded,category_code,country_code,category
0,Hamoud Boualem,1878,CAT11,DZA,Food & Beverages
1,Communauté Électrique du Bénin,1968,CAT10,BEN,Energy
2,Botswana Meat Commission,1965,CAT1,BWA,Agriculture
3,Air Burkina,1967,CAT2,BFA,Aviation & Transport
4,Brarudi,1955,CAT9,BDI,"Distillers, Vintners, & Breweries"


Unnamed: 0_level_0,count
category,Unnamed: 1_level_1
Agriculture,6
Aviation & Transport,19
Banking & Finance,37
"Cafés, Restaurants & Bars",6
Conglomerate,3


## 7. Restaurant representation
<p>No matter how we measure it, looks like Banking and Finance is an excellent industry to be in if longevity is our goal! Let's zoom in on another industry: cafés, restaurants, and bars. Which restaurants in our dataset have been around since before the year 1800?</p>

### query method (just a test)

In [58]:
import pandas as pd

# Create a DataFrame
data = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 22],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [55000, 48000, 70000, 65000, 48000]
})

# Filter using query()
# Example 1: Select rows where Age is greater than 30
result1 = data.query('Age > 30')

# Example 2: Select rows where the City is either New York or Los Angeles
result2 = data.query("City in ['New York', 'Los Angeles']")

# Example 3: Use complex conditions, combining filters
# Select rows where the Age is greater than 25 and Salary is less than 65000
result3 = data.query('Age > 25 & Salary < 65000')

# Print the results
print("Filtered by Age > 30:")
display(result1)
print("\nFiltered by City (New York or Los Angeles):")
display(result2)
print("\nFiltered by Age and Salary condition:")
display(result3)

Filtered by Age > 30:


Unnamed: 0,Name,Age,City,Salary
2,Charlie,35,Chicago,70000
3,David,40,Houston,65000



Filtered by City (New York or Los Angeles):


Unnamed: 0,Name,Age,City,Salary
0,Alice,25,New York,55000
1,Bob,30,Los Angeles,48000



Filtered by Age and Salary condition:


Unnamed: 0,Name,Age,City,Salary
1,Bob,30,Los Angeles,48000


In [59]:
# Example 2: Select rows where the City is either New York or Los Angeles using .loc[] and .isin()
result4 = data.loc[data['City'].isin(['New York', 'Los Angeles'])]
display(result4)

result5 = data[data['City'].isin(['New York', 'Los Angeles'])]
result5

Unnamed: 0,Name,Age,City,Salary
0,Alice,25,New York,55000
1,Bob,30,Los Angeles,48000


Unnamed: 0,Name,Age,City,Salary
0,Alice,25,New York,55000
1,Bob,30,Los Angeles,48000


In [64]:
# Filter using .query() for CAT4 businesses founded before 1800; sort results
old_restaurants = businesses_categories.query('category_code == "CAT4" & year_founded < 1800') # query() comes from pandas
display(businesses_categories.head())
display(old_restaurants.head())

# Sort the DataFrame
old_restaurants = old_restaurants.sort_values(by = 'year_founded')
old_restaurants

Unnamed: 0,business,year_founded,category_code,country_code,category
0,Hamoud Boualem,1878,CAT11,DZA,Food & Beverages
1,Communauté Électrique du Bénin,1968,CAT10,BEN,Energy
2,Botswana Meat Commission,1965,CAT1,BWA,Agriculture
3,Air Burkina,1967,CAT2,BFA,Aviation & Transport
4,Brarudi,1955,CAT9,BDI,"Distillers, Vintners, & Breweries"


Unnamed: 0,business,year_founded,category_code,country_code,category
58,Ma Yu Ching's Bucket Chicken House,1153,CAT4,CHN,"Cafés, Restaurants & Bars"
94,St. Peter Stifts Kulinarium,803,CAT4,AUT,"Cafés, Restaurants & Bars"
111,Sean's Bar,900,CAT4,IRL,"Cafés, Restaurants & Bars"


Unnamed: 0,business,year_founded,category_code,country_code,category
94,St. Peter Stifts Kulinarium,803,CAT4,AUT,"Cafés, Restaurants & Bars"
111,Sean's Bar,900,CAT4,IRL,"Cafés, Restaurants & Bars"
58,Ma Yu Ching's Bucket Chicken House,1153,CAT4,CHN,"Cafés, Restaurants & Bars"


## 8. Categories and continents
<p>St. Peter Stifts Kulinarium is old enough that the restaurant is believed to have served Mozart - and it would have been over 900 years old even when he was a patron! Let's finish by looking at the oldest business in each category of commerce for each continent. </p>

### Official answer, and compare the different between two using reset_index() method and without using it

In [71]:
# Merge all businesses, countries, and categories together
businesses_categories_countries = businesses_categories.merge(countries, on='country_code') # default how='inner'

# Sort businesses_categories_countries from oldest to most recent
businesses_categories_countries = businesses_categories_countries.sort_values(by='year_founded')

# Create the oldest by continent and category DataFrame
oldest_by_continent_category = businesses_categories_countries.groupby(['continent', 'category']).agg({'year_founded': 'min'}).reset_index()

oldest_by_continent_category.head()

Unnamed: 0,continent,category,year_founded
0,Africa,Agriculture,1947
1,Africa,Aviation & Transport,1854
2,Africa,Banking & Finance,1892
3,Africa,"Distillers, Vintners, & Breweries",1933
4,Africa,Energy,1968


In [74]:
# This is another method:
oldest_by_continent_category = businesses_categories_countries.groupby(['continent', 'category']).first().reset_index() # first() returns the first row of each group, and would therefore include all columns
oldest_by_continent_category.head()

Unnamed: 0,continent,category,business,year_founded,category_code,country_code,country
0,Africa,Agriculture,Cameroon Development Corporation,1947,CAT1,CMR,Cameroon
1,Africa,Aviation & Transport,Egyptian National Railways,1854,CAT2,EGY,Egypt
2,Africa,Banking & Finance,Standard Chartered Zimbabwe,1892,CAT3,ZWE,Zimbabwe
3,Africa,"Distillers, Vintners, & Breweries",Tanzania Breweries Limited,1933,CAT9,TZA,"Tanzania, United Republic of"
4,Africa,Energy,Communauté Électrique du Bénin,1968,CAT10,BEN,Benin


In [73]:
# Merge all businesses, countries, and categories together
businesses_categories_countries = businesses_categories.merge(countries, on="country_code")

# Sort businesses_categories_countries from oldest to most recent
businesses_categories_countries = businesses_categories_countries.sort_values("year_founded")

# Create the oldest by continent and category DataFrame
oldest_by_continent_category = businesses_categories_countries.groupby(["continent", "category"]).agg({"year_founded":"min"})
oldest_by_continent_category

Unnamed: 0_level_0,Unnamed: 1_level_0,year_founded
continent,category,Unnamed: 2_level_1
Africa,Agriculture,1947
Africa,Aviation & Transport,1854
Africa,Banking & Finance,1892
Africa,"Distillers, Vintners, & Breweries",1933
Africa,Energy,1968
Africa,Food & Beverages,1878
Africa,Manufacturing & Production,1820
Africa,Media,1943
Africa,Mining,1962
Africa,Postal Service,1772
