# Pandas Practice with Fast Food Data
Author: JAAR

Date: 07/22/2025

In [3]:
# imports
import pandas as pd

In [4]:
# Load the fast food data
df=pd.read_csv('data\\US_top_50_fast_foods.csv')

Get the shape and basic information for the dataset

In [5]:
df.shape

(50, 7)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 7 columns):
 #   Column                                            Non-Null Count  Dtype 
---  ------                                            --------------  ----- 
 0   Fast-Food Chains                                  50 non-null     object
 1   U.S. Systemwide Sales (Millions - U.S Dollars)    50 non-null     int64 
 2   Average Sales per Unit (Thousands - U.S Dollars)  50 non-null     int64 
 3   Franchised Stores                                 50 non-null     int64 
 4   Company Stores                                    50 non-null     int64 
 5   2021 Total Units                                  50 non-null     int64 
 6   Total Change in Units from 2020                   50 non-null     int64 
dtypes: int64(6), object(1)
memory usage: 2.9+ KB


What are the data types?

In [7]:
df.dtypes

Fast-Food Chains                                    object
U.S. Systemwide Sales (Millions - U.S Dollars)       int64
Average Sales per Unit (Thousands - U.S Dollars)     int64
Franchised Stores                                    int64
Company Stores                                       int64
2021 Total Units                                     int64
Total Change in Units from 2020                      int64
dtype: object

Are there any null values?

In [8]:
df.isnull().sum().sum()

np.int64(0)

Replace all of the columns spaces with underscores

In [9]:
df.columns=df.columns.str.replace(' ', '_')

For the series containing the fast food chains, change it to just chain

In [11]:
df.rename(columns={'Fast-Food_Chains':'chain'}, inplace=True)

Get a sample of five entries

In [12]:
df.sample(5)

Unnamed: 0,chain,U.S._Systemwide_Sales_(Millions_-_U.S_Dollars),Average_Sales_per_Unit_(Thousands_-_U.S_Dollars),Franchised_Stores,Company_Stores,2021_Total_Units,Total_Change_in_Units_from_2020
37,Raising Cane’s,2377,4893,23,544,567,58
23,KFC,5100,1408,3906,47,3953,10
36,QDOBA,835,1006,406,333,739,2
17,Freddy’s Frozen Custard & Steakburgers,759,1842,391,29,420,32
5,Checkers/Rally’s,931,1145,568,266,834,-13


Rename 'Franchised Stores' as'stores_franchised' and 'Company Stores' as 'company_stores'

In [14]:
df.rename(columns={
    'Franchised_Stores':'franchised',
    'Company_Stores':'company_stores'
}, inplace=True)

Order the companies by number of franchises descending and ascending by names

In [24]:
df.sort_values(by=['franchised', 'chain'], ascending=[False, True]).head()

Unnamed: 0_level_0,U.S._Systemwide_Sales_(Millions_-_U.S_Dollars),Average_Sales_per_Unit_(Thousands_-_U.S_Dollars),franchised,company_stores,2021_Total_Units,Total_Change_in_Units_from_2020
chain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Subway,9350,438,21147,0,21147,-1043
McDonald’s,45960,3420,12775,663,13438,244
Dunkin',10416,1127,9244,0,9244,161
Burger King,10033,1470,7054,51,7105,24
Taco Bell,12600,1823,6540,462,7002,203


Retrieve the first three columns

In [25]:
df.iloc[:,0:3].head() # easier since the names for the columns are complicated as fuq

Unnamed: 0_level_0,U.S._Systemwide_Sales_(Millions_-_U.S_Dollars),Average_Sales_per_Unit_(Thousands_-_U.S_Dollars),franchised
chain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Arby’s,4462,1309,2293
Baskin-Robbins,686,296,2317
Bojangles,1485,1924,496
Burger King,10033,1470,7054
Carl’s Jr.,1560,1400,1011


Get every other row

In [26]:
df.iloc[::2].head() # truncated with head to show that it works

Unnamed: 0_level_0,U.S._Systemwide_Sales_(Millions_-_U.S_Dollars),Average_Sales_per_Unit_(Thousands_-_U.S_Dollars),franchised,company_stores,2021_Total_Units,Total_Change_in_Units_from_2020
chain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arby’s,4462,1309,2293,1116,3409,40
Bojangles,1485,1924,496,277,773,15
Carl’s Jr.,1560,1400,1011,47,1058,-21
Chick-fil-A,16700,6100,2650,82,2732,155
Church’s Chicken,776,870,731,161,892,-13


What companies have 2616, 831, and 2293 stores franchised?

In [27]:
df.loc[df.franchised.isin([2616, 831, 2293])]

Unnamed: 0_level_0,U.S._Systemwide_Sales_(Millions_-_U.S_Dollars),Average_Sales_per_Unit_(Thousands_-_U.S_Dollars),franchised,company_stores,2021_Total_Units,Total_Change_in_Units_from_2020
chain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arby’s,4462,1309,2293,1116,3409,40
Culver’s,2489,3099,831,6,837,55
Jimmy John’s,2301,866,2616,41,2657,48


Assign the chain column as the index column

In [None]:
df.set_index('chain', inplace=True)

Which company has the most franchised stores?

In [29]:
# Two way of solving this. If we just need the index, if we need a list, or if we need the max
df.franchised.idxmax()

'Subway'

Which company has the highest percentage of franchised stores?

In [30]:
(df.franchised / (df.franchised + df.company_stores)).sort_values(ascending=False).head()

chain
Baskin-Robbins    1.000000
Dunkin'           1.000000
Subway            1.000000
Tim Hortons       1.000000
Dairy Queen       0.999539
dtype: float64

Of the companies with 100% franchising, which has the most stores?

In [31]:
df.loc[df.index.isin(['Subway', 'Dunkin\'', 'Baskin-Robbins', 'Tim Hortons']), 'franchised']

chain
Baskin-Robbins     2317
Dunkin'            9244
Subway            21147
Tim Hortons         637
Name: franchised, dtype: int64

Rename the 2021 units column to store_count, 2020 unit change column to store_count_change, us sustemwide sales to sales_in_millions and average sales per unit to store_sales_thousands

In [None]:
# Maybe also change the mil and store sales at this time?
df.rename(columns={
    'Total_Change_in_Units_from_2020':'store_count_change',
    '2021_Total_Units':'store_count',
    'U.S._Systemwide_Sales_(Millions_-_U.S_Dollars)':'sales_in_millions',
    'Average_Sales_per_Unit_(Thousands_-_U.S_Dollars)':'store_sales_thousands'
}, inplace=True)

Create a column that has bool that reflects positive changes in store counts as True and negative as False

In [36]:
df['positive_store_count']=(df.store_count > 0)

Get both the first and last chains?

In [37]:
df.iloc[[0, -1]]

Unnamed: 0_level_0,U.S._Systemwide_Sales_(Millions_-_U.S_Dollars),Average_Sales_per_Unit_(Thousands_-_U.S_Dollars),franchised,company_stores,store_count,store_count_change,positive_store_count
chain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Arby’s,4462,1309,2293,1116,3409,40,True
Zaxby’s,2233,2484,761,147,908,3,True


Drop the Total Changes column and store this as a new df

In [38]:
drop_change=df.drop('store_count_change', axis=1)

Create a new column called positive growth where chains that opened stores have a 1 and chains that don't have a 0

In [39]:
df['positive_growth']=df.positive_store_count.astype(int)

Convert systemwide sales to thousands and store the value as a new column then drop the old column

In [42]:
df['sales_in_thousands'] = df.sales_in_millions * 1000

In [43]:
df=df.drop('sales_in_millions', axis=1)

Replace Multiples

In [44]:
# FIND A WAY TO TEST REPLACE, MULTIPLES
# df.key.replace({
#     0:'C', 1:'C#', 2:'D', 3:'D#', 4:'E', 5:'F', 6:'F#', 7:'G',
#     8:'G#', 9:'A', 10:'A#', 11:'B'
# }, inplace=True
# )

Sort chains by sales per store descending and name ascending

In [51]:
df.sort_values(by=['store_sales_thousands', 'chain'], ascending=[False, True]).head()

Unnamed: 0_level_0,store_sales_thousands,franchised,company_stores,store_count,store_count_change,positive_store_count,positive_growth,sales_in_thousands
chain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Chick-fil-A,6100,2650,82,2732,155,True,1,16700000
Raising Cane’s,4893,23,544,567,58,True,1,2377000
Krispy Kreme,4000,51,307,358,6,True,1,996000
Shake Shack,3679,25,218,243,38,True,1,777000
Whataburger,3640,131,742,873,29,True,1,3089000


Come up with an example where we use .clip()

In [57]:
df.columns

Index(['store_sales_thousands', 'franchised', 'company_stores', 'store_count',
       'store_count_change', 'positive_store_count', 'positive_growth',
       'sales_in_thousands'],
      dtype='object')

In [None]:
# this lets us take the top and bottom and replace all values after that fact with those numbers
df['tempo'] = df['tempo'].clip(lower=50, upper=150)

In [None]:
# df['artist_count']=(df['artists'].str.count(',').fillna(-1).astype(int) + 1)

In [None]:
# df.dropna().drop([...]).rename([...]).sort_values().head() CHAINING EXAMPLE
# df.drop(df.loc[df["Revenue"] < 80_000].index).sort_values(by='Revenue')
# df.drop(df.loc[df.Revenue < df.Revenue.mean()].index)
# df_usa_only = df.drop(df.loc[(~df['Is American?'])].index) is the same as df.loc[df['Is America']]
# removing columns by dropping them and then also renaming columns are things worth looking into

# datetime accessor with at .dt.year .dt.month


## Questions created by ChatGPT from easy to difficult
Following is a list of 30 questions produced by ChatGPT to practice my understanding of Pandas

In [None]:
1.	Get the Series of fast-food chain names.

2.	Count the number of null values in each column.

3.	Get the Series of total U.S. sales.

4.	Find the maximum sales value.

In [None]:
5.	Find the chain with the minimum number of units.
6.	Get a Series showing whether each chain is American.
7.	Count how many are American.
8.	Count how many are not American.
9.	Get a boolean Series: Is sales > 5000 million?
10.	Find the average number of total units.
11.	Get all chain names with sales over $5B (as Series).
12.	Sort the Series of total units in descending order.
13.	Create a Series of lowercase chain names.
14.	Get chains that are not American.
15.	Create a Series showing sales per unit.
16.	Check which chains have "Pizza" in their name.
17.	Get the 5 smallest chains by unit count.
18.	Rename the column "U.S. Systemwide Sales (Millions)" to lowercase using .rename().
19.	Find the chain with the second highest sales.
20.	Create a Series: is chain name longer than 10 characters?
21.	Create a Series showing sales rank (1 = highest).
22.	Get the average sales for non-American chains.
23.	Create a Series of chain names sorted by sales per unit.
24.	Count how many chains contain "Burger" or "Chicken" in the name.
25.	Normalize the sales column (0–1 range).
26.	Create a Series flagging "Global Giants" (sales > 5000 & total units > 10000).
27.	Create a Series of first letters of chain names.
28.	Replace all spaces in chain names with underscores.
29.	Bucket sales into categories: Low (<1000), Medium (1000–5000), High (>5000)
30.	Get a Series of boolean values: does chain name start with "M"?