# Rhino Africa Inquiry Data Analysis
**Student Name:** [Benjamin Goldberg]

**Student ID:** [3501-4910]

**Course:** ISM4641 - Fall 2025

---

## Assignment Overview
This notebook analyzes customer inquiry data from Rhino Africa (`inquiries_mini.csv`) using custom Python functions created in the `my_rhino_lastname.py` module. The analysis covers geographic patterns, budget trends, travel timing, and customer segmentation.

## Section 1: Data Loading and Overview (8 points)

In this section, we'll import our custom module and load the inquiry data to get a basic understanding of the dataset.

In [1]:
# Import your custom module
import my_rhino_goldberg as my_rhino # change my_rhino_lastname to be the exact file name you are using for your file

# Load the inquiries_mini.csv data using your function
inquiries = my_rhino.load_inquiries_data('inquiries_mini.csv')

# Display basic information about the dataset
my_rhino.print_data_summary(inquiries)

Total records: 45203
Number of columns: 16
Column names: inquiry_id, customer_id, timestamp_utc, timestamp_local, timezone_offset, completed, customer_country, customer_city, customer_latitude, customer_longitude, travel_year, travel_month, num_adults, num_children, budget_currency, budget_value


Section 1: Data Loading and Overview — Commentary

This data represents customer inquiries. The datset includes the customers id, country, number of adults and children, the budget, and travel month. The values I just stated we will be analzying later in this project. This first section loads the module script into the jupyter notebook. The section also loads the inquiries_mini.csv. The second function that is used is the print_data_summary. This function returns the total records, columns, and each column name.

## Section 2: Geographic Analysis (8 points)

Let's analyze where our customers are coming from by examining inquiry patterns by country.

In [2]:
# Analyze inquiries from specific countries
us_inquiries = my_rhino.count_inquiries_by_country(inquiries, 'United States')
uk_inquiries = my_rhino.count_inquiries_by_country(inquiries, 'United Kingdom')
sa_inquiries = my_rhino.count_inquiries_by_country(inquiries, 'South Africa')

print(f"United States inquiries: {us_inquiries}")
print(f"United Kingdom inquiries: {uk_inquiries}")
print(f"South Africa inquiries: {sa_inquiries}")

United States inquiries: 14221
United Kingdom inquiries: 9069
South Africa inquiries: 8995


In [3]:
# Create a list of top countries by manually checking several countries
# You'll need to test different countries and create your top 5 list
countries_to_check = ['United States', 'United Kingdom', 'South Africa', 'Canada', 'Australia', 'France', 'Belgium']
country_counts = []

for country in countries_to_check:
    count = my_rhino.count_inquiries_by_country(inquiries, country)
    if count > 0:
        country_counts.append((country, count))

# Sort by count (descending)
country_counts.sort(key=lambda x: x[1], reverse=True)

print("Top Countries by Inquiry Count:")
for i, (country, count) in enumerate(country_counts[:5], 1):
    print(f"{i}. {country}: {count} inquiries")

Top Countries by Inquiry Count:
1. United States: 14221 inquiries
2. United Kingdom: 9069 inquiries
3. South Africa: 8995 inquiries
4. Australia: 2566 inquiries
5. Canada: 2218 inquiries


### Geographic Patterns Commentary

Inquiries are concentrated in a few English-speaking markets. The United States leads with 14221 inquiries, followed by the United Kingdom 9069 inquiries and South Africa 8995 inquiries. Australia and Canada are next. This suggests strong brand reach in North America, the UK, and the Southern African region, with room to grow in continental Europe and Asia. I was suprised with the number of South Africa inquries because of the proximity to other African nations

## Section 3: Budget Analysis (8 points)

Now let's examine the budget information from customer inquiries to understand spending patterns.

In [4]:
# Get budget statistics
budget_stats = my_rhino.get_budget_statistics(inquiries)

print("Budget Statistics:")
print(f"Total Budget Value: ${budget_stats['total']:,.2f}")
print(f"Average Budget: ${budget_stats['average']:,.2f}")
print(f"Minimum Budget: ${budget_stats['min']:,.2f}")
print(f"Maximum Budget: ${budget_stats['max']:,.2f}")
print(f"Number of Valid Budgets: {budget_stats['count']}")

Budget Statistics:
Total Budget Value: $731,501,000.00
Average Budget: $16,182.58
Minimum Budget: $4,000.00
Maximum Budget: $40,000.00
Number of Valid Budgets: 45203


In [5]:
# Calculate percentage of inquiries with valid budget information
total_inquiries = len(inquiries)
valid_budgets = budget_stats['count']
budget_completion_rate = (valid_budgets / total_inquiries) * 100

print(f"\nBudget Completion Analysis:")
print(f"Total Inquiries: {total_inquiries}")
print(f"Inquiries with Valid Budgets: {valid_budgets}")
print(f"Budget Completion Rate: {budget_completion_rate:.1f}%")


Budget Completion Analysis:
Total Inquiries: 45203
Inquiries with Valid Budgets: 45203
Budget Completion Rate: 100.0%


### Budget Patterns Commentary



The average budget tells us roughly the amount we need to prepare for each inquiry, this can be used as the typical price for a budget. Additionally, using the min operator we can use this for quoting. For example, we could say on our website "Starting from $4,000" Since we use the mini_inquries csv file we have a 100 percent completion rate. We can draw insights such as the rough amount of capital we need to save to support the budget.

## Section 4: Travel Timing Patterns (8 points)

Let's analyze when customers prefer to travel by examining the travel month data.

In [6]:
# Get popular travel months
popular_months = my_rhino.find_popular_travel_months(inquiries)

print("Popular Travel Months (Ranked by Frequency):")
for i, (month, count) in enumerate(popular_months, 1):
    print(f"{i}. {month}: {count} inquiries")

Popular Travel Months (Ranked by Frequency):
1. July: 4374 inquiries
2. August: 4245 inquiries
3. June: 4183 inquiries
4. May: 3995 inquiries
5. September: 3843 inquiries
6. April: 3755 inquiries
7. October: 3603 inquiries
8. March: 3472 inquiries
9. January: 3230 inquiries
10. November: 3040 inquiries
11. February: 2883 inquiries
12. December: 2836 inquiries
13. Any month: 1744 inquiries


In [7]:
# Additional analysis: peak vs off-peak seasons
if popular_months:
    peak_month = popular_months[0]
    print(f"\nSeasonal Analysis:")
    print(f"Peak Month: {peak_month[0]} with {peak_month[1]} inquiries")

    # Calculate what percentage of travel months are specified
    total_with_months = sum(count for month, count in popular_months)
    total_inquiries = len(inquiries)
    month_completion_rate = (total_with_months / total_inquiries) * 100
    print(f"Travel Month Completion Rate: {month_completion_rate:.1f}%")


Seasonal Analysis:
Peak Month: July with 4374 inquiries
Travel Month Completion Rate: 100.0%


### Travel Timing Commentary

Travel interest peaks in the middle of the year: July is highest, followed by August and June, May and September also strong. This aligns with Southern Africa’s dry season (roughly May–October), when wildlife viewing is best and many travelers have summer holidays. Lower interest in November–March may reflect wetter conditions and fewer school breaks.

## Section 5: Customer Segmentation (8 points)

Finally, let's analyze customer behavior patterns by looking at completion rates and group sizes.

In [8]:
# Display completion report
print("Inquiry Completion Analysis:")
my_rhino.print_completion_report(inquiries)

Inquiry Completion Analysis:


{'TRUE': 23713, 'FALSE': 21490}

In [9]:
# Analyze group sizes
group_categories = my_rhino.categorize_group_sizes(inquiries)

print("\nGroup Size Analysis:")
total_categorized = sum(group_categories.values())

for category, count in group_categories.items():
    percentage = (count / total_categorized) * 100
    print(f"{category}: {count} inquiries ({percentage:.1f}%)")

print(f"\nTotal Categorized Inquiries: {total_categorized}")


Group Size Analysis:
solo: 3042 inquiries (6.7%)
couple: 19258 inquiries (42.6%)
family: 19212 inquiries (42.5%)
large_group: 3691 inquiries (8.2%)
unknown: 0 inquiries (0.0%)

Total Categorized Inquiries: 45203


In [10]:
# Additional insight: Most common customer type
most_common_type = max(group_categories, key=group_categories.get)
print(f"\nCustomer Insights:")
print(f"Most Common Customer Type: {most_common_type} ({group_categories[most_common_type]} inquiries)")

# Calculate data completeness for group size information
total_inquiries = len(inquiries)
group_data_rate = (total_categorized / total_inquiries) * 100
print(f"Group Size Data Completion Rate: {group_data_rate:.1f}%")


Customer Insights:
Most Common Customer Type: couple (19258 inquiries)
Group Size Data Completion Rate: 100.0%


### Customer Behavior Commentary


Our completion rate tells us how many inquries we still need to complete this is important to track because this gives us an estimate for our conversion rate. The group size that is the most common is the Couple category. This information for marketing and operations because we know our target audience (who to market towards) and operations (who we can tailor our itineraries for). An example would be adding romantic itineraries such as honeymoons to better match our couple target audience.

## Summary and Conclusions



Overall, our strongest demand comes from North america followed by South Africa. Our budget analysis gives us a good idea of our costs and pricing. Our peak season is June- September matching the classic safari timing.

Some recommendations i have: Our inventory and marketing should favor our peak season months, our target audience are couples and families so we could create a honeymoon/romatic safari themes and family oriented themes. We also need to work on getting a higher form completion rate so working on that would help.