![gym](gym.png)


You are a product manager for a fitness studio and are interested in understanding the current demand for digital fitness classes. You plan to conduct a market analysis in Python to gauge demand and identify potential areas for growth of digital products and services.

### The Data

You are provided with a number of CSV files in the "Files/data" folder, which offer international and national-level data on Google Trends keyword searches related to fitness and related products. 

### workout.csv

| Column     | Description              |
|------------|--------------------------|
| `'month'` | Month when the data was measured. |
| `'workout_worldwide'` | Index representing the popularity of the keyword 'workout', on a scale of 0 to 100. |

### three_keywords.csv

| Column     | Description              |
|------------|--------------------------|
| `'month'` | Month when the data was measured. |
| `'home_workout_worldwide'` | Index representing the popularity of the keyword 'home workout', on a scale of 0 to 100. |
| `'gym_workout_worldwide'` | Index representing the popularity of the keyword 'gym workout', on a scale of 0 to 100. |
| `'home_gym_worldwide'` | Index representing the popularity of the keyword 'home gym', on a scale of 0 to 100. |

### workout_geo.csv

| Column     | Description              |
|------------|--------------------------|
| `'country'` | Country where the data was measured. |
| `'workout_2018_2023'` | Index representing the popularity of the keyword 'workout' during the 5 year period. |

### three_keywords_geo.csv

| Column     | Description              |
|------------|--------------------------|
| `'country'` | Country where the data was measured. |
| `'home_workout_2018_2023'` | Index representing the popularity of the keyword 'home workout' during the 5 year period. |
| `'gym_workout_2018_2023'` | Index representing the popularity of the keyword 'gym workout' during the 5 year period.  |
| `'home_gym_2018_2023'` | Index representing the popularity of the keyword 'home gym' during the 5 year period. |

In [20]:
# Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

### When was the global search for 'workout' at its peak? 

To determine when the global search for 'workout' was at its peak, follow these steps:

1. Identify the highest value in the 'workout_worldwide' column.
2. Locate the row(s) in the DataFrame that match this highest value.
3. Extract the 'month' value from these row(s).
4. Format the 'month' value to return the result in 'YYYY' format.


In [21]:
workout = pd.read_csv("data/workout.csv")
workout

Unnamed: 0,month,workout_worldwide
0,2018-03,59
1,2018-04,61
2,2018-05,57
3,2018-06,56
4,2018-07,51
...,...,...
56,2022-11,47
57,2022-12,44
58,2023-01,62
59,2023-02,57


In [1]:
import pandas as pd

workout = pd.read_csv("data/workout.csv")
workout

# find the max workout worldwide
# return the year associated with it

max_num = workout['workout_worldwide'].max()
year_str = workout.loc[workout['workout_worldwide'] == max_num, 'month'].values[0][:4]
print('The global search for workout was at its peak in the year ' + year_str + '.')


The global search for workout was at its peak in the year 2020.


### Of the keywords available, what was the most popular during the covid pandemic, and what is the most popular now? 

To determine which keyword was the most popular during the COVID pandemic and which one is the most popular now, follow these steps:

1. **Most Popular During the COVID Pandemic:**
   - Calculate the sum of each keyword column individually.
   - Identify the maximum value among these sums to determine the most popular keyword.
   - Find the keyword column name that corresponds to this maximum sum value.

2. **Most Popular Currently:**
   - Extract the last row from the DataFrame.
   - Identify the keyword with the highest popularity in this last row.
   - Return the name of the most searched keyword.


In [23]:
three_keywords = pd.read_csv('data/three_keywords.csv')
three_keywords

Unnamed: 0,month,home_workout_worldwide,gym_workout_worldwide,home_gym_worldwide
0,2018-03,12,16,10
1,2018-04,12,18,10
2,2018-05,13,16,9
3,2018-06,12,17,9
4,2018-07,12,17,9
...,...,...,...,...
56,2022-11,11,18,12
57,2022-12,11,16,11
58,2023-01,17,22,15
59,2023-02,14,21,12


In [2]:
import pandas as pd

three_keywords = pd.read_csv('data/three_keywords.csv')
three_keywords

# sum up the values in the columns and find the max of the three

home_sum = three_keywords['home_workout_worldwide'].values.sum()
gym_sum = three_keywords['gym_workout_worldwide'].values.sum()
home_gym_sum = three_keywords['home_gym_worldwide'].values.sum()

# Determine the most popular keyword during COVID

if home_sum > gym_sum and home_sum > home_gym_sum:
    peak_covid = 'home_workout_worldwide'
elif gym_sum > home_sum and gym_sum > home_gym_sum:
    peak_covid = 'gym_workout_worldwide'
else:
    peak_covid = 'home_gym_worldwide'

print('Of the keywords available, the most popular during the covid pandemic was ' + peak_covid +'.')

# Check the last row to find the current most popular keyword

last_row = three_keywords.iloc[-1]

if last_row['home_workout_worldwide'] > last_row['gym_workout_worldwide'] and last_row['home_workout_worldwide'] > last_row['home_gym_worldwide']:
    current = 'home_workout_worldwide'
elif last_row['gym_workout_worldwide'] > last_row['home_workout_worldwide'] and last_row['gym_workout_worldwide'] > last_row['home_gym_worldwide']:
    current = 'gym_workout_worldwide'
else:
    current = 'home_gym_worldwide'

print('The most popular now is ' + current + '.')



Of the keywords available, the most popular during the covid pandemic was home_workout_worldwide.
The most popular now is gym_workout_worldwide.


### What country has the highest interest for workouts among the following: United States, Australia, or Japan?

To determine which country has the highest interest in workouts among the United States, Australia, and Japan, we need to:

1. Extract the values for each of these countries from the `workout_geo` DataFrame under the `workout_2018_2023` column.
2. Identify the maximum value among these three countries.
3. Return the country with the highest value as `top_country`.


In [25]:
workout_geo = pd.read_csv('data/workout_geo.csv')
workout_geo

Unnamed: 0,country,workout_2018_2023
0,Guam,
1,Falkland Islands (Islas Malvinas),
2,Cook Islands,
3,Brunei,
4,Palau,
...,...,...
245,Tokelau,
246,Tuvalu,
247,U.S. Outlying Islands,
248,Vatican City,


In [3]:
import pandas as pd

workout_geo = pd.read_csv('data/workout_geo.csv')
workout_geo

# find the values for Us, AUS and JPN
#find the max out of the three
# print the country with max


US = workout_geo.loc[workout_geo['country'] == 'United States', 'workout_2018_2023'].values

AUS = workout_geo.loc[workout_geo['country'] == 'Australia', 'workout_2018_2023'].values

JPN = workout_geo.loc[workout_geo['country'] == 'Japan', 'workout_2018_2023'].values

# Find highest value

if US > AUS and US > JPN:
    top_country = 'United States'
elif AUS > US and AUS > JPN:
    top_country = 'Australia'
else:
    top_country = 'Japan'

print('The country that has the highest interest for workouts is ' + top_country + '.')

The country that has the highest interest for workouts is United States.


### Which of the two countries (Philippines or Malaysia) has the highest interest in home workouts? 

To determine which country, between the Philippines and Malaysia, has the highest interest in home workouts, we will follow these steps:

1. Retrieve the popularity values for the keyword 'home_workout_2018_2023' for both the Philippines and Malaysia from the `three_keywords_geo` DataFrame.
2. Compare the retrieved values to identify the country with the higher interest.
3. Return a statement indicating which country has the higher interest in home workouts.


In [27]:
three_keywords_geo = pd.read_csv('data/three_keywords_geo.csv')
three_keywords_geo

Unnamed: 0,Country,home_workout_2018_2023,gym_workout_2018_2023,home_gym_2018_2023
0,Gibraltar,,,
1,Lesotho,,,
2,Guam,,,
3,Botswana,,,
4,Brunei,,,
...,...,...,...,...
245,Tokelau,,,
246,Tuvalu,,,
247,U.S. Outlying Islands,,,
248,Vatican City,,,


In [5]:
import pandas as pd

three_keywords_geo = pd.read_csv('data/three_keywords_geo.csv')
three_keywords_geo

PH = three_keywords_geo.loc[three_keywords_geo['Country'] == 'Philippines', 'home_workout_2018_2023'].values

MA = three_keywords_geo.loc[three_keywords_geo['Country'] == 'Malaysia', 'home_workout_2018_2023'].values

home_workout_geo = 'Philippines' if PH > MA else 'Malaysia'

print('The country with the highest interest in home workouts is ' + home_workout_geo + '.')

The country with the highest interest in home workouts is Philippines.
