# Basic-Fit analysis

The Basic-Fit analysis project aims to gain valuable insights from open-source data about all Basic-Fit gyms. The primary goal is to analyze the ratings, number of reviews, comments, and sentiment across all branches. By doing so, this project can help the business better understand end-users' problems and identify any weak points in some branches.

With this information, Basic-Fit can make data-driven decisions to improve customer satisfaction and address any issues that may arise. The project could potentially identify areas where branches are performing exceptionally well and use those as examples for other branches to follow. Overall, the project's findings can provide valuable insights into Basic-Fit's performance and help the business improve its operations to better serve its customers.

## Set up configurations

In [39]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
import os

In [40]:
#set up configuraitons
options = Options()
options.add_argument("--disable-blink-features=AutomationControlled")

#create a list of Countries and Cities
#countries_cities = {"Spain" : ['Madrid', 'Barcelona', 'Valencia', 'Seville', 'Zaragoza', 'Málaga', 'Murcia', 'Palma', 'Las Palmas de Gran Canaria', 'Bilbao', 'Alicante', 'Córdoba', 'Valladolid', 'Vigo', 'Gijón', 'L Hospitalet de Llobregat', 'A Coruña', 'Vitoria-Gasteiz', 'Granada', 'Elche']}
countries_cities = {"Spain" : ['Madrid', 'Barcelona', 'Valencia', 'Seville', 'Zaragoza', 'Málaga', 'Murcia', 'Palma', 'Las Palmas de Gran Canaria', 'Bilbao', 'Alicante', 'Córdoba', 'Valladolid', 'Vigo']}

In [41]:
#create Dataframe for collecting data
if os.path.isfile("basic-fit.csv"):
    #read file
    df = pd.read_csv(
        "basic-fit.csv",
        dtype={
            "Name" : str,
            "Country" : str,
            "City" : str,
            "Address" : str,
            "Rating" : float,
            "Reviews" : int,
            "Site" : str,
            "Link" : str
        }
        )
else:
    # create an empty dataframe with columns
    df = pd.DataFrame({
        "Name" : pd.Series(dtype="str"),
        "Country" : pd.Series(dtype="str"),
        "City" : pd.Series(dtype="str"),  
        "Address" : pd.Series(dtype="str"),
        "Rating" : pd.Series(dtype="float"),
        "Reviews" : pd.Series(dtype="int"),
        "Site" : pd.Series(dtype="str"), 
        "Link" : pd.Series(dtype="str")
        })

## Defining the question

1. What is the overall customer satisfaction rating for Basic-Fit chains?
2. How does the customer satisfaction rating vary across different Basic-Fit chains?
3. What are the most common complaints and positive feedback from customers about Basic-Fit chains?
4. How does the customer satisfaction rating correlate with the location, size or age of the Basic-Fit chain?
5. Are there any notable differences in customer satisfaction between Basic-Fit chains in different countries or regions?
6. How does Basic-Fit chains compare with other fitness chains in terms of customer satisfaction rating?
7. What are the factors that drive customer satisfaction for Basic-Fit chains?
8. How has the customer satisfaction rating for Basic-Fit chains changed over time?
9. What improvements could Basic-Fit chains make to enhance customer satisfaction?
10. Are there any patterns or trends in the customer feedback for Basic-Fit chains that can be used to inform future business decisions?

## Collecting the data

In [42]:
#scrape data from the google map
for country, cities in countries_cities.items():
    for city in cities:
        driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
        driver.get(f"https://www.google.com/maps/search/basic-fit+near+{city}+in+{country}/")

        #accept offerta
        accept_button = driver.find_elements("xpath", "//button[@aria-label='Accept all']")[0]
        accept_button.click()

        #find branches ellements
        branches = driver.find_elements("xpath", "//div[contains(@aria-label,'Basic-Fit') and @class='Nv2PK tH5CWc THOPZb ']")
        
        #get data from the element
        for branch in branches:

            #branch name
            branch_name = branch.get_attribute("aria-label")

            #branch detaled information link
            detailed_web = branch.find_element("xpath", ".//a[@class='hfpxzc']")
            detailed_web_link = detailed_web.get_attribute("href")

            #branch adress
            branch_address_el = branch.find_element("xpath", ".//div[@class='W4Efsd']/span[2]/jsl/span[2]")
            branch_address = branch_address_el.get_attribute("innerHTML")

            #branch rating
            rating = branch.find_element("xpath", ".//div[@class='AJB7ye']/span[2]/span[2]")
            rating_text = rating.get_attribute("aria-label")
            stars = rating_text.split("stars")[0].strip()

            #branch reviews
            reviews = rating_text.split("stars")[1].strip().split(" ")[0].replace(",", "")

            #branch web site
            branch_web = branch.find_element("xpath", ".//a[@class='lcr4fd S9kvJb']")
            branch_web_link = branch_web.get_attribute("href")

            #create a new row
            new_row = {
                    "Name" : f"{branch_name}",
                    "Country" : f"{country}",
                    "City" : f"{city}",
                    "Address" : f"{branch_address}",
                    "Rating" : f"{stars}",
                    "Reviews" : f"{reviews}",
                    "Site" : f"{branch_web_link}",
                    "Link" : f"{detailed_web_link}"
                }

            #add the row to the datafame
            df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)

# Close the browser window
driver.quit()

[WDM] - Downloading: 100%|██████████| 8.00M/8.00M [00:00<00:00, 32.2MB/s]


## Cleaning the data

In [43]:
#get dataframe shape
df.shape

(64, 8)

In [60]:
#check column datatypes
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64 entries, 0 to 63
Data columns (total 8 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Name     64 non-null     object 
 1   Country  64 non-null     object 
 2   City     64 non-null     object 
 3   Address  64 non-null     object 
 4   Rating   64 non-null     float64
 5   Reviews  64 non-null     int64  
 6   Site     64 non-null     object 
 7   Link     64 non-null     object 
dtypes: float64(1), int64(1), object(6)
memory usage: 4.1+ KB


In [59]:
#change dtypes
df["Rating"] = df["Rating"].astype(float)
df["Reviews"] = df["Reviews"].astype(int)
df.dtypes

Name        object
Country     object
City        object
Address     object
Rating     float64
Reviews      int64
Site        object
Link        object
dtype: object

In [45]:
#get data sample
df.head()

Unnamed: 0,Name,Country,City,Address,Rating,Reviews,Site,Link
0,Basic-Fit Madrid Goya,Spain,Madrid,"Calle de Goya, 43",3.9,524,https://www.google.com/aclk?sa=l&ai=DChcSEwi-1...,https://www.google.com/maps/place/Basic-Fit+Ma...
1,Basic-Fit,Spain,Madrid,"C. de Atocha, 24",3.5,1000,https://www.basic-fit.com/es-es/gimnasio/basic...,https://www.google.com/maps/place/Basic-Fit/da...
2,Basic-Fit Madrid Núñez de Balboa,Spain,Madrid,"Calle de Núñez de Balboa, 115",4.0,600,https://www.basic-fit.com/es-es/gimnasio/basic...,https://www.google.com/maps/place/Basic-Fit+Ma...
3,Basic-Fit,Spain,Madrid,"C. de la Montera, 41",3.6,535,https://www.basic-fit.com/es-es/gimnasio/basic...,https://www.google.com/maps/place/Basic-Fit/da...
4,Basic-Fit Madrid Sor Ángela de la Cruz,Spain,Madrid,"C. de Sor Ángela de la Cruz, 24",3.4,773,https://www.basic-fit.com/es-es/gimnasio/basic...,https://www.google.com/maps/place/Basic-Fit+Ma...


In [61]:
#describe data
df.describe()

Unnamed: 0,Rating,Reviews
count,64.0,64.0
mean,4.29375,309.046875
std,0.445034,304.714513
min,3.3,6.0
25%,3.9,42.0
50%,4.2,221.0
75%,4.8,479.0
max,4.9,1135.0


In [62]:
#amount of unique categories
df["Address"].nunique()

41

In [46]:
#find duplicates
duplicates = (df.groupby(by=["Address"])
    .count()["Name"]
    .sort_values(ascending=False)
    )
    
duplicates.loc[duplicates > 1]

Address
Av. de Isaac Peral, 61              6
C. Armengual de la Mota, 26         4
Av. Palma de Mallorca, 20           4
Blvr. Louis Pasteur, 20             4
C. Félix García Palacios, 1         4
Carrer de Joaquín Costa, 6          2
Rodríguez Arias K., 58              2
Av. de César Augusto, 17-19         2
CC Los Fresnos, C. Río de Oro, 3    2
Av. del Mestre Rodrigo, 32          2
Av. Juan Pablo II, 1                2
Name: Name, dtype: int64

In [47]:
#drop duplicates
df_unique = df.drop_duplicates(
    subset=["Name", "Address"],
    keep="first",
    inplace=False
    ).reset_index(drop=True)

#get dataframe shape
df_unique.shape

(41, 8)

In [48]:
#save data to csv
df_unique.to_csv("basic-fit.csv", index=False)

## Analyzing the data