# Extract data from MindBody Completed Class Account Page for Analysis

### Objective
1. Harvest the workout class activitiy from my account page on Mindbody.com
2. Use BeautifulSoup to extract the interesting data.
3. Write to the data do a csv for reporting and analysis. 

**Challenges**
- Mindbody is behind authentication
- The HTML is not well structured
- 

In [17]:
import pandas as pd
from bs4 import BeautifulSoup

Ingest the file


In [18]:
mhtml_file= 'data/Mindbody.htm'

with open(mhtml_file, 'r', encoding='utf-8') as f:
    html = f.read()
    
soup = BeautifulSoup(html, 'html.parser')

print(soup.prettify())

<!DOCTYPE html>
<!-- saved from url=(0055)https://www.mindbodyonline.com/explore/account/schedule -->
<html lang="en">
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <meta content="A/kargTFyk8MR5ueravczef/wIlTkbVk1qXQesp39nV+xNECPdLBVeYffxrM8TmZT6RArWGQVCJ0LRivD7glcAUAAACQeyJvcmlnaW4iOiJodHRwczovL2dvb2dsZS5jb206NDQzIiwiZmVhdHVyZSI6IkRpc2FibGVUaGlyZFBhcnR5U3RvcmFnZVBhcnRpdGlvbmluZzIiLCJleHBpcnkiOjE3NDIzNDIzOTksImlzU3ViZG9tYWluIjp0cnVlLCJpc1RoaXJkUGFydHkiOnRydWV9" http-equiv="origin-trial"/>
  <script src="./full_Account _ Mindbody_files/169e250927" type="text/javascript">
  </script>
  <script src="./full_Account _ Mindbody_files/nr-spa-1208.min.js">
  </script>
  <script async="" charset="utf-8" crossorigin="anonymous" integrity="sha384-8mJgBUBw4uTWF9Ooxgb4sUuO9jKtaVm1I+8vb0qpxxX3cafec7ovH+goM3yD4UyO" src="./full_Account _ Mindbody_files/recaptcha__en.js" type="text/javascript">
  </script>
  <script async="" src="./full_Account _ Mindbody_files/mixpan

In [20]:
clean_mb = soup.prettify()

with open('data/clean_mb.html', 'w', encoding='utf-8') as f:
    f.write(clean_mb)


Ingest the cleaned data

In [21]:
with open('data/clean_mb.html', 'r', encoding='utf-8') as f:
    html = f.read()
    
soup = BeautifulSoup(html, 'html.parser')
print(soup.title)


<title>
   Account | Mindbody
  </title>


Write File to CSV

### EDA and Data Prep

---

#### Key Challenges
1. **Inconsistent Data Structure**:
   - Elements like dates and times may be missing or structured differently, leading to misalignment in the extracted data.
   - This caused my data from being consistent in the dataframe. 
2. **Dynamic Content**:
   - If the HTML contains JavaScript-rendered content, static parsing may miss certain elements.
3. **Data Alignment**:
   - Extracted lists for different attributes (e.g., dates, categories) have mismatched lengths.

---

## Solution

### Step 1: Identify the Parent Container
- The workout schedule data is encapsulated in containers identified by the class `UserScheduleItem_wrapper__8PXfi`.
- Each container holds all relevant data (e.g., dates, times, categories, locations).

### Step 2: Iterative Parsing
- Iterate through each container and extract the following:
  - **Date (`day`)**: Extracted from `UserScheduleItemDate_day__1DJZ_`.
  - **Month and Year (`month_year`)**: Extracted from `UserScheduleItemDate_month__1Vlt2`.
  - **Category (`category`)**: Extracted from `CategoryTag_category__3BsVp`.
  - **Class Name (`class_name`)**: Extracted from `UserScheduleItemDetails_headerLink__1UVuC`.
  - **Location (`location`)**: Extracted from `UserScheduleItemDetails_link__u4Ftd`.
  - **Start Time (`time_of_day`)**: Extracted from `UserScheduleItemTime_start__1M87Z`.
  - **Duration (`duration`)**: Extracted from `UserScheduleItemTime_end__2Ykw-`.

### Step 3: Handle Missing Data Gracefully
- Use `.find()` and conditional checks to avoid errors for missing elements.
- Append `None` for any missing data to maintain alignment.

### Step 4: Create a DataFrame
- Compile the extracted data into a Pandas DataFrame for easy manipulation and analysis.
- Ensure alignment across all attributes by processing each container independently.

### Step 5: Save to CSV
- Save the final DataFrame to a CSV file for further analysis or reporting.

---



In [101]:
# Load the cleaned HTML file
html_file_path = "data/clean_mb.html"  # Replace with the correct file path

with open(html_file_path, "r", encoding="utf-8") as file:
    html_data = file.read()

# Parse the HTML content
soup = BeautifulSoup(html_data, "html.parser")

# Extract all 'UserScheduleItem_wrapper__8PXfi' containers
schedule_items = soup.find_all('div', class_='UserScheduleItem_wrapper__8PXfi')

# Parse data from each schedule item container
parsed_data = []
for item in schedule_items:
    day = item.find('div', class_='UserScheduleItemDate_day__1DJZ_')
    month_year = item.find('div', class_='UserScheduleItemDate_month__1Vlt2')
    category = item.find('span', class_='CategoryTag_category__3BsVp')
    class_name = item.find('span', class_='UserScheduleItemDetails_headerLink__1UVuC')
    location = item.find('span', class_='UserScheduleItemDetails_link__u4Ftd')
    time_of_day = item.find('h5', class_='UserScheduleItemTime_start__1M87Z')
    duration = item.find('div', class_='UserScheduleItemTime_end__2Ykw-')
    
    # Add parsed data to the list
    parsed_data.append({
        'day': day.get_text(strip=True) if day else None,
        'month_year': month_year.get_text(strip=True) if month_year else None,
        'category': category.get_text(strip=True) if category else None,
        'class_name': class_name.get_text(strip=True) if class_name else None,
        'location': location.get_text(strip=True) if location else None,
        'time_of_day': time_of_day.get_text(strip=True) if time_of_day else None,
        'duration': duration.get_text(strip=True) if duration else None,
    })

# Convert the parsed data to a DataFrame
df = pd.DataFrame(parsed_data)

# Save the DataFrame to a CSV file
output_file_path = "data/workout_schedule.csv"  # Replace with the desired output path
df.to_csv(output_file_path, index=False)

print(f"Cleaned workout schedule data saved to: {output_file_path}")


Cleaned workout schedule data saved to: data/workout_schedule.csv


In [104]:
df.tail()
df

Unnamed: 0,day,month_year,category,class_name,location,time_of_day,duration
0,13,"January, 2025",Martial arts,Fighting Mastery,"Cantu's Self-Defense, LLC. w/ Roy Cantu",12:30pm,(30 min)
1,13,"January, 2025",Martial arts,Fighting Foundations,"Cantu's Self-Defense, LLC. w/ Roy Cantu",11:30am,(60 min)
2,11,"January, 2025",Martial arts,Sparring,"Cantu's Self-Defense, LLC. w/ Roy Cantu",11:00am,(60 min)
3,08,"January, 2025",Martial arts,Fighting Mastery,"Cantu's Self-Defense, LLC. w/ Roy Cantu",12:30pm,(30 min)
4,08,"January, 2025",Martial arts,Fighting Foundations,"Cantu's Self-Defense, LLC. w/ Roy Cantu",11:30am,(60 min)
...,...,...,...,...,...,...,...
1233,11,"January, 2017",Other,Adult Fundamentals,"Cantu's Self-Defense, LLC. w/ Roy Cantu",5:30am,(60 min)
1234,09,"January, 2017",Other,Adult Fundamentals,"Cantu's Self-Defense, LLC. w/ Roy Cantu",5:30am,(60 min)
1235,07,"January, 2017",Other,Adult Spar,"Cantu's Self-Defense, LLC. w/ Roy Cantu",11:00am,(60 min)
1236,04,"January, 2017",Other,Adult Fundamentals,"Cantu's Self-Defense, LLC. w/ Roy Cantu",5:30am,(60 min)
