# Strava Exports to CSV Files
Now that all of our data is uncompressed, we have to transform all of the seperate files to something that can be loaded into a database. MySQL's `LOAD DATA INFILE` is quite efficient for bulk loading, so if we are able to transform the data for each table into a CSV file then loading the data should be more straightforward.

As we get into going through data for various users it gets more difficult to keep track of what data is supposed to represent and who it belongs to, so I think an object oriented approach might be better. In the end, we want load all the data for each table into a pandas DataFrame, use the `pandas.to_csv()` method, and then import the generated CSVs into MySQL.

For information and advice on the three filetypes used and their contents, this tutorial was heavily referenced: [Parsing fitness tracker data with Python](https://towardsdatascience.com/parsing-fitness-tracker-data-with-python-a59e7dc17418/)

In [27]:
'''
Package Imports, Constants, Global Variable

Run this cell to import all the packages we need and define some constants. 
You'll likely need to install any missing packages to your Python environment
with pip or your package manager of choice.
'''

import os
import gpxpy
import fitdecode
import pandas as pd
import csv
from collections import namedtuple
import random
from datetime import datetime, timedelta

DATA_DIR_PATH = os.path.join('..', 'data')  # Path of data directory relative to this Jupyter Notebook
ACTIVITY_DIR_PATH = os.path.join(DATA_DIR_PATH, 'export_activities') # Parent directory of all exports
OUTPUT_DIR_PATH = os.path.join(DATA_DIR_PATH, 'outputs')
cur_activity_id = 0   # Global activity counter to give each activity a unique id across users
MYSQL_NULL = 'NULL'

# Namedtuple to pass around fields we want to end up in the activities table
Activity_Fields = namedtuple('Activity_Fields', ['user_id', 
                                                 'activity_id', 
                                                 'name', 
                                                 'type', 
                                                 'description', 
                                                 'filename'])

Let's define an `Activity` object. We will use this object to store all the data from an individual file from an export, whether it is a .fit, .gpx, or .tcx file. By feeding the path to the activity file in the constructor, we are able to make an `Activity` create itself from a file when it is instantiated.

In [None]:
class Activity:

  __activity_summary_keys = ['start_datetime', 'end_datetime', 
                             'distance_2d', 'distance_3d',
                             'avg_speed', 'max_speed',
                             'uphill', 'downhill',
                             'avg_hr', 'min_hr', 'max_hr',
                             'avg_cad','min_cad','max_cad',
                             'total_kcal']


  def __init__(self, activity_fields: Activity_Fields, activity_filepath: os.PathLike):
    self.__activity_id = activity_fields.activity_id
    self.__activity_filepath = activity_filepath
    self.__points_df = pd.DataFrame()

    self.__activity_summary = activity_fields._asdict()
    
    global MYSQL_NULL
    self.__activity_summary.update(dict.fromkeys(self.__activity_summary_keys,
                                                 MYSQL_NULL))

    self.__point_dict = {
      'activity_id': [],
      'latitude': [],
      'longitude': [],
      'elevation': [],
      'time': [],
      'speed': [],
      'hr': [],
      'cad': []
    }

    self.__load_from_file()
    

  def __load_from_file(self) -> None:
    match self.__activity_filepath.split('.')[-1].lower():
      case 'gpx':
        self.__load_from_gpx()
      case 'tcx':
        self.__load_from_tcx()
      case 'fit':
        self.__load_from_fit()


  def __load_from_gpx(self) -> None:

    with open(self.__activity_filepath) as f:
      gpx = gpxpy.parse(f)

      uphill, downhill = 0, 0

      if len(gpx.tracks) == 0:
        raise ValueError(f'No tracks found in gpx file {os.path.abspath(self.__activity_filepath)}')

      for track in gpx.tracks:

        uphill_downhill = track.get_uphill_downhill()
        uphill += uphill_downhill.uphill
        downhill += uphill_downhill.downhill
        
        for segment in track.segments:
          for point_idx, point in enumerate(segment.points):            
            self.__point_dict['activity_id'].append(self.__activity_id)
            self.__point_dict['time'].append(point.time)
            self.__point_dict['latitude'].append(point.latitude)
            self.__point_dict['longitude'].append(point.longitude)
            self.__point_dict['elevation'].append(point.elevation)

            # Adding speed
            point_speed = point.speed
            if point_idx == 0:
              point_speed = 0
            elif point_speed == None:
              point_speed = point.speed_between(segment.points[point_idx - 1])
            self.__point_dict['speed'].append(round(point_speed, 3))

            # Adding extensions
            found_hr = False
            found_cad = False
            for extension in point.extensions:

              hr_element = extension.find('{http://www.garmin.com/xmlschemas/TrackPointExtension/v1}hr')
              cad_element = extension.find('{http://www.garmin.com/xmlschemas/TrackPointExtension/v1}cad')
              
              # Adding heart rate, if exists
              if hr_element != None and hr_element.text:
                self.__point_dict['hr'].append(int(hr_element.text))
                found_hr = True
              
              # Adding cadence, if exists
              if cad_element != None and cad_element.text:
                self.__point_dict['cad'].append(int(cad_element.text))
                found_cad = True
            
            global MYSQL_NULL
            # Adding nulls if cadence or heart rate don't exist
            if not found_hr:
              self.__point_dict['hr'].append(MYSQL_NULL)
            if not found_cad:
              self.__point_dict['cad'].append(MYSQL_NULL)
      
      # Creating the dataframe of points from the dictionary
      self.__points_df = pd.DataFrame(self.__point_dict)

      # Populating activity summary dictionary
      timebounds = gpx.get_time_bounds()
      self.__activity_summary.update({'start_datetime': timebounds.start_time,
                                      'end_datetime': timebounds.end_time,
                                      'distance_2d': round(gpx.length_2d(), 3),
                                      'distance_3d': round(gpx.length_3d(), 3),
                                      'avg_speed': round(self.__points_df['speed'].mean(), 3),
                                      'max_speed': round(self.__points_df['speed'].max(), 3),
                                      'uphill': round(uphill, 3),
                                      'downhill': round(downhill, 3)})
      
      questionable_cols = ['hr', 'cad']
      for col in questionable_cols:
        if(self.__points_df[col].dtype == 'int64'):
          self.__activity_summary.update({'avg_' + col: round(self.__points_df[col].mean(), 3),
                                          'min_' + col: self.__points_df[col].min(),
                                          'max_' + col: self.__points_df[col].max()})
    

    # Ensure activity_id is stored as integer
    self.__points_df['activity_id'] = self.__points_df['activity_id'].astype(int)


  # Stubs to be replaced with real file parsing code
  def __load_from_tcx(self) -> None:
    pass

  # Helper for __load_from_fit
  def __convert_semicircles_to_degrees(self, val):
      if val is None:
        return None
      try:
        # Some fit decoders return semicircles; convert only when value appears large
        if abs(val) > 90:
          return val * (180.0 / (2**31))
        return val
      except Exception:
        return val

  # The following function is an absolute mess and desperately needs a refactor :(
  def __load_from_fit(self) -> None:
    # Parse .fit file using fitdecode
    # Use session messages to populate activity summary and record messages to populate points

    # iterate through fit file messages
    with fitdecode.FitReader(self.__activity_filepath) as fit:

      current_speed = 0.0
      current_altitude = 0.0

      for frame in fit:
        global MYSQL_NULL
        # only process data messages
        if not isinstance(frame, fitdecode.records.FitDataMessage):
          continue
          
        # session messages -> summary fields. Assuming one session per fit file
        if frame.name == 'session':
          try:
            start_time = frame.get_value('start_time')
            duration = frame.get_value('total_elapsed_time')
            elapsed_timedelta = timedelta(seconds = duration)
            end_time = start_time + elapsed_timedelta
            total_distance = frame.get_value('total_distance')
          except Exception as e:
            raise ValueError(f'Unable to find time bounds or total distance in fit file session {os.path.abspath(self.__activity_filepath)}')
          
          total_calories = frame.get_value('total_calories')

          avg_speed, max_speed = 0,0
          try:
            avg_speed = frame.get_value('avg_speed')
            max_speed = frame.get_value('max_speed')
          except KeyError:
            pass
          

          # elevation gain/descent fields may have different names across devices
          total_ascent = 0
          try:
            total_ascent = frame.get_value('total_ascent')
          except KeyError:
            pass
          try:
            total_ascent = frame.get_value('total_elevation_gain')
          except KeyError:
            pass

          total_descent = 0
          try:
            total_descent = frame.get_value('total_descent')
          except KeyError:
            pass
          try:
            total_descent = frame.get_value('total_elevation_loss')
          except KeyError:
            pass

          uphill, downhill = total_ascent, total_descent

          self.__activity_summary.update({'start_datetime': start_time,
                                      'end_datetime': end_time,
                                      'distance_2d': round(total_distance, 3),
                                      'avg_speed': round(avg_speed, 3) if avg_speed else MYSQL_NULL,
                                      'max_speed': round(max_speed, 3) if max_speed else MYSQL_NULL,
                                      'uphill': round(uphill, 3) if uphill else MYSQL_NULL,
                                      'downhill': round(downhill, 3) if downhill else MYSQL_NULL,
                                      'total_kcal': int(total_calories)})

        # Using gps_metadata messages to substitute for when altitude is not available
        # in record frame
        elif frame.name == 'gps_metadata':
          altitude = current_altitude

          try:
            interim_altitude = frame.get_value('altitude')
            if interim_altitude:
              altitude = interim_altitude
          except KeyError:
            pass
          
          try:
            interim_altitude = frame.get_value('enhanced_altitude')
            if interim_altitude:
              altitude = interim_altitude
          except KeyError:
            pass

          current_altitude = altitude

        # record messages -> per-point data
        elif frame.name == 'record':

          lat, lon = None, None
          try:
            lat = frame.get_value('position_lat', raw_value = False)
            lon = frame.get_value('position_long', raw_value = False)
          except KeyError:
            pass

          if not lat or not lon:
            continue

          # attempt conversion if semicircles
          lat = self.__convert_semicircles_to_degrees(lat)
          lon = self.__convert_semicircles_to_degrees(lon)

          self.__point_dict['activity_id'].append(self.__activity_id)

          self.__point_dict['latitude'].append(lat)
          self.__point_dict['longitude'].append(lon)

          timestamp = frame.get_value('timestamp')
          self.__point_dict['time'].append(timestamp)

          altitude = current_altitude
          try:
            interim_altitude = frame.get_value('altitude')
            if interim_altitude:
              altitude = interim_altitude
          except KeyError:
            pass
          try:
            interim_altitude = frame.get_value('enhanced_altitude')
            if interim_altitude:
              altitude = interim_altitude
          except KeyError:
            pass

          if isinstance(altitude, tuple):
            altitude = altitude[0]

          current_altitude = altitude
          self.__point_dict['elevation'].append(round(altitude, 3))

          speed = current_speed
          try:
            speed = frame.get_value('speed')
          except KeyError:
            pass
          try:
            speed = frame.get_value('enhanced_speed')
          except KeyError:
            pass
          current_speed = speed
          self.__point_dict['speed'].append(speed)
          
          hr = 0
          try:
            hr = frame.get_value('heart_rate')
          except KeyError:
            pass
          if hr:
            self.__point_dict['hr'].append(int(hr))
          else:
            self.__point_dict['hr'].append(MYSQL_NULL)
          
          cad = 0
          try:
            cad = frame.get_value('cadence')
          except KeyError:
            pass
          if cad:
            self.__point_dict['cad'].append(int(cad))
          else:
            self.__point_dict['cad'].append(MYSQL_NULL)

    # build dataframe from accumulated point dict
    self.__points_df = pd.DataFrame(self.__point_dict)

    # Datetime is messed up sometimes, this is a fix I guess
    self.__points_df['time'] = self.__points_df['time'].astype(str)
    self.__points_df['time'] = self.__points_df['time'].astype(str)

    # compute summary fields from points if missing
    if 'avg_speed' in self.__activity_summary_keys and (self.__activity_summary.get('avg_speed') == MYSQL_NULL or self.__activity_summary.get('avg_speed') is None):
      if not self.__points_df.empty and 'speed' in self.__points_df.columns:
        speeds = pd.to_numeric(self.__points_df['speed'], errors='coerce')
        if not speeds.dropna().empty:
          self.__activity_summary['avg_speed'] = round(speeds.mean(), 3)
          self.__activity_summary['max_speed'] = round(speeds.max(), 3)

    # compute heart rate / cadence summaries if available
    for col in ['hr', 'cad']:
      if col in self.__points_df.columns:
        col_numeric = pd.to_numeric(self.__points_df[col], errors='coerce')
        if not col_numeric.dropna().empty:
          self.__activity_summary['avg_' + col] = round(col_numeric.mean(), 3)
          self.__activity_summary['min_' + col] = int(col_numeric.min())
          self.__activity_summary['max_' + col] = int(col_numeric.max())

    self.__points_df['activity_id'] = self.__points_df['activity_id'].astype(int)


  # Getter methods
  def get_summary(self) -> dict:
    return self.__activity_summary
  

  def get_points(self) -> pd.DataFrame:
    return self.__points_df

Now let's define a `User`. An entire export directory of activity files belongs to a Strava user, so our `User` can have a list of `Activities`. By feeding the path to the export directory into the constructor, a `User` is able to initialize itself with all of its `Activities` upon instantiation.

In [29]:
class User:

  __user_field_keys = ['user_id', 
                       'email_address', 
                       'first_name', 
                       'last_name', 
                       'description', 
                       'weight', 
                       'city', 
                       'state', 
                       'country']


  def __init__(self, export_filepath: os.PathLike):
    self.__export_filepath = export_filepath

    self.__user_id = 0
    self.__user_fields = {}
    self.__load_user_info( anonymized=True)

    self.__activities = []
    self.__load_all_activities()

    self.__load_challenges()

    self.__load_follows()


  def __load_user_info(self, anonymized = False):
    profile_csv = csv.DictReader(open(os.path.join(self.__export_filepath, 'profile.csv'), encoding = 'utf-8'))
    for row in profile_csv:    
      if anonymized:
        self.__user_fields = dict.fromkeys(self.__user_field_keys, 'sample_data (anonymized)')
      else:
        global MYSQL_NULL
        self.__user_fields = dict.fromkeys(self.__user_field_keys, MYSQL_NULL)
        for key in self.__user_field_keys[1:]:
          colname = ' '.join(key.split('_')).title()
          self.__user_fields[key] = row[colname] if row[colname] != '' else MYSQL_NULL

      self.__user_id = self.__user_fields['user_id'] = row['Athlete ID']
      break


  def __load_all_activities(self):
    # Read in activities.csv for this User
    activity_csv_df = pd.read_csv(os.path.join(self.__export_filepath, 'activities.csv'),
                                  usecols=['Activity Name', 
                                           'Activity Type',
                                           'Activity Description',
                                           'Filename'])
    files = os.listdir(self.__export_filepath)
    for file in files:
      # Only process known file types
      file_ext = file.split('.')[-1].lower()
      if file_ext in ['gpx', 'fit']:#, 'tcx', 'fit']:
        try:
          global cur_activity_id
          df_row = activity_csv_df[activity_csv_df['Filename'].str.contains(file, na=False)]
          cur_activity_fields = Activity_Fields(user_id=self.__user_id, 
                                                activity_id=cur_activity_id,
                                                name=df_row['Activity Name'].iloc[0],
                                                type=df_row['Activity Type'].iloc[0],
                                                description=df_row['Activity Description'].iloc[0],
                                                filename=file)
          self.__activities.append(Activity(activity_fields=cur_activity_fields,
                                            activity_filepath = os.path.join(self.__export_filepath, file)))
          cur_activity_id += 1
        except ValueError as ve:
          print(f'Error occured when loading {file}: {ve}')


  def __load_challenges(self):
    colnames = ['join_datetime', 'name', 'completed']
    global_challenges_csv_df = pd.read_csv(os.path.join(self.__export_filepath, 'global_challenges.csv'),
                                           header = 0,
                                           names = colnames,
                                           parse_dates = ['join_datetime'])
    group_challenges_csv_df = pd.read_csv(os.path.join(self.__export_filepath, 'group_challenges.csv'),
                                          header = 0,
                                          names = colnames,
                                          parse_dates = ['join_datetime'])
    global_challenges_csv_df['type'] = 'global'
    group_challenges_csv_df['type'] = 'group'
    challenge_dfs = [global_challenges_csv_df, group_challenges_csv_df]
    nonempty_challenge_dfs = [df for df in challenge_dfs if not df.empty]
    self.__combined_challenge_df = pd.concat(nonempty_challenge_dfs)
    self.__combined_challenge_df['user_id'] = self.__user_id
    self.__combined_challenge_df['completed'] = self.__combined_challenge_df['completed'].astype(int)

  def __load_follows(self):
    colnames = ['follow_status', 'favorite_status']
    following_csv_df = pd.read_csv(os.path.join(self.__export_filepath, 'following.csv'),
                                   header = 0,
                                   names = ['followee_user_id'] + colnames,
                                   usecols = ['followee_user_id', 'follow_status'])
    following_csv_df['follower_user_id'] = self.__user_id
    followers_csv_df = pd.read_csv(os.path.join(self.__export_filepath, 'followers.csv'),
                                   header = 0,
                                   names = ['follower_user_id'] + colnames,
                                   usecols = ['follower_user_id', 'follow_status'])
    followers_csv_df['followee_user_id'] = self.__user_id
    self.__follow_df = pd.concat([following_csv_df, followers_csv_df])


  # Getters for exporting
  def get_follows(self) -> pd.DataFrame:
    return self.__follow_df

  def get_challenges(self) -> pd.DataFrame:
    return self.__combined_challenge_df


  def get_activity_summaries(self) -> pd.DataFrame:
    activity_summaries = [activity.get_summary() for activity in self.__activities]
    return pd.DataFrame(activity_summaries)


  def get_activity_points(self) -> pd.DataFrame:
    point_dfs = [activity.get_points() for activity in self.__activities if not activity.get_points().empty]
    return pd.concat(point_dfs)
  
  
  def get_user_fields(self) -> dict:
    return self.__user_fields

A `Secretary` keeps track of multiple `Users`

In [30]:
class Secretary:

  def __init__(self, all_exports: os.PathLike):
    self.__all_exports_path = all_exports
    self.__users = []
    self.__load_all_users()

  def __load_all_users(self):
    dirs = os.listdir(self.__all_exports_path)
    i = 1
    for dir in dirs:
      print(f'Loading user {i} of {len(dirs)} ...')
      self.__users.append(User(os.path.join(self.__all_exports_path, dir)))
      i += 1
    print('Done loading in all users!')

  def export_all_to_csvs(self, dest_path: os.PathLike):
    self.__create_points_csv(os.path.join(dest_path, 'points.csv'))
    self.__create_activities_csv(os.path.join(dest_path, 'activities.csv'))
    self.__create_users_csv(os.path.join(dest_path, 'users.csv'))
    self.__create_challenges_csv(os.path.join(dest_path, 'challenges.csv'))
    self.__create_follows_csv(os.path.join(dest_path, 'follow.csv'))

  def __create_points_csv(self, dest_path: os.PathLike):
    point_dfs = [user.get_activity_points() for user in self.__users if not user.get_activity_points().empty]
    combined_points_df = pd.concat(point_dfs)
    combined_points_df.to_csv(dest_path, index_label='seq_num', lineterminator='\n')

  def __create_activities_csv(self, dest_path: os.PathLike):
    activity_dfs = [user.get_activity_summaries() for user in self.__users if not user.get_activity_summaries().empty]
    combined_activities_df = pd.concat(activity_dfs)
    combined_activities_df.to_csv(dest_path, index = False, lineterminator='\n')

  def __create_users_csv(self, dest_path: os.PathLike):
    all_user_fields = [user.get_user_fields() for user in self.__users]
    combined_users_df = pd.DataFrame(all_user_fields)
    combined_users_df.to_csv(dest_path, index = False)

  def __create_challenges_csv(self, dest_path: os.PathLike):
    challenge_dfs = [user.get_challenges() for user in self.__users if not user.get_challenges().empty]
    combined_challenges_df = pd.concat(challenge_dfs, ignore_index=True)
    combined_challenges_df.to_csv(dest_path, index_label='challenge_id')

  def __create_follows_csv(self, dest_path: os.PathLike):
    follow_dfs = [user.get_follows() for user in self.__users]
    combined_follows_df = pd.concat(follow_dfs, ignore_index=True)

    # Ensure user IDs are integers and drop rows with NaN values
    combined_follows_df['follower_user_id'] = pd.to_numeric(combined_follows_df['follower_user_id'], errors='coerce')
    combined_follows_df['followee_user_id'] = pd.to_numeric(combined_follows_df['followee_user_id'], errors='coerce')
    combined_follows_df.dropna(subset=['follower_user_id', 'followee_user_id'], inplace=True)

    # Convert to integers after dropping NaNs
    combined_follows_df['follower_user_id'] = combined_follows_df['follower_user_id'].astype(int)
    combined_follows_df['followee_user_id'] = combined_follows_df['followee_user_id'].astype(int)

    user_ids = [int(user.get_user_fields()['user_id']) for user in self.__users]
    print("User IDs:", user_ids)
    print("Combined Follows DataFrame (Before Filtering):\n", combined_follows_df)

    # Filter rows where both follower and followee are in the user_ids list
    self.__combined_follows_df = combined_follows_df[
      combined_follows_df['follower_user_id'].isin(user_ids) &
      combined_follows_df['followee_user_id'].isin(user_ids)
    ]

    print("Filtered Follows DataFrame:\n", self.__combined_follows_df)
    self.__combined_follows_df.to_csv(dest_path, index=False)

Let's test it out by initializing a `User`:

In [31]:
# steve = User(export_filepath = '../data/export_activities/export_96589216')

In [32]:
# pts = steve.get_activity_points()
# act = steve.get_activity_summaries()

# print(pts)

# print(act)
# act.to_csv('../data/act.csv', index = False)
# pts.to_csv('../data/pts.csv', index_label='seq_num', lineterminator='\n')

In [33]:
# print(steve.get_follows().info())
# follows_df = steve.get_follows()
# follows_df.to_csv('../data/follows.csv', index = False)

And let's look at the activities summaries CSV for this user:

In [34]:
# points_df = steve.get_activity_points()
# points_df.to_csv('../data/points.csv', index_label='seq_num', lineterminator='\n')

# activities_df = steve.get_activity_summaries()
# print(activities_df.head(15))
# activities_df.to_csv('../data/activities_summaries.csv', index = False, lineterminator='\n')

A `Secretary` object stores all the `Users` so initializing the following object will load all `Users` and `Activities` in the specified export directory

In [35]:
secretary = Secretary(all_exports = os.path.abspath(ACTIVITY_DIR_PATH))

Loading user 1 of 4 ...
Error occured when loading 13902526382.fit: Unable to find time bounds or total distance in fit file session c:\Users\matth\Documents\Python\CS3200\CS3200_Strava_Secretary\data\export_activities\export_101635319\13902526382.fit
Error occured when loading 7077892227.gpx: No tracks found in gpx file c:\Users\matth\Documents\Python\CS3200\CS3200_Strava_Secretary\data\export_activities\export_101635319\7077892227.gpx


  global_challenges_csv_df = pd.read_csv(os.path.join(self.__export_filepath, 'global_challenges.csv'),


Loading user 2 of 4 ...
Error occured when loading 17205460653.fit: Unable to find time bounds or total distance in fit file session c:\Users\matth\Documents\Python\CS3200\CS3200_Strava_Secretary\data\export_activities\export_148511532\17205460653.fit
Error occured when loading 17249570840.fit: Unable to find time bounds or total distance in fit file session c:\Users\matth\Documents\Python\CS3200\CS3200_Strava_Secretary\data\export_activities\export_148511532\17249570840.fit


  global_challenges_csv_df = pd.read_csv(os.path.join(self.__export_filepath, 'global_challenges.csv'),


Loading user 3 of 4 ...
Loading user 4 of 4 ...
Error occured when loading 7968742010.fit: Unable to find time bounds or total distance in fit file session c:\Users\matth\Documents\Python\CS3200\CS3200_Strava_Secretary\data\export_activities\export_96589216\7968742010.fit
Done loading in all users!


  global_challenges_csv_df = pd.read_csv(os.path.join(self.__export_filepath, 'global_challenges.csv'),


Getting all the table data out of the `Secretary`

In [36]:
os.makedirs(OUTPUT_DIR_PATH, exist_ok=True)
secretary.export_all_to_csvs(os.path.join('..', 'data', 'outputs'))

  return pd.concat(point_dfs)


User IDs: [101635319, 148511532, 57141745, 96589216]
Combined Follows DataFrame (Before Filtering):
      followee_user_id follow_status  follower_user_id
0            36064991      Accepted         101635319
1            41868570      Accepted         101635319
2            45050667      Accepted         101635319
3            45711813      Accepted         101635319
4            47388563      Accepted         101635319
..                ...           ...               ...
947          96589216      Accepted         148511532
948          96589216      Accepted         150679523
949          96589216      Accepted         153835805
950          96589216      Accepted         158514221
951          96589216      Accepted         159639266

[952 rows x 3 columns]
Filtered Follows DataFrame:
      followee_user_id follow_status  follower_user_id
10           57141745      Accepted         101635319
23           96589216      Accepted         101635319
39          148511532      Accepted 