# SPORT RECOMMENDATION

Seiring dengan meningkatnya kesadaran akan pentingnya kesehatan dan kebugaran, kebutuhan akan panduan yang lebih personal dalam memilih aktivitas olahraga semakin bertambah. Setiap individu memiliki preferensi yang berbeda-beda dalam hal jenis olahraga yang sesuai dengan kondisi fisik, tujuan kebugaran, dan rutinitas harian mereka. Untuk memenuhi kebutuhan ini, sistem rekomendasi menawarkan solusi yang dapat memberikan rekomendasi olahraga yang lebih relevan dan dipersonalisasi.

Salah satu sistem rekomendasi yang bisa digunakan adalah Content Based Filtering.Dengan menggunakan teknik pemrosesan teks seperti TF-IDF (Term Frequency-Inverse Document Frequency) untuk mengukur kepentingan dari setiap kata dalam deskripsi olahraga, serta Cosine Similarity untuk mengukur kemiripan antar aktivitas, sistem ini dapat memberikan rekomendasi yang akurat dan relevan bagi pengguna.

### Library

In [2]:
import os
import numpy as np
import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

import seaborn as sns
import matplotlib.pyplot as plt

### Dataset Overview

In [3]:
data_sports = pd.read_csv('\\Users\\acer\\Downloads\\data activity_new version 2.csv')

In [4]:
data_sports.head()

Unnamed: 0,Sports,Description,Visual,Duration (Min),Location,Number of people,Equipment,Muscle,Category
0,Joging,A form of trotting or running at a slow or lei...,,20-30,Outdoor,1,No,"hamstrings, quads, calves",Lower Body
1,Running,The action of rapidly propelling yourself forw...,,20-30,Outdoor,1,No,"hamstrings, quads, calves",Lower Body
2,Jumping Rope,A form of exercise that involves swinging a ro...,,1-5,Indoor,1,Yes,"leg, biceps, shoulders",Lower Body
3,Aerobic,A series of movements accompanied by musical r...,,15-20,Indoor,1,No,"legs, stomach, arms, cardiac",Whole Body
4,Yoga,"A practice that connects the body, breath, and...",,20-40,Indoor,1,No,"pelvic floor, core, arms, glutes, back",Whole Body


### Prepocessing Data

In [5]:
data_sports.isnull().sum()

Sports               0
Description          0
Visual              55
Duration (Min)       0
Location             0
Number of people     0
Equipment            0
Muscle               0
Category             0
dtype: int64

In [6]:
data_sports.drop(["Visual"], axis=1, inplace=True)

In [7]:
data_sports.head()

Unnamed: 0,Sports,Description,Duration (Min),Location,Number of people,Equipment,Muscle,Category
0,Joging,A form of trotting or running at a slow or lei...,20-30,Outdoor,1,No,"hamstrings, quads, calves",Lower Body
1,Running,The action of rapidly propelling yourself forw...,20-30,Outdoor,1,No,"hamstrings, quads, calves",Lower Body
2,Jumping Rope,A form of exercise that involves swinging a ro...,1-5,Indoor,1,Yes,"leg, biceps, shoulders",Lower Body
3,Aerobic,A series of movements accompanied by musical r...,15-20,Indoor,1,No,"legs, stomach, arms, cardiac",Whole Body
4,Yoga,"A practice that connects the body, breath, and...",20-40,Indoor,1,No,"pelvic floor, core, arms, glutes, back",Whole Body


In [8]:
data_sports.drop_duplicates(subset="Sports",inplace=True, keep="first")

In [9]:
data_sports.head()

Unnamed: 0,Sports,Description,Duration (Min),Location,Number of people,Equipment,Muscle,Category
0,Joging,A form of trotting or running at a slow or lei...,20-30,Outdoor,1,No,"hamstrings, quads, calves",Lower Body
1,Running,The action of rapidly propelling yourself forw...,20-30,Outdoor,1,No,"hamstrings, quads, calves",Lower Body
2,Jumping Rope,A form of exercise that involves swinging a ro...,1-5,Indoor,1,Yes,"leg, biceps, shoulders",Lower Body
3,Aerobic,A series of movements accompanied by musical r...,15-20,Indoor,1,No,"legs, stomach, arms, cardiac",Whole Body
4,Yoga,"A practice that connects the body, breath, and...",20-40,Indoor,1,No,"pelvic floor, core, arms, glutes, back",Whole Body


In [10]:
data_sports.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 55 entries, 0 to 54
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Sports            55 non-null     object
 1   Description       55 non-null     object
 2   Duration (Min)    55 non-null     object
 3   Location          55 non-null     object
 4   Number of people  55 non-null     object
 5   Equipment         55 non-null     object
 6   Muscle            55 non-null     object
 7   Category          55 non-null     object
dtypes: object(8)
memory usage: 3.9+ KB


In [11]:
print(f'Jumlah data sports yang duplikat: {data_sports.duplicated().sum()}')


Jumlah data sports yang duplikat: 0


### Pengolahan Teks dengan TF-IDF

In [12]:
tfidf = TfidfVectorizer()
tfidf.fit(data_sports.Muscle)
#tfidf.get_feature_names_out()

TfidfVectorizer()

In [13]:
tfidf_matrix = tfidf.fit_transform(data_sports.Muscle)
tfidf_matrix.shape

(55, 80)

In [14]:
tfidf_matrix.todense()

matrix([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        ...,
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.33466201, 0.        ,
         0.        ]])

In [15]:
pd.DataFrame(
    tfidf_matrix.todense(),
    columns = tfidf.get_feature_names(),
    index   = data_sports.Sports
).sample(20, axis=1).sample(10, axis=0)

Unnamed: 0_level_0,feet,leg,buttocks,and,gluteals,abductors,shoulders,stomach,abdomen,arm,grip,body,maximus,knees,rectus,neck,pecs,deltoids,ankles,adductors
Sports,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Jumping Jack,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Football,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Shoulder Touch,0.0,0.0,0.0,0.0,0.0,0.0,0.40904,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Table Tennis,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.376739,0.376739,0.0
Futsal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Aerobic,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.572865,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Sit-up,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.320674,0.297181,0.0,0.0,0.0,0.0
Golf,0.0,0.0,0.0,0.351558,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.525404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Muay thai,0.0,0.488718,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.527354,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Bridge,0.0,0.0,0.748914,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Penerapan Cosine Similarity

In [16]:
cosine_sim = cosine_similarity(tfidf_matrix)
cosine_sim

array([[1.        , 1.        , 0.        , ..., 0.69759454, 0.        ,
        0.29114002],
       [1.        , 1.        , 0.        , ..., 0.69759454, 0.        ,
        0.29114002],
       [0.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.33239142],
       ...,
       [0.69759454, 0.69759454, 0.        , ..., 1.        , 0.        ,
        0.20309769],
       [0.        , 0.        , 0.        , ..., 0.        , 1.        ,
        0.        ],
       [0.29114002, 0.29114002, 0.33239142, ..., 0.20309769, 0.        ,
        1.        ]])

In [17]:
cosine_sim_df = pd.DataFrame(
    cosine_sim,
    columns = data_sports.Sports,
    index   = data_sports.Sports
)

print(f'Cosine Similarity Shape : {cosine_sim_df.shape}')

cosine_sim_df.sample(8, axis=1).sample(8, axis=0)

Cosine Similarity Shape : (55, 55)


Sports,Badminton,Donkey Kickback,Wrestling,Stepping,Swimming,Judo,Rugby,Zumba
Sports,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Scissor Kick,0.147105,0.0,0.090724,0.0,0.0,0.078788,0.39821,0.0
Squat,0.384421,0.215115,0.095444,0.285474,0.061421,0.0,0.327419,0.0
Gym,0.43342,0.0,0.0,0.093651,0.122471,0.136755,0.052932,0.076197
Golf,0.149463,0.0,0.109423,0.0,0.0,0.205123,0.0,0.0
Wrestling,0.0,0.0,1.0,0.0,0.173322,0.136903,0.058479,0.0
Plank,0.0,0.0,0.078147,0.0,0.2229,0.067866,0.075206,0.0
Rugby,0.223704,0.45603,0.058479,0.260694,0.056089,0.050785,1.0,0.0
Taekwondo,0.112239,0.0,0.122469,0.0,0.081312,0.279426,0.0,0.09643


### Sistem Rekomendasi

In [18]:
def sports_recommendations(Sports, similarity_data=cosine_sim_df, items=data_sports[['Sports', 'Muscle','Duration (Min)','Category']], k=10):
    index = similarity_data.loc[:,Sports].to_numpy().argpartition(range(-1, -k, -1))
    closest = similarity_data.columns[index[-1:-(k+2):-1]]
    closest = closest.drop(Sports, errors='ignore')
    
    # Menggabungkan data olahraga-olahraga terdekat dengan atribut-atributnya
    recommendations = pd.DataFrame(closest).merge(items)
    
    # Menambahkan kolom similarity
    similarity_values = [similarity_data.loc[Sports, sport] for sport in closest]
    recommendations['Similarity'] = similarity_values
    
    return recommendations.head(k)

In [19]:
########## INPUT REKOMEN NYA DISINII ############

performed_sports = 'Joging'

In [20]:
df=data_sports[data_sports.Sports.eq(performed_sports)]
df

Unnamed: 0,Sports,Description,Duration (Min),Location,Number of people,Equipment,Muscle,Category
0,Joging,A form of trotting or running at a slow or lei...,20-30,Outdoor,1,No,"hamstrings, quads, calves",Lower Body


### Hasil Rekomendasi

In [21]:
db=sports_recommendations(performed_sports).drop_duplicates()
db

Unnamed: 0,Sports,Muscle,Duration (Min),Category,Similarity
0,Running,"hamstrings, quads, calves",20-30,Lower Body,1.0
1,Stair Climbing,"hamstrings, quads, calves, gluts",1-5,Lower Body,0.697595
2,Futsal,"hamstrings, glutes, quads",50-60,Lower Body,0.668985
3,Football,"hamstrings, glutes, quads",50-60,Lower Body,0.668985
4,Fencing,"quadriceps, hamstrings, glutes, calves, core",20-30,Lower Body,0.474136
5,Badminton,"quadriceps, glutes, calves, hamstrings, arms, ...",50-60,Lower Body,0.425841
6,Boxing,"feet, calves, quads, hamstrings, hips, glutes,...",30-50,Upper Body,0.402474
7,Elliptical Training,"glutes, hamstrings, quads, chest, back, biceps...",1-5,Lower Body,0.379048
8,Tennis,"forearms, core, glutes, quads",50-60,Upper Body,0.338218
9,Squat,"gluteus, quadriceps, hamstrings, adductor, hip...",1-5,Lower Body,0.307148


In [22]:
data_sports['Category'].value_counts()

Lower Body    21
Core          14
Upper Body    12
Whole Body     8
Name: Category, dtype: int64

In [23]:
df['Category']

0    Lower Body
Name: Category, dtype: object

### Evaluasi Akurasi

In [24]:
def TP(): #True positive (rekomen yang muncul sesuai kategori)
    filtered_data = db[db['Category'].isin(df['Category'])]
    jlm = len(filtered_data)
    return jlm

def FP(): #False positive (rekomen yang muncul tapi tidak sesuai kategori)
    filtered_data2 = db[~db['Category'].isin(df['Category'])]
    jlm = len(filtered_data2)
    return jlm

def FN():#True negative (sesuai kategori tapi tidak masuk rekomendasi)
    filtered_data = data_sports[data_sports['Category'].isin(df['Category'])]
    filtered_data2 = db[db['Category'].isin(df['Category'])]
    jlm = len(filtered_data) - len(filtered_data2)
    return jlm 

def TN():#True negative (tidak sesuai kategori dan tidak masuk rekomendasi)
    filtered_data = data_sports[~data_sports['Category'].isin(df['Category'])]
    filtered_data2 = db[~db['Category'].isin(df['Category'])]
    jlm = len(filtered_data) - len(filtered_data2)
    return jlm 
    
def accuracy(): # ini akurasi pada 10 rekomendasi (ga pake threshold)
    TP_val = TP()
    FP_val = FP()
    FN_val = FN()
    TN_val = TN()
    
    presisi = TP_val / (TP_val + FP_val)
    akurasi = (TP_val + TN_val) / (TP_val + TN_val + FP_val + FN_val)
    
    print(f"Jumlah rekomendasi yang muncul sesuai kategori: {TP_val}")
    print(f"Jumlah rekomendasi yang muncul tapi tidak sesuai kategori: {FP_val}")
    print(f"Jumlah sport yang sesuai tetapi tidak masuk rekomendasi: {FN_val}")
    print(f"Jumlah sport yang tidak sesuai dan tidak masuk rekomendasi: {TN_val}")
    print("Precision value recommendation:", presisi)
    print("Accuracy value recommendation:", akurasi)
    
    return

In [25]:
accuracy()

Jumlah rekomendasi yang muncul sesuai kategori: 8
Jumlah rekomendasi yang muncul tapi tidak sesuai kategori: 2
Jumlah sport yang sesuai tetapi tidak masuk rekomendasi: 13
Jumlah sport yang tidak sesuai dan tidak masuk rekomendasi: 32
Precision value recommendation: 0.8
Accuracy value recommendation: 0.7272727272727273


In [377]:
###### ALHAMDULILLAH NANGIS :'))))