# MoneyMakingAlgorithm (MMA) project guide

This notebook walks through the entire data pipeline from (from raw data to predictions) for predicting upcoming fights (or bouts, as Uncle Dana wants us to call them) in the Ultimate Bouting Championship (UFC), a weekly or biweekly Mixed Martial Arts (MMA) tournament. 

About this project: 
- This project cleans, engineers features, trains and makes predictions for past and future Ultimate Fighting Championship (UFC). 
- Automatically updates dataset and scrapes upcoming fight info. 
- Predicts 7 classes: KO win, Submission win, Decision win, Draw, Decision Loss, Submission Loss, KO loss. 
- Creates probability distributions for each of the 7 classes for price estimation and risk analysis
- Makes predictions using a self-devised rating system and other features. 
- Machine Learning method used: Extreme Gradient Booster (`xgboost`)
- Minimizes hyperparameters using n-repeated m-fold cross valuation. 
- Includes singular value decomposition, feature selection and other optional data processing features.


Notes 
- To keep my own edge, I may not publish my data cleaning module and certain feature sets. I will also only publish the mean of the final probability distributions to prevent risk analysis. You can, however, still use this notebook to make your own predictions on the nearest upcoming UFC event, but with fewer feature sets. 



## 1. Cleaning Data

First, the raw ufcstats.com datasets are loaded directly from Greco1899's scape_ufc_stats repository. These datasets are regularly updated by the corresponding scraper (https://github.com/Greco1899/scrape_ufc_stats). Second, datasets are cleaned and merged into a single data set with basic features are created (date, height, time format etc...) and made ready for further feature engineering. Output csv is saved as `data/interim/clean_ufcstats-com_data.csv`.

Key module: `src/data_processing/clean_raw_data.py`
- Main function: `process_all_data(prefer_external=True, new_fights_only=False)`
- Core class: `UFCDataProcessor`

Notes:
- All events before UFC 31: Locked and Loaded (May 04, 2001) are excluded, because the aforementioned event is the earliest event to both use the Unified Rules of MMA and to restrict itself to either a 3 rounds or 5 rounds format (in contrast to UFC 30 which has 2 rounds format). In other words, UFC 31 is the first standardized UFC event. 
- Set `new_fights_only=False` (WIP otherwise).
- Not all names in ufc_fight_results.csv match with those in ufc_fighter_tott. In addition, some fighters in ufc_fighter_tott have the same name, or are mentioned double with (one with stats, the other without etc). These issues have been resolved for all retired and currently active fighters, but may arise again in the future for newly debuting fighters. In this case, the code terminates and user must follow the instructions in the log and comments to implement a simple hard-code fix. 
- Due to dataset-specific reasons, sex cannot be inferred for some fighters and requires a hard-code fix (this could theoretically occur but has so far not been happened). In this event, program writes to `data/interim/unknown_sex.csv` and stops for manual review. In the future this problem could be resolved by implementing an AI that recognizes male/female names. 

In [1]:

# Make sure your environment has the project's requirements installed.
from src.data_processing.clean_raw_data import process_all_data

# Tip: prefer_external=True tries to pull fresh CSVs from GitHub and falls back to local files on failure.
# IMPORTANT: new_fights_only is WIP — keep it False.
process_all_data(prefer_external=True, new_fights_only=False)

# After running, you should see: data/interim/clean_ufcstats-com_data.csv

[32m2025-10-04 19:42:26.376[0m | [1mINFO    [0m | [36msrc.data_processing.clean_raw_data[0m:[36mprocess_all_data[0m:[36m704[0m - [1mStarting UFC data processing pipeline[0m
[32m2025-10-04 19:42:26.382[0m | [1mINFO    [0m | [36msrc.data_processing.clean_raw_data[0m:[36m__init__[0m:[36m71[0m - [1mInitialized UFC data processor[0m
[32m2025-10-04 19:42:26.382[0m | [1mINFO    [0m | [36msrc.data_processing.clean_raw_data[0m:[36mprocess_all_data[0m:[36m709[0m - [1mLoading raw data[0m
[32m2025-10-04 19:42:27.361[0m | [1mINFO    [0m | [36msrc.data_processing.clean_raw_data[0m:[36mload_raw_data[0m:[36m93[0m - [1mSuccesfully loaded data from https://github.com/Greco1899/scrape_ufc_stats[0m
[32m2025-10-04 19:42:27.372[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mstore_csv[0m:[36m155[0m - [34m[1mSaved CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\raw\ufc_event_details.csv (747 rows)[0m


KeyboardInterrupt: 

## 2. Constructing feature sets

+ We now construct both our desired feature sets, and the feature sets they depend on. 

Key module: `src/data_processing/feature_manager.py`
- Class: `FeatureManager(feature_set_names=None, feature_set_params=None, overwrite_all=True)`
- Feature modules live in `src/feature_engineering` and are imported dynamically (e.g., `get_base_features.py`).

Finished sets to choose from:
- base_features (always include)
- elo_params (always include for elo feature sets)
- wl_elos
- stat_elos_round_averages
- stat_elos_per_round (alternative to the above; generally not recommended) 
- acc_elos_round_averages
- acc_elos_per_round (alternative to the above; generally not recommended)
- rock_paper_scissor (who beat who and how?)

Regarding elo_params 
- get_elo_params creates multiple K-parameters which can then be chosen by the feature model using 'which_K' ('cust' or 'log', per round or not per round). 
- I believe that currently K-parameters that are not used by the other elo feature sets are automatically discarded in the final model, but I have to double check. In any case, it may be worthwhile keeping them in, because the K-parameters acts as an experience measures for the fighters, so not only help the model understand the elo rating system but also directly help it understand the data. 

Notes:
- Keep the param `process_upcoming_fights=False` at this stage (otherwise handled separately).

In [None]:
# Example: generate 
from src.data_processing.feature_manager import FeatureManager
from src.feature_engineering.get_elo_params import set_elo_params 
# set_elo_params() allows you to set some of the parameters of the 
# elo system, however most are for which_K = 'cust'
feature_sets = {} 

# Parameters for the feature engineering functions  
base_features_params = {} 
elo_params_params = {'d_params': set_elo_params()} 
wl_elos_params = {'which_K': 'log'}
stat_elos_round_averages_params = {'which_K': 'log', 
                             'exact_score': True, 
                             'always_update': False
                            }
stat_elos_per_round_params = {'which_K': 'cust', 
                             'exact_score': True, 
                             'always_update': False
                            }
acc_elos_round_averages_params = {'which_K': 'log'} 
acc_elos_per_round_params = {} 
rock_paper_scissor_params = {'intervals': [0,2]} # Or [0,2,4] 

# Choose final feature sets and create them. 
feature_sets['base_features'] = base_features_params
feature_sets['elo_params'] = elo_params_params
feature_sets['wl_elos'] = wl_elos_params
feature_sets['stat_elos_round_averages'] = stat_elos_round_averages_params
#feature_sets['stat_elos_per_round'] = stat_elos_per_round_params
feature_sets['acc_elos_round_averages'] = acc_elos_round_averages_params
feature_sets['rock_paper_scissor'] = rock_paper_scissor_params

# Create feature sets. Set overwrite = True if data/features is empty. 
FeatureManager(feature_sets, overwrite_all = False) 


[32m2025-10-05 18:17:47.682[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\features\base_features.csv (8088 rows)[0m
[32m2025-10-05 18:17:47.720[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\features\elo_params.csv (8088 rows)[0m
[32m2025-10-05 18:17:47.737[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\features\wl_elos.csv (8088 rows)[0m
[32m2025-10-05 18:17:47.871[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\features\stat_elos_round_averages.csv (8088 rows)[0m
[32m2025-10-05 18:17:48.008[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m1

<src.data_processing.feature_manager.FeatureManager at 0x1dd3a836c90>

## 3. Final data processing 

+ Key module: `src/model_selection/trainvalpred.py`
- Class: `TrainValPred(feature_sets=None)`

### 3.1 Creating the training data and the validation data. 

+ Now, we can decide which feature sets to include in our model, and merge them into one.
+ Because we want to validate the model on a representative part of the data, we truncate the most recent `last_years` years of fight, up to maximally the most recent portion  of fights `sample_size`. 
+ The snippet below creates files `interim/chosen_features_merged`, `processed/train.csv` and `procssed/valid.csv`, + 

In [3]:
from src.model_selection.trainvalpred import TrainValPred

# Choose final feature sets and initiate 
feature_sets = feature_sets # OR: omit some
TVP = TrainValPred(feature_sets) 

# Merge features (overwrite_feature_sets = True if you missed the last step)
TVP.merge_features(overwrite_feature_sets=False) 

# Set parameters to choose your validation set. 
last_years = 2
sample_size = 0.1

TVP.split_trainval(last_years=last_years, sample_size=sample_size)     

NameError: name 'feature_sets' is not defined

### 3.2 Constructing the prediction data 

- Before further processing, we need to construct the prediction data. 
- The snippet below runs the entire prediction data pipeline from scraping `ufcstats.com`'s upcoming event data to creating all the features. 
- Creates files: `raw/pred_raw.csv`, `interim/pred_clean`, `processed/pred.csv`, +1
- Tip: rerun this snippet if any bouts get cancelled/replaced. 


In [4]:
from src.data_processing.scrape_pred import scrape_pred 
from src.data_processing.clean_pred import clean_pred 

scrape_pred() 
clean_pred() 
TVP.construct_pred() 

[32m2025-10-04 19:43:17.437[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\raw\pred_raw.csv (14 rows)[0m


Wrote 14 rows to C:\Users\OAVAI\Desktop\mma - Copy\upcoming_event_fights.csv


[32m2025-10-04 19:43:18.108[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\interim\clean_ufcstats-com_data.csv (8088 rows)[0m
[32m2025-10-04 19:43:18.136[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\interim\alternative_spellings_internal.csv (59 rows)[0m
[32m2025-10-04 19:43:18.145[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\interim\alternative_spellings_external.csv (940 rows)[0m
[32m2025-10-04 19:43:18.183[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\interim\ufc_fighter_tott_clean.csv (4442 rows)[0m
[32m2025-10-04 19:43:19.278[0m | [34m[1mDEBUG   [0m | [36msr

### 3.3 Further processing 

There are now basically three options: 
1. No further processing and go straight to training (`suffix = ""`) 
2. Make (anti-)symmetric features, i.e. `fighter1_feautures -> (fighter1_features + fighter2_features)/sqrt(2)` and  `fighter2_feautures -> (fighter1_features - fighter2_features)/sqrt(2)`, and leaves shared features be. In this case, set `suffix = "symm"`. 
3. Do a Singular Value Decompostion(SVD) on the data `suffix = "svd"` and transform to the Schmidt basis. 

Notes 
- The SVD standardizes and also makes (anti-)symmetric pairs. However, in contrast to `symmetrize(for_svd = False)`, one-hot encoded features will not be transformed to flags. This is done because one-hot encoded features could be favorable for the SVD, but otherwise waste xgb splits. 
- Ceates datasets `processed/train_{suffix}`, `processed/valid_{suffix}`, `processed/pred_{suffix}`

In [5]:
suffix = 'symm'

if suffix == 'symm': 
    TVP.symmetrize(for_svd = False) 

elif suffix == 'svd':
    # Because you probably wanna check where you truncate,
    # you may have to run the SVD twice. 
    # TVP.svd(k = 10e6, plot_sv = True)
    
    TVP.do_svd(k=204)  



NameError: name 'TVP' is not defined

## 4. Training and making predictions

The training pipeline works as follows. 
1. Optimize hyperparameters with Optuna using a stratified 5-fold cross validation.  
2. Perform feature selection using xgboost's built in feature importance.
3. Vary again over both hyperparemeters and a range of the `k_selected` most important features determined in the previous step. 
4. Create probability distributions of per and loss type to hopefully see price spread. 

+ Key module: `src/model_selection/clean_raw_data.py`
- Main function: `CrossValidation.run_cv(data_params, cv_params, xgb_params)`
- Core class: `CrossValidation`

Suffix reminders:
- "": plain merged features
- "symm": (anti-)symmetric pair features; one-hot encoded features to flags
- "svd": Transformation to Schmidt basis



### 4.1 Training

+ The code snippet below begins initial training. `cv_params['n_trials']` is the number of `Optuna` trials over the parameter ranges in `xgb_params`. Keep `k_selected = None` for now. 

+ There are two options for finding optimal parameters. 
    - Minimize the logloss on the training set (i.e., the average logloss over the "validation folds")
    - Minimize the logloss on a partition of the validation set `valid_train`, validating the model on its remainder `valid_valid`. 
    The reasoning behind option this is that more recent fights may represent a more relevant part of your data, allowing your model to optimize on those in stead of the entire UFC history ("the sports evolves"; "fighters are much more well-rounded these days"). In this case, set `vv_size > 0`, and choose whether you want to split `valid_train` and `valid_valid` based on recency (`valid_valid` being the most recent fights), or randomnly. In case of the latter, set `vv_seed`.  

+ Code snippet below outputs `output/metrics/param_optimizatoin_{suffix}`. It logs the following values: 
    - run-specific (hyper)parameter values
    - model metrics: accuracy, logloss, f1-score (macro) for the training set, validation set(s) for BOTH 3 classes and 7 classes. Note that all training and predictions are done on 7 classes but predicted probabilities are simply summed and analyzed as if it were a 3 class model. 
    - The above metrics are the averages over training the model `n_repeats * n_folds` times for the given hyperparameter combination. 

In [1]:
from src.model_selection.cv import CrossValidation

# Remove this when running this cell for the first time! 
suffix = 'svd'

xgb_params = {
    "max_depth": (3, 6),
    "learning_rate": (0.02, 0.03),
    "n_estimators": (300, 750),
    "min_child_weight": (0, 50),
    "gamma": (0, 10),
    "subsample": (0.5, 1.0),
    "colsample_bytree": (0.8, 1.0),
    "reg_alpha": 0.0,
    "reg_lambda": 1.0
}
data_params = {
    'suffix': suffix,
    'k_selected': None,
    'vv_seed': 6,
    'vv_size': 0,
    'vv_random_split': False,
    'save_as_n_classes': 7,             # See 4.3. 
    'measure_calibration': False,       # See 4.3 
}
cv_params = {
    'fold_seed': 30,
    'n_folds': 5,
    'n_repeats': 1,
    'n_trials': 25
}
CV = CrossValidation(data_params, cv_params, xgb_params)
CV.run_cv(select_features=False, predict=False)  # feature selection or predictions also supported

  from .autonotebook import tqdm as notebook_tqdm
[32m2025-10-06 18:39:55.209[0m | [1mINFO    [0m | [36msrc.model_selection.trainvalpred[0m:[36mTrainValPred[0m:[36m69[0m - [1mFeatures merged.[0m
[32m2025-10-06 18:39:55.704[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\processed\train_svd.csv (14364 rows)[0m
[32m2025-10-06 18:39:55.764[0m | [1mINFO    [0m | [36msrc.model_selection.trainvalpred[0m:[36mget_folds[0m:[36m322[0m - [1mCreated 1 unique 5-folds[0m
[32m2025-10-06 18:39:55.775[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\output\feature_selection\feature_frequency_svd.csv (216 rows)[0m
[32m2025-10-06 18:39:55.832[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mm

KeyboardInterrupt: 

### 4.2 Feature selection and re-training 

+ The following code snippet automatically selects the best hyperparameters from the output metrics file and starts feature selection. It outputs file `output/feature_selection/feature_frequency`.
+ After ranking all features by their importance, it starts optimizing hyperparameters again but this time also varying over a range of the k_selected-th most important features. It appends to 


`output/feature_selection/feature_frequency`. The program automatically chooses the best hyperparamters from the output metrics file, and start optimizing hyperparameters again whilst also varying over the most important features. 



In [3]:
### Feature selection 
cv_params['n_repeats'] = 3 # Should also be done above for rigorousness, but lets assume stability over seeds.

# WIP: xgb_params not used in this step but still has to be provided. 
CV = CrossValidation(data_params, cv_params, xgb_params)
CV.run_cv(select_features=True, predict=False)


### Re-training with selection of features 
# Set range of k_selected most important features to vary over. 
# Note: k_selected = 50 means the top 50 features, not "only the 50th top feature"
data_params['k_selected'] = (50,150)
cv_params['n_repeats'] = 1
cv_params['n_trials'] = 50

xgb_params = {
    "max_depth": (5, 6),
    "learning_rate": (0.02, 0.025),
    "n_estimators": (450, 750),
    "min_child_weight": (0, 30),
    "gamma": (0, 2),
    "subsample": (0.75,0.9),
    "colsample_bytree": 1.0,
    "reg_alpha": 0.0,
    "reg_lambda": 1.0
}

CV = CrossValidation(data_params, cv_params, xgb_params)
CV.run_cv(select_features=False, predict=False)

### Making predictions. 


[32m2025-10-06 18:33:14.114[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\processed\train_symm.csv (14364 rows)[0m
[32m2025-10-06 18:33:14.206[0m | [1mINFO    [0m | [36msrc.model_selection.trainvalpred[0m:[36mget_folds[0m:[36m322[0m - [1mCreated 1 unique 5-folds[0m
[32m2025-10-06 18:33:14.222[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\output\feature_selection\feature_frequency_symm.csv (260 rows)[0m
[32m2025-10-06 18:33:14.281[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\OAVAI\Desktop\mma - Copy\data\processed\valid_symm.csv (1596 rows)[0m
[32m2025-10-06 18:33:14.295[0m | [34m[1mDEBUG   [0m | [36msrc.utils.general[0m:[36mopen_csv[0m:[36m118[0m - [34m[1mLoaded CSV: c:\Users\O

UnsupportedModelRegistryStoreURIException:  Model registry functionality is unavailable; got unsupported URI 'c:\Users\OAVAI\Desktop\mma - Copy\mlruns' for model registry data storage. Supported URI schemes are: ['', 'file', 'databricks', 'databricks-uc', 'uc', 'http', 'https', 'postgresql', 'mysql', 'sqlite', 'mssql']. See https://www.mlflow.org/docs/latest/tracking.html#storage for how to run an MLflow server against one of the supported backend storage locations.

### 4.3 Making predictions

+ Now we can start making predictions. Program automatically selects the best hyperparameters and the best k_selected most important columns and calculates probabilities for each of the 7 classes (so xgb_params and data_params['k_selected'] does not have to be provided). 
+ The output file `output/predictions/pred_{suffix}` contains averages, standard deviations, mean +/- 2std, 5perc, 95perc, min max for each of the 7 classes. These values define the probability distributions that are created by making predictions in each of the folds of the n_repeats unique 5-folds (so `n_repeats*n_folds` unique samples).  
+ Depending on your goal, you can choose to create probability distributions for 2 (win / loss), 3 (win / draw / loss) or 7 class outcomes using `data_params['save_as_n_classes']`. 
Note: The model will still train and make predictions on 7 classes, predicted probabilities are simply summed! For example, in the case of the binary outcomes, probability is calculated as $$P_{win} = P_{KO} + P_{Submission} + P_{Decision} + \frac{1}{2}P_{Draw}$$
+ Optionally, if we want to check how well the model is calibrated, we can store the predictions the model made on the validation set. Actual comparing accuracy vs. confidence is still WIP. 
+ Also makes predictions for debuting fighters, but the model does not take into account previous carreer stats. This means that, with luck, only height, reach and age are available. Take this into account when competing against other models and comparing accuracies. 



In [2]:
# Optionally check calibration of model 
# data_params['calibration'] = True 

# Optional: look at only 3 classes
data_params['save_as_n_classes'] = 3    # 2, 3 or 7 

cv_params['n_repeats'] = 200

# WIP: xgb_params not used but still have to be provided. 
CV = CV(data_params, cv_params, None)
CV.run_cv(select_features=False, predict=True)

NameError: name 'data_params' is not defined

In [1]:
# Quickstart: MLflow UI in Codespaces or locally
import os
import subprocess
import time
import socket
from pathlib import Path


def _is_port_open(host: str, port: int, timeout: float = 0.5) -> bool:
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.settimeout(timeout)
    try:
        s.connect((host, port))
        return True
    except Exception:
        return False
    finally:
        s.close()

# Ensure a local tracking directory by default
mlruns_dir = Path.cwd() / "mlruns"
os.environ.setdefault("MLFLOW_TRACKING_URI", str(mlruns_dir))
mlruns_dir.mkdir(parents=True, exist_ok=True)

PORT = int(os.environ.get("MLFLOW_PORT", "5000"))
IN_CODESPACES = os.environ.get("CODESPACES") == "true" or bool(os.environ.get("CODESPACE_NAME"))
HOST = "0.0.0.0" if IN_CODESPACES else "127.0.0.1"

# Start MLflow UI if it's not already running
if not _is_port_open("127.0.0.1", PORT):
    try:
        subprocess.Popen(
            ["mlflow", "ui", "--host", HOST, "--port", str(PORT)],
            stdout=subprocess.DEVNULL,
            stderr=subprocess.DEVNULL,
        )
        # Wait briefly for the server to come up
        for _ in range(20):
            if _is_port_open("127.0.0.1", PORT):
                break
            time.sleep(0.3)
    except FileNotFoundError:
        print("mlflow not found. Install deps with: pip install -r requirements.txt")
    except Exception as e:
        print(f"Could not start MLflow UI: {e}")

# Build a convenient URL
if IN_CODESPACES and os.environ.get("CODESPACE_NAME") and os.environ.get("GITHUB_CODESPACES_PORT_FORWARDING_DOMAIN"):
    url = (
        f"https://{PORT}-" \
        f"{os.environ['CODESPACE_NAME']}." \
        f"{os.environ['GITHUB_CODESPACES_PORT_FORWARDING_DOMAIN']}"
    )
else:
    url = f"http://127.0.0.1:{PORT}"

print("MLflow tracking directory:", os.environ.get("MLFLOW_TRACKING_URI"))
print("MLflow UI:", url)
if IN_CODESPACES:
    print("Tip: If the link doesn't auto-open, check the Ports panel for the forwarded URL.")

MLflow tracking directory: c:\Users\OAVAI\Desktop\mma - Copy\mlruns
MLflow UI: http://127.0.0.1:5000
