# Fighter Biography and Physical Attributes

The file will create a DataFrame named `fighters_detailed_stats_grouped` that includes comprehensive information about each fighter. This DataFrame will contain the following columns:

1. **Name**: The name of the fighter.
2. **Wins**: The total number of wins the fighter has achieved.
3. **Losses**: The total number of losses the fighter has experienced.
4. **Height**: The fighter's height, typically measured in centimeters or inches.
5. **Weight**: The fighter's weight, usually in kilograms or pounds.
6. **Reach**: The reach of the fighter, which is the arm span measurement.
7. **Stance**: The fighting stance (e.g., Orthodox, Southpaw) used by the fighter.
8. **Age**: The age of the fighter at the time of the recorded fight.
9. **SLpM (Significant Strikes Landed per Minute)**: A measure of how many significant strikes the fighter lands per minute on average.
10. **Sig_str_acc (Significant Strike Accuracy)**: The accuracy percentage of significant strikes.
11. **SApM (Significant Strikes Absorbed per Minute)**: The average number of significant strikes absorbed by the fighter per minute.
12. **Str_def (Strike Defense)**: The percentage of significant strikes the fighter successfully defends against.
13. **Td_avg (Takedown Average per 15 Minutes)**: The average number of successful takedowns the fighter achieves in a 15-minute period.
14. **Td_acc (Takedown Accuracy)**: The success rate percentage for takedown attempts.
15. **Td_def (Takedown Defense)**: The percentage of opponent's takedown attempts that the fighter successfully defends against.
16. **Sub_avg (Submission Average per 15 Minutes)**: The average number of submission attempts made by the fighter in a 15-minute period.
17. **r_str_x (Recorded Strikes)**: The number of strikes recorded in the fight data.
18. **r_td_x (Recorded Takedowns)**: The number of takedowns recorded in the fight data.
19. **r_sub_att (Recorded Submission Attempts)**: The number of submission attempts recorded.
20. **Event_name**: A list of events in which the fighter has participated, combined into a single string.
21. **Winner**: The outcome for the fighter in each event, represented as a list of win/loss records.
22. **Method**: The method of victory or defeat (e.g., KO, submission), listed for each fight.
23. **Gender**: The gender of the fighter (male or female).

This DataFrame is intended for analyzing fighters' career statistics, evaluating performance metrics, and supporting tasks such as predicting outcomes, understanding fighting styles, and generating fighter profiles for reports or visualizations. It will aggregate and organize data from various sources to provide a detailed view of each athlete's career and fight history.

## Preliminary Data Assessment for UFC Datasets

1. Loading Data and Analyzing Structure
This section loads five different datasets related to UFC: fighter_stats, large_dataset, medium_dataset, ufc, and ufc_master. After loading the data, the column names for each dataset are displayed to understand their structure and identify any common variables or differences.
2. Checking for Missing Values
The code checks for missing values in each column of every dataset. This step is essential for identifying data quality issues that may need to be addressed, such as filling in missing values or removing problematic data.
3. Checking for Duplicates
This part of the code checks for duplicate rows in each dataset, helping to detect potential data errors that could affect further analysis and model performance.

In [18]:
import pandas as pd

# Load the datasets
fighter_stats = pd.read_csv('data/raw/Fighter_stats/fighter_stats.csv')
large_dataset = pd.read_csv('data/raw/Large_set/large_dataset.csv')
medium_dataset = pd.read_csv('data/raw/Medium_set/medium_dataset.csv')
ufc = pd.read_csv('data/raw/ufc.csv')
ufc_master = pd.read_csv('data/raw/ufc-master.csv')

# Display the column names of each dataset to understand their structure
fighter_stats_columns = fighter_stats.columns
large_dataset_columns = large_dataset.columns
medium_dataset_columns = medium_dataset.columns
ufc_columns = ufc.columns
ufc_master_columns = ufc_master.columns

# Check for missing values in each dataset
fighter_stats_missing = fighter_stats.isnull().sum()
large_dataset_missing = large_dataset.isnull().sum()
medium_dataset_missing = medium_dataset.isnull().sum()
ufc_missing = ufc.isnull().sum()
ufc_master_missing = ufc_master.isnull().sum()

# Check for duplicates in each dataset
fighter_stats_duplicates = fighter_stats.duplicated().sum()
large_dataset_duplicates = large_dataset.duplicated().sum()
medium_dataset_duplicates = medium_dataset.duplicated().sum()
ufc_duplicates = ufc.duplicated().sum()
ufc_master_duplicates = ufc_master.duplicated().sum()

# Output the information for analysis with headings for clarity
print("=== Column Names for Each Dataset ===")
print("Fighter Stats Columns:", fighter_stats_columns)
print("Large Dataset Columns:", large_dataset_columns)
print("Medium Dataset Columns:", medium_dataset_columns)
print("UFC Columns:", ufc_columns)
print("UFC Master Columns:", ufc_master_columns)
print("\n")

print("=== Missing Values in Each Dataset ===")
print("Fighter Stats Missing Values:\n", fighter_stats_missing)
print("Large Dataset Missing Values:\n", large_dataset_missing)
print("Medium Dataset Missing Values:\n", medium_dataset_missing)
print("UFC Missing Values:\n", ufc_missing)
print("UFC Master Missing Values:\n", ufc_master_missing)
print("\n")

print("=== Duplicate Rows in Each Dataset ===")
print("Fighter Stats Duplicates:", fighter_stats_duplicates)
print("Large Dataset Duplicates:", large_dataset_duplicates)
print("Medium Dataset Duplicates:", medium_dataset_duplicates)
print("UFC Duplicates:", ufc_duplicates)
print("UFC Master Duplicates:", ufc_master_duplicates)

=== Column Names for Each Dataset ===
Fighter Stats Columns: Index(['name', 'wins', 'losses', 'height', 'weight', 'reach', 'stance', 'age',
       'SLpM', 'sig_str_acc', 'SApM', 'str_def', 'td_avg', 'td_acc', 'td_def',
       'sub_avg'],
      dtype='object')
Large Dataset Columns: Index(['event_name', 'r_fighter', 'b_fighter', 'winner', 'weight_class',
       'is_title_bout', 'gender', 'method', 'finish_round', 'total_rounds',
       'time_sec', 'referee', 'r_kd', 'r_sig_str', 'r_sig_str_att',
       'r_sig_str_acc', 'r_str', 'r_str_att', 'r_str_acc', 'r_td', 'r_td_att',
       'r_td_acc', 'r_sub_att', 'r_rev', 'r_ctrl_sec', 'r_wins_total',
       'r_losses_total', 'r_age', 'r_height', 'r_weight', 'r_reach',
       'r_stance', 'r_SLpM_total', 'r_SApM_total', 'r_sig_str_acc_total',
       'r_td_acc_total', 'r_str_def_total', 'r_td_def_total', 'r_sub_avg',
       'r_td_avg', 'b_kd', 'b_sig_str', 'b_sig_str_att', 'b_sig_str_acc',
       'b_str', 'b_str_att', 'b_str_acc', 'b_td', 'b_td_at

## Consolidating UFC Fighter Profiles and Fight Statistics

The code performs the following steps:

### 1. **Remove Duplicate Records**
   - Duplicates are dropped from each dataset using specific columns to ensure that each fighter and event combination appears only once. This step is crucial before merging to prevent redundant data.

### 2. **Merge Datasets**
   - The fighter statistics are merged with multiple datasets (`large_dataset`, `medium_dataset`, `ufc`, and `ufc_master`) using the fighters' names as keys. The merge is conducted in a left-join manner to retain all fighter data and add information about events, victory methods, and other statistics.

### 3. **Group and Aggregate Data**
   - The merged data is grouped by fighter name, and various columns are aggregated:
     - `event_name`, `winner`, and `Method`: These fields are aggregated as comma-separated lists of unique values.
     - Fighter statistics (`wins`, `losses`, `height`, etc.): The first available value is taken for each attribute.

### 4. **Select Relevant Columns**
   - Only the necessary columns are kept for the final output, including fighter details, events, results, victory methods, and gender.

### 5. **Save the Resulting DataFrame**
   - The cleaned and aggregated DataFrame is saved as a CSV file (`fighter_profile_stats.csv`) in the processed data folder for future use.

In [23]:
# Remove duplicates before merging
fighter_stats = fighter_stats.drop_duplicates(subset='name')
large_dataset = large_dataset.drop_duplicates(subset=['r_fighter', 'event_name'])
medium_dataset = medium_dataset.drop_duplicates(subset=['r_fighter', 'event'])
ufc = ufc.drop_duplicates(subset=['Fighter 1', 'Event Name'])
ufc_master = ufc_master.drop_duplicates(subset=['RedFighter', 'Date'])

# Merge the fighter statistics with the fight data, including 'Method' from ufc.csv
fighters_detailed_stats = pd.merge(fighter_stats, large_dataset, left_on='name', right_on='r_fighter', how='left')
fighters_detailed_stats = pd.merge(fighters_detailed_stats, medium_dataset, left_on='name', right_on='r_fighter', how='left')
fighters_detailed_stats = pd.merge(fighters_detailed_stats, ufc[['Fighter 1', 'Event Name', 'Method']], left_on='name', right_on='Fighter 1', how='left')
fighters_detailed_stats = pd.merge(fighters_detailed_stats, ufc_master, left_on='name', right_on='RedFighter', how='left')

# Group and aggregate data to create lists of events, results, and victory methods for each fighter, and add gender
fighters_detailed_stats_grouped = fighters_detailed_stats.groupby('name').agg({
    'event_name': lambda x: ', '.join(x.dropna().unique()),  # List unique events separated by commas
    'winner': lambda x: ', '.join(x.dropna().unique()),      # List unique results separated by commas
    'Method': lambda x: ', '.join(x.dropna().unique()),      # List victory methods (KO, submission, etc.)
    'gender': 'first',                                       # Take the first gender value (male/female)
    'wins': 'first',                                         # Keep the original fighter's data
    'losses': 'first',
    'height': 'first',
    'weight': 'first',
    'reach': 'first',
    'stance': 'first',
    'age': 'first',
    'SLpM': 'first',
    'sig_str_acc': 'first',
    'SApM': 'first',
    'str_def': 'first',
    'td_avg': 'first',
    'td_acc': 'first',
    'td_def': 'first',
    'sub_avg': 'first',
    'r_str_x': 'first',     # Fight statistics columns
    'r_td_x': 'first',
    'r_sub_att': 'first'
}).reset_index()

# Select the relevant columns including events, victory methods, gender, and results
relevant_columns = ['name', 'wins', 'losses', 'height', 'weight', 'reach', 'stance', 'age', 
                    'SLpM', 'sig_str_acc', 'SApM', 'str_def', 'td_avg', 'td_acc', 'td_def', 'sub_avg', 
                    'r_str_x', 'r_td_x', 'r_sub_att', 'event_name', 'winner', 'Method', 'gender']

# Keep only the necessary columns
fighters_detailed_stats_grouped = fighters_detailed_stats_grouped[relevant_columns]

# Display the first few rows of the resulting DataFrame
print(fighters_detailed_stats_grouped.head())

# Save the final DataFrame to a CSV file
'''fighters_detailed_stats_grouped.to_csv('data/processed/fighter_profile_stats.csv', index=False)

print("File containing fighters, events, victory methods, gender, and results has been saved to 'data/processed/fighters_detailed_stats_with_events_methods_and_wins.csv'.")'''


            name  wins  losses  height  weight   reach    stance   age  SLpM  \
0  AJ Cunningham  11.0     4.0  177.80   70.31  180.34  Orthodox  29.0  7.66   
1      AJ Dobson   7.0     3.0  185.42   83.91  193.04  Orthodox  32.0  4.29   
2    AJ Fletcher  10.0     3.0  177.80   77.11  170.18    Switch  27.0  3.36   
3     Aalon Cruz   8.0     4.0  182.88   65.77  198.12    Switch  34.0  7.58   
4    Aaron Brink  26.0    26.0  190.50   92.99     NaN  Orthodox  49.0  3.49   

   sig_str_acc  ...  td_acc  td_def  sub_avg  r_str_x  r_td_x  r_sub_att  \
0         0.29  ...    0.00    0.33      0.0      NaN     NaN        NaN   
1         0.46  ...    0.75    0.65      0.3     98.0     2.0        0.0   
2         0.49  ...    0.35    0.33      0.9     23.0     0.0        1.0   
3         0.39  ...    0.00    0.00      0.0      2.0     0.0        0.0   
4         0.42  ...    0.00    0.00      0.0      NaN     NaN        NaN   

                                          event_name     winne

## Data Quality Assessment for UFC Fighter Profiles


1. **Calculating Missing Values**: Counts missing values in each column.
2. **Checking for Duplicates**: Counts duplicate rows.
3. **Generating Statistical Summary**: Provides summary statistics for numerical data.
4. **Listing Unique Values**: Shows unique values for 'gender' and 'stance'.
5. **Displaying Data Types**: Lists the data type of each column.



In [20]:

# Calculate missing values
missing_values = fighters_detailed_stats_grouped.isnull().sum()

# Count the number of duplicate rows
duplicate_rows = fighters_detailed_stats_grouped.duplicated().sum()

# Get the statistical summary
stats_summary = fighters_detailed_stats_grouped.describe()

# Get unique values for certain columns
unique_genders = fighters_detailed_stats_grouped['gender'].unique()
unique_stances = fighters_detailed_stats_grouped['stance'].unique()

# Get data types of each column
data_types = fighters_detailed_stats_grouped.dtypes

# Combine the results into a DataFrame
quality_check_df = pd.DataFrame({
    'Missing Values': missing_values,
    'Data Type': data_types,
})

# Add unique value information for 'gender' and 'stance'
quality_check_df.loc['Unique Genders'] = [', '.join(map(str, unique_genders)), '']
quality_check_df.loc['Unique Stances'] = [', '.join(map(str, unique_stances)), '']

# Convert the number of duplicate rows into a DataFrame row
quality_check_df.loc['Duplicate Rows'] = [duplicate_rows, '']

# Display the new DataFrame
print(quality_check_df)

# If needed, save this quality check DataFrame to a CSV file
'''quality_check_df.to_csv('data/processed/quality_check_fighters_stats.csv', index=True)

print("Quality check results saved to 'data/processed/quality_check_fighters_stats.csv'.")'''


                                                   Missing Values Data Type
name                                                            0    object
wins                                                            0   float64
losses                                                          0   float64
height                                                          0   float64
weight                                                          0   float64
reach                                                         651   float64
stance                                                         75    object
age                                                           156   float64
SLpM                                                            0   float64
sig_str_acc                                                     0   float64
SApM                                                            0   float64
str_def                                                         0   float64
td_avg      

## Filtering Inactive UFC Fighters from the Dataset


1. **Check Fighter Activity Status**:
   - For each fighter, it constructs a URL to their UFC profile page and checks if the page contains the 'Not Fighting' tag.
   - If the tag is found or there's an error loading the page, the fighter is considered inactive.

2. **Filter Inactive Fighters**:
   - Adds a column indicating whether the fighter is active. Inactive fighters are then removed from the DataFrame.


In [21]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Function to check if the fighter is active
def is_fighter_active(fighter_name):
    # Create the URL for the fighter's page on the UFC website
    fighter_url = f"https://www.ufc.com/athlete/{fighter_name.replace(' ', '-').lower()}"
    
    try:
        # Load the fighter's page
        response = requests.get(fighter_url)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, 'html.parser')

        # Check if the tag with class 'hero-profile__tag' and text 'Not Fighting' exists
        tag = soup.find('p', class_='hero-profile__tag')
        if tag and 'Not Fighting' in tag.text:
            print(f"Removing inactive fighter: {fighter_name} ({fighter_url})")
            return False  # Fighter is inactive
        return True  # Fighter is active
    except requests.RequestException as e:
        print(f"Error loading page for {fighter_name}: {e}")
        return False  # If there's an error loading the page, consider the fighter inactive

# Function to filter inactive fighters and update the existing DataFrame
def filter_inactive_fighters(df):
    # Use boolean indexing to keep only active fighters
    df['active'] = df['name'].apply(is_fighter_active)
    df = df[df['active']].drop(columns=['active'])
    return df

# Apply the filter to the existing DataFrame
fighters_detailed_stats_grouped = filter_inactive_fighters(fighters_detailed_stats_grouped)




Error loading page for Billy Miles: 403 Client Error: Forbidden for url: https://www.ufc.com/athlete/billy-miles
Error loading page for Brandon Melendez: 403 Client Error: Forbidden for url: https://www.ufc.com/athlete/brandon-melendez
Error loading page for Daniel Sarafian: 403 Client Error: Forbidden for url: https://www.ufc.com/athlete/daniel-sarafian
Error loading page for David Roberts: 403 Client Error: Forbidden for url: https://www.ufc.com/athlete/david-roberts
Error loading page for Jason Godsey: 403 Client Error: Forbidden for url: https://www.ufc.com/athlete/jason-godsey
Error loading page for Lance Gibson: 403 Client Error: Forbidden for url: https://www.ufc.com/athlete/lance-gibson
Error loading page for Steven Graham: 403 Client Error: Forbidden for url: https://www.ufc.com/athlete/steven-graham
Error loading page for Zelim Imadaev: 403 Client Error: Forbidden for url: https://www.ufc.com/athlete/zelim-imadaev
The filtered DataFrame has been updated and saved to 'data/pro

## Post-Filtering Quality Check for UFC Fighter Dataset


1. **Calculate Missing Values**: Counts missing values in each column.
2. **Check for Duplicates**: Counts duplicate rows.
3. **Generate Statistical Summary**: Provides a summary of numerical data.
4. **List Unique Values**: Shows unique values for 'gender' and 'stance'.
5. **Display Data Types**: Lists each column's data type.


In [22]:
import pandas as pd

# Assuming your DataFrame is named 'fighters_detailed_stats_grouped'

# Calculate missing values
missing_values = fighters_detailed_stats_grouped.isnull().sum()

# Count the number of duplicate rows
duplicate_rows = fighters_detailed_stats_grouped.duplicated().sum()

# Get the statistical summary
stats_summary = fighters_detailed_stats_grouped.describe()

# Get unique values for certain columns
unique_genders = fighters_detailed_stats_grouped['gender'].unique()
unique_stances = fighters_detailed_stats_grouped['stance'].unique()

# Get data types of each column
data_types = fighters_detailed_stats_grouped.dtypes

# Combine the results into a DataFrame
quality_check_df = pd.DataFrame({
    'Missing Values': missing_values,
    'Data Type': data_types,
})

# Add unique value information for 'gender' and 'stance'
quality_check_df.loc['Unique Genders'] = [', '.join(map(str, unique_genders)), '']
quality_check_df.loc['Unique Stances'] = [', '.join(map(str, unique_stances)), '']

# Convert the number of duplicate rows into a DataFrame row
quality_check_df.loc['Duplicate Rows'] = [duplicate_rows, '']

# Display the new DataFrame
print(quality_check_df)

                                                   Missing Values Data Type
name                                                            0    object
wins                                                            0   float64
losses                                                          0   float64
height                                                          0   float64
weight                                                          0   float64
reach                                                         585   float64
stance                                                         55    object
age                                                           102   float64
SLpM                                                            0   float64
sig_str_acc                                                     0   float64
SApM                                                            0   float64
str_def                                                         0   float64
td_avg      

## Filling Missing Values in UFC Fighter Dataset Using Median


1. **Fill Missing Values**:
   - For numeric columns, missing values are replaced with the median of the column.
   - Non-numeric columns remain unchanged.

2. **Check for Remaining Missing Values**:
   - The code then checks if any missing values remain after the filling process.


In [6]:
import pandas as pd

# Fill missing values with the median for numeric columns only
fighters_detailed_stats_grouped = fighters_detailed_stats_grouped.apply(
    lambda x: x.fillna(x.median()) if x.dtype.kind in 'biufc' else x
)

# Check if there are still any missing values
missing_values = fighters_detailed_stats_grouped.isnull().sum()
print("Missing values after filling with median:\n", missing_values)


Пропущенные значения после заполнения медианой:
 name             0
wins             0
losses           0
height           0
weight           0
reach            0
stance          75
age              0
SLpM             0
sig_str_acc      0
SApM             0
str_def          0
td_avg           0
td_acc           0
td_def           0
sub_avg          0
r_str_x          0
r_td_x           0
r_sub_att        0
event_name       0
winner           0
Method           0
gender         676
dtype: int64


In [None]:
# Save the updated DataFrame to a CSV file
fighters_detailed_stats_grouped.to_csv('data/processed/filtered_fighters.csv', index=False)
print("The filtered DataFrame has been updated and saved to 'data/processed/filtered_fighters.csv'")

## Preliminary Work and Experimental Ideas for UFC Data Processing

#### This code represents preliminary work and experiments aimed at processing UFC fighter data. It includes testing various methods of filling in missing values, filtering data, and checking data quality. At this point, these are just trial ideas that have not made it to final release or full-fledged analysis.

In [24]:
# Сохраняем обновленный DataFrame в новый CSV файл
fighters_detailed_stats_grouped.to_csv('data/processed/fighters_detailed_stats_filled_new.csv', index=False)

print("Обновленный файл с заполненными значениями сохранен как 'data/processed/fighters_detailed_stats_filled_new.csv'.")


Обновленный файл с заполненными значениями сохранен как 'data/processed/fighters_detailed_stats_filled_new.csv'.


In [1]:
pip install openai


Collecting openai
  Downloading openai-1.52.2-py3-none-any.whl.metadata (24 kB)
Collecting anyio<5,>=3.5.0 (from openai)
  Using cached anyio-4.6.2.post1-py3-none-any.whl.metadata (4.7 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Using cached httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.6.1-cp311-cp311-macosx_11_0_arm64.whl.metadata (5.2 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Using cached pydantic-2.9.2-py3-none-any.whl.metadata (149 kB)
Collecting sniffio (from openai)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting tqdm>4 (from openai)
  Using cached tqdm-4.66.5-py3-none-any.whl.metadata (57 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Using cached httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.

In [119]:
import pandas as pd
import json
from openai import OpenAI

client=OpenAI(api_key='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')

# Function to call ChatGPT
def get_chatgpt_response(user_text):
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "user", "content": user_text}
            ],
        )
        response_text = response.choices[0].message.content
        return response_text
    except Exception as e:
        return f"Ошибка при обращении к OpenAI: {str(e)}"

# Function to get the r_str_x data for a fighter
def get_fighter_r_str_x(fighter_info):
    user_text = (
        f"I need information about the fighter {fighter_info['name']}.\n"
        "Specifically, provide the value for Recorded Strikes (r_str_x) as a float "
        "in their UFC statistics. Return the response in JSON format, like this: "
        "{\"r_str_x\": value}."
    )
    
    chatgpt_response = get_chatgpt_response(user_text)

    # Try parsing the response as JSON and extract r_str_x
    try:
        response_json = json.loads(chatgpt_response)
        r_str_x_value = response_json.get("r_str_x")
        if isinstance(r_str_x_value, (int, float)):
            return r_str_x_value
        else:
            print(f"Invalid r_str_x format for {fighter_info['name']}: {chatgpt_response}")
            return None
    except (json.JSONDecodeError, TypeError):
        print(f"Не удалось получить значение r_str_x для {fighter_info['name']}: {chatgpt_response}")
        return None

# Example usage with an existing DataFrame `fighters_detailed_stats_grouped`
for i, row in fighters_detailed_stats_grouped.head(5).iterrows():  # Taking the first 5 fighters as an example
    fighter_info = {
        'name': row['name']
    }

    # Get r_str_x value for the current fighter
    r_str_x_value = get_fighter_r_str_x(fighter_info)
    if r_str_x_value is not None:
        # Update the r_str_x column for the DataFrame
        fighters_detailed_stats_grouped.at[i, 'r_str_x'] = r_str_x_value
        print(f"Успешно обновлено r_str_x для {fighter_info['name']}")
    else:
        # Set to NaN if the value couldn't be retrieved
        fighters_detailed_stats_grouped.at[i, 'r_str_x'] = pd.NA
        print(f"Ошибка при обновлении r_str_x для {fighter_info['name']}")

# Display the updated DataFrame
print(fighters_detailed_stats_grouped[['name', 'r_str_x']].head(5))


Не удалось получить значение r_str_x для AJ Cunningham: I'm sorry, but I don't have real-time access to databases or the internet to provide the current statistics of AJ Cunningham or any UFC fighter. For up-to-date information, please check the official UFC website or other reliable sports statistics sources.
Ошибка при обновлении r_str_x для AJ Cunningham
Не удалось получить значение r_str_x для AJ Dobson: As of my last update, I can't provide real-time data or the latest sports statistics. For the most accurate and up-to-date statistics for AJ Dobson, such as "Recorded Strikes (r_str_x)," you should check the official UFC statistics or another reputable sports statistics database. They often provide detailed fighter statistics. Once you locate the correct value, you can format it in JSON like this:

```json
{"r_str_x": value}
```

Simply replace `value` with the actual figure you find.
Ошибка при обновлении r_str_x для AJ Dobson
Не удалось получить значение r_str_x для AJ Fletcher: 

In [122]:
import pandas as pd
import json
from openai import OpenAI

client = OpenAI(api_key='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')

# Create a new DataFrame with only the 'name' column and new columns for additional information
new_columns = ['name', 'wins', 'losses', 'height', 'weight', 'reach', 'stance', 'age', 'r_str_x', 'r_td_x', 'r_sub_att']
fighters_data = pd.DataFrame(columns=new_columns)

# Fill the 'name' column with the first 5 fighters' names from the original DataFrame
fighters_data['name'] = fighters_detailed_stats_grouped.head(5)['name']

# Function to call ChatGPT
def get_chatgpt_response(user_text):
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "user", "content": user_text}
            ],
        )
        response_text = response.choices[0].message.content
        return response_text
    except Exception as e:
        return f"Ошибка при обращении к OpenAI: {str(e)}"

# Function to fill the DataFrame with ChatGPT's responses
def fill_fighter_data_with_chatgpt(df, index, fighter_name):
    # Request information about the fighter in JSON format
    user_text = (
        f"I need detailed information about the fighter {fighter_name} in JSON format:\n"
        "{\"wins\": value, \"losses\": value, \"height\": \"value\", \"weight\": \"value\", "
        "\"reach\": \"value\", \"stance\": \"value\", \"age\": value, "
        "\"r_str_x\": value, \"r_td_x\": value, \"r_sub_att\": value}"
    )
    
    chatgpt_response = get_chatgpt_response(user_text)
    
    # Try parsing the response as JSON and filling the DataFrame
    try:
        fighter_data = json.loads(chatgpt_response)
        for key in fighter_data:
            if key in df.columns:
                df.at[index, key] = fighter_data[key]
        print(f"Successfully updated data for {fighter_name}")
    except (json.JSONDecodeError, TypeError):
        print(f"Failed to parse or update data for {fighter_name}: {chatgpt_response}")

# Fill the new DataFrame with data from ChatGPT
for i, row in fighters_data.iterrows():
    fill_fighter_data_with_chatgpt(fighters_data, i, row['name'])

# Display the updated DataFrame
print(fighters_data)


Failed to parse or update data for AJ Cunningham: As of my last update in October 2023, I don't have specific detailed statistics or current fight-related metrics for individual fighters like AJ Cunningham beyond what might be available through broader sports or MMA-specific databases and resources. For the most current and detailed statistics on fighters like AJ Cunningham, check the official UFC website, MMA record keeper resources like Sherdog or Tapology, or look for athlete profiles and announcements from the fighter's management team.

However, I can help you create a JSON template based on typical fighter statistics categories, which you can fill in with AJ Cunningham's current data as you find it:

```json
{
  "wins": null,
  "losses": null,
  "height": "null",
  "weight": "null",
  "reach": "null",
  "stance": "null",
  "age": null,
  "r_str_x": null,
  "r_td_x": null,
  "r_sub_att": null
}
```

Replace `null` with specific values as you acquire the information. Here is a brie

In [8]:
import pandas as pd
from bs4 import BeautifulSoup

def parse_fighter_info_from_html(file_path):
    # Открытие и чтение HTML файла
    with open(file_path, 'r', encoding='utf-8') as file:
        content = file.read()

    soup = BeautifulSoup(content, 'html.parser')
    
    # Извлечение данных
    name = soup.find('h1', class_='hero-profile__name').text.strip() if soup.find('h1', class_='hero-profile__name') else 'Не указано'
    nickname = soup.find('p', class_='hero-profile__nickname').text.strip() if soup.find('p', class_='hero-profile__nickname') else 'Нет'
    record_text = soup.find('p', class_='hero-profile__division-body').text.strip() if soup.find('p', class_='hero-profile__division-body') else 'Нет данных'
    weight_class = soup.find('p', class_='hero-profile__division-title').text.strip() if soup.find('p', class_='hero-profile__division-title') else 'Нет данных'
    
    # Разбиение рекорда на победы, поражения и ничьи
    wins, losses, draws = None, None, None
    if record_text and '-' in record_text:
        record_parts = record_text.split('-')
        if len(record_parts) == 3:  # Ожидается формат "W-L-D"
            try:
                wins, losses, draws = map(int, record_parts)
            except ValueError:
                print(f"Не удалось преобразовать рекорд: {record_text}")
        elif len(record_parts) == 2:  # Возможно, формат "W-L", без ничьих
            try:
                wins, losses = map(int, record_parts)
                draws = 0  # Установим ничьи в 0, если информация отсутствует
            except ValueError:
                print(f"Не удалось преобразовать рекорд: {record_text}")
    
    # Возврат результата в виде словаря
    return {
        'Name': name,
        'Nickname': nickname,
        'Weight Class': weight_class,
        'Wins': wins,
        'Losses': losses,
        'Draws': draws,
    }

# Список путей к файлам HTML
html_files = [
    'fighters_stats/all_fighters_stats.html',
    # Добавить еще пути к файлам
]

# Сбор данных в DataFrame
fighters_data = []

for file_path in html_files:
    fighter_info = parse_fighter_info_from_html(file_path)
    if fighter_info:
        fighters_data.append(fighter_info)

fighters_df = pd.DataFrame(fighters_data)

# Показать результат
print(fighters_df)


           Name     Nickname           Weight Class  Wins Losses Draws
0  Bryan Battle  "Pooh Bear"  Welterweight Division  None   None  None
