In [3]:
project_md = """
# Soccer Player Market Value Analysis

This document analyzes predicted vs. actual market values for top soccer players across different seasons using a linear regression model with performance-based features.

---

## Features Used in the Model

- xG Per Avg Match  
- Shots  
- OnTarget  
- Shots Per Avg Match  
- On Target Per Avg Match  
- goals  
- goals_per_shot  
- assists  
- passes_completed  
- assisted_shots  
- touches  
- height  
- games_starts  
- minutes  

> **Note:** `passes_blocked` was not included due to missing data for some players.

---

## 1. Messi

### Prime Season: 2011-2012

| Feature | Value |
|---------|-------|
| xG Per Avg Match | 1.35 |
| Shots | 278 |
| OnTarget | 135 |
| Shots Per Avg Match | 4.63 |
| On Target Per Avg Match | 2.25 |
| goals | 73 |
| goals_per_shot | 0.263 |
| assists | 30 |
| passes_completed | 3700 |
| assisted_shots | 125 |
| touches | 2008 |
| height | 170 |
| games_starts | 37 |
| minutes | 3269 |

- **Actual Market Value:** 150-200 Million Euros  
- **Predicted Market Value:** 258.57 Million Euros  
- **Observation:** This was Messi's peak season. The model should reflect a very high market value due to exceptional goal-scoring and assists.

### 2019-2020 Season

| Feature | Value |
|---------|-------|
| xG Per Avg Match | 0.63 |
| Shots | 39 |
| OnTarget | 19 |
| Shots Per Avg Match | 4.5 |
| On Target Per Avg Match | 2.19 |
| goals | 25 |
| goals_per_shot | 0.13 |
| assists | 21 |
| passes_completed | 1700 |
| assisted_shots | 86 |
| touches | 2614 |
| height | 170 |
| games_starts | 32 |
| minutes | 2880 |

- **Actual Market Value:** 112 Million Euros  
- **Predicted Market Value:** 106.10 Million Euros  
- **Observation:** The model may underestimate Messi's value due to missing context-based features, e.g., influence in key matches.

---

## 2. Ronaldo

### Prime Season: 2017-2018

| Feature | Value |
|---------|-------|
| xG Per Avg Match | 0.7 |
| Shots | 260 |
| OnTarget | 111 |
| Shots Per Avg Match | 4.81 |
| On Target Per Avg Match | 2.06 |
| goals | 44 |
| goals_per_shot | 0.169 |
| assists | 8 |
| passes_completed | 1210 |
| assisted_shots | 53 |
| touches | 1174 |
| height | 188 |
| games_starts | 44 |
| minutes | 3670 |

- **Actual Market Value:** ~135 Million Euros  
- **Predicted Market Value:** 220.62 Million Euros  
- **Observation:** Ronaldo's prime season shows high goal-scoring efficiency; model predictions should reflect his market value peak.

### 2019-2020 Season

| Feature | Value |
|---------|-------|
| xG Per Avg Match | 1.27 |
| Shots | 26 |
| OnTarget | 13 |
| Shots Per Avg Match | 6.22 |
| On Target Per Avg Match | 3.11 |
| goals | 31 |
| goals_per_shot | 0.1 |
| assists | 5 |
| passes_completed | 1086 |
| assisted_shots | 50 |
| touches | 1761 |
| height | 187 |
| games_starts | 33 |
| minutes | 2917 |

- **Actual Market Value:** 60 Million Euros  
- **Predicted Market Value:** 69.69 Million Euros  
- **Observation:** The model slightly overestimates Ronaldo's market value — despite strong goal stats, his lower assists, reduced passing volume, and age likely caused his real-world market value to drop.

---

## 3. Analysis Observations

- Players’ **prime seasons** generally result in higher predicted market value due to better stats (goals, assists, xG, etc.).  
- Model may **underestimate value in shorter or injury-affected seasons** (e.g., 2019-2020 for Messi or Ronaldo).  
- Adding features like **team influence, competition level, or key match performance** could improve prediction accuracy.
"""



import os

md_file_path = "summary/soccer_market_value_analysis.md"


with open(md_file_path, "w") as f:
    f.write(project_md)

print(f"Markdown file saved to: {md_file_path}")


project_summary_md = """
# Soccer Player Market Value Prediction – Project Summary

This project explores predicting soccer players’ market values using performance statistics from various seasons. The main goal is to understand how measurable on-field performance can explain or estimate a player’s monetary value in the transfer market.

---

## 1. Objective

- Predict the market value of top soccer players based on key performance metrics.  
- Compare predicted values to actual market values to identify patterns and discrepancies.  
- Analyze the effect of prime seasons versus less productive seasons on market valuation.

---

## 2. Data

- Player data sourced from `soccer_player.csv`.  
- Seasons analyzed include Messi 2011-2012, Messi 2019-2020, Ronaldo 2017-2018, Ronaldo 2019-2020, Lewandowski 2019-2020, and Harry Kane 2019-2020.  
- Features used in the model:
  - xG Per Avg Match  
  - Shots, OnTarget  
  - Shots Per Avg Match, On Target Per Avg Match  
  - Goals, Goals per Shot, Assists  
  - Passes Completed, Assisted Shots, Touches  
  - Height, Games Started, Minutes Played  

> **Note:** `passes_blocked` was not included due to missing data for several players.

---

## 3. Methodology

- A **linear regression model** was implemented using gradient descent from scratch.  
- Players’ feature vectors were normalized using mean and standard deviation from the dataset.  
- Predictions were computed and compared against actual market values.

---

## 4. Key Observations

- **Prime seasons** result in higher predicted market values due to strong statistics (e.g., Messi 2011-2012, Ronaldo 2017-2018).  
- The model tends to **overestimate market values** when a player has exceptional goal-scoring efficiency but lower context-based metrics like assists or passes.  
- The model tends to **underestimate market values** for shorter seasons or injury-affected seasons (e.g., Harry Kane 2019-2020).  
- Metrics like R², MAE, RMSE, and MAPE indicate moderate predictive performance, which is reasonable given the linear nature of the model and limited features.

---

## 5. Limitations & Future Work

- Only linear regression was used; other models (e.g., random forest, XGBoost, or neural networks) may capture non-linear relationships better.  
- Important contextual features like team influence, competition level, and key match performance are missing.  
- Expanding the dataset to include more seasons and players could improve model accuracy.  
- Visualization of predictions vs. actual values helps identify over- and under-estimations clearly.

---

## 6. Deliverables

- **Plots** comparing predicted vs. actual market values for top players.  
- **Markdown reports** summarizing predictions, observations, and analysis.  
- **Evaluation metrics** to quantify model performance.  

---

## 7. Model Evaluation Metrics

The following metrics were calculated on the dataset:

- **R² Score:** 0.438  
- **Mean Absolute Error (MAE):** 15,841,811.81 Euros  
- **Root Mean Squared Error (RMSE):** 21,637,107.99 Euros  
- **Mean Absolute Percentage Error (MAPE):** 336.31%  

> **Observation:** The linear model captures some trends but struggles with extreme values (e.g., prime seasons), explaining why high-performing players like Messi 2011-2012 have predicted values far above actual market value.


This project provides a foundation for understanding how performance metrics translate into market value and serves as a stepping stone for more advanced player valuation models.
"""


summary_file_path = "summary/project_overview.md"

with open(summary_file_path, "w") as f:
    f.write(project_summary_md)

print(f"Project summary Markdown saved to: {summary_file_path}")


Markdown file saved to: summary/soccer_market_value_analysis.md
Project summary Markdown saved to: summary/project_overview.md
