# üìò AUTOFIX
This notebook provides a **premium-level consulting explanation** of your `autofix.py` preprocessing step.
It includes:
- Business & data value explanation
- Technical breakdown with reasoning
- Full line‚Äëby‚Äëline code appendix
- Executable code cells


## üìë Table of Contents

- [Part A ‚Äî High-Level Summary](#part-a)
- [Part B ‚Äî Technical Audit](#part-b)
- [Code Appendix & Execution](#appendix)
- [Run ‚Äî Test the Function](#run-test)


---

# üß† Part A ‚Äî High-Level Summary (Consulting View)
<a id="part-a"></a>
### What this preprocessing step achieves
- Converts a raw CSV into a standardized and analysis‚Äëready dataset.
- Fixes common formatting problems: inconsistent separators, messy column names.
- Ensures downstream scripts (cleaning, loading, ML) receive a **stable, predictable structure**.

### Why it matters for a client
- Eliminates the typical *dirty CSV headache* before EDA/ML.
- Guarantees compatibility with automated pipelines.
- Reduces failure risks in production (bad encoding, wrong separators‚Ä¶)


---

# üõ†Ô∏è Part B ‚Äî Technical Audit (How It Works)
<a id="part-b"></a>
Below is the full script, explained in structured sections.


## 1. Imports
```python
import pandas as pd
```
**Why?** Pandas is required to load, clean, and export structured CSV data.


## 2. File paths
```python
INPUT_CSV = "data/raw/animal_data_dirty.csv"
OUTPUT_CSV = INPUT_CSV.replace(".csv", "_reworked.csv")
```
**Why?** Dynamic output naming avoids hardcoding and prevents overwriting the original raw dataset.


## 3. CSV loading with safe error handling
```python
try:
    df: pd.DataFrame = pd.read_csv(path_in, sep=";")
except Exception as e:
    print(f"‚ùå Erreur de lecture : {e}")
    return
```
**Purpose**: Ensure pipeline never crashes because of a malformed file.


## 4. Column normalization
```python
df.columns = (
    df.columns.str.strip()
             .str.replace(" ", "_", regex=False)
             .str.replace("-", "_", regex=False)
)
```
**Why?** Consistent column names = easier merges, ML training, and code stability.


## 5. Saving cleaned file
```python
df.to_csv(path_out, sep=";", index=False, encoding="utf-8")
```
UTF‚Äë8 + `sep=";"` preserves European CSV format.


---

## üíª Code Appendix & Execution
<a id="appendix"></a>
Here is your complete script, ready for study or debugging.


In [None]:
# Full autofix.py code
import pandas as pd

INPUT_CSV = "data/raw/animal_data_dirty.csv"
OUTPUT_CSV = INPUT_CSV.replace(".csv", "_reworked.csv")

def clean_csv(path_in, path_out):
    print(f"üìÇ Lecture : {path_in}")

    try:
        df: pd.DataFrame = pd.read_csv(path_in, sep=";")
    except Exception as e:
        print(f"‚ùå Erreur de lecture : {e}")
        return

    print("‚ú® CSV charg√© avec s√©parateur ';'")

    df.columns = (
        df.columns.str.strip()
                 .str.replace(" ", "_", regex=False)
                 .str.replace("-", "_", regex=False)
    )
    print("‚ú® Noms de colonnes nettoy√©s")

    df.to_csv(path_out, sep=";", index=False, encoding="utf-8")
    print(f"üíæ CSV propre sauvegard√© ‚Üí {path_out}")

if __name__ == "__main__":
    clean_csv(INPUT_CSV, OUTPUT_CSV)


---

# ‚ñ∂Ô∏è RUN ‚Äî Test the function here
<a id="run-test"></a>
Place your raw CSV into `data/raw/` and run the cell below.


In [None]:
from autofix import clean_csv, INPUT_CSV, OUTPUT_CSV
clean_csv(INPUT_CSV, OUTPUT_CSV)