# Process Data with Module

### Instructions
Click 'Run all' and procede to `03_retrieve_odds.ipynb`
The first cell prints the dataframes if you need to check that everything looks correct.
Procede to `03_retrieve_odds.ipynb`


This notebook shows the usage for processing raw data with the `ProcessData` module after running `01_scrape_and_save_data.ipynb`
Logic from `02_process_data.ipynb` is now performed by `ProcessData` module to symplify a frequently repeated task.

`ProcessData()` initializes an object with cleaned dataframes containing calculated `gpm_scored` (goals per match scored) and `gpm_conceded` (goals per match conceded) columns.
Home and Away dataframes are stored in the ProcessData object. They can be accessed with `<obj_name>.home_df` and `<obj_name>.away_df` as needed.

### Note on file structure:
Currently this method requires the following directory structure and naming:
```bash
├── README.md
├── data
│   ├── processed
│   │   ├── away_table.csv
│   │   ├── home_table.csv
│   │   └── odds
│   └── raw
│       ├── away_table_raw.csv
│       └── home_table_raw.csv
├── notebooks
├── outputs
├── requirements.txt
├── src
```
If alternate file paths are needed they can be passed as parameters to ProcessData as 'home_table_raw_path` and 'away_table_raw_path` eg.
```python
data = ProcessData(
    home_table_raw_path='path/to/raw_home_table',
    away_table_raw_path='path/to/raw_away_table'
)
```
Likewise, `save_to_csv` can be customized with parameter 'directory_path' eg.
```python
data.save_to_csv(directory_path='custom/dir/path')
```
Note csv will always save as `away_table.csv` and `home_table.csv`

In [None]:
import sys
sys.path.append('../src')

from understat_api.process_data import ProcessData

data = ProcessData()

print("\n <~~~ Home Data ~~~>\n", data.home_df)

print("\n\n <~~~ Away Data ~~~>\n", data.away_df)


In [None]:
data.save_to_csv()