## Scraping Data for the Last 15 Years: 2010–2024

In this notebook, I use the functions defined in the `utils.py` file to scrape transplant data for all transplant centers in Italy over the past 15 years (2010–2024). The scraped data will be saved in the `data_raw` folder, organized by year and organ. 

Once collected, the data will be loaded, cleaned, and reshaped into a single consolidated DataFrame, which will serve as the foundation for further analysis and data visualization.

To minimize the risk of data loss in case of an error during execution, I will perform the scraping in blocks of 3 to 5 years. This approach allows me to save partial results progressively and resume easily if something goes wrong during the loop.

### 🔁 Handling Missing Data from Previous Scraping Runs
In case any organ-year combinations fail during scraping (e.g., due to stale element errors or loading delays), I can re-run the scraping selectively for the missing items after reviewing the console warnings or scrape reports.

For example, if 2023-Fegato or 2023-Polmone were skipped, I can retry:

```python
organs = ['Fegato', 'Polmone']
year = '2023'

driver = webdriver.Chrome()
save_each_organ_table_for_year(driver, year=year, organs=organs, output_folder=f"../data_raw/{year}")
driver.quit()
```
This strategy avoids repeating successful scrapes and ensures that all organ-year data is eventually collected in the correct folder structure.

### ⚠️ Organs Skipped Due to Legitimate Absence of Data
During scraping, some organ-year combinations like `2020-Intestino` and `2021-Intestino` may return timeout errors because **no transplants were performed for that organ in that year**.
This is not a scraping failure but a reflection of the real data available on the website.

Such cases can be skipped safely.

In [1]:
from utils import scrape_year, reset_main_page, save_each_organ_table_for_year
from selenium import webdriver

In [2]:
organs = ['Rene', 'Fegato', 'Cuore', 'Polmone', 'Pancreas', 'Intestino']
years = ['2012', '2011', '2010']

for year in years:
    try:
        print(f"🔄 Starting scraping for year {year}...")
        driver = webdriver.Chrome()
        output_folder = f"../data_raw/{year}"
        save_each_organ_table_for_year(driver, year=year, organs=organs, output_folder=output_folder)
        print(f"✅ Finished scraping for year {year}.")
    except Exception as e:
        print(f"❌ Error while scraping year {year}: {e}")
    finally:
        driver.quit()


🔄 Starting scraping for year 2012...
🔄 Scraping Rene data for 2012...
[Error] 2012-Rene: Message: stale element reference: stale element not found
  (Session info: chrome=137.0.7151.69); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#stale-element-reference-exception
Stacktrace:
	GetHandleVerifier [0x0x7ff794b8fea5+79173]
	GetHandleVerifier [0x0x7ff794b8ff00+79264]
	(No symbol) [0x0x7ff794949e5a]
	(No symbol) [0x0x7ff794960264]
	(No symbol) [0x0x7ff79495ed33]
	(No symbol) [0x0x7ff794952551]
	(No symbol) [0x0x7ff7949526b1]
	(No symbol) [0x0x7ff79495041f]
	(No symbol) [0x0x7ff794954be1]
	(No symbol) [0x0x7ff7949f22b4]
	(No symbol) [0x0x7ff7949c896a]
	(No symbol) [0x0x7ff7949f100d]
	(No symbol) [0x0x7ff7949c8743]
	(No symbol) [0x0x7ff7949914c1]
	(No symbol) [0x0x7ff794992253]
	GetHandleVerifier [0x0x7ff794e5a2dd+3004797]
	GetHandleVerifier [0x0x7ff794e5472d+2981325]
	GetHandleVerifier [0x0x7ff794e73380+3107360]
	GetHa

### 🔁 Final Retry for Missing Data
This is the final retry to collect data for `'Rene'` in **2012** and `'Cuore'` in **2011**, which were previously skipped during the loop despite being available on the website.

With this final step, I’ve now successfully collected all available data for **all organs**, across **all transplant centers**, for **each year from 2010 to 2024**.

In [None]:
#organs = ['Rene']
#year = '2012'

#driver = webdriver.Chrome()
#save_each_organ_table_for_year(driver, year=year, organs=organs, output_folder=f"../data_raw/{year}")
#driver.quit()

🔄 Scraping Rene data for 2012...
✅ Saved: ../data_raw/2012/2012_Rene.csv


In [None]:
#organs = ['Cuore']
#year = '2011'

#driver = webdriver.Chrome()
#save_each_organ_table_for_year(driver, year=year, organs=organs, output_folder=f"../data_raw/{year}")
#driver.quit()

🔄 Scraping Cuore data for 2011...
✅ Saved: ../data_raw/2011/2011_Cuore.csv


### ✅ Scraping Summary
With this notebook, I have completed the scraping of transplant data for:

- **All organs**: Rene, Fegato, Cuore, Polmone, Pancreas, Intestino
- **All planned years**: from 2010 to 2024
- **All transplant centers** in Italy

Each organ's data has been saved by year in the `data_raw/` folder following a structured directory layout.

This dataset forms the complete raw foundation for building a unified and analysis-ready table. In the next step — as outlined in the notebook `06_Manipulating_DataFrames_to_build_one_single_DataFrame.ipynb` — I will:

- Load these individual CSV files
- Clean and reshape them
- Consolidate everything into a single **pivoted long-format DataFrame**
ready for data exploration and visualization.