# Pre-processing del dataset [EM_DATA](https://doc.emdat.be/docs/introduction/)

## Spiegazione delle variabili

<table border="1" cellspacing="0" cellpadding="8" style="border-collapse: collapse; width: 100%; font-family: Arial, sans-serif; font-size: 14px;">
  <thead style="background-color: #040348;">
    <tr>
      <th>Column Name</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>DisNo.</td>
      <td>ID, Mandatory</td>
      <td>A unique 8-digit identifier including the year (4 digits) and a sequential number (4 digits) for each disaster event (i.e., 2004-0659). In the EM-DAT Public Table, the ISO country code is appended. See column ISO below.</td>
    </tr>
    <tr>
      <td>Historic</td>
      <td>Yes/No, Mandatory</td>
      <td>Binary field specifying whether or not the disaster happened before 2000, using the Start Year. Data before 2000 should be considered of lesser quality (see Time Bias).</td>
    </tr>
    <tr>
      <td>Classification Key</td>
      <td>ID, Mandatory</td>
      <td>A unique 15-character string identifying disasters in terms of the Group, Subgroup, Type and Subtype classification hierarchy. See Disaster Classification System.</td>
    </tr>
    <tr>
      <td>Disaster Group</td>
      <td>Name, Mandatory</td>
      <td>The disaster group, i.e., “Natural” or “Technological.” See Disaster Classification System.</td>
    </tr>
    <tr>
      <td>Disaster Subgroup</td>
      <td>Name, Mandatory</td>
      <td>The disaster subgroup. See Disaster Classification System.</td>
    </tr>
    <tr>
      <td>Disaster Type</td>
      <td>Name, Mandatory</td>
      <td>The disaster type. See Disaster Classification System.</td>
    </tr>
    <tr>
      <td>Disaster Subtype</td>
      <td>Name, Mandatory</td>
      <td>The disaster subtype. See Disaster Classification System.</td>
    </tr>
    <tr>
      <td>External IDs</td>
      <td>IDs List, Optional</td>
      <td>List of identifiers for external resources (GLIDE, USGS, DFO, HANZE), in the format “&lt;source&gt;:&lt;identifier&gt;” and separated by the pipe character ("|").</td>
    </tr>
    <tr>
      <td>Event Name</td>
      <td>Optional</td>
      <td>Short specification for disaster identification, e.g., storm names (“Mitch”), plane type (“Boeing 707”), disease (“Cholera”), or volcano (“Etna”).</td>
    </tr>
    <tr>
      <td>ISO</td>
      <td>ID, Mandatory</td>
      <td>The ISO 3-letter country code (ISO 3166). See Spatial Information and Geocoding.</td>
    </tr>
    <tr>
      <td>Country</td>
      <td>Name, Mandatory</td>
      <td>Country where the disaster occurred, using UN M49 Standard names. If multiple countries are affected, each has a separate entry linked to the same DisNo.</td>
    </tr>
    <tr>
      <td>Subregion</td>
      <td>Name, Mandatory</td>
      <td>Subregion of occurrence based on UN M49 standard, automatically linked to the Country field.</td>
    </tr>
    <tr>
      <td>Region</td>
      <td>Name, Mandatory</td>
      <td>Region or continent of occurrence based on UN M49 standard, automatically linked to the Country field.</td>
    </tr>
    <tr>
      <td>Location</td>
      <td>Text, Optional</td>
      <td>Geographical location name as in the sources (city, province, etc.), used to identify GAUL Admin Units.</td>
    </tr>
    <tr>
      <td>Origin</td>
      <td>Text, Optional</td>
      <td>Additional contextual factors that led to the event (e.g., “heavy rains” for floods).</td>
    </tr>
    <tr>
      <td>Associated Types</td>
      <td>Names List, Optional</td>
      <td>List of secondary disaster types cascading from or co-occurring with the main type (e.g., landslide after flood).</td>
    </tr>
    <tr>
      <td>OFDA/BHA Response</td>
      <td>Yes/No, Mandatory</td>
      <td>Specifies whether the OFDA or BHA responded to the disaster.</td>
    </tr>
    <tr>
      <td>Appeal</td>
      <td>Yes/No, Mandatory</td>
      <td>Specifies whether there was a request for international assistance.</td>
    </tr>
    <tr>
      <td>Declaration</td>
      <td>Yes/No, Mandatory</td>
      <td>Specifies whether a state of emergency was declared in the country.</td>
    </tr>
    <tr>
      <td>AID Contribution (‘000 US$)</td>
      <td>Unadjusted Monetary Amount (‘000 US$), Optional</td>
      <td>Total amount (in thousands of US$) of contributions for relief activities, sourced from OCHA FTS (1992–2015). Not maintained after 2015.</td>
    </tr>
    <tr>
      <td>Magnitude</td>
      <td>Disaster-Type-Dependent, Optional</td>
      <td>The intensity of a specific disaster (see Hazard and Disaster Magnitude Units).</td>
    </tr>
    <tr>
      <td>Magnitude Scale</td>
      <td>Disaster-Type-Dependent, Optional</td>
      <td>The associated unit for the Magnitude column.</td>
    </tr>
    <tr>
      <td>Latitude</td>
      <td>Degrees [-90;90], Optional</td>
      <td>North-South coordinates, mainly for earthquakes and volcanic activity.</td>
    </tr>
    <tr>
      <td>Longitude</td>
      <td>Degrees [-180;180], Optional</td>
      <td>East-West coordinates, mainly for earthquakes and volcanic activity.</td>
    </tr>
    <tr>
      <td>River Basin</td>
      <td>Text, Optional</td>
      <td>Name of affected river basins (typically used for floods).</td>
    </tr>
    <tr>
      <td>Start Year</td>
      <td>Numeric, Mandatory</td>
      <td>Year of occurrence of the disaster.</td>
    </tr>
    <tr>
      <td>Start Month</td>
      <td>Numeric, Optional</td>
      <td>Month of occurrence of the disaster. Optional for long-duration disasters like droughts.</td>
    </tr>
    <tr>
      <td>Start Day</td>
      <td>Numeric, Optional</td>
      <td>Day of occurrence of the disaster. Optional for long-duration disasters.</td>
    </tr>
    <tr>
      <td>End Year</td>
      <td>Numeric, Optional</td>
      <td>Year of disaster conclusion.</td>
    </tr>
    <tr>
      <td>End Month</td>
      <td>Numeric, Optional</td>
      <td>Month of disaster conclusion.</td>
    </tr>
    <tr>
      <td>End Day</td>
      <td>Numeric, Optional</td>
      <td>Day of disaster conclusion.</td>
    </tr>
    <tr>
      <td>Total Deaths</td>
      <td>Numeric, Optional</td>
      <td>Total fatalities (deceased + missing).</td>
    </tr>
    <tr>
      <td>No. Injured</td>
      <td>Numeric, Optional</td>
      <td>Number of injured or ill people requiring immediate medical assistance.</td>
    </tr>
    <tr>
      <td>No. Affected</td>
      <td>Numeric, Optional</td>
      <td>Number of people requiring immediate assistance due to the disaster.</td>
    </tr>
    <tr>
      <td>No. Homeless</td>
      <td>Numeric, Optional</td>
      <td>Number of people requiring shelter due to house destruction or damage.</td>
    </tr>
    <tr>
      <td>Total Affected</td>
      <td>Numeric, Optional</td>
      <td>Total number of affected people (injured + affected + homeless).</td>
    </tr>
    <tr>
      <td>Reconstruction Costs (‘000 US$)</td>
      <td>Unadjusted Monetary Amount (‘000 US$), Optional</td>
      <td>Replacement costs for lost assets in thousands of US$, unadjusted for inflation.</td>
    </tr>
    <tr>
      <td>Reconstruction Costs, Adjusted (‘000 US$)</td>
      <td>Adjusted Monetary Amount (‘000 US$), Optional</td>
      <td>Reconstruction costs adjusted for inflation using the CPI.</td>
    </tr>
    <tr>
      <td>Insured Damage (‘000 US$)</td>
      <td>Unadjusted Monetary Amount (‘000 US$), Optional</td>
      <td>Economic damage covered by insurance companies, unadjusted for inflation.</td>
    </tr>
    <tr>
      <td>Insured Damage, Adjusted (‘000 US$)</td>
      <td>Adjusted Monetary Amount (‘000 US$), Optional</td>
      <td>Insured damage adjusted for inflation using CPI.</td>
    </tr>
    <tr>
      <td>Total Damage (‘000 US$)</td>
      <td>Unadjusted Monetary Amount (‘000 US$), Optional</td>
      <td>Total economic losses due to the disaster, unadjusted for inflation.</td>
    </tr>
    <tr>
      <td>Total Damage, Adjusted (‘000 US$)</td>
      <td>Adjusted Monetary Amount (‘000 US$), Optional</td>
      <td>Total damage adjusted for inflation using CPI.</td>
    </tr>
    <tr>
      <td>CPI</td>
      <td>Conversion Ratio, Optional</td>
      <td>Consumer Price Index from OECD used to adjust US$ values for inflation.</td>
    </tr>
    <tr>
      <td>Admin Units</td>
      <td>JSON Array of Objects, Optional</td>
      <td>Collection of impacted Administrative Units from FAO GAUL 2015. Geocoding is maintained for non-biological natural hazards from 2000 onwards.</td>
    </tr>
    <tr>
      <td>Entry Date</td>
      <td>Date, Mandatory</td>
      <td>The day on which the event record was created in EM-DAT.</td>
    </tr>
    <tr>
      <td>Last Update</td>
      <td>Date, Mandatory</td>
      <td>The last modification of the event or its associated records in EM-DAT.</td>
    </tr>
  </tbody>
</table>

## Carico il dataset

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_excel("emdata.xlsx")

In [3]:
df.head(10)

Unnamed: 0,DisNo.,Historic,Classification Key,Disaster Group,Disaster Subgroup,Disaster Type,Disaster Subtype,External IDs,Event Name,ISO,...,Reconstruction Costs ('000 US$),"Reconstruction Costs, Adjusted ('000 US$)",Insured Damage ('000 US$),"Insured Damage, Adjusted ('000 US$)",Total Damage ('000 US$),"Total Damage, Adjusted ('000 US$)",CPI,Admin Units,Entry Date,Last Update
0,1900-0003-USA,Yes,nat-met-sto-tro,Natural,Meteorological,Storm,Tropical cyclone,,,USA,...,,,,,30000.0,1131126.0,2.652223,,2004-10-18,2023-10-17
1,1900-0005-USA,Yes,tec-ind-fir-fir,Technological,Industrial accident,Fire (Industrial),Fire (Industrial),,,USA,...,,,,,,,2.652223,,2003-07-01,2023-09-25
2,1900-0006-JAM,Yes,nat-hyd-flo-flo,Natural,Hydrological,Flood,Flood (General),,,JAM,...,,,,,,,2.652223,,2003-07-01,2023-09-25
3,1900-0007-JAM,Yes,nat-bio-epi-vir,Natural,Biological,Epidemic,Viral disease,,Gastroenteritis,JAM,...,,,,,,,2.652223,,2003-07-01,2023-09-25
4,1900-0008-JPN,Yes,nat-geo-vol-ash,Natural,Geophysical,Volcanic activity,Ash fall,,,JPN,...,,,,,,,2.652223,,2003-07-01,2023-09-25
5,1900-0009-TUR,Yes,nat-geo-ear-gro,Natural,Geophysical,Earthquake,Ground movement,,,TUR,...,,,,,,,2.652223,,2019-08-05,2023-09-25
6,1900-9001-IND,Yes,nat-cli-dro-dro,Natural,Climatological,Drought,Drought,,,IND,...,,,,,,,2.652223,,2006-12-01,2025-03-06
7,1900-9002-CPV,Yes,nat-cli-dro-dro,Natural,Climatological,Drought,Drought,,,CPV,...,,,,,,,2.652223,,2006-12-01,2025-03-06
8,1901-0001-UGA,Yes,nat-bio-epi-dis,Natural,Biological,Epidemic,Infectious disease (General),,,UGA,...,,,,,,,2.652223,,2003-07-01,2023-09-25
9,1901-0003-BEL,Yes,tec-ind-exp-exp,Technological,Industrial accident,Explosion (Industrial),Explosion (Industrial),,Coal mine,BEL,...,,,,,,,2.652223,,2005-04-13,2023-09-25
