## Our mission 
In this module, we will preprocess the Chernobyl air data by doing the following: 

- Rename PAYS to be country.
    - To avoid confusion. 
- Rename variables X, Y to be lattitude and longitude respectively.
    - To avoid confusion. 
- Rename chemical concentration variables. 
    - As you can see in the initial air.head() output below, writing in a string search for 1 131 (Bq/m3) everytime 
    I need to find info for the variable can get tedious. Renaming will speed up the analysis.
- Rename duration.
    - Same as above. 
- Convert date to date object.
    - Dates in date object make for easier analysis.
- Convert end of sampling to hour.
    - Same as above, except for time.
- Convert all chemicals to floats and insert Na's for any non-numeric string. 

In [1]:
import pandas as pd 
import numpy as np

In [38]:
air = pd.read_csv('CHERNAIR.csv')

## Renaming variables

In [34]:
air.head()

Unnamed: 0,PAYS,Code,Ville,X,Y,Date,End of sampling,Duration(h.min),I 131 (Bq/m3),Cs 134 (Bq/m3),Cs 137 (Bq/m3)
0,SE,1,RISOE,12.07,55.7,86/04/27,24:00:00,24.0,1.0,0.0,0.24
1,SE,1,RISOE,12.07,55.7,86/04/28,24:00:00,24.0,0.0046,0.00054,0.00098
2,SE,1,RISOE,12.07,55.7,86/04/29,12:00,12.0,0.0147,0.0043,0.0074
3,SE,1,RISOE,12.07,55.7,86/04/29,24:00:00,12.0,0.00061,0.0,9e-05
4,SE,1,RISOE,12.07,55.7,86/04/30,24:00:00,24.0,0.00075,0.0001,0.00028


In [39]:
air.rename(columns = {'PAYS': 'country', 
                      'X': 'lattitude', 'Y': 'longitude', 
                      'End of sampling': 'endsampling',
                      'I 131 (Bq/m3)': 'i131',
                     'Cs 134 (Bq/m3)': 'cs134',
                     'Cs 137 (Bq/m3)': 'cs137',
                     'Duration(h.min)': 'duration'}, inplace = True)

In [40]:
air.head()

Unnamed: 0,country,Code,Ville,lattitude,longitude,Date,endsampling,duration,i131,cs134,cs137
0,SE,1,RISOE,12.07,55.7,86/04/27,24:00:00,24.0,1.0,0.0,0.24
1,SE,1,RISOE,12.07,55.7,86/04/28,24:00:00,24.0,0.0046,0.00054,0.00098
2,SE,1,RISOE,12.07,55.7,86/04/29,12:00,12.0,0.0147,0.0043,0.0074
3,SE,1,RISOE,12.07,55.7,86/04/29,24:00:00,12.0,0.00061,0.0,9e-05
4,SE,1,RISOE,12.07,55.7,86/04/30,24:00:00,24.0,0.00075,0.0001,0.00028


## Converting to date and time objects

In [41]:
air.Date = pd.to_datetime(air.Date)

In [47]:
air.endsampling.replace({'24:00:00':'00:00:00'}, inplace = True)

In [53]:
air.endsampling = pd.to_datetime(air.endsampling).dt.hour

## Convert chemical strings to numeric values

In [73]:
air.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2051 entries, 0 to 2050
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   country      2051 non-null   object        
 1   Code         2051 non-null   int64         
 2   Ville        2051 non-null   object        
 3   lattitude    2051 non-null   float64       
 4   longitude    2051 non-null   float64       
 5   Date         2051 non-null   datetime64[ns]
 6   endsampling  2051 non-null   int64         
 7   duration     2051 non-null   float64       
 8   i131         2009 non-null   float64       
 9   cs134        1801 non-null   float64       
 10  cs137        1506 non-null   float64       
dtypes: datetime64[ns](1), float64(6), int64(2), object(2)
memory usage: 176.4+ KB


In [63]:
air.i131 = pd.to_numeric(air.i131, errors = 'coerce')

In [66]:
air.cs134 = pd.to_numeric(air.cs134, errors = 'coerce')

In [69]:
air.cs137 = pd.to_numeric(air.cs137, errors = 'coerce')

In [74]:
air.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2051 entries, 0 to 2050
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   country      2051 non-null   object        
 1   Code         2051 non-null   int64         
 2   Ville        2051 non-null   object        
 3   lattitude    2051 non-null   float64       
 4   longitude    2051 non-null   float64       
 5   Date         2051 non-null   datetime64[ns]
 6   endsampling  2051 non-null   int64         
 7   duration     2051 non-null   float64       
 8   i131         2009 non-null   float64       
 9   cs134        1801 non-null   float64       
 10  cs137        1506 non-null   float64       
dtypes: datetime64[ns](1), float64(6), int64(2), object(2)
memory usage: 176.4+ KB
