In [1]:

Air pollution in the EU by Dirk Engfer, Germany
Provided Input data file: airpollution2012.csv

------------------------------
Original source of Input Data:
------------------------------
Creator: EEA - European Environment Agency, Copenhagen
Title: National emissions reported to the Convention on Long-range Transboundary Air Pollution (LRTAP Convention)
Copyright: European Environment Agency (EEA) is the owner of copyrights and database rights
Copyright notice: Information, documents and material available on this website and for which the EEA holds the rights of use are public and may be re-used without prior permission, free of charge, for commercial or non-commercial purposes, provided that the EEA is always acknowledged as the original source of the material and that the original meaning or message of the content is not distorted. Such acknowledgment must be included in each copy of the material. The re-use of the content on the EEA website covers the reproduction, adaptation and/or distribution, irrespective of the means and/or the format used. The re-use of certain data may be subject to different conditions, and if so the item concerned is accompanied by a copyright mark or other mention of the specific conditions relating to it. The above mentioned permissions do not apply to content supplied by third parties. Therefore, for documents where the copyright lies with a third party, permission for reproduction must be obtained from the copyright holder.
Copyright source: https://www.eea.europa.eu/legal/copyright
Download date: 14-May-2020
License notice: https://creativecommons.org/licenses/by/2.5/dk/deed.en_GB 
Data download source: https://www.eea.europa.eu/data-and-maps/data/national-emissions-reported-to-the-convention-on-long-range-transboundary-air-pollution-lrtap-convention-13
Disclaimer notice: No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
(Disclaimer source: https://creativecommons.org/licenses/by/2.5/dk/deed.en_GB)


import os, numpy as np
import pandas as pd
homedir = os.getenv('HOME')

datapath = os.path.join(homedir, 'Dokumente','python-apps','tensorflow', 'eu_air_pollution_data')
datafile = 'airpollution2012.csv'

indatapath = os.path.join(datapath,datafile)

In [4]:
c = lambda a : (  a.replace(',', '.')  )

df = pd.read_csv(indatapath, skiprows=10, header=0, sep=',',usecols=[0,1,2,3,5,7,9,13,14], converters={'statistic_value':c})
df = df.loc[(df['type_of_station'].isin(['Traffic'])) & (df['station_type_of_area'].isin(['urban']))& (df.city_name.isin(['Linz', 'Salzburg']))]
df['statistic_value'] = df['statistic_value'].astype(np.float)
df.sort_values(by=['station_european_code', 'country_name', 'city_name', 'statistics_year'])
df.drop(['type_of_station', 'station_type_of_area', 'component_caption', 'measurement_unit'], axis=1, inplace=True)
groups = df.groupby(['station_european_code', 'country_name', 'city_name'])
df['transform_first_of_values'] = groups['statistic_value'].transform('first')
df['cumsum_of_values'] = groups['statistic_value'].cumsum()

 Pandas supports the FIRST and LAST properties of data items on grouped data.
 This is a check how smart Pandas is about FIRST and LAST within BY-group processing.
 The approach of assigning a value to records denoted as FIRST or LAST
 is different from SAS in the way that assignment cannot happen to grouped data directly.
 Instead, we accomplish this by a MERGE with the ungrouped/original data frame.

 get the first per group:
first = groups.first()
 Label first-bygroup values as such:
first['first_of_group'] = 'yes'
 drop the unnecessary columns:
first.drop(['statistic_value', 'transform_first_of_values', 'cumsum_of_values'], axis=1, inplace=True)
print(first)
 Merge back first bygroup value with orig. data BY the First "record's" identifier of each group:
df2 = df.merge(first, how='left', on=['station_european_code', 'country_name', 'city_name', 'statistics_year'])
df2.head(61)

                                              statistics_year first_of_group
station_european_code country_name city_name                                
AT4S415               Austria      Linz                  2001            yes
AT4S431               Austria      Linz                  2001            yes
AT51000               Austria      Salzburg              2000            yes
AT51066               Austria      Salzburg              2001            yes


Unnamed: 0,station_european_code,statistics_year,statistic_value,country_name,city_name,transform_first_of_values,cumsum_of_values,first_of_group
0,AT4S415,2001,53.000,Austria,Linz,53.0,53.000,yes
1,AT4S415,2002,57.000,Austria,Linz,53.0,110.000,
2,AT4S415,2003,53.000,Austria,Linz,53.0,163.000,
3,AT4S415,2004,43.900,Austria,Linz,53.0,206.900,
4,AT4S415,2005,56.500,Austria,Linz,53.0,263.400,
...,...,...,...,...,...,...,...,...
56,AT51066,2008,34.850,Austria,Salzburg,46.8,322.350,
57,AT51066,2009,39.000,Austria,Salzburg,46.8,361.350,
58,AT51066,2010,44.969,Austria,Salzburg,46.8,406.319,
59,AT51066,2011,37.967,Austria,Salzburg,46.8,444.286,
