# Programming for Data Analysis - Project 2

**Ciaran Moran**

***

**Standard imports**

In [1]:
# receiving some user wraning messages, so i found this to prevent them being displayed
# https://stackoverflow.com/questions/9134795/how-to-get-rid-of-specific-warning-messages-in-python-while-keeping-all-other-wa
import warnings
warnings.simplefilter("ignore", category=Warning)

# Imports
import matplotlib.pyplot as plt 
import random
import datetime
import pandas as pd 
import seaborn as sns
import numpy as np
import os

## Open the .csv files
#### We can skip the first X rows in the csv as they are not pure data columns
#### Initially received the error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 33: invalid start byte"
####
#### Looking online I tried various suggestions from 
#### https://stackoverflow.com/questions/22216076/unicodedecodeerror-utf8-codec-cant-decode-byte-0xa5-in-position-0-invalid-s
####
#### The working solution appears to be encoding='unicode_escape'
####
#### The next issue was rows with all Nan values, which may cause issues later on.
#### For this I tried keep_default_na=False and also skip_blank_lines=True from 
#### https://stackoverflow.com/questions/39297878/how-to-skip-an-unknown-number-of-empty-lines-before-header-on-pandas-read-csv


### Data standardisation

##### File: 41586_2008_BFnature06949_MOESM31_ESM.csv

Here I attempt to standardise the data.

The initial issue is that we have 2 sets of data side by side.

So I extract the data for University of Berlin into a seperate dataframe.

Then I extract the data for LGGE in Grenoble into its own dataframe.

I then rename the column titles to match those of University of Berlin.

Then the dataframes are concatinated together into one dataframe.

The result is a .csv with the data listed in a consistant order.


In [2]:
# Here we read in 41586_2008_BFnature06949_MOESM31_ESM.csv
#
df = pd.read_csv('data/41586_2008_BFnature06949_MOESM31_ESM.csv', \
                 skiprows=6, encoding='unicode_escape',  skip_blank_lines=True, keep_default_na=False)

####################
# University of Bern
####################
# Ref https://stackoverflow.com/questions/61553063/read-csv-file-by-column-number-in-pandas-python
#
moesm31_1 = df.iloc[0:247, 0:4] # This will copy columns 0 to 3, for rows 0-246 
#Now add in some constants to standardise the data frame
moesm31_1['station'] = 'moesm31'
moesm31_1['uni'] = 'University of Bern'
save_filename = 'data/moesm31_1.csv'
if os.path.isfile(save_filename): os.remove(save_filename) # delete if exists
moesm31_1.to_csv(save_filename, index=False)

print(moesm31_1.tail())

###################
# LGGE in Grenoble
###################
moesm31_2 = df.iloc[0:47, 4:7] # This will give you all rows for columns 4 to 6

print(moesm31_2.head())
# https://stackoverflow.com/questions/11346283/renaming-column-names-in-pandas
moesm31_2.rename(columns={'Depth (m).1': 'Depth (m)', 'EDC3_gas_a (yr).1': 'EDC3_gas_a (yr)', \
                   'CO2 (ppmv).1': 'CO2 (ppmv)' }, inplace=True)
print(moesm31_2.head())

#Now add in some constants to standardise the data frame
moesm31_2['sigma (ppmv)'] = '' # this data isn't present
moesm31_2['station'] = 'moesm31'
moesm31_2['uni'] = 'LGGE in Grenoble'
		

save_filename = 'data/moesm31_2.csv'
if os.path.isfile(save_filename): os.remove(save_filename) # delete if exists
moesm31_2.to_csv(save_filename, index=False)

# We can append (concat) both of the new .csv files  
# Ref: https://www.usepandas.com/csv/append-csv-files
moesm31_combined=pd.concat([moesm31_1, moesm31_2])
# write out to csv, not necessary but handy for checking data
save_filename = 'data/moesm31_combined.csv'
if os.path.isfile(save_filename): os.remove(save_filename) # delete if exists
moesm31_combined.to_csv(save_filename, index=False)
print(moesm31_combined.head)
print(moesm31_combined.tail)

df2 = pd.read_csv('data/grl52461-sup-0003-supplementary.csv', skiprows=6, encoding='unicode_escape',  skip_blank_lines=True, keep_default_na=False)


    Depth (m) EDC3_gas_a (yr) CO2 (ppmv) sigma (ppmv)  station  \
242   3187.87          794608      199.4          1.7  moesm31   
243   3188.23          795202      195.2          2.0  moesm31   
244   3188.98          796467      189.3          2.1  moesm31   
245   3189.33          797099      188.4          1.4  moesm31   
246   3190.08          798512      191.0          2.2  moesm31   

                    uni  
242  University of Bern  
243  University of Bern  
244  University of Bern  
245  University of Bern  
246  University of Bern  
  Depth (m).1 EDC3_gas_a (yr).1 CO2 (ppmv).1
0     3061.71            667435        178.5
1     3063.98            670124        189.0
2     3085.78            688035        234.0
3     3086.88            688751        235.4
4     3087.98            689444        241.0
  Depth (m) EDC3_gas_a (yr) CO2 (ppmv)
0   3061.71          667435      178.5
1   3063.98          670124      189.0
2   3085.78          688035      234.0
3   3086.88          

***

## End