
<font size=6> Sea ice decline and Aitkensize particle relation </font>
<br><br>
<font size=4>
    **Tuuli Lehmusjärvi** (tuuli.lehmusjarvi@helsinki.fi) <p>
    15 Nov, 2019<p>
    NeGI course 2019 - Climate in high latitudes: eScience for linking Arctic measurements and modeling<p>
    Group assistant: Lisa Beck

# Abstract

In this report I studied the relation between Aitkensize particle concentration and the sea ice extent near Svalbard in the summer. I used three different particle size distribution datasets from Zeppelin station and compared them with reanalyzed sea ice extent data from ECMWF. Observational data shows seasonal, cyclical variation in the particle concentration, but also a variation from year to year. The reanalyzed data showed that the extent of the sea ice in the summer varies annually. As expected, the summer sea ice maxima coincide well with the particle concentration minima. The sea ice minima also were somewhat correlated to the particle concentration maxima, but not perfectly. I attempted to find an explanation for the anomaly from the wind direction data from Zeppelin station, but it was not successful. Based on the analysis, the particle size concentration and the sea ice extent are linked but other factors need to be considered to fully explain the relation. In future studies, one could explore the link with newer sea ice models (eg. ERA5) and using the wind speed in addition to the directional data.



# Introduction

Aerosols have a direct and indirect effect on the radiative balance: direct effects are scattering, reflection and absorption of short wave radiation. The indirect effect occurs  when aerosol particles act as Cloud Condensation Nuclei (CCN) and affect the cloud's properties such as life time and reflectivity. Currently atmospheric aerosols cause the largest uncertainties in global radiative forcing predictions and this is the biggest in the Arctic regions (Freud et al. 2017).   

New particle formation plays a big role in the CCN formation. In the Arctic summer new particle formation is the biggest and even though it occurs regionaly it affects worldwide aerosol number concentrations. New particles in the Arctic are mainly formed due to emission of biogenic sulphur gases (Dall'Osto et al. 2018). 

In the course my task was to look at observation data of particle size distribution from Zeppelin station at Ny-Ålesund and compare that to reanalyzed sea ice extend data. The aim was to see if sea ice loss is affecting  the particle concentrations. Dall’Osto et al. 2018 claim that declining sea ice and therefore increased exposure of open water is increasing new particle formation in the Arctic. They were concentrating on observational data collected at Villum, North East Greenland. The idea is to study this effect in Svalbard. The reanalyzed data showed the ice extent in the whole Arctic sea from 1979-2012, but I wanted to concentrate on the area in the west of Svalbard (Fram Strait), which affects the measurements done at Zeppelin the most since it is located on the west coast. From the particle size distribution data from 2000-2016 I wanted to select the time periods when the wind was blowing from the west or north-west. With this filter I hoped to better see the link between the sea ice extent in the west of Svalbard and the particle concentration. However this turned out not to be as succesful as expected way and therefore I ended up using all the observed and modelled data. 

The observational dataset of particle size distribution was from NILU and I got two datasets from EBAS (years 2000-2007 and 2008-2009) and the third dataset (2010-2016) from the NeGI server.
The reanalyzed dataset I used was ERA-Interim from ECMWF for the years 1979-2012.

In this notebook in methods section ([3](#Methods)) I will show how I processed and analysed both the modelled and observed data and how I made them comparable. In the results and discussion section ([4](#Results-and-discussion)) I will show my main finding and analysis from the data and compare along with discuss of them. In the last section ([5](#Conclusions-and-outlook)) I am going to present the conclusions and outlook of this report.

# Methods

Reading, analyzing and plotting of the observational data is presented in  [Section 3.1](#Observational-data). [Section 3.2](#Reanalyzed-model-data) will show the same thing but with model data. All the code is commented to make it easier to understand. 

In [None]:

# Import all needed packages
%load_ext autoreload
%autoreload 2
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import pyaerocom as pya
import glob
import cartopy.feature as cfeature
import cartopy.crs as ccrs
from IPython.display import Image
from tuuli_functions import size_dist_to_xarray
from tuuli_functions import lognorm_to_concentration
from tuuli_functions import load_seaice_xarray


## Observational data

The observational data used was aerosol particle size distribution data from Zeppelin station in Ny-Ålesund Svalbard. It was fetched from EBAS database. I wanted to have as long as possible time series and I ended up taking data from 2000 until 2015. A single long time series was not available in EBAS, but two smaller ones: first from 2000 to 2007 and second from 2008 to 2009. The third dataset I found was from our NEGi course folder and it was from 2010 to 2016. So now I had three different datasets which needed to be read separatly due to different file formats. Datasets also had different time steps, firts two measured in every half an hour, but the last dataset measured in every hour. I also ended up using the last dataset only until end of 2012, because the model data I used also ended in that year. In addition, there were many major data gaps in 2014 and 2015 which could have made the analysis unsure. The particle size range in the data was from 20-500nm. Since I wanted to look the Aitkensize particles, I selected size range of 20-50nm from all the three datasets.

I also planned to use wind direction data from Zeppelin station from the same years I had the particle size distributions datasets (2000-2012). Eventually, I decided to drop the wind data analysis since it didn't work  how I was planned. In the subsection ([3.1.4](#Wind-data-from-Zeppelin)) the reading and processing of the wind data will still be shown, and in results and discussion ([4](#Results-and-discussion)) I will elaborate on the reasons of disregarding the data.

### Reading and cleaning the different datasets

First, the datasets needed to be loaded into memory from the different files. After that all the flag values must be removed and new arrays created from the raw data. First the years 2000-2007:

In [2]:

# Reading all the files in the firs part of the datasets (2000-2007)
# together into a new array with pyaerocom
arrs1 = []
for filepath in glob.glob('EBAS_FILES_2000_2007/*.nas'):
    filedata = pya.io.EbasNasaAmesFile(filepath)
    arrs1.append(size_dist_to_xarray(filedata))    

# Sorting the previously acquired array by time
arr1 = arrs1[0]
for array1 in arrs1[1:]:
    arr1 = xr.concat([arr1, array1], dim='time')


Next the years 2008-2009:

In [3]:

# Reading all the files in the second part of the datasets (2008-2009)
# together into a new array with pyaerocom
arrs2 = []
for filepath2 in glob.glob('ebasfiles_2008_2009/*.nas'):
    filedata2 = pya.io.EbasNasaAmesFile(filepath2)
    arrs2.append(size_dist_to_xarray(filedata2, step=1))
    
# Sorting the previously acquired array by time
arr2 = arrs2[0]
for array2 in arrs2[1:]:
    arr2 = xr.concat([arr2, array2], dim='time')  
  

Last the years 2010-2015. Here the reading is done little bit differently since the format of the files was .csv and easier to read with Pandas builtins than the .nas files of the two previous datasets.

In [4]:

#Reading the last part of datasets (2010-2015)
datadir = '/home/2daa7756-2d5725-2d4dfb-2db0ff-2d5e0a6858a009/shared-ns1000k/inputs//Aerosol_sizedist_obs/'
filenam1 = datadir + 'Zeppelin_2010_hourly.csv'
filenam2 = datadir + 'Zeppelin_2011_hourly.csv'
filenam3 = datadir + 'Zeppelin_2012_hourly.csv'
filenam4 = datadir + 'Zeppelin_2013_hourly.csv'
filenam5 = datadir + 'Zeppelin_2015_hourly.csv'

flist=[filenam1, filenam2, filenam3, filenam4, filenam5]

# Creating a date parser
mydateparser = lambda x: pd.datetime.strptime(x, "%Y %m %d %H %M")

# Go through each file and append them together. 
# Date is split in five first columns, so we parse them together using 'mydateparser'.

datalist = []
for f in flist:
    datalist.append(pd.read_csv(f, parse_dates=[['0','0.1','0.2','0.3','0.4']],date_parser = mydateparser))
# Convert the new 'datalist' to Paandas dataframe    
data3 = pd.concat(datalist, axis=0)


To convert the first two datasets from raw data to readable arrays I used a function called 'size_dist_to_xarray': 

```python
def size_dist_to_xarray(loaded_ebas_file, step=2 ):
    filedata = loaded_ebas_file
    all_d = filedata.data[:,2:-1:step]
    time =  list(filedata.time_stamps)
    ds = xr.Dataset({'time':time})

    ds['sized'] = xr.DataArray(
        all_d, 
        dims={'time':ds['time'], 'd_index': np.arange(len(all_d[0,:]))}
    )
    return ds['sized']
```
This function will remove the flag values from the raw data and returns it in an xarray.

### Processing data

Now the data which is stored in xarrays should be converted to Pandas dataframes for easier joining. I wanted them all be in the same format. Some of the data in the 2008-2009 dataset was clearly faulty. In this section, I also show the removal of these erroneous data points. The data in the other data sets was usable as is.

The particle size distribution data doesn’t directly give the concentration of the particles. The instruments are measuring lognormat distributions so the unit of the 'concentrations' is actually:

$$\frac{dN_i}{d\log{D_p}}$$ 

Where $N_i$ is the concentration of the particles, and $D_p$ is the diameter of the particles. So when processing the data I need to calculate the real concentration from the measured data. For that I used a function called 'lognorm_to_concentration'. The function parameters are the name of the dataframe with the cleaned data, start and end date from the dataset and the names of the starting column and ending column. These columns define the size range of the particles that I want to study (20-50nm). No we can integrate over $d log10(D_p)$ to get the total number concentration value ($Ntot$); the return value of the function. 

```python
# Get data for diameters within 20-50nm
def lognorm_to_concentration(data, start_col, end_col, start_date, end_date):
    df_sizes = data.loc[start_col:end_col,start_date:end_date].T

# Integrate over log10(Dp) to get Ntot
    Ntot = pd.Series(np.trapz(df_sizes,
    x = np.log10(df_sizes.columns)),
    index=df_sizes.index.copy())
    
    return Ntot
```

Processing the first dataset:

In [5]:

# Turn arr1 to Pandas-dataframe for easier handling
df1 = arr1.to_dataframe().unstack('d_index')
cols = [x[1] for x in df1.columns.values]
# Transpose the dataframe for the futher analysis
df1_turned = df1.T.loc['sized']

# Get data for 1 Mar 2000 - 31 Dec 2007 for diameters within 20-50nm
# Column names 1-5 represents diameters from 20-50nm 
Ntot_20_50_2000_2007 = lognorm_to_concentration(df1_turned,1,5,'2000-03-01 00:30:00','2007-12-31 19:29:59')

Processing the second dataset and removing incorrect data in the end of 2008 :

In [None]:

# Turn arr1 to Pandas-dataframe for easier handling
df2 = arr2.to_dataframe().unstack('d_index')
cols2 = [y[1] for y in df2.columns.values]

# Masking the data to remove the overly large values which are not real

dd=df2.iloc[:,3:8] #First selecting right particle size range (20-50nm) to a new dataframe dd
mask = dd >10000 #Creating a mask of values over than 10000
dd[mask] = np.nan #Setting those values to NaN

# Plotting shows that in the end of 2009 there is weird peak
dd.plot(figsize=(14, 6))
plt.ylabel('Particle size distribution')
plt.savefig('particle_size_08_09_not_converted.png')

![](particle_size_08_09_not_converted.png)

<center><font size=2.99> <b>Fig. 1:</b> First look at the second observational data dataset of the years 2008-2009.  </font> </center>

In [None]:

# Plotting only the weird subset to see better
weird_subset = dd["2008-12-01":"2008-12-24"]
weird_subset.plot(figsize=(14, 6))
plt.ylabel('Particle size distribution')
plt.savefig('particle_size_weird_subset.png')



![](particle_size_weird_subset.png)

<center><font size=2.99> <b>Fig. 2:</b> Closer look of the strange looking section of the data in the end of the 2008.  </font> </center>

In [None]:
# Setting the time when the weird subset happened to NaN 
dd["2008-12-01":"2008-12-24"] = np.nan

# Transpose the dataframe for the futher analysis
dd_turned = dd.T.loc['sized']

# Get data for 1 Jan 2008 - 31 Dec 2009 for diameters within 20-50nm
# Column names 3-7 represents diameters from 20-50nm 
Ntot_20_50_2008_2009 = lognorm_to_concentration(dd_turned,3,7,'2008-01-01 00:30:00','2009-12-31 23:29:59')

Processing the third dataset:

In [9]:

# When parsing the header is also affected so we give it a new name, 'date'
data3.rename(columns={'0_0.1_0.2_0.3_0.4':'date'}, inplace = True)
# Set indices of the rows to date
data3 = data3.set_index('date')
# Remove last column
data3.drop(labels='0.6', axis=1, inplace=True) 

# Change all the incorrect values (-999) to NaN
data3 = data3.replace(-999,np.nan)

# Check the names of the columns, now they are the same as the diameter of the particles 
data3.columns = [float(ii) for ii in data3.columns]

# Transpose the dataframe for the futher analysis
new_d=data3.T
# Get data for 1 Jan 2010 - 28 Aug 2013 for diameters within 20-50nm
# Column names 20.0-50.238represents diameters from 20-50nm 
Ntot_20_50_2010_2013 = lognorm_to_concentration(new_d,20.0,50.238,'2010-01-01 00:00:00','2012-12-31 08:00:00')


### Plotting data

When processing data, it’s a good practice to have a look a the data as a sanity check to make sure everything is fine. 

In [None]:
# Take a look at the data as a sanity check
Ntot_20_50_2000_2007.plot(figsize=(12,6))
plt.ylabel('Particle concentration')
plt.savefig('sanitycheck_1.png')

![](sanitycheck_1.png)

<center><font size=2.99> <b>Fig. 3:</b> Sanity check of the first part of observational data, years 2000 to 2008. </font> </center>

In [None]:
# Take a look at the data as a sanity check
Ntot_20_50_2008_2009.plot(figsize=(12,6))
plt.ylabel('Particle concentration')
plt.savefig('sanitycheck_2.png')

![](sanitycheck_2.png)

<center><font size=2.99> <b>Fig. 4:</b> Sanity check of the first part of observational data, years 2008 t0 2010. </font> </center>

In [None]:
# Take a look at the data as a sanity check
Ntot_20_50_2010_2013.plot(figsize=(12,6))
plt.ylabel('Particle concentration')
plt.savefig('sanitycheck_3.png')

![](sanitycheck_3.png)

<center><font size=2.99> <b>Fig. 5:</b> Sanity check of the first part of observational data, years 2010 to 2013. </font> </center>

### Wind data from Zeppelin

Reading and cleaning the wind data is done much in the same way as the particle data before. 

In [13]:
# reading all the wind direction files together into a new array with pyaerocom
winddir = []
for filepath4 in glob.glob('ebas_winddir/*.nas'):
    filedata4 = pya.io.EbasNasaAmesFile(filepath4)
    winddir.append(filedata4.to_dataframe().wind_direction_deg)
    
# Sorting the previously acquired array by time
all_wind = winddir[0]
for data in winddir[1:]:
    all_wind = pd.concat([all_wind, data], axis=0)
all_wind.sort_index(inplace=True)    

To use the wind data as a mask to choose the correct days from the processed particle data, they need to have exactly the  same time frame. Here I created the filter mask to match the time of the particle data, and selected only the times when the wind came from between west and northwest. 

In [14]:
# Taking hourly mean from the data.
# Wind data and concentration data needs to start at the same time step that they can be used together
all_wind_avg = all_wind.resample('H').mean()

# Creating a mask for the wind directions. 
# Selecting only those times (hours) when wind comes from west or northwest
filter_mask = all_wind_avg.between(260, 350) 
# Concentration starts on 1st of March 2000, so we need to make the wind data to start at the same time 
March_index = filter_mask.index > pd.to_datetime('2000-03-01 00:00:00')  
new_filter = filter_mask[March_index]

## Reanalyzed model data

The reanalyzed model data was from ERA-Interim from ECMWF, and it showed the sea ice extent in the Arctic from 1979 to 2012. Because of my observational particle data was from Zeppelin station in Svalbard, coordinates of which are 78°54'29" N, 11°52'53" E, I needed to choose and cut an area from the model near Svalbard that I want to look at. 

### Reading and processing the model data

First, the reanalyzed data is read from the server and converted to xarray-format. In this way in future it is easier to handle. In the original file data is already averaged to monthly means. 

In [15]:
DATA_DIR = '/home/notebook/shared-ns1000k/inputs/SEAICE_M/'
FILE = 'ci.mon.mean.nc'

In [16]:
# Loading the data from the server and returning it in xarray
def load_seaice_xarray(filepath, shift_lons=True):
    import iris, xarray
    cube = iris.load_cube(DATA_DIR + FILE)
    if shift_lons:
        cube = cube.intersection(longitude=(-180, 180))
    return xarray.DataArray.from_iris(cube)

In [None]:
seaice_xarr = load_seaice_xarray(DATA_DIR + FILE)

Next I will define a new subset of the data which contains only the area near Svalbard. I decided to use area which coordinates are 78$^{\circ}$- 85$^{\circ}$N, -5$^{\circ}$- 20$^{\circ}$E. Since sea ice extent changes most in the summer time, I also wanted to look only at the summer months (from April to September). Here I also created new data frame which contains only the yearly mean of the summer months. 

In [18]:
# From the whole sea ice dataset (which keeps inside the whole Arctic ocean) creating an new subset.
# In the Svalbard subset there is only the sea ice extent near Svalbard area
svalbard_subset = seaice_xarr.sel(latitude=slice(82, 76),longitude=slice(-2,20))
sea_ice_timeseries = svalbard_subset.mean(('latitude', 'longitude'))
df_seaice = sea_ice_timeseries.to_dataframe()
df_everymonth = df_seaice['2000-03-01':'2012-12-31']

In [None]:
# Taking only the summer months (Apr-Sep) and creating a yearly mean out of them
df_everymonth['month'] = df_everymonth.index.month
mask = df_everymonth.month.between(4,9)
only_summer = df_everymonth[mask]
summer_mean = only_summer.resample('Y').mean()
df_everymonth.drop(['month'], axis=1, inplace=True)
summer_mean.drop(['month'], axis=1, inplace=True)

# Results and discussion

The main results and discussion of them will be presented in this section. First I will go through the observational particle data and what we can see from there.  After that I show the main results from the model data and then discuss how we can compare these two. Last, in [Section 4.1](#Discussion-for-the-wind-data) I will open up why I tried to use the observational wind data from Zeppelin and why it didn't work as I planned. 

As I mentioned in [Section 3.1](#Observational-data) the observational data was in three different datasets which were needed to be read separately. These datasets had different time steps, and because of this I needed to resample the first and second dataset so they would match the third one which had the longest time step of these. Also I wanted to concatenate the three datasets as one and plot it.

In [None]:
# Creating a variable which contains all of the particle concentration data
part1 = Ntot_20_50_2000_2007.resample('H').mean()
part2 = Ntot_20_50_2008_2009.resample('H').mean()
part3 = Ntot_20_50_2010_2013

frames = [part1,part2,part3]
Ntot_20_50_ALL = pd.concat(frames)

In [None]:
# Plotting the final particle concentration plot
plt.figure(1,figsize=[16,8])
plt.plot(Ntot_20_50_ALL.resample('M').mean(),color='g', linewidth = 2)
plt.xlim('2000','2013')
plt.xlabel('Date')
plt.ylabel('Concentration ($cm^{-3}$)')
plt.title('Monthly mean of particle concentration for 20-50nm at Zeppelin station (2000-2012)')
plt.savefig('particle_cons_month_mean.png')

![](particle_cons_month_mean.png)

<center><font size=2.99> <b>Fig. 6:</b> Time series of the monthly mean of particle concentration for 20-50 nm sized particles at Zeppelin station (2000-2012). The y axis shows the concentration (cm$^{-3}$) and in the x axis the date.</font> </center>

As can be seen in Fig. 6 the monthly mean of total particle concentration at Zeppelin station from 2000 to 2012. The local concentration peak is always in the spring and summer, and the minimum is in the winter. This is due the new particle formation in the Arctic which needs radiation of the sun to occur. In the winter there is no sunlight because of the polar night so there is no new particle formation either. In the plot there is also small gaps, for example at the end of 2002, which is due the fact that the observational data is never perfect and the instruments can fail in time to time. Figure 6 also shows that in different years the particle concentration can vary quite a lot. Years where the concentration is largest are 2000, 2001, 2011 and 2012. On the other hand the clear minimum of the concentration is in 2008 and 2009. 

From the modelled data it was possible to plot the sea ice extent also in monthly means, like the particle concentrations,with the monthly means, the variation between winter and summer is easier to see. The extent is always in its lowest in the summer and highest in the winter. If we want to compare the ice extent and the particle concentration we need to look at the summer sea ice and the peaks in concentration. As mentioned previously new particle formation happens only in summer. Now when both of the main datasets are same length and both averaged by month, comparing them is going to be smoother. 

In [None]:
# Plotting the monthly average sea ice extent and yearly summer-month mean together
plt.figure(1, figsize=[16,8])

ax = plt.subplot(1, 1, 1)
df_everymonth.plot(ax=ax,linewidth=2)
summer_mean.plot(ax=ax,linewidth=2.5)
plt.xlim('2000','2013')
plt.xlabel('Date',fontsize=16)
plt.ylabel('Sea ice extent (0-1)',fontsize=16)
leg=ax.legend(["Monthly average","Yearly mean over April - September"],fontsize=14)
plt.title('Sea ice coverage near Svalbard (2000-2012)',fontsize=16)
plt.savefig('seaice_extent_2000_2012.png')


![](seaice_extent_2000_2012.png)

<center><font size=2.99> <b>Fig. 7:</b> Sea ice coverage near Svalbard in 2000-2012. Blue line shows the monthly average of the ice extent and the orange line shows the yearly mean over the summer moths which are in this case from April till September. Y axis shows the ice extent and its unitless and goes from 0-1. X axis shows the date. </font> </center>

Fig. 7 shows the sea ice coverage near Svalbard (coordinates of the area are mentioned in [section 3.2.1](#Reading-and-processing-the-model-data)) in a time series and how it varys. Yearly mean over summer months is especially interesting since we want to see if the particle formation correlates whith the ice extent in summer. The largest extent is always in winter, and the smallest in the summer. From the Fig. 7 one can see that in 2001, 2002, 2004, 2011 and 2012 sea ice in the summer is lower than in other years and the extent is only around 0.05 or even lower. Again, the largest ice extent in the summer is in 2007, 2008 and 2009. 

Now comparing these main results we can notice some interesting things. The clearest is that in Fig 6, the years where the particle concentration was on lowest were 2008 and 2009. These are the same years when the summer sea ice extent was highest (see  both blye and orange line in the Fig. 7). From 2008 the concentration of the particles has been rising steadily, and the same can be seen in ice extent which has been decreasing in summers. On the other hand, the years when there was less sea ice in summers (2001, 2002, 2004, 2011, 2012) will partly match the years with the high concentration peaks (2000, 2001, 2011 and 2012).                                                                                             

With the model data we can have a closer look at the sea ice extent with maps from the Svalbard area. Even though sea ice extent is at its lowest in the end of summer in September, I chose to concentrate on July since that is the month when most of the new particle formation happens (Dall'Osto 2018). 

In [None]:
# Creating variables of different years, which have the same selected month in every year

selected_years_jun = seaice_xarr.isel(time=[258,270,282,294,306,318,330,342,354,366,378,390,402]).sel(latitude=slice(82,75),longitude=slice(-5,30))
selected_years_sep = seaice_xarr.isel(time=[272,284,296,308,320,332,344,356,368,380,392,404]).sel(latitude=slice(82,75),longitude=slice(-5,30))
selected_years_may = seaice_xarr.isel(time=[268,280,292,304,316,328,340,352,364,376,388,400]).sel(latitude=slice(82,75),longitude=slice(-10,30))
selected_years_jun_min = seaice_xarr.isel(time=[258,270,390,402]).sel(latitude=slice(82,75),longitude=slice(-5,30))
selected_years_jun_max = seaice_xarr.isel(time=[354,366]).sel(latitude=slice(82,75),longitude=slice(-5,30))

In [None]:
# Plotting together sea ice extent maps of all the years whith the same month
extent = [-5, 30, 75, 82]
proj_plot = ccrs.NorthPolarStereo(central_longitude=12)

p = selected_years_jun_max.plot(x='longitude', y='latitude', transform=ccrs.PlateCarree(),
              subplot_kws={"projection": proj_plot},
              col='time', col_wrap=2, robust=True, cmap='viridis')
    
    

for ax,i in zip(p.axes.flat,selected_years_jun_max.time.values):
    ax.coastlines()
    ax.set_extent(extent)
    ax.coastlines(resolution='50m')
    ax.gridlines(draw_labels=True)
    ax.set_title(pd.to_datetime(str(i)).strftime("%B %Y"), fontsize=18, y=-0.35)

![](seaice_maps_max.png)

<center><font size=2.99> <b>Fig. 8:</b> Sea ice extent maps from the area near Svalbard from July 2008 and July 2009. Coordinates of the area are 75$^{\circ}$- 82$^{\circ}$N, -5$^{\circ}$- 30$^{\circ}$E. Colorbar shows the sea ice cover. </font> </center>

The best connection with the summer sea ice and particle concentration could be seen in 2008 and 2009. Now with the maps in the Fig. 8, the amount of sea ice can be better understood than from Fig. 7. One can see that in these years the sea ice near the north part of Svalbard  is uniform over the Fram Strait and it is quite close of the west side of Svalbard. In 2008 the ice extent goes all the way to the shore and near Ny-Ålesund. In these years the high amount of sea ice can lead to the decreasing of the particle concentration which can be seen in Fig. 6. 

In [None]:
# Plotting together sea ice extent maps of all the years whith the same month
extent = [-5, 30, 75, 82]
proj_plot = ccrs.NorthPolarStereo(central_longitude=12)

p = selected_years_jun_max.plot(x='longitude', y='latitude', transform=ccrs.PlateCarree(),
              subplot_kws={"projection": proj_plot},
              col='time', col_wrap=2, robust=True, cmap='viridis')
    
    

for ax,i in zip(p.axes.flat,selected_years_jun_min.time.values):
    ax.coastlines()
    ax.set_extent(extent)
    ax.coastlines(resolution='50m')
    ax.gridlines(draw_labels=True)
    ax.set_title(pd.to_datetime(str(i)).strftime("%B %Y"), fontsize=18, y=-0.35)

![](seaice_maps_min.png)

<center><font size=2.99> <b>Fig. 9:</b> Sea ice extent maps from the area near Svalbard from July 2000, 2001, 2002, 2004, 2011 and 2012. Coordinates of the area are 75$^{\circ}$- 82$^{\circ}$N, -5$^{\circ}$- 30$^{\circ}$E. Colorbar shows the sea ice cover. </font> </center>

The years of the sea ice minimums and highest peaks of concentrations were not so easy to connect than the opposite. The years when the particle concentration was the highest were 2000, 2001, 2011 and 2012 and the years when sea ice was its lowest were 2001, 2002, 2004, 2011, 2012. Years 2001, 2011 and 2012 seemed to match quite well: in those years particle concentrations were high and in the maps can be seen that in that time also ice was far away from the west coast of Svalbard. Also in these years there was much more open water in the north of Svalbard than in 2008 and 2009, when the ice ectent was at its highest. Now the question is why in 2002 and 2004 the summer sea ice extent is really low according to Fig. 7 but the particle concentration (Fig. 6) is not remarkably higher than in the other years. One reason to this can be that in the Fig. 9 in these two maps (2002, 2004) there is some ice present really close to the west side of Svalbard, so really close to Ny-Ålesund. Though the whole sea ice extent of the area is smaller than in other years, the small place near the coast will affect more. On the other hand the year 2000 is also a different. There the sea ice extent is quite high (Fig. 7 and 9) but the particle concentratons are highest in the whole time series. With these datasets I can't explain it. 

### Discussion for the wind data

As mentioned in [Section 3.1](#Observational-data) there was also wind data available for the time period I was concentrating on. The processing of wind data was shown in [Section 3.1.4](#Wind-data-from-Zeppelin). There I made a mask from the hours when the wind was coming from the west or north-west, and this could be applied to the particle concentration data. The plan was to see if the wind could make a difference to the time series, and if the connection between the sea ice extent and the particle concentration measured from Zeppelin could be seen more clearly. However this didn't work as planned. When applying the wind direction-mask to the particle data, almost half of the time series disappeared (figure 10). From this we see that the wind direction hasn't been often from west or north-west. Due to this less than optimal result I decided to not to use the filtered data. 

In [None]:
# Plotting the particle concentration with the westwind-filter
plt.figure(1,figsize=[18,10])
plt.plot(Ntot_20_50_ALL[west_filter].resample('M').mean(),linewidth=3)
plt.savefig('particle_cons_month_mean_westwind.png')
plt.xlim('2000','2013')
plt.ylabel('Concentration ($cm^{-3}$)',fontsize=24)
plt.xlabel('Date',fontsize=24)
plt.title('Monthly mean of particle concentration for 20-50nm at Zeppelin station (2000-2012) with the westwind-filter',fontsize=22)
plt.savefig('filtered_particle_cons_month_mean.png',bbox_inches = 'tight',pad_inches=0.1)

![](filtered_particle_cons_month_mean.png)

<center><font size=2.99> <b>Fig. 10:</b> Time series of the monthly mean of particle concentration for 20-50nm sized particles at Zeppelin station with the westwind-filter. In y-axis there is the concentration (cm$^{-3}$) and in the x-axis there is the date.  </font> </center>

# Conclusions and outlook

When we are talking about new particle formation we are talking of particles in size range 1-10nm and when talking about Aitkensize it is 10-50nm. The observational particle data was available only from 20 nm and not smaller. With this data I couldn't look directly at small particles which are the ones formed by new particle formation, but I decided to concentrate in Aitkensize. So in this report we are assuming that newly formed particles grow straight to bigger sizes. With this assumption it is possible to connect the sea ice loss to new particle formation to concentration of Aitkensize particles.

In this study the aim was to see if there is a clearly noticeable connection between total particle concentration in Aitkensize particles measured in Zeppelin and the sea ice extent near Svalbard. When drawing conclusions from the data we need to take into consideration the fact that the local sea ice coverage in Kongsfjorden closest to Ny-Ålesund probably affects the particle formation the most. I tried to correlate the sea ice extent of the larger marine area to the particle concentration data, but the spatial resolution of the model is not high enough to show the details of the area. Because of this I also tried to use the wind direction data to see if the wind could have an effect on particle concentrations. After all I realised that wind direction alone is not enough for an analysis and it would need to be  combined with the wind speed. But due the time limitations it is left to the future studies.



I found out that the connection can be seen, as Dall'Osto et al. 2018 claimed, but it is not as clear as I hoped it would be. Relaton between sea ice extent from the model ERA-Interim and particle concentration from observations from Zeppelin station can be seen better when there has been more sea ice than usual (years 2008 and 2009). In that time particle concentration was low as the sea ice extent was high. In turn, when the sea ice coverage was low the default was that the particle concencentration would be high. This came true in some of the years (2001, 2011 and 2012), but in some years it didn't. That is probably due to because there are many other factors also affecting the particle concentrations, not only the ice extent, which this study didn't take into account. 
With the used data and results derived from it, it can still be said that the sea ice loss is going to affect the concentrations of Aitkensize particles. 

This study could be improved if the time series would  have been longer. The problem however is that observational data was not really good after 2012 due to many data gaps. Also for the future it would be a good idea to look into different models and how they maybe differ. For example, one could compare now used ERA-Interim and the newer ERA5, which claims to have more consistent sea surface temperature and sea ice. 

# Acknowledgements

I would like to thank Lisa Beck, the assistant of group 1. She helped me to find this topic and this report was made under her supervision. Also thank you to Luis Santos who was a great group member and help. All the other teachers and assistants were also a great help to me, especially for teaching Python. 

# References

Dall, M., Geels, C., Beddows, D. C. S., Boertmann, D., Lange, R., Nøjgaard, J. K., ... & Massling, A. (2018). Regions of open water and melting sea ice drive new particle formation in North East Greenland. Scientific reports, 8(1), 6109.

Freud, E., Krejci, R., Tunved, P., Leaitch, R., Nguyen, Q. T., Massling, A., ... & Barrie, L. (2017). Pan-Arctic aerosol number size distributions: seasonality and transport patterns. Atmospheric Chemistry and Physics, 17(13), 8101-8128.